Вы находитесь на странице: 1из 27

BOOK II DEFINITIONS AND CERTAIN FORMULA

Time Value of Money


1. Present Value
2. Future Value
3. Discounting
4. Compounding
5. Cost of capital
6. Opportunity cost
7. Real risk free rate
8. Discount rate
9. Annuity
10. Perpetuity
Ch.2 Probabilities, Miller Mathematics and Statistics for Risk Management.
1.
2.
3.
4.
5.
6.

Random variable an uncertain quantity or number


Outcome observed value of a random variable from an experiment
Event - a single outcome or a set of outcomes
Mutually exclusive events events that cannot happen at the same time
Exhaustive events those that include all possible outcomes.
Probability distribution - describes probabilities of all possible outcomes for a random variable. Probabilities
must sum to 1. Assignment of probabilities to the possible outcomes for discrete and continuous random
variables provides us with discrete probability distributions and continuous probability distributions.
7. Discrete random variable - the number of possible outcomes can be counted, and for each possible outcome,
there is a measurable and positive probability.
8. Probability (mass) function - denoted (px), specifies the probability that a random variable is equal to a
specific value. P(x) is the probability that a random variable X take n a value x. p(x) = P(X=x)
9. Continuous random variable - one for which the number of possible outcomes is infinite, even if lower and
upper bounds exist.
10. Discrete distribution - p(x) = 0 when x cannot occur or p(x) >0, if it can. P(x) is read probability that a
random variable X=x.
11. Continuous distribution - p(x) = 0 even though x can occur. We can only consider certain bounds for X, such
as

P( x 1 X X 2)

where

x 1x2

are actual numbers. P(x=2)=0 , because it is a single point in a CRV

that can take infinite range of possible values.


12. Probability density function (pdf) - denoted f(x) that can be used to generate the probability that outcomes of
a continuous distribution lie within a particular range of outcomes. Pd used to calculate the probability of an
outcome that lies between two values.
13. Cumulative distribution function (cdf) - defines probability that a random variable, X, takes on a value equal
to or less than a specific value, x. it represent the sum, or cumulative value, of the probabilities for the
outcomes up to and including the specified outcome.
14. Inverse cumulative distribution function - used to find the value that corresponds to a specific probability.
E.g. (VaR)
15. Discrete uniform random variable probabilities for all possible outcomes for a discrete random variable are
equal.
16. Unconditional probability - (marginal probability) the probability of an event regardless of the past or future
occurrence of other events
17. Conditional probability where the occurrence of one event affects the probability of the occurrence of another
event. E.g. P(Recession| monetary authority increase interest rates)
a.

P ( A|B )=

P ( AB)
P ( B)

18. Joint probability of two events - the probability that they will both occur together.
19. Multiplication rule of Probability - multiplication of conditional and unconditional probability.
a. P(AB) = P(A|B) P(B)
20. Independent events - refers to events for which the occurrence of one has no influence on the occurrence of
the others.
a. P(A|B) = P(A) or equivalently P(B|A) = P(B)
21. Dependent events - if the independence condition is not satisfied, the events are dependent events. (the
occurrence of one is dependent on the occurrence of the other)
22. Addition rule for probabilities - used to determine the probability that at least one of the two events will
occur. E.g. given two events A and B, the addition rule can be used to determine the probability that either A or
B will occur.

23. Mutually exclusive events - the joint probability of (AB) is zero, the probability that either A or B will occur is
simply the sum of the unconditional probabilities.
Ch 3 Basic Statistics, Miller, Math and Stats for Risk Management
1. Statistics- word used to refer to data and methods used to analyze data
2. Descriptive Statistics - summary of important characteristics of large data sets
3. Inferential Statistics - pertain to procedures used to make forecasts, estimates, judgements about large set
of data on the basis of statistical characteristics of smaller set.
4. Population - set of all possible members of a stated group
5. Measures of Central Tendency - center, average of a data set. Can be used to represent the typical or
expected value of a data set.
6. Population mean - describes the expected value/average of the entire data set (population)
7. Sample mean
- a partial mean of a sample of the population for n observations < N population. Used to
make inferebnces abut population mean when it is infeasible or not possible to get all members of population.
8. Arithmetic mean - is the only measure of central tendency for which the sum f the deviations from the mean
is zero.
9. Median - midpoint of a data set when ordered from ascending or descending order. Half observations above
and below the median.
10. Mode - most frequent value observed in the dataset.
11. Geometric mean - used when calculation investment returns over multiple periods or when measuring
compound growth rates. nth root of the product of all n observed values.
12. Expected Value - weighted average of all possible outcomes of a random variable.
13. Properties of Expectation - see handwritten lecture notes
14. Variance measure of the expected spread or dispersion of the random variables about the mean (squared) .
can only be used for one variable
15. Sample Variance variance of the sample with n-1 degrees of freedom for an unbiased estimator of variance.
16. Standard Deviation same as variance but in units of the mean/expected value. square root of the variance.
spread or dispersion about the mean.
17. Properties of Variance - see handwritten notes
18. Covariance - expected value of the product of the deviations of the two random variables from their respective
expected values. Describes the co-movement of two variables. Describes a linear relationship between two
variables but does not mean much until reduced down to correlation (as a percentage). Gives direction of comovement but not size(level) of the relationship.
19. Properties of Covariance - see handwritten notes
20. Correlation - easier to interpret covariance when reduced by the standard deviations of the two variables.
Measures the strength of the linear relationship between the two random variables. Ranges between -1<=

<=1

21. Properties of Correlation - see handwritten notes


22. Scatter plot collection of points which represent the values of two variables. (X/Y) pair.
23. Interpretation of Correlation Coefficients see handwritten notes
24. Moments describe the shape of a probability distribution. Raw moments are measured relative to an
expected value raised to an appropriate power
25. Kth raw moment the expected value of the

E ( X k ) = pi X ki

26. Central moments - kth moments measured around the mean

E ( Xu ) = pi ( X iu )

27. 1st raw/central moment mean


28. 2nd central moment variance
29. 3rd central moment Skewness
30. 4th Central moment Kurtosis
31. Interpret Skewness and Pos-Neg Skewness see handwritten notes
32. Interpret Kurtosis and LeptoKurtic Platykurtic MesoKurtic see handwritten notes
33. Coskewness see handwritten notes
34. Cokurtosis see handwritten notes
35. Unbiased estimator - one for which the expected value of the estimator is equal to the parameter you are
trying to estimate. Unbiased estimator also efficient I the variance of its sampling distribution is smaller than
all the other unbiased estimators of the parameter you are trying to estimate. E.g. the sample mean is an
unbiased estimator of the population mean.
36. Biased estimator expected value of the estimator is not equal to parameter you are trying to estimate. And
may be far off.
37. Consistent estimator - one for which the accuracy of the parameter estimate increases as the sample size
increases.

38. Point estimate is a linear estimator when it can be used as a linear function of sample data. If the estimator is
the best available (has minimum variance), exhibits linearity, and is unbiased, it is said to be BLUE (best
linear unbiased estimator)

Ch 4 Distributions, Miller, Math and Stats for Risk Management


1. Parametric Distribution - described using a mathematical function
2. Nonparametric Distribution - like a historical distribution cannot be described with a mathematical function .
makes no assumptions about the data and thus type of distribution fits data perfectly. Difficult to draw to draw
conclusion or infer much from R.V.
3. Continuous Uniform Distribution - all random variables have the same probability and bounded by a,b,.
everything else probability is zero.
4. Cdf of a continuous Uniform Distribution see handwritten notes
5. The Bernoulli Distribution - a trial that determines the success or failure of an experiment. Two-point
distribution. w/ probability p of success and q=1-p or failure.
6. Binomial Random Variable random variable defined by the experiment of n Bernoulli trials. Number of
success in n Bernoulli trials.
7. Binomial distribution see above.
8. Expected Value of a Binomial Random Variable see hand notes but simply expected number of successes.
9. Poisson Distribution pdf discrete distribution of outcomes of a random variable X that refers to the number of
success per unit or an experiment. . The parameter lambda,

, refers to the average (expected value) of

the number of successes per unit. E.g. number of calls per hour arriving at a switchboard, or number of defects
per batch in a prod process, no of phonecalls per hour arriving at phone. Number of x patients affected from
200 procedures with average affected rate=2 persons/procedure.
10. Pdf of a normal distribution 11. Cdf of a normal distribution
12. Confidence interval - a range of values around the expected outcome within which we expect the actual
outcome to be some specified percentage of the time. 95% confidence interval is a range that we expect the
random variable to be in 95% of the time.
13. Confidence intervals for Normal Distribution - interval is based on the expected value (sometimes called a
point estimate) of a random variable and on its variability, which we measure with a standard deviation.
a. For any normally distributed random variable, 68% are within one standard deviation.
b. Approx. 95% are within 2 standard deviations.

c. 90% confidence interval for X ,


d. 95% confidence interval for X,
e. 99% confidence interval for X, ,

x 1.65 ,
x 1.96 ,

x 2.58 ,

x + 1.65

x + 1.96

x +2.58

f. Numbers are the z-score of a standard normal.


14. Standard Normal Distribution - normal distribution that has been standardized so mean=zero and standard
deviation = 1 .
15. Standardization process of converting an observed value for a random variable to its z-value.

z=

obspop . mean xu
=
std dev

16. Z-Values standardized random variable. Also determines number of standard deviations of normal. As
std.normal r.v sd = 1. And mean = 0.
17. Lognormal Distribution , pdf 18. Lognormal Distribution, cdf
19. Normal vs. Lognormal Distribution
20. Central Limit Theorem states that for simple random samples of size n from population with a mean
and finite variance

, the sampling distribution of the sample mean

approaches a normal probability distribution withmean

x
2

variance equal

as the sample size

becomes large. Because when the sample size is large, the sum of independent and identically distributed r.vs
will be normally distributed.
21. Properties of Central Limit Theorem
a.

22. Students t Distribution


23. Properties of Students t
24. Pdf of Students t
25. Cdf of Students t
26. Normal vs. t Distribution
27. Chi-Squared Distribution
28. Pdf of Chi-Squared Distribution
29. F Distribution
30. Properties of F-Distribution
31. Mixture Distributions
Ch. 6 Bayesian Analysis , Miller, Mathematics a Statistics for Financial Risk Management
1.
2.
3.
4.
5.
6.
7.
8.

Bayes Theorem
Unconditional Probability of events
Joint probability of events
Bayesian Approach
Frequentist Approach
Bayes vs. Frequentist
Bayesian Theorem with Multiple States

Ch. 7 Hypothesis Testing and Confidence Intervals , Miller, Mathematics a Statistics for Financial Risk
Management
1. Simple Random Sampling - method of selecting a sample in such a way that each item or person in the population
being studied has the same likelihood of being included in the sample.
2. Sampling Error the difference between sample statistic(mean, variance, std dev of the sample) and its
corresponding population parameter(true mean, variance, std dev of the population)
a. Sample error (se) = sample mean population mean =

3. Sampling Distribution of the sample statistic - sample statistic itself is a random variable itself and therefore, has a
probability distribution. This is the prob distribution of all possible sample statistics computer from a set of equal
size samples that are randomly drawn from the same population.
4. Sampling Distribution of the mean repeated process of sampling the mean from n observations from the population.
5. Mean of the sample average e.g. two random variables x_1 and x_2.

( X 1 + X 2 ) = x + x =2 x

, therefore we can say the

6. Variance of the Sample Average =

a.

Var ( X )=

X1+ X 2 2 x
=
= x
2
2

recall

( X ) = x

Var ( X 1 + X 2 ) =2 2x a=c =0.5 Var ( a X 1+ c X 2 )=a 2 Var ( X 1 ) +c 2 Var ( X 2 )=

2
general terms .
n

7. Standard Error = std dev of the sample average is known as standard error.

8. Population mean all observed values in population are summed and divided by number of observations in
population.
9. Sample mean = sum of all values in a sample of population dived by no. of observations in the sample.
10. Dispersion defined as the variability around the central tendency. Theme in finance is tradeoff between reward
and variability. Central tendency is measure of reward and dispersion is a measure of risk.
11. Population variance - average of the squared deviations from the mean. Population variance uses all members of a
population.

12. Standard deviation of the population =

( X )2
i=1

( X )2

13. Sample variance

s 2= i=1

using n will systematically underestimate the population variance, especially

n1

for small sample sizes, known as a biased estimator using n-1 improves the statistical properties of
2

s as an estimator of

14. Sample standard deviation square root of the sample variance.


15. Standard error of sample mean the standard deviation of the distribution of the sample means . sample std
dev/sqrt(n)

s X =s e X =

s
n

16. Covariance between two random variables is a statistical measure of the degree to which two variables move
together. Captures the linear relationship between one variable and another. Pos cov- move together same
direction. Neg cov move in opp direction
N

( X i X )( Y i y )

17. Population Covariance -

co v xy= i=1

( X i X ) ( Y i y )

18. Sample Covariance -

s . co v xy= i=1

n1

19. Confidence Intervals - estimates that result in a range of values which the actual value of a parameter will lie, given
the probability 1-

20. Level of significance

21. Degree of confidence - 1-

is called a level of significance for the confidence interval. , and the probability

probability is the degree of confidence. E.g. estimation of pop mean of a rando

variables will range from 15-25 with 95% degree of confidence , or at the 5% level of significance.
22. Construction of a confidence interval usually constructed by adding or subtracting an appropriate value form the
point estimate. In general, confidence intervals take on the following form.
a. Point estimate

reliabiity factor x standard error e.g.

x ( 1 x )

where

x is sample std devstandard error .


23. Point estimate value of a sample statistic of the population parameter
24. Reliability factor - number that depends on sampling distribution of the pint estimate and the probability that the
point estimate falls in the confidence interval. (1-

25.

26.

Standard error of the point estimate

standard deviation of the point estimate .

Confidence interval for Population mean (normal dbn) -

x z / 2

27. Commonly used normal distribution reliability factor factor, a standard normal variable for which te probability in
the right-hand tail of the distribution is

/2 . in other words, this is the z-factor that leaves

/2 or

probability in the upper tail.

a.

z/ 2

= 1.65 for 90% confidence Intervals (signf level 10%, 5% in each tail

b.

z/ 2

= 1.96 for 95% confidence intervals (sign level 5%, 2.5% in each tail

c.

z/ 2

= 2.58 for 99% confidence interval (sign level 1%, 0.5% in each tail

28. Probabilistic Interpretation after repeatedly samples of the population, constructing confidence intervals for each
sample mean, 99% of the resulting confidence intervals will, in the long run, include the population mean,

29. Practical Interpretation - We are 99% confident that the population mean score is between

x
( 1 , x 2 )

for the

samples from this population.


30. Population is normal with unknown variance , t-distribution when the population is normal but variance is
unknown, we use the t-distribution to construct a confidence interval -

x t /2

s
n

last variable is the

standard error of the sample mean, s is sample std dev.


31. Reliability factors for t-distribution - t-reliability factor (t-statistic or critical t-value) corresponding to a t-distributed
random variable with n-1 degrees of freedom, where n is the sample size. The area under the tail of the tdistribution to the right of

32.

33.
34.
35.

is

/2

Reliability factors depend on sample size and degrees of freedom (n-1), so we cannot rely on standard
set of factors. Confidence intervals with t-statistic will be more conservative(wider) than with zreliability factors.
Selecting Appropriate Test-Statistic
a. Size of the sample influences whether or not we can construct the appropriate confidence interval for
the sample mean.
b. Distribution non-normal but population variance is known - z-statistic can be used as long as the
sample size is large (n>30). We can use this because the central limit theorem assures us that the
distribution of the sample mean is approx. normal when n is large.
c. Distribution is normal and population variance is unknown t-statistic can be used as long as the
sample size is large (n>30). You can use z-statistic but t-statistic is more conservative.
Hypothesis testing is the statistical assessment of a statement or idea regarding a population.
Hypothesis a statement about the value of a population parameter developed for the purpose of testing a theory
or belief. Hypotheses are stated in terms of the population parameter to be tested, like the population mean, .
Hypothesis Testing procedures - based on sample statistics and probability theory, are used to determine whether
a hypothesis is a reasonable statement and should not be rejected or If it is an unreasonable statement and
should be rejected.
a. State the hypothesis
b. Select the appropriate test statistic
c. Specify the level of significance
d. State the decision rule regarding the hypothesis
e. Collect the sample and calculate the sample statistic
f. Make a decision regarding the hypothesis
g. Make a decision based on the results of the test

36. Null Hypothesis designated ,

H0

is the hypothesis the researcher wants to reject. It is the hypothesis that is

actually tested and is the basis for the selection of the test statistics. The null is generally a simple statement
about a population parameter.
a. Typical Statement -

H 0 : =0 , H 0 : 0H 0 : 0

39.
40.
41.

is the population mean and

is the hypothesized value of the population mean.

37. Alternative hypothesis - designated

38.

where

HA,

is what is concluded if there is sufficient evidence to reject the null

hypothesis. It is usually the alternative hypothesis the research is really trying to assess. Why? Since you cannot
really prove anything with the statistics, when the null is rejected, the implication is that the alternative is valid.
Choice of Null and Alternative
a. Most common null will be an Equal to hypothesis . Alternative is often the hoped-for hypothesis.
When the null is that a coefficient is equal to zero, we hope to reject it and show significance of the
relationship.
Mutually excusive alternative - when the null is less than or equal to, the mutually exclusive alternative is framed
as greater than.
Hypothesis testing parameters - includes two statistics
a. Test statistic calculated from the sample data
b. Critical value of the test statistic.
Test statistic calculated by comparing the point estimate of the population parameter with the hypothesized value
of the parameter. (i.e. the value specified in the null hypothesis)

test statistichypothesized value


standard error of the sample statistic

42. Test statistic -

43. Standard error of the sample statistic adjusted standard deviation of the sample.
44. Critical value of the test statistic 45. Alternative Hypothesis can be one sided or two sided. Whether the test is one side or two-sided depends on the
proposition being tested.
46. One-tailed test considered if something is just greater than or less than a value. But if you do not know on which
side, two tailed tests are preferred.

a. Upper tail:

H 0 : o versus H A : 0

b. Upper tail: if calculated test statistic is greater than 1.645 at 5% sign level, we conclude that the
sample stasitic is sufficiently greater than hypothesized value. We reject the null hypothesis.
i. If calculated test statistic is less than 1.645, we conclude that the sample statistic is not
sufficiently different from hypothesized value, and we fail to reject the null.

c. Lower tail :

H 0 : o versus H A : 0

d. The appropriate set of hypotheses depends on whether we believe the population mean, , to be

greater than (upper tail) or ess than (lower tail) of the hypothesized value,

47. Two-tailed test allow for deviations on both sides of the hypothesized value (in general cases, zero)
a. Two-tailed test an be structured as

H 0 : =0 , H A : 0 ,

b. Since the alternative allows for values above and below hypothesized parameter, a two tailed test uses
two critical values(or rejection points)
48. Decision/rejection rule for a two-tailed z-test
a. Reject

H0

if : test statistic > Upper critical value or test statistic < lower critical value

49. Type I Error the rejection of the null hypothesis when it is actually true
50. Type II Error - the failure to reject the null hypothesis when it is actually false. (difficult in practice depends on
sample size and critical calue chosen) (alternative was not statistically significant enough due to sample size and
critical value .also collinearity when more than two variables involved)
51. Power of a test probability of correctly rejecting the null hypothesis when it is actually false. = (1 P(Type II error) )
52. Type I and Type II Errors in Hypothesis Testing

True Condition

H 0 is True

Decision
Do not Reject
Reject

H0

H A is False

Correct Decision

H0

Incorrect Decision Type I


error,
Significance level, . =
P(Type I error)

Incorrect Decision Type II


error
Correct Decision- Power of
the Test
1 P( Type II error)

53. Confidence interval and Hypothesis Testing


a. Confidence Interval is a range of values within which the researcher believe the true population
parameter may lie.

b.
[Sample Statistic (critical value)(standard error) ]

population parameter

[Sample statistic + (critical value)

(Standard error)]

c. Restated also as : [ -critical value


i.

test stat =

test statistic

+ critical value ]

sample stat pop param ( H 0 )


standard error

d. This is the range we fail to reject the null for two tailed hypothesis test at given level of significance
e.
54. Statistical Significance does not imply economic significance. When something is statistically significant based on
data, the economic benefits might be diminished based on executing and maintain the strategy. (transaction

55.
56.
57.
58.
59.
60.

costs, taxes, downside risk from short sales) which can diminish returns and not make strategy economically
viable in long term, even if it is statistically significant above zero.
Economic Significance see directly above
P-value - probability of obtaining a test statistic that would lead to a rejection of the null hypothesis, assuming the
null hypothesis is true. It is the smallest level of significance for which the null hypothesis can be rejected.
One Tailed test-p-value - is the probability that lies above the computed test statistic for upper tai tests or below
the computed test statistic for lower tail tests.
Two tailed test-pvalue - probability that lies above the positive value of the computed test statistic plus the
probability that lies below the negative value of the computed test statistic.
T-test - employs a t-statistic that uses a hypothesis test based on the t-distribution. Mostly used when n>30 and
population variance is unknown. Can be used if distribution is normal and known and sample size n< 30.
Critical Z-values

Level of Significance
.10=105
.05 =5%
.01= 1%

Two-Tailed Test
+-1.65
+-1.96
1%+-2.58

One Tailed Test


+- 1.28
+-1.65
+-2.33

61. ChiSquared Test - used for hypothesis tests concerning the variance of a normally distributed population. Letting

represent the true population variance and

represent the hypothesized variance,

62. Hypothesis of a two-tailed test of single population variance -

H 0 : = 0 H A : 0

63. Hypotheses for one-tailed chi-squared test structure as:

a.
b.

H 0 : 2 20 H A : 2 20
2

2
0

H 0 : H A :

Upper tail test

2
0

Lower Tail test

64. Hypothesis Testing of population variance requires use of chi-squared distributed test statistic , denoted

The chi-squared distribution is asymmetrical and approaches the normal distribution in shape as the degrees of
freedom increase.

65.

ChiSquared Test Statistic

2=

( n1 ) s2
20

66. F-test - hypotheses concerned with the equality of the variances of two populations are tested with an Fdistributed test statistic.
a. used under assumption that the populations from which samples are drawn are normally distributed
and that the samples are independent.

b.

1 2 variances normal population 12

c. Two tailed Ftest

H 0 : 21= 22 H A : 12 22

d. One Tailed FTest

H 0 : 21 22 versus H A : 21> 22

OR

H 0 : 21 22 versus H A : 21< 22
e. F =

s 21
,
s 22

Always put the larger variance in the numerator (

s 21

) .Following this convention, we only

needed to consider critical value for right-hand tail. Also F-critical value takes into account degrees of
freedom for calculation .

n11n21

f.

S 1=variance of thsmaple of n1 observations population 1

g.

S 2=variance of the sampl eof n2 observations population 2

67.

Chebyshevs Inequality - states that for any set of observations, whether sample or population data, and
regardless f the distribution, the percentage of the observations that like within k standard deviations of
the mean is at least

1
k2

k >1

for all

68. Relationships of Chebyshevs inequality

a. 36% lie within

1.25 standard deviations the mean

b. 56% lie within

1.5 standard deviations the mean

c. 75% lie within

2 standard deviations the mean

d. 89% lie within

3 standard deviations the mean

e. 94% lie within

4 standard deviations the mean

69. Importance of Chebyshevs Inequality it applies to any distribution. If we know the underlying distribution is
normal. We can be even more precise about the percentage of observations that will fall within a given number of
standard deviations of the mean.
a. Events for nonnormal distributions may not be so rare, occurring about 11% of the time for events
beyond

3 Standard Deviations.

70. Backtesting - involves comparing expected outcomes against actual data. It is common for Risk managers to
backtest VaR models to ensure model is forecasting losses with same frequency predicted by the confidence
interval.
a. When the VaR measure is exceeded during a given testing period, it is known as an EXCEPTION or an
EXCEEDANCE. Ater backtesting, if number of exceptions is greater than expected, the risk manager
may be underestimating the actual risk and VaR may be underestimated. If no. of exceptions is less
than expected, risk manager may be overestimating actual risk.
b.
71. Limits of backtesting VaR
a. Backtesting VaR can cause issues because exceptions are often serially correlated. High probability
that an exception will occur after a previous period had an exception.
b. Also occurrence of exceptions tend to be correlated to overall market volatility. High exceptions with
high mkt volatility and low exceptions with low mkt volatility.
c. Failure of VaR model to quickly react to Risk levels.
Ch. 11 Correlations and Copulas, Hull, Risk Management and Financial Institutions
1. Correlation - measures the strength of the linear relationship btw two variables over time.
2. Covariance - measures the direction of the co-movement between two variables over time.
3. Rho = -1, 0, and 1 , - standardized measure is more convenient in risk analysis applications than covariance,
which can have values between

cov ( X ,Y )
x y

a.

xy =

b.

co v xy = xy x y

c.

X E ( X ) ( Y E ( Y ) ]=E ( X , Y ) E ( X ) E ( Y )

cov ( x , y )=E

d.

first time is expected value of product of X and Y.

Variables are defined as independent if knowledge of one variable does not impact the probability
distribution for another variable. In other words conditional probability of V2 given information
regarding probability distribution of V1 is equal to the unconditional probability of V2 as expressed in
the following equation

e. P ( V 2|V 1 =x =P(V 2)
f.

Correlation of zero does not imply that there is no dependence between the two variables. It implies
there is no linear relationship between the two variables. But value of one variable can still have a
nonlinear relationship with the other variable.

4. EWMA Model a. Conventional Wisdom suggests that more recent observations should carry more weight because they
more accurately reflect the current market environment. We can calculate a new covariance on day n
using an exponentially weighted moving average(EWMA) model. Model is designed to vary the weight
given to more recent observations (by adjusting
b.

co v n=co v n1 + ( 1 ) X n1 Y n1

, where

weight for most recent covariance ,

Y n1=

= percentage change for variable X on day n-1.

X n1

percentage change for variable Y on day

n-1.
c.

can also be used update variance

d. And also the correlation from the EWMA covariance and EWMA variance.
5. GARCH(1,1) Model - Alternative method to updating covariance rate for two variables X and Y.

co v n=w+ X n1 Y n1 + co v n1

a. GARCH(1,1) model is:

X n1 Y n1

b.

c.

,weight to most recent covariance estimate

d.

,weight is given to long term average covariance rate

e.
f.

,weight of most recent observation on covariance

EWMA is a special case of GARCH(1,1)

co v n= V L + X n1 Y n1+ co v n1

( co v n1 )

with w = 0, a = (1- and = .


, where is weight to the long-term variance V L

. This

equation requires three weights to sum to 100% or y+a+B=1 and long term average covariance rate
must equal w/(1-a-B)
6. Variance Covariance Matrix - can be constructed using the calculate estimates of variance and covariance
rates for a set of variables. The diagonal of the matrix represents the variance rates where i=j. Covariance
rates are all other elements of the matrix where

i j

7. Positive Semidefinite - matrix is positive semidefinite if it is internally consistent. The following expression
defines the necessary condition for an NxN variance-covariance matrix,
all Nx1 vectors

a.

w,

where

wT

is the transpose of vector,

to be internally consistent for

w w0

b. When small changes are made to a small positive-semidefinite matrix, the matrix will most likely
remain positive-semidefinite. However, changes to a large 1000x1000 will most likely cause the matrix
to no longer be positive-semidefinite.
c.
8. Internally consistent

a. For a 3x3 matrix.

1,2 + 1,3 + 2,3 2 1,3 1,2 2,3 1

b.
9. Generating Samples for Bivariate Normal
a. Suppose we have a bivariate normal with two variables X and Y.
b. If variable X is known, and value of variable Y is conditional on value of variable X.
c. Expected value of Y is normally distribute with mean of:

i.

E ( Y )=Y + XY Y

( X X )
X

Y = Y 12XY

d. Steps for Generating two samples sets of variables from a bivariate normal distribution.
i.

Independent samples

Z x Z y

are obtained from a univariate standardized normal

distribution. Using inversenorm functions in programming languages.


ii.

Samples

x y

are then generated. The first samples of X variables is the same as the

random sample from a univariate standard normal distribution

x =Z x

iii.

The conditional sample of Y variables is determined as follows.

y = XY Z x + Z y 12XY

iv.

XY

where

is the correlation between X and Y in the

bivariate normal distribution


10. Factor Models - a factor model can be sued to define correlations between normally distributed variables. The
following equation is a one-factor model
a.

U i= i F + 1 2i Z i
i.
ii.

One factor models are structured as follows:

Every U_i has a standard normal distribution (mean=0, stddev =1)

Constant

is between -1 and 1

F and

Zi

have standard normal distributions and are uncorrelated with each other.

iv. Every

Zi

is uncorrelated with each other

iii.

v. All correlations between

U iU j

result from their dependence on a common factor , F.

b. Advantages of one-factor models.


i. Covariance matrix for one-factor models is positive semi-definite
ii. Number of correlations between variables is greatly reduced. If there were N variables, this
would be (N(N-1)/2) calculations. One-facto model only requires N estimates for correlations,
where each of the N variables is correlated with one factor, F.
iii. Most well known one-factor model is the CAPM.
11. Copula - a copula creates a joint probability distribution between two or more variables while maintaining their
individual marginal distributions. Accomplished by mapping the marginal distributions to a new known
distribution. Mapping of each variable to a new distribution is done based on percentiles.
a. Using a copula is a way to indirectly define correlation structure between two variables when it is not
possible to directly define correlation. Made by assuming the two univariate has a joint bivariate
normal distribution.
12. Marginal Distributions individual unconditional distribution of a random variable.
13. Key properties of copulas
a. Preservation of the original marginal distributions while defining a correlation between them.
14. Correlation copula created by converting two distributions that may be unusual or have unique shapes and
mapping them to know distributions with well-defined properties, such as the normal distribution. done by
mapping on a percentile-to-percentile basis.
a. E.g. 5th percentile obs for variable X marginal distribution is mapped to the 5 th percentile point on

Ux

standard normal. Mapped for every observation. Done for Y marginal distribution now

standard normal distribution. The correlations between


correlation.
b. The conditional mean of
deviation of

Uy

Uy

is linearly dependent on

U x U y
Ux

Uy

are referred to as the copula

, and the conditional standard

is constant because the two distributions are bivariate normal.

15. Types of Copulas


a. Students t-Copula

- variables are mapped to distributions

U 1U 2

that have a bivariate

Students t distribution rather than a normal distribution.


i. The procedure is used to create a Students t-Copula assuming bivariate Students t
distribution with f degrees of freedom and correlation
1. Step 1 : Obtain values from

by sampling from the inverse chi-square distribution

with f degrees of freedom.


2. Step 2: Obtain values by sampling from a bivariate normal distribution with correlation

3. Multiply

by the normally distributed samples.

b. Gaussian (Normal) Copula - maps the marginal distribution of each variable to the standard normal
distribution. The mapping of each variable to the new distribution is done based on percentiles.

c.

Multivariate copula - used to define a correlation structure for more than two variables. Suppose the
marginal distributions are known for N variables.

V 1 V 2 V 3 V N

variable is mapped to a standard normal distribution,

Ui

Vi

. Distribution

for each i

. Thus the correlation structure for all

variables is now based on a multivariate normal distribution.


d. One-factor copula model often used to define the correlation structure in multivariate copula models.
The nature of dependence between variables is impacted by the choice of the
following equation defines a one-factor copula model. Where F and

Zi

Ui

distribution. The

are standard normal

distributions :

i.

U i= i F + 1 2i Z i

ii.

Ui

distribution has a multivariate Students t-distribution if

Z iF

are assumed to have

Ui

a normal distribution and a Students t-distribution, respectively. The choice of

determines the dependency of the U variables, which also defines the covariance copula for
the V variables.
16. Tail dependence
a. There is a greater tail dependence in a bivariate Students t-distribution than a bivariate normal
distribution.
b. It is more common for two variables to have the same tail values at the same time using the bivariate
Students t distribution.
c. During a financial crisis or some other extreme market condition, it is common for assets to be highly
correlated and exhibit large losses at the same time. Suggests that the Students t Copula is better
than a Gaussian copula in describing the correlation structure of assets that historically have extreme
outliers in the distribution tails at the same time.
Ch4 Linear Regression with One Variable, Watson, Introduction to Econometrics
1. Dependent(explained) variable - variable attempting to be explained by an independent (X) variable.
2. Parameters of an equation - indicate the relationship (change in the relationship) between two variables.
(Linear in an OLS Regression)
3. Scatter plot - a visual representation of the relationship between the dependent variable and a given
independent variable. It uses a standard two-dimensional graph where the values of the dependent, or Y
Variable , are on the vertical axis
a. Can indicate the nature of the relationship between the dependent and independent variable.
b. A closer inspection can indicate if the relationship is linear or nonlinear.
4. Population regression coefficients -

- equation describing the relationship between dependent and

independent variables, but includes the entire population. It can be described as the true parameter estimates.
5. Regression coefficients - parameters of the population regression equation.
a.

E ( Y i|X i ) = 0+ i X i

6. Error Term (noise component) - Difference between each Y and its corresponding conditional Expectation. (the
line that fits the data)
a.

i=Y i E ( Y i| X i )

b. Deviation from the expected value is the result of factors other than the included X-variable(s) .
c.

Breaking down equation into deterministic(systematic) component

E ( Y i|X i )

and

nonsystematic or random component.


d. Error terms represents effects from independent variables not included in the model.
7. Slope Coefficient - expected change in Y for a unit change in X
8. Sample Regression Function - equation that represents a relationship between the Y and X variable(s) that is
only based on the information in a sample of the population. In almost all cases, the slope and intercept
coefficients will be different than the population regression function.
9. Residual - difference between the actual and expected value (sample regression estimate)
10. Key Properties of Regression a. Term linear has implications for both independent variable and coefficients.

b. One interpretation of the term linear relates to the independent variable(s) and specifies that the
independent variables enters into the equation without a transformation. (such as a square root or
logarithm). E.g. X = ln(amount consumed)
c. 2nd interpretation for the term linear applies to the parameters. Specifies that the dependent variable is
a linear function of the parameters, but does not require that there is linearity in the variables.
2

E ( Y i|X i ) = 0+ i X i E ( Y i| X i )= 0 +
d.

1
X
1 i

It would not be appropriate t apply linear regression to estimate the parameters of these functions.
The primary concern for linear models is that they display linearity in the parameters. When we refer
to a linear regression model we generally assume that the equation is linear in the parameters; it may
or may not be linear in the variables.

11. Ordinary Least Squares (OLS) process that estimates the population parameters
values for

bi

with corresponding

that minimize the squared residuals. (i.e. error terms) Recall the expression

e i Y i( b0 +b1 X i )
a. Minimize

; the OLS sample coefficients are those that:

e 2i = ( Y i( b 0 +b1 X i ) )2
b1

b. Slope coefficients
n

( X i X )(Y iY )
c.

i=1

b1=

( X i X )

i=1

12. Intercept term - lines intersection with the Y-axis at X=0. It can be positive, negative, zero.
a.

b0 =Y b 1 X

Y mean of Y Xmean
of X

13. Method of OLS - to minimize the sum of squared errors.


a. Most of the major assumptions pertain to the regression models residual term.
14. Key assumptions of OLS a. Three Key Assumptions
i. Expected value of the error term, conditional on the independent variable, is zero

E ( i|X i ) =0.
ii.
iii.

All (X,Y) observations are independent and identically distributed (i.i.d)


It is unlikely that large outliers will be observed in the data. Large outliers have the potential
to create misleading regression results.
a. A linear relationship exists between the dependent and independent variable.
b. Model is correctly specified in that it includes the appropriate independent variable and does not omit
variables.
c. The independent variable is uncorrelated with the error terms.
d. The variance of

is constant for all

Xi

e. No serial correlation of the error term exists i.e.

Var ( i| X i )= 2
Corr ( i , j ) =0 for j=1,2,3,

The point being

that knowing the value of an error for one observation does not reveal information concerning the
value of an error for another observation.
f. The error term is normally distributed.
15. Benefits of OLS Estimators a. Interpretation and Analysis of Regression outputs are easily understood across fields of study
b. Unbiased, consistent and under special conditions efficient
16. Properties of OLS Estimators and their Sampling Distributions 17. Unbiased estimator of the population mean The mean of the sampling distribution is used as an estimator of
the population mean is said to be an unbiased estimator WHEN The expected value of the estimator is equal to
the parameter you are trying to estimate. E(X) =

18. Central Limit Theorem - with large sample sizes, it is reasonable to assume that the sampling distribution will
approach the normal distribution. This means that the estimator is also a consistent Estimator.

19. Consistent Estimator - an unbiased estimator is one for which the expected value of the estimator is equal to
the parameter you are trying to estimate. A consistent estimator is one for which the accuracy of the
parameter estimate increases as the sample size increases.
20. Sum of Squares Error (SSE/SSR) - sums of squares that results from placing a given intercept and slope
coefficient into the equation and computing the residuals, squaring the residuals and summing them.
2

( Y iY^ ) = e 2i

a. -

21. Total Sum of Squares (SST) -

( Y iY )2

22. Relationship between the three SS - Total Sum of Squares = Explained Sum of Squares + Sum of squared
residuals
23. Coefficient of Determination,

R2

- a measure of the goodness of fit of the regression. It is interpreted as a

percentage of variation in the dependent variable explained by the independent variable. (% explained by
regression parameters)
a. underlying concept is that for the dependent variable, there is a total sum of squares(SST or TSS)
around the sample mean. The regression explains some portion of that TSS.
b. Total Sum of Squares = Explained Sum of Squares + Sum of squared residuals

c.
d.
e.

TSS
R^2
R^2

=
=

=
SSE
1 - SSE/SST
1 - SSR/SST =
SSE/SST

24. Correlation Coefficient from OLS Regression -

r= R2

SSR

a standard measure of the strength of the linear

relationship between two variables. NOT similar to coefficient of determination.


a. Correlation coefficient indicates the sign of the relationship, where as the coefficient of determination
does not.
b. The coefficient of determination can apply to an equation with several independent variables, and it
implies causation or explanatory power,
c. while the correlation coefficient only applies to two variables and does not imply causation between
the variables.
25. Standard Error of the Regression (SER) - measure the degree of variability of the actual Y-values relative to
the estimated Y-values from a regression equation.
a. SER gauges the fit of the regression line.
b. The smaller the standard error, the better the fit.
c. SER is the standard deviation of the error terms in the regression.
d. SER is also referred to as the standard error of the residual, or the standard error of Estiamte (SEE).
Ch5 Regression with Single Regressor- Hypothesis Tests and Confidence Intervals, Watson, Introduction
to Econometrics
1. Confidence Intervals for regression coefficients a.

b.

b1 t c s b
c
t

or

b
[ 1( t c s b ) < B1< b1 + ( t c s b ) ]

is the critica two-tailed t-value for the selected confidence level with the appropriate number

of degrees of freedom, which is equal to the number of sample observations minus 2 (i.e. , n-2)
c.
d.

sb

standard error of the regression coefficient. It is a function of the SER. As SER rises ,

sb

also

increases and the confidence interval widens.


Makes sense because SER measures the variability of the data about the regression line, and the more
variable the data, the less confidence there is in the regression model to estimate a coefficient.

e.
2. Hypothesis Tests regression coefficients (t-test)

- t-test may also be used to test hypothesis that the true

slope coefficients, B1 is equal to some hypothesized value. Letting


appropriate test statistic with n-2 degreees of freedom is :

b1

be the point estimate for

B1

, the

a.

t=

b1B1
sb
1

b.

Reject H 0 if t>+t critic t <t critic


B1

from the hypothesized value of


c.

, rejection of the null means that the sope coefficient is different

Appropriate test if Bis stat significant

H 0 : B1=0 H A : B1 0

3. P-value - smallest level of significance for which the null hypothesis can be rejected. A small p-value provides
support for rejecting the null hypothesis
a. Two tailed tests pvalue is the probability that lies above the positive value of the computed test
statistic. Plus the probability that lies below the negative value of the computed test statistic.
b. Pvalue gives a general idea of statistical significance without selecting a significance level.
4. Predicted Values - values of the dependent variable based on the estimated regression coefficients and a
prediction about the value of the independent variable. They are the values that are predicted by the
regression equation, given an estimate of the independent variable.

a.

Y^ =b 0 +b1 X p X pforecasted value of independent variable

5. Confidence Intervals for Predicted Values -

a.

Y^ ( t c s f ) Y^ ( t c s f ) <Y < Y^ + ( t c sf )

6. Standard error of the forecast -

a.

sf

challenging to calculate. For the variance of a forecast.

2
1 ( X X )
s =SE R [1+ +
]
n ( n1 ) s 2x
2
f

7. Dummy variable a. Often used to quantify the impact of qualitative events.


b. Dummy variables are assigned 0 or 1
c. Reason for including a dummy variable is to see if variable is significant or not on the dependent
variable.
d. Estimated regression coefficients for dummy variables indicates the difference in the dependent
variable for the category represented by the dummy variable and the average value of the dependent
variable for all classes except the dummy variable class.
8. Homoscedasticity - the variance of the residuals is constant across all observations in the sample.
9. Heteroscedasticity - the variance of the residuals is NOT constant across observations in the sample. This
happens when there are subsamples that are more spread out than the rest of the sample.
10. Unconditional heteroscedasticity - occurs when the heteroscedasticity is not related o the level of the
independent variables. It does not systematically increase or decrease with changes in the value of
independent variables..
a. While this is a violation of equal variance, it usually causes no major problems with regression.
11. Conditional Heteroscedasticity - heteroscedasticity is related to the level of the independent variable. Exists
if the variance of the residual term increases as the value o the independent variable increases,
a. Creates significant problems for statistical inference.
12. Effects of Heteroscedasticity on Regression Analysis a. Standard errors are usually unreliable estimates.
b. Coefficient estimates (the
c.

b j

arent affected.

If the standard errors are too small, but the coefficient estimates themselves are not affected, the tstatistics will be too large and the null hypothesis of no statistical significance is rejected too often. The
opposite is true when standard errors are too large.
13. Detecting Heteroscedasticity a. A scatter plot of the residuals versus the independent variables can reveal patterns among
observations.
b.
14. Correcting Heteroscedasticity - beyond scope of FRM. Some available techniques. such as robust standard
errors.
15. Robust standard Errors - used to recalculate the t-statistics using the original regression coefficients.
16. Gauss-Markov Theorem - if the linear regression model assumptions are true and the regression errors
display homoscedasticity, then the OLS estimators have the following properties:

a. The OLS estimated coefficients have the minimum variance compared to other methods of estimating
the coefficients (i.e. they are the most precise) BEST
b. OLS estimated coefficients are based on linear functions. LINEAR
c. The OLS estimated coefficients are unbiased, which means that in repeated sampling the averages of
the coefficients from the sample will be distributed around the true population parameters. i.e.

E ( b0 ) =B0E ( b 1 )=B1

UNBIASED

d. OLS estimate of the variance of the errors is unbiased. i.e.

E ( ^ 2 ) = 2

UNBIASED

e. BLUE Best Linear Unbiased Estimators


17. OLS heteroscedasticity can cause problems and there can be better estimators:
a. WLSE - Weighted least squares estimator (produce estimator with a smaller variance)
b. LADE least absolute deviations estimator (less sensitive to extreme outliers)
18. T-statistic when sample size is small a. When sample size is small, the distribution of a t-statistic becomes more complicated to interpret.
b. When sample is small, we must assume assumptions underlying linear regression model. In order to
apply and interpret the t statistic, error terms must be homoscedastic. (constant variance or error
terms) and the error terms must be normally distributed.) and the error terms must be normally
distributed.
Ch6 Linear Regression with Multiple Regressors, Watson, Introduction to Econometrics
1. Omitted variable Bias - when relevant variables are absent from a linear regression model, results will likely
lead to incorrect conclusions as the OLS estimators may not accurately portray the actual data.
a. Omitted variable is correlated with the movement of the independent variable in the model.
b. Omitted variable is a determinant of the dependent variable.
2. Methods for omitting variable bias - (if omitted variable is correlated with slope coefficient, then the error
term will also be correlated with the slope coefficient
a. Multiple Regression
b. if bias is found, it can be addressed by dividing data into groups and examining one factor at a time
while holding other factors constant. We need to utilize multiple independent coefficients
(MLRegression)
3. Multiple Regression - linear regression with more than one independent variable.

a.

Y i=B0 + B1 X 1i + B2 X 2 i+ + Bk X ki + ei

4. OLS estimators Multiple Regression a. Intercept term is the value of the dependent variable when the independent variables are all equal to
zero
b. Slope coefficients estimated change in the dependent variable for a one-unit change in the
independent variable holding the other independent variables constant.
c. Slope coefficients (parameters) are sometimes called partial slope coefficients.
5. Homoscedasticity - refers to the condition that the variance of the error term is constant for all independent
variables, X.

Var ( i| X i )= 2

6. Heteroscedasticity - dispersion to the error terms varies over the sample. (variance is a function of the
independent variables)
7. Standard Error of the Regression (SER) - measures the uncertainty abou the accuracy of the predicted
values of the dependent variable.
closer to the regression line. (

ei

Y^ i=b 0+ bi X i .

Graphically is stronger when the actual x,y, data points lie

are smaller)

a. Formally SER is the standard deviation of the predicted values for the dependent variable,
Equivalently, it is the standard deviation of the error terms in the regression. SER is sometimes
specified as

se

( b 0+ bi X i )
Y i

nk1

Y^
Y i

i.

i1

SSR
SER=
=
nk 1

ii.

Smaller the standard error , the better the fit.

8. Multiple Coefficient of Determination,

R2

- can be used to test the overall effectiveness of the entire set

of independent variables in explaining the dependent variable.


a. Same calculation
b.
9. Adjusted

- when

by itself is not sufficient or be a reliable measure of explanatory power of the

Multiple regression model.

R2

a. Because

almost always increases as an independent variable are added to the model, even if

marginal contribution is of adding a new variable is not statistically significant. REFERRED to as


Overstimating the Regression.

b.
c.

( nkn11 (1R ) ), n=no . of obs k =no . of indep variables

R2a=1
2

Ra isless thanequal R

Increasing a new variable may either increase or decrease the

Adjusted R-squared.
10. Assumptions of Multiple Regression a. A linear relationship exists between the dependent and independent variables.
b. Independent variables are not random, and there is no exact linear relation between two or more
independent variables.
c. Expect value of the error term, conditional on the independent variables , is zero.

E ( e i| X 1 , X 2 , , X k =0
d. The variance of the error terms is constant for all observations.
e. Error term for one observation is not correlated with that of another observation.

E ( e i e j ) =0 , j i

f. The error term is normally distributed.


11. Multicollinearity a. The condition when two or more independent variables, or linear combinations of independent
variables , are highly correlated with each other.
b. Condition distorts the standard error of the regression and the coefficient standard errors. leads to
problems with t-tests for statistical significance.
c. Degree of correlation will determine the difference between Perfect and imperfect multicollinearity.
12. Perfect Multicollinearity one independent variable is a perfect linear combination of the other independent
variables.

13. Imperfect Multicollinearity - two or more independent variables are highly correlated, but less than perfectly
correlated.
14. Effect of MultiCollinearity a. Greater probability that we will incorrectly conclude that a variable is not statistically significant (Type
II Error)
b.
15. Detecting Multicollinearity a. The situation where t-tests indicate that non o the individual coefficients is significantly different from
zero, while the R-squared in the MLR model is high.
b. Indicates variables together explain much of the variation but the individual independent variables do
not.
c. Only way this happens is when independent variables are highly correlated with each other.
d. If absolute value of the sample correlation between two indep variables is greater than 0.7
multicollinearity may be a problem.
e. And if individual variables might not be highly correlated, linear combinations might lead to
multicollinearity. So even if low correlation, it may not necessarily mean that multicollinearity is a
problem.
16. Correcting Multicollinearity
a. Usually requires omitting a variable with highest correlation and low individual R^2
b. Stepwise regression until multicollinearity is minimized.
Ch7 Hypothesis Tests and Confidence Intervals in Multiple Regression, Watson, Introduction to
Econometrics
1. Hypothesis Testing of Regression coefficients (Multiple Regression) a. Needed to test estimated slope coefficients to determine if the independent variables make a
significant contribution to explaining the variation in the dependent variable.
2. Determining Statistical Significance - t-statistic used to test the significance of the individual coefficients in a
multiple regression is calculated using the same formula.
a.

t=

b j B j
sb

estimated regress coefficient hypothesized value


coefficient standard error of b j

the tstatistic has n-k-1

degrees of freedom.
b.

testing statistical significance= H 0 :b j=0 H A : b j 0

3. Interpreting p-values - as I said, :p , the smallest level of significance for which the null hypothesis can be
rejected.
a. P-value is less than the significance level, the null hypothesis can be rejected.
b. P-value is greater than the significance level, the null hypothesis cannot be rejected.
4. Other tests of the Regression Coefficients a.
5. Confidence Intervals for Regression Coefficients a.

b j t c sb

b. Critical t-value with n-k-1 degrees of freedom and a 5% significance level, where n is the number of
observations and k is the number of independent variables.
6. Predicting the Dependent Variable a. We can make predictions about the dependent variable based on forecasted values of the independent
variables.
b. But we need predicted values for more than one independent variable.
c.

^ ki
Y^ =b 0 +b1 ^
X 1 i+ b2 ^
X 2 i+ +b k X

7. Joint Hypothesis Testing - tests two or more coefficients at the same time. We could develop a hypothesis for
a linear regression model with three independent variables that sets two of these coefficients equal to zero ,

H 0 : b1=0 b 2=0

versus alternative hypothesis that one of them is not equal to zero. If just one of the

equalities in the null does not hold, we can reject the entire null hypothesis.
a. Using a joint hypothesis is preferred in certain scenarios since testing coefficients individually leads to
a greater chance of rejecting the null hypothesis.
8. F-Statistic a. F-test assesses how well the set of independent variables, as a group, explains the variation in the
dependent variable.
b. F-statistic is used to test whether at least one of the independent variables explains a significant
portion of the variation of the dependent variable.

c.

E.g. 4 indep variables , hypothesis are structured as:


i.

H 0 : B1=B2 =B3 =B 4=0 versus H A :at least one B j 0

d. F-statistic which is always a one-tailed test

ESS
k
SSR
nk 1

ESS Explained sum of squares

SSR

sum of squared residuals . ESS and SSR are found in ANOVA table.

Fc

9. One-tailed Critical F-value ,

, at the appropriate level of significance. The degrees of freedom for the

numerator and denominator are:


a.

d f num =k

b.

d f denom =nk 1

c.

Decision rule : reject

H0

if F(test statistic)>

Fc ( critical value )

d. Rejection of the null hypothesis at a stated level of significance indicates that at least one of the
coefficients is significantly different than zero, which is interpreted to mean that at least one of the
independent variables in the regression model makes a significant contribution to the explanation of
the dependent variable.
e. When testing the hypothesis that all the regression coefficients are simultaneously equal to zero, the Ftest is always a one-tailed test, despite the fact that it looks like it should be a two-tailed test because
there is an equal sign in the null hypothesis.
f. It should be noted that rejecting the null hypothesis indicates one or both of the coefficients are
significant.
10. Specification Bias - refers to how the slope coefficient and other statistics for a given independent variable are
usually different in a simple regression when compared to those of the same variable when included in a
multiple regression.
a. Indicated by the extent to which the coefficients for each independent variable is different when
compared across equations.
11. Decision rule for the F-test 12. Method of rejection 13. Coefficient of multiple correlation - simply the square root of R-squared. Always positive in multiple
regression.
14. ANOVA - table that spits out all the data for a multiple linear regression. F-tests, tstats, pvalues, 95% conf
intervals, parameter coefficients, R^2, ESS, SSR, TSS
15.

R2 Adjusted R2

16. Single Restrictions involving multiple coefficients


17. Model misspecification -

Ch5 Modelling and Forecasting Trend, Diebold Elements of Forecasting


1. Mean Squared Error statistical measure computed as the sum of squared residuals divided by the total
number of observations in the sample.
T

a. MSE =

e2t
t =1

e t= yt ^y t

the residual for observation t or difference between the observed

T
and expected observation)
b.

y t = ^0 + ^ 1 TIM E t

i.e. a regression model.

c. MSE is based on in-sample data.


d. Regression model with the smallest sum of squared residuals. The residuals are calculated as the
difference between the actual value observed and the predicted value based on the regression model.
e. Closely related to the R^2 and thus equation with smallest MSE also has the highest R^2
2. Model Selection is one of the most important criteria in forecasting data.

a. Unfortunately, selecting a model based on smallest MSE or highest R^2 is not effective in providing
good out of sample models.
b. A better methodology to select the best forecasting model is to find the model with the smallest out-fsample , one-step-ahead MSE.
3. Reducing MSE Bias a. In sample MSE to estimate out-of-sample MSE is not very effective because in-sample MSE cannot
increase when more variables are included in the forecasting model. MSE will have DOWNWARD bias
when predicting out-of-sample bias. (too small)
b. Must adjust the MSE bias with S62 measure.
4.

S2

measure - an unbiased estimate of the MSE because it corrects for degrees of freedom as follows:
T

a.

e 2t

s 2= t=1
T k

5. Data mining - as more variables are included in a regression equation, the model is at greater risk of overfitting the in-sample data.
a. Problem with data mining is that the regression model does a very good job of explaining the sample
data but does a poor job of forecasting out-of-sample data.
b. As more parameters are introduced to a regression model, it will explain the data better, but may be
worse at forecasting out-of-sample data. Increasing the number of parameters will not necessarily
improve the forecasting model.
6. Model Selection Criteria - Selection criteria is often compared based on a penalty factor.
a. The unbiased estimator, s^2 defined earlier can be rewritten Multiplying T to the numerator and
denominator,
7. Penalty factor - multiplying the s^2 by T to strike a penalty for degrees of freedom.
T

a.

s 2=(

e 2t

T
) t =1
T k T

b. Takes the form of a penalty factor X MSE


T

8. Akaike information Criterion (AIC) -

e 2t

2k
T

AIC=e t=1
T
T

9. Schwarz information criterion - -

k
T

e2t

SIC =T t =1
T

10. Evaluating Consistency 11. Consistency - is a key property that is used to compare different selection criteria. Two conditions are required
for a model selection criteria to be consistent:
a. When the TRUE Model or data-generating process (DGP) is one of the defined regression models, then
the probability of selecting the true model approaches one as the sample size increases.
b. When the TRUE model is not one of the defined regression models being considered, then the
probability of selecting the BEST APPROXIMATION MODEL approaches one as the sample size
increases.
12. NOTE: because we live in a very complex world, almost all economic and financial models have assumptions
that simplify this complex environment.
a. MSE does not penalize for degrees of freedom and therefore not consistent
b. Unbiased MSE, S^2 penalizes for degrees of freedom but adjustment is too small for consistency
c. AIC penalty> s^2 ; however with large sample sizes, n, AIC tends to select models that have too many
variables or parameters
d. SIC seems to be the parameter with greatest penalty factor and most consistent when parameters are
increased relatively with sample size.
e. BUT if we think the true model is very complex, then the AIC should be considered/examined.
13. Asymptotic Efficiency
a. Is the property that chooses a regression model with one-step-ahead forecast error variances closest to
the variance of the true model. Interestingly, the AIC is asymptotically efficient and the SIC is not
asymptotically efficient.
b.

Ch7 Characterizing Cycles, Diebold Elements of Forecasting


1. Time series is a set of observations for a variable over successive periods of time
2. Trend - a consistent pattern that can be observed from the data, when plotting etc. eg. Seasonal trend during
a certain time period. To FORECAST a time series, one needs to understand and characterize its structure.
3. Autoregression - refers to the process of regressing a variable on lagged or past values of itself.
4. Autoregressive (AR) model - when a dependent variable is regressed on one or more lagged values of itself. (e.g. past values of sales used to predict the current value of the variable.
5. Covariance stationary - if its mean, variance and covariance with lagged and leading values do not change
over time. Covariance stationary is a requirement of using AR models.
6. Autocovariance function - the tool used to quantify stability of the covariance structure. Its important lies in
its ability to summarize cyclical dynamics in a series that is covariance stationary.
7. Autocorrelation function - refers to the degree of correlation and interdependency between data points in a
time series. Bec correlations lend themselves to clearer interpretation than covariances.
8. Partial autocoreraltion function refers to the partial correlation and interdependency between data in a time
series that measures the association between data in a series after controlling for the effects of lagged
observations.
9. Requirements for series to be covariance stationary a. Constant and finite expected value. (Expected value of a time series is constant over time)
b. Constant and finite variance (Time series volatility around its mean does not change over time.
c. Constant and finite covariance between values at any given lag (The covariance of the time series with
leading or lagged values is constant)
10. Implications of working with models that ARE NOT covariance stationary a. This is achieved by working with models that provide special treatment to trend and seasonality
components that are stationary, which allows the remaining, residual, cyclical components to be
covariance stationary.
b. Forecasting models whose probabilistic nature changes would not lend itself well to predicting the
future (lacks covariance stationarity)
c. Such a process would make the process f characterizing a cycle difficult, if not impossible. However,
nonstationary series can be transformed to appear covariance stationary by using transformed data,
such as growth rates.
11. White Noise - a process with zero mean, constant variance, and no serial correlation is referred to as white
noise. SIMPLEST type of time series process and it is used as a fundamental building block for more complex
time series processes. EVEN THOUGH , serially uncorrelated, it may not be serially independent and normally
distributed.
12. Independent White Noise - time series process that exhibits serial independence and lack of serial correlation
(Strong white noise)
13. Variants of White noise Independent white noise and Normal(Gaussian White noise)
14. Normal (Gaussian) White Noise - time series process that exhibits serial independence, serially uncorrelated
and normally distributed is a normal white noise.
15. Dynamic structure of white noise a. Unconditional mean and variance MUST be constant for any covariance stationary process.
b. Lack of any correlation in white noise means that all autocovariances and autocorrealtions are zero
beyond displacement zero. ( refers to distance of a moving body from a central point) same result
holds for partial autocorrelation function of white noise.
c. Both conditional and unconditional means and variances are the same for an independent white noise.
(lack forecastable dynamics)
d. Events in a white noise process exhibit no correlation between past and present.
16. Lag Operators - quantifies how a time series evolves by lagging data series. It enables a model to express
how past data links to the present and how present data links to the future.
a. E.g. Lag Operator

L y t = y t 1

b. E.g. Common Lag Operator, the First-Difference Operator

y t =( 1L ) y t = y t y t 1

Applies a polynomial in the Lag Operator.


17. Distributed lag - key component of an operator. It is a weighted sum of present and past values in a data
series, achieved by lagging present values upon past values.
18. Wolds representation theorem - is a model for the covariance stationary residual. (i.e. a model that is
constructed after making provisions for trends and seasonal components. )
a. Theorem enables the selection of the correct model to evaluate the evolution of covariance
stationarity.
b. Wolds Theorem utilizes an infinite number of distributed lags, where the one-step-ahead forecast error
terms are known as Innovations.
19. General linear process a. is a component in the creation of forecasting models in a covariance stationary time series. It uses
Wols Representation Theorem to express innovations that capture evolving information set. These

evolving information sets move the conditional mean over time (recall a requirement of stationarity is
constant unconditional mean)
b. Thus it can model the dynamics of time series process that is outside of covariance stationarity
(Unstable)
20. Rational polynomials a. Applying Wolds Theorem to infinite distributed lags is not practical, therefore, we need to restate this
lag model as infinite polynomials in the lag operator because infinite polynomials do not necessarily
contain an infinite number of parameters.
b. Infinite polynomials that are a ratio of finite-order polynomials are known as rational polynomials.
21. Rational distributed lags - The distributed lags constructed from these rational polynomials. With these lags,
we can approximate Wold Representation Theorem. An Autoregressive Moving Average Process (ARMA) is an
practical approximation for Wolds Representation Theorem.
22. Sample mean and sample autocorrelation
a. Sample Mean an approximation of the mean of the population which can be used to estimate the
autocorrelation function.
T

i.

y =

1
y
T t=1 t

b. Sample Autocorrelation estimates the degree to which white noise characterizes a series of data.
Recall that for a time series to be classified as white noise process, all autocorrelations must be zero in
the population data set. The sample autocorrelation , as a function of displacement,

is

computed as follows:
T

[ ( y t y ) ( y t y ) ]

i.

^ ( )=

t =+1

( y t y )2
t=1

c.

Sample partial autocorrelation can also be used to determine whether a time series exhibits white
noise. It differs from sample autocorrelation in that it performs linear regression on a finite or feasible
data series. .
i. However, the outcome of sample partial correlations is typically identical to that achieved
through sample autocorrelation.
ii. Sample partial autocorrelations usually plot within two-standard errors bands (i.e. 95%
confidence interval) when time series is white noise.
23. Q-staitstic - can be used to measure the degree to which autocorrelations vary from zero and whether white
noise is present in a dataset.
a. This can be done by evaluation the overall statistical significance of the autocorrelation.
b. This statistical measure is approximately Chi-Squared with m degrees of freedom in large samples
under the null hypothesis of no autocorrelations.
24. Box-Pierce Q-statistic a. Reflects the absolute magnitudes of the correlations, because it sums the squared autocorrelations.
b. Thus the signs do not cancel each other out, and large positive or negative autocorrelations
coefficients will result in large Q-statistics.
25. Ljung-Box Q-Statistic a. Is similar to Box-Pierce Q-Statistic EXCEPT that it replaces the sum of squared autocorrelations with a
weighted sum of squared autocorrelations. For large samples sizes, weights for both statistics are
roughly equal. More used on small data sets but same results with large data sets.
Ch8 Modelling Cycles : MA, AR, and ARMA Models , Diebold Elements of Forecasting
1. Moving Average Process
a. Used to capture the relationship between a time series variable and its current and lagged random
shocks.
b. Is a linear regression of the current values of a time series against both the current and previous
unobserved white noise error terms, which are random shocks.
2. Ma(1) Process - first order Moving Average Process has a mean of zero and a constant variance and can be
defined as:
a.

y t = t + t 1

b.

current random white noise shock

c.

t1

d.

e.

MA ( 1 ) process isconsidered t the first order because it only

f.

This yields very short term memory

one-period lagged random white noise shock (unobservable shock)

coefficient for the lagged random shock.


has one lagged error term.

because it only incorporates what happens one period ago.

If we ignore the lagged error term for a moment, and assume

t >0

, then

y t >0

equivalent to saying that a positive error term will yield a positive dependent variable (

. This is

yt

).

g. When adding back the lagged error term, we are now saying that the dependent variable is impacted
by not only the current error term, but also the previous periods unobserved error term, which is
amplified by a coefficient (

3. Autocorrelation cutoff
a.

1=

1+1

where

=0 for >1

b. For any value beyond the first lagged error term, the autocorrelation will be zero in an MA(1) process.
This is important because it is one condition of being covariance stationary(mean =0 and var=

which is a condition of this process being a useful estimator.


4. Moving average representation - has both a current random shock and an unobservable shock on the
independent side of this equation.
a. Present a problem for forecasting in the real world because it does not incorporate observable shocks.
The solution for this problem is known as autoregressive representation.
5. Autoregressive representation - the MA(1) process is inverted so we have a lagged shock and a lagged value
of the time series itself. The condition of inverting the MA(1) process is

||<1

. The autoregressive

representation, which is an algebraic rearrangement of the MA(1) process, is expressed in the following
formula.
a.

t = y t + t 1

b.

process of inversion enables the forecaster express current observables terms of past observables .
6. MA(q) process - forecasters can broaden their horizon to a finite order MA process of order q , which
essentially adds lag operators out to the
a.

y t = t +1 t1 ++ q tq

b.

tq

c.

q th

observation and potentially improves the MA(1) process.

qth period lagged random white noise shock

coefficients for the lagged random shocks

d. MA(q) process theoretically captures complex patters in greater detail, which can potentially provide
for more robust forecasting. This also lengthens the memory from one period to the qth period.
e. Ma(q) process also exhibits autocorrelation cutoff after the qth lagged error term. Again this is
important because covariance stationarity is essential to the predictive ability of the model.
7. First-order autoregressive (AR(1) ) process a. More capable of capturing a more robust relationship and not in need of being inverted because it is
already in a favorable rearrangement compared to the unadjusted moving average process.
b. Must have mean = 0 and constant variance.
c. Specified in the form of a variable regressed against itself in a lagged form. This relationship can be
shown below:
i.

y t = y t 1+ t

ii.

iii.

coefficient of the lagged observation of the variable being estimated.

current random white noise shock

d. Predictive ability depends on it being covariance stationary, i.e. abs(phi) <1


e. Process allows us to use a past observed variable to predict a current observed variable.
8. Forecasters need a. To accurately estimate the autoregressive parameters, we need to accurately estimate the
autocovariance of the data series. Enter the Yuler-Walker Equation.
9. Yule-Walker Equation a. Yuler-Walker concept used to solve for the autocorrelations of an AR(1) process.
b.

t = t for t=0,1,2,

c.

Used to reinforce a very important distinction between autoregressive processes and moving average
processes.
d. Recall that moving average processes exhibit autocorrelation cutoff, which means autocorrelation are
essentially zero beyond the order of the process.
e. The significance of the Yuler-Walker Equation is that for autoregressive processes, the autocorrelation
decays vary gradually.
i.

E.g.

=0.651 st period=0.65 2 nd period 0.65 =0.4225 ,so on

f. If coefficient was negative, we would experience an oscillating autocorrelation decay.


10. Ar(p) Process (General pth Order autoregressive Process a. Expands the AR(1) process out to the pth observation as seen below:
i.

y t = 1 y t 1+ 2 y t 2 ++ p y t p + t

b. AR(p) process is also covariance stationary if


c.

||<1

and it exhibits the same decay in

autocorrelations that was found in the AR(1) process.


While an AR(1) process only evidences oscillation in its autocorrelations(switching from pos to neg)

11. Autoregressive Moving Average Process ( ARMA )


a. Possible for a time series to show both signs of both MA and AR processes. And theoretically capture a
richer relationship.
b. A combination of unobservable shocks and lagged(past behavior) values.
c.

y t = y t 1+ t + t 1

d. Must be covariance stationary if abs(theta) < 1.


e. Autocorrelations still decay gradually
f. Provides highest possible set of combinations for time series forecasting of the three models discussed
in this topic.
g.

12. Applications of AR and ARMA process a. Must pay attention to decay of autoccorelation function.
b. Determine if any spikes in autocorrelation function which may indicate using an AR or ARMA.
c. E.g. if every 12th auto correlation jumps up, maybe a seasonality effect.
d. Useful to test various models using regression results. (easiest t see differences using data that follows
some pattern of seasonality.

Ch23 Estimating Volatilities and Correlations for Risk Management , Hull Options Futures and Other
Derivatives 9th ed
1. Weighting Schemes to estimate volatility
a. Equally weighted (average)
b. Autoregressive conditional heteroscedasticty model (ARCH)
c. EWMA Model
d. Generalized ARCH(1,1) GARCH(1,1) e. Maximum likelihood estimator
2. Equally weighted (Average)
a. Traditional models first used change in asset value from period to period. From continuously
compounded return over successive days is represented as:
i.

ui=ln

Si
, where S i=asset price at time i .
S i1

( )

ii.
m

u =
iii.

1
u where n= present period plus no . of observations, m ,leading up current period
m i=1 ni

Maximum likelihood estimator of variance, assuming

=0 is :

1
= 2ni
m i=1
2

1.
2.

Simplest terms , historical data is used to generate returns in asset pricing series.
Historical returns used to generate volatility parameter.
3. Can be used to infer expected realizations of risk.
4. Equally weighted by 1/m . all periods have the same weight. If we want more strength
on earlier periods, we can:
General Weighting Scheme

b.

i.

1.

= i n1 where i=weight on return i days ago . Weights must


2

i=1

ii.

condition i=1
i=1

iii.

Objective is to generate greater influence on recent observations, then all alphas will decline in
value for later observations.
3. Autoregressive conditional heteroscedasticity model ( ARCH(m) )
a. One extension is to assume long-run variance through ARCH:
m

b.

n = V L + i uni with + i=1

c.

V L =long run variance not weighted

d.

w= V L long run variance weighted by parameter

e.

=w+ i u2ni

i=1

2
n

i =1

f.
4. EWMA
a. Model is a specific case of the general weighting model presented in the previous section.
b. The main difference Is that the weights are assumed to deline exponentially back through time.
c.

2n = 2n1 + ( 1 ) u 2n1 where weight on previous volatility est .u is return ( chg ) previous period
d.
e.
f.

Benefit of EWMA is that it requires few data points.


Current estimate of variance will then feed into the next periods estimate, as will this periods squared
return.
Technically the only new piece of information for the volatility calculation will be that attritubted
squared return.

5. Weights of EWMA model =

6.

GARCH (1,1) model - one of the most popular methods of estimating volatility. Not only incorporates the most
recent estimates of variances and squared return, but also a variable that accounts for a long-run average
level of variance.
7. GARCH (p,q) p is for number of lagged terms on historical returns squared. Q stands for number of lagged
terms on historical volatility.
8. Weights of GARCH(1,1) model
a.

2n =w+ u2n1 + 2n1


i.
ii.

a = weighting on the previous periods return


B weighting on previous volatility estimate

iii.
iv.
v.
vi.

VL

w weighted long-run variance =

VL

= long-run average variance =

w
1

+ + =1
+ <1

for stability so that

is not negative.

9. Set GARCH parameters to become an EWMA model.


a.

w=0 , =1 , = .

b. GARCH model adds to the information generated by the EWMA model in that it also assigns a
weighting to the average long-run variance estimate.
c. GARCH also has an additional characteristic which is the implicit assumption that variance tends to
revert to a long-term average level.
d. Recognition of a mean-reverting characteristic in volatility is an important feature when pricing
derivative securities such as options.
10. Mean Reversion
a. Experience indicates that volatility exhibits a mean-reverting characteristic. A GARCH model tends to
display a better theoretical justification than the EWMA model.
b. Methods for estimating GARCH parameters (or weights), however, often generates outcomes that are
not consistent with the models assumptions.
c. Sum of the weights a and B are sometimes greater than 1- which causes instability in the volatility
estimation.
d. In this case, the analyst must resort to using EWMA model.
e. If model is to be stationary over time the persistence ( sum (a + B) ) plus be less than one (w.r.t.
the reversion to the mean)
f. The PERSISTENCE describes that rate at which the volatility will revert to its long term value following
a large movement
g. The HIGHER the persisitence (given that it is less than one) the longer it will take to revert to the mean
following a shock or large movement.
h. Perisitence of 1 means that there is no reversion and with each change in volatility, a new level is
attained.
.
11. Volatility using Maximum Likelihood Estimator
a. MLE selects values of model parameters that maximize the likelihood that the observed data will occur
in the sample.
b. Requires formulating an expression or function for the underlying probability distribution of the data
and then searching for the parameters that maximize the value generated by the expression.
12. GARCH Estimation
a. Estimated using maximum likelihood techniques.
b. Estimation begins with a guess of the models paramters.
c. Then a calculation of the likelihood function based on those parameter estimates is made.
d. Parameters then slightly adjusted until the likelihood function fails to increase, at which the estimation
process assumes it has maximized the function and stops.
e. The values of the parameters at the point of maximum value in the likelihood function are then used to
estimate GARCH model volatility.
13. Volatility term structure a. Good job at modeling volatility clustering when periods of high volatility tend to be followed by other
periods of high volatility and periods of low volatility tend to be followed by subsequent periods of low
volatility.
b. Thus, there is autocorrelation in
c.

u2i

If GARCH models does a good job of explaining volatility changes, there should be very little
autocorrelation in

u2i

2i

d. Does a good job at forecasting volatility from a term structure perspective (time to maturity) Even
though the actual volatility term structure figures are somewhat different from those forecasted by
GARCH models, GARCH generated volatility data does an excellent job in predicting how the volatility
term structure responds to changes in volatility.
14. Impact of volatility changes
15. Estimating covariances and correlations a.

16. Consistency condition for covariances


Ch 4 Fabozzi Simulation Modelling, Fabozzi Simulation and Optimization in Finance.
1.

Simulation models generate random inputs that are assumed to follow a probability distribution.
a. With inputs, simulation model then generates scenarios or trials based on probabilities associated with
probability distributions.
b. Last step is to analyze the characterisitics of the probability generated inputs (samples) (mean,
variance, skewness, conf intervals)
c.
2. Choosing Simulation models a. Bootstrapping Technique
b. Parameter Estimate Technique
c. Best Fit technique
d. Subjective Guess Technique
3. Bootstrapping technique 4. Parameter Estimate Technique 5. Best-Fit Technique 6. Subjective Guess Technique 7. Monte Carlo Simulation 8. Advantages of Simulation Modeling 9. Correlations can be incorporated into simulation modeling 10. Relationship between Accuracy and Number of Scenarios
11. Estimator Bias
12. Discretization Error Bias
13. Identifying Most Efficient Estimator 14. Inverse Transform Method 15. Inverse Transform method for Discrete Distributions 16. Inverse Transform Method for Continuous Distributions 17. Hazard Rate 18. Pseudorandom Number Generators
19. See random numbers 20. Midsquare technique 21. Congruential pseudorandom number generator 22. Linear congruential pseudorandom generator 23. Quasirandom Sequences / Low-discrepancy Sequences 24. Stratified Sampling 25. Latin Hypercube Sampling Method 26. Generalized permutation matrix -

Вам также может понравиться