Cfal1 QM

CFA Level 1 – Quantitative Methods
CFA Level 1 - Quantitative Methods
Session 2
Time Value of Money
Statistical Concepts & Market Returns
Probability Concepts
Common Probability Distributions
Session 3
Sampling & Estimation
Hypothesis Testing
Correlation and Regression
Correlation & Regression
SESSION 2
Reading 1.A: Time Value of Money
LOS
a: Calculate the future value (FV) and present value (PV) of a single sum of money, an ordinary
annuity, and an annuity due.
Future Value Example:

Using a financial calculator, with FV of a $300 investment (PV), given you earn a compound rate of
return (I/Y) of 8% over a 10-year (N) period of time:
N = 10, I/Y = 8, PV = 300; CPT FV = $647.68 (ignore the sign).
Present Value Example:
Using a financial calculator, with PV of a $1,000 cash flow (FV) to be received in 5 (N) years, given a
discount rate of 9% (I/Y).
N = 5, I/Y = 9, FV = 1,000; CPT PV = $649.93 (ignore the sign).
Calculate the FV of an ordinary annuity: Find the FV of an ordinary annuity that will pay $150 per
year at the end of each of the next 15 years, given the investment is expected to earn a 7% rate of
return.
N = 15, I/Y = 7%, PMT = $150; CPT FV = $3,769.35 (ignore the sign).
The time line for the cash flows in this problem is depicted below.
1
Calculate the FV of an annuity due: Find the FV of an annuity due that will pay $100 per year for each
of the next three years, given the cash flows can be invested at an annual rate of 10%.
Note: you MUST put your calculator in the beginning of year mode (BGN)
N = 3, I/Y = 10%, PMT = $100; CPT FV = $364.10 (ignore the sign).
Calculate the PV of an ordinary annuity: Find the PV of an annuity that will pay $200 per year at the
end of each of the next 13 years, given a 6% rate of return.
N = 13, I/Y = 6, PMT = 200; CPT PV = $1,770.54
Calculate the PV of an annuity due: Find the PV of a 3-year annuity due that will make a series of $100
beginning of year payments, given a 10% discount rate.
BGN mode: N = 3, I/Y = 10, PMT = 100; CPT PV = $273.55
LOS b: Calculate the PV of a perpetuity.
Example: Assume a certain preferred stock pays $4.50 per year in annual dividends (and they're
expected to continue indefinitely). Given an 8% discount rate, what's the PV of this stock?
PVperpetuity = PMT / I/Y
PVperpetuity = 4.50 / .08 = $56.25
This means that if the investor wants to earn an 8% rate of return, she should be willing to pay $56.25
for each share of this preferred stock.
LOS c: Calculate an unknown variable, given the other relevant variables, in single-sum problems,
annuity problems, and perpetuity problems.
Example 1: Solving for I/Y
2
In this example, you want to find the rate of return (I/Y) that you'll have to earn on a $500 investment
(PV) in order for it to grow to $2,000 (FV) in 15 years (N).
N = 15, PV = -500, FV = 2,000; CPT I/Y = 9.68%
Example 2: Solving for PMT
In this example, you want to find out the dollar amount of payments (PMT) it will take at 7% (I/Y) to
achieve $3,000 (FV) in 15 years (N).
N = 15, I/Y = 7, FV = 3,000; CPT PMT = -$119.38 (ignore sign)
LOS d: Calculate the FV and PV of a series of uneven cash flows.
FV Example: Given a 10% discount rate and cash flows of (starting with year 1) -1,000, -500, 0, 4,000,
3,500, and 2,000, compute the FV.
PV = -1,000, I/Y = 10, N = 5, CPT FV = -1,610.51
PV = -500, I/Y = 10, N = 4, CPT FV = -732.05
PV = 0, I/Y = 10, N = 3, CPT FV = 0
PV = 4,000, I/Y = 10, N = 2, CPT FV = 4,840.00
PV = 3,500, I/Y = 10, N =1 , CPT FV = 3,850.00
PV = 2,000, I/Y = 10, N = 0, CPT FV = 2,000.00
Total cash flow stream = $8,347.44
PV Example: Given a 10% discount rate and cash flows of (starting with year 1) -1,000, -500, 0, 4,000,
3,500, and 2,000, compute the PV.
FV = -1,000, I/Y = 10, N = 1, CPT PV = -909.09

FV = -500, I/Y = 10, N = 2, CPT PV = -413.22
FV = 0, I/Y = 10, N = 3, CPT PV = 0
FV = 4,000, I/Y = 10, N = 4, CPT PV = 2,732.05
FV = 3,500, I/Y = 10, N =5 , CPT PV = 2,173.22
FV = 2,000, I/Y = 10, N = 6, CPT PV = 1,128.95
Total cash flow stream = $4,711.91
LOS e: Solve time value of money problems when compounding periods are other than annual.
Example: PV = $100, N = 1 year, I = 12%. Find the FV for various compounding periods.
Annual: N=1 I = 12% PV = $100 CPT FV = $112.00
3
Semi-annual: N=2 I = 6% PV = $100 CPT FV = $112.36
Quarterly: N= 4 I = 3% PV = $100 CPT FV = $112.55
Monthly: N = 12 I = 1% PV = $100 CPT FV = $112.68
Daily: N = 365 I = .03287 PV = $100 CPT FV = $112.74
Continuous: FV = (PV) e(i rate)(n) =100e(.12)(1) CPT FV = $112.75
In the continuous compounding equation, the interest rate is the stated or nominal annual rate.
Example: Given a 10% annual rate paid quarterly; PV = 500; time is 5 years; CPT FV.
Solve: I/Y = 10/4 = 2.5; N = 5*4 = 20; PV = 500: CPT FV = $819.31
LOS f: Distinguish between the stated annual interest rate and the effective annual rate.
The stated rate of interest is known as the nominal rate, and represents the contractual rate. The
periodic rate, in contrast, is the rate of interest earned over a single compound period - e.g., a stated
(nominal) rate of 12%, compounded quarterly, is equivalent to a periodic rate of 12/4 = 3%. The true
rate of interest is known as the effective rate and represents the rate of return actually being earned,
after adjustments have been made for different compounding periods.
LOS g: Calculate the effective annual rate, given the stated annual interest rate and the frequency of
compounding.
Example: Compute the effective rate of 12%, compounded quarterly. Given m = 4, and periodic rate =
12/4 = 3%.
Effective rate = (1 + periodic rate)m - 1
Where m = the number of compounding periods in a year.
(1 + .03)4 - 1 = 1.1255 - 1 = 12.55%
LOS h: Draw a time line, specify a time index, and solve problems involving the time value of money
as applied to mortgages, credit card loans, and saving for college tuition or retirement.
Example: Paying off a Loan (or Mortgage)
A company wants to borrow $50,000 for five years. The bank will lend the money at a 9% rate of
interest and will require that the loan be paid off in five equal, annual (end-of-year) installment
payments. What are the annual loan payments that this company will have to make in order to pay off
this loan?
N = 5, I/Y = 9, PV = 50,000; CPT PMT = $12,854.62
Example: Loan Amortization
4
An individual borrows $10,000 at 10% today amortized over 5 years. What are his payments?
PV = 10,000, N = 5, I/Y = 10; CPT PMT = $2,637.97
Example: Funding a Retirement Program
A 35-year old investor wants to retire in 25 years at age 60. Given he expects to earn 12.5% on his
investments prior to his retirement, and then 10% thereafter, how much must he deposit annually (at the
end of each year) for the next 25 years in order to be able to withdraw $25,000 per year (at the
beginning of each year) for the next 30 years?
This is a two-part problem. First, use PV to compute the present value of the 30-year, $25,000 annuity
due and second, use FV to find the amount of the fixed annual deposits that must be made at the end
of the first 25-year period to come up with the needed funds.
Step 1: N = 29, I/Y = 10, PMT = 25,000; CPT PV = 234,240 + 25,000 = $259,240
Step 2: N = 25, I/Y = 12.5, FV =259,240; CPT PMT = $1,800.02
5
SESSION 2
Reading 1.B: Statistical Concepts and Market Returns
LOS a: Differentiate between a population and a sample.
A population is defined as all members of a specified group. Any descriptive measure of a population
characteristic is called a parameter.
A sample is defined as a portion, or subset of the population of interest. Once the population has been
defined, we can take a sample of the population with the view of describing the population as a whole.
LOS b: Explain the concept of a parameter.
Any descriptive measure of a population characteristic is called a parameter.
LOS c: Explain the differences among the types of measurement scales.
 Nominal scale: Observations are classified or counted with no particular order.
 Ordinal scale: All observations are placed into separate categories and the categories are
placed in order with respect to some characteristic.
 Interval scale: This scale provides ranking and assurance that differences between scale
values are equal.
 Ratio scale: These represent the strongest level of measurement. In addition to providing
ranking and equal differences between scale values, ratio scales have a true zero point as the
origin.
LOS d: Define and interpret a frequency distribution.
A frequency distribution is a grouping of raw data into categories (called classes) so that the number
of observations in each of the nonoverlapping classes can be seen and tallied. The purpose of
constructing a frequency distribution is to group raw data into a useable visual framework for analysis
and presentation.
LOS e: Define, calculate, and interpret a holding period return.
Holding period return (HPR) measures the total return for holding an investment over a certain period
of time, and can be calculated using the following formula:
HPR = [Pt - Pt - 1 + Dt ] / P t - 1
6
Where: Pt = price per share at the end of time period t, and Dt = cash distributions received during time
period t.
Example: A stock is currently worth $60. If you purchased the stock exactly one year ago for $50 and
received a $2 dividend over the course of the year, what is your HPR?
(60 - 50 + 2) / 50 = 24%
LOS f: Define and explain the use of intervals to summarize data.
 Define the intervals. An interval (classes) is the set of return values within which an
observation falls. Each observation falls into only one interval, and the total number of intervals
covers the entire population. Intervals must be all-inclusive and non-overlapping.
 Tally the observations.
 Count the number of observations.
LOS g: Calculate relative frequencies, given a frequency distribution.
Relative frequency is calculated by dividing the frequency of each return interval by the total number of
observations. Simply, relative frequency is the percentage of total observations falling within each
interval.
LOS h: Describe the properties of data presented as a histogram or a frequency polygon.
A histogram is the graphical equivalent of a frequency distribution. It is a bar chart of continuous data
that has been grouped into a frequency distribution. The advantage of a histogram is that we can
quickly see where most of the observations lie.
To construct a frequency polygon, we plot the midpoint of each interval on the horizontal axis and the
absolute frequency for that interval on the vertical axis. Each point is then connected with a straight line.
LOS i: Define, calculate, and interpret measures of central tendency, including the population mean,
sample mean, arithmetic mean, geometric mean, weighted mean, median, and mode.
A population mean is the entire group of objects that are being studied. To find the population's mean,
sum up all the observed values in the population (sum X) and divide this sum by the number of
observations (N) in the population.
A sample mean is sum of all the values in a sample of a population divided by the number of values in
the sample. The sample mean is used to make inferences about the population mean.
7
Arithmetic mean is the sum of the observation values divided by the number of observations. It is the
most widely used measure of central tendency, and is the only measure where the sum of the
deviations of each value from the mean is always zero.
Geometric mean is often used when calculating investment returns over multiple periods, or to find a
compound growth rate. It is computed by taking the nth root of the product of n values.
Weighted mean is a special case of the mean that allows different weights on different observations.
The median is the mid-point of the data when the data is arranged from the largest to the smallest
values. Half the observations are above the median and half are below the median. To determine the
median, arrange the data from highest to the lowest and find the middle observation.
The mode of a data set is the value of the observation that appears most frequently.
LOS j: Describe and interpret quartiles, quintiles, deciles, and percentiles.
 Quartiles: The distribution is divided into quarters.
 Quintiles: The distribution is divided into fifths.
 Decile: The distribution is divided into tenths.
 Percentile: The distribution is divided into hundredths (percents).
LOS k: Define, calculate, and interpret (1) a portfolio return as a weighted mean, (2) a weighted
average or mean, (3) a range and mean absolute deviation, and (4) a sample and a population variance
and standard deviation.
Refer to LOS 1.B.i for a review of weighted mean and weighted average.
 Range is the distance between the largest and the smallest value in the data set.
 Mean absolute deviation (MAD) is the average of the absolute values of the deviations of
individual observations from the arithmetic mean.
 Population variance is the mean of the squared deviations from the mean. The population
variance is computed using all members of a population.
 Population standard deviation is the square root of the population variance.
 Sample variance applies when we are dealing with a subset, or sample of the total population.
 Sample standard deviation can be found by taking the positive square root of the sample
variance.
LOS l: Calculate the proportion of items falling within a specified number of standard deviations of the
8
mean, using Chebyshev's inequality.
Chebyshev's inequality states that for any set of observations, the proportion of the observations within
k standard deviations of the mean is at least 1 - 1/k2 for all k > 1.
Chebyshev's inequality states that for any distribution, approximately:
 36% of observations lie within +/- 1.25 standard deviations of the mean
 56% of observations lie within +/- 1.50 standard deviations of the mean
 75% of observations lie within +/- 2 standard deviations of the mean
 89% of observations lie within +/- 3 standard deviations of the mean
 94 of observations lie within +/- 4 standard deviations of the mean
Example: What is the approximate percentage of any distribution that lie within +/- 2 standard
deviations of the mean?
Answer: Applying Chebyshev’s inequality, we have:
1 -1/k2 = 1-1/22 = 1-1/4 = 0.75 or 75%.
LOS m: Define, calculate, and interpret the coefficient of variation.
The coefficient of variation expresses how much dispersion exists relative to the mean of a
distribution and allows for direct comparison of dispersion across different data sets.
CV = [standard deviation of returns] / [expected rate of return]
Example: Investment A has an ER of 7% and an s of .05. Investment B has an ER of 12% and an s of .

07. Which is riskier?
A’s CV is .05/.07 = .714
B’s CV is .07/.12 = .583
A has .714 units of risk for each unit of return while B has .583 units of risk for each unit of return. A is it
has more risk per unit of return.
LOS n: Define, calculate, and interpret the Sharpe measure of risk-adjusted performance.
The Sharpe measure seeks to measure excess return per unit of risk. The numerator of the Sharpe
measure recognizes the existence of a risk-free return. You want a large Sharpe measure.
Example: The mean monthly return on T-bills is 0.25%. The mean monthly return on the S&P 500 is
1.30% with a standard deviation of 7.30%. Calculate the Sharpe measure for the S&P 500 and interpret
the results.
Sharpe measure = (1.30 - 0.25) / 7.30 = 0.144
9
The S&P 500 earned 0.144% of excess return per unit of risk, where risk is measured by standard
deviation.
LOS o: Describe the relative locations of the mean, median, and mode for a nonsymmetrical
distribution.
 For a symmetrical distribution, the mean, median, and mode are equal.
 For a positively skewed distribution, the mode is less than the median, which is less than the
mean. Recall that the mean is affected by outliers. In a positively skewed distribution, there are
large, positive outliers which will tend to "pull" the mean upward.
 For a negatively skewed distribution, the mean is less than the median, which is less than the
mode. In this case, there are large, negative outliers which tend to "pull" the mean downward.
The relative location of the mean, median, and mode for different distribution shapes is shown below:
LOS p: Define and interpret skewness and explain why a distribution might be positively or negatively
skewed.
Skewness refers to a distribution that is not symmetrical.
10
 A positively skewed distribution is characterized by many outliers in its upper or right tail.
Recall that an outlier is defined as an extraordinarily large outcome in absolute value. Positively
skewed distributions have long right tails.
 A negatively skewed distribution is the opposite of a positively skewed distribution. A

negatively skewed distribution has a disproportionately large amount of outliers on its left side. In
other words, a negatively skewed distribution is said to have a long tail on its left side.
LOS q: Define and interpret kurtosis and explain why a distribution might have positive excess
kurtosis.
Kurtosis deals with whether or not a distribution is more or less "peaked" than a normal distribution.
 A distribution that is more peaked than normal is leptokurtic. A leptokurtic return

distribution will have more returns clustered around the mean and more returns with large
deviations from the mean (fatter tails).
 A distribution that is less peaked, or flatter than normal is said to be platykurtic.
For all normal distributions, kurtosis is equal to three.
As indicated in the figure below, a leptokurtic return distribution will have more returns clustered around
the mean and more returns with large deviations from the mean (fatter tails).
LOS r: Describe and interpret measures of skewness and kurtosis.
There are two measures of skewness:
 Absolute skewness is computed in the same way as the variance except that cubed
deviations from the mean are used instead of squared deviations.
 Relative skewness is equal to absolute skewness divided by the cubed standard

deviation.
If relative skewness is equal to zero, then the data is NOT skewed. For kurtosis, positive values indicate
a distribution that is leptokurtic and negative values indicate a platykurtic distribution.
11
12
SESSION 2
Reading 1.C: Probability Concepts
LOS
a: Define a random variable, an outcome, an event, mutually exclusive events, and exhaustive events.
A random variable is a quantity whose outcomes are uncertain. A realized random variable is a
number associated with the outcome of an experiment. An event is a set of outcomes, or a specified
outcome. Mutually exclusive means that only one even can occur at a time. Exhaustive means that
the set of events includes all possible outcomes.
LOS b: Explain the two defining properties of probability.
 The probability of any event "i" is between zero and one.
 If a set of events: E1, E2, ....En, are mutually exclusive and exhaustive, then the sum of the
probabilities of those events equals one.
LOS c: Distinguish among empirical, subjective, and a priori probabilities.
We can assign probabilities to events three ways:
 We calculate an empirical probability by analyzing past data.
 We calculate an a priori probability by using formal reasoning and inspection.
 A subjective probability is less formal and involves personal judgment.
LOS d: Describe the investment consequences of probabilities that are inconsistent.
With respect to investment opportunities, when two assets are price based upon different probabilities
being assigned to the same event, this is called inconsistent probabilities. It is best defined by a
general example.
Example: Event E will increase the return of both stock A and B. The price of stock A incorporates a
higher probability of E than does stock B. All other things equal, stock A is overpriced when compared to
stock B. Therefore, an investor should lower holdings of stock A and increase holdings of stock B. An
investor that is not too risk averse might engage in a pairs arbitrage trade, where he/she short sells A
and uses the proceeds to buy stock B.
LOS e: Distinguish between unconditional and conditional probabilities.
An unconditional probability is also called a marginal probability, and it is the most basic type of
probability. It is the probability of an event where the occurrence of other events is not important.
13
A conditional probability is one where the knowledge of some other event is important. The key thing
to look for is "the probability of A given B." This is noted by a vertical bar symbol.
LOS f: Define a joint probability.
An joint probability is the probability that both events occur at the same time, but neither is certain or a
given. We write the probability of A and B as P(AB). Unless both A and B occur, it does not qualify as
the event "A and B."
LOS g: Calculate, using the multiplication rule, the joint probability of two events.
The multiplication rule for probabilities: P(AB) = P(A I B) * P(B)
We can manipulate this to give the following representation for a conditional probability:
P (A I B) = P(AB) / P(B)
Example: Assume the probability of increasing interest rates "I" is 40%, the probability of recession "R"
given increasing interest rates is 70%, and the probability of "R" without an increase in interest rates is
10%.
Since these events are mutually exclusive and exhaustive, they are called complements.
P(IC) = 1 - P(I) = .60
What is the probability of "recession and an increase in interest rates?"
P(RI) = P(R given I) * P(I) = 0.7 x 0 .4 = 0.28
A diagram such as the one below helps to organize the relationship between unconditional, conditional,
and joint probabilities
LOS h: Calculate,using the addition rule, the probability that at least one of two events will occur.
The general rule of addition states that if two events A and B are not mutually exclusive then you
must account for the joint probability of events. That is the possibility that the two events will occur at
14
exactly the same time. Joint probability is shown by the overlap of the occurrence circles in the
traditional Venn diagram shown below.
P (A or B) = P (A) + P(B) - P(A and B) , where P(A and B) is the joint probability of A and B.
The joint probability [P(A and B)] is defined as the probability that measures the likelihood that 2 or
more events will happen concurrently.
LOS i: Distinguish between dependent and independent events.
Independent events are a list of events where knowledge of one has no influence on the other. That is
easily expressed using conditional probabilities. A and B are independent if:
P (A I B) = P (A), and P(B I A) = P(B)
If this condition is not satisfied, the events are dependent events (i.e., the occurrence of one is
dependent on the occurrence of the other).
LOS j: Calculate a joint probability of any number of independent events.
The multiplication rule for independent events is:
P(A I B) x P(B) = P(A) x P(B) = P(AB)
P(B I A) x P(A) = P(B) x P(A) = P(AB)
Example: On the roll of two dice, the probability of getting two "4s" is:
P(4 on the first die and 4 on the second die) = P(4 on first die) x P(4 on second die)
P(4 on the first die and 4 on the second die) = (1/6) x (1/6) = 1/36 = 0.0278
Hint: When dealing with independent events, the word "and" indicates multiplication, and the word "or"
indicates addition.
LOS k: Calculate, using the total probability rule, an unconditional probability.
15
The total probability rule highlights the relationship between unconditional and conditional
probabilities of mutually exclusive and exhaustive events. It is used to explain the unconditional
probability of an event in terms of probabilities that are conditional upon other events. The total
probability rule is used to demonstrate how joint probabilities tie in with unconditional probabilities.
Example: Assume that a recession can only occur with either two events, an interest rate increase "I",
or interest rates staying the same "IC." Furthermore, assuming that the events "I" and "IC" are mutually
exclusive and exhaustive, a recession can only occur with either of these two events. In that case, the
sum of these two joint probabilities is the unconditional probability of a recession.
P(R) = P(R given I) * P(I) + P(R given IC) * P(IC)
LOS l: Define, calculate, and interpret expected value, variance, and standard deviation.
The expected value is the probability-weighted average of the possible outcomes of the random
variable.
E(X) = ∑xi*P(xi) = x1*P(x1) + x2*P(x2) + …+ xn*P(xn)
Here, the "E" denotes expected value. The symbol x1 is the first realization of random variable X. The
symbol x2 is the second realization, etc. In the long run, the realizations should average to the expected
value. This is most easily seen using the a priori probabilities associated with a coin toss.
For a fair coin where P(head) = P(X = 1) = 0.5 and P(tail) = P(X = 0) = 0.5, the probability weighted
average or expected value is:
E(X) = P(X = 0) * 0 + P(X = 1) = 0.5
For the coin flip, X cannot assume a value of 0.5 in any single experiment. Over the long term, however,
the average of all the outcomes should be 0.5.
The variance is the expected value of the squared deviations of each observation from the random
variable's expected value. As an expected value, the variance uses the probability of each observation
xi to weight the associated squared deviation: [xi - E(X)]2. The formula for variance is:
σ2(X) = ∑[xi - E(X)]2 * P(xi)
The standard deviation is the positive square root of the variance. It may be represented by σ(X) or
just σ.
LOS m: Explain the use of conditional expectation in investment applications.
Example: Historically, on the day a company announces earnings, the stock has fallen three percent,
increased one percent, or increased four percent, depending on whether the earnings announcement
fell short of, met, or exceeded the earnings forecast, respectively.
16
Suppose the probability of falling short, meeting or exceeding expectations depends upon some
external event like weather conditions. If the weather in the relevant time period has been "good," the
probabilities are P(fall short | good) = 0.10, P(meet | good) = 0.50, P(exceed | good) = 0.40. If the
weather has been "poor," the corresponding probabilities are 0.3, 0.4, 0.3. For each type of weather, the
conditional expected value is:
Good weather: E(X | good) = (-0.03 x 0.10) + (0.01 x 0.50) + (0.04 x 0.40) = 0.018
Poor weather: E(X | poor) = (-0.03 x 0.30) + (0.01 x 0.40) + (0.04 x 0.30) = 0.007
LOS n: Calculate an expected value using the total probability rule.
The total probability rule for expected value says that the unconditional probability is the weighted
average of the conditional probabilities.
Example: Continuing with our good vs. poor weather example from LOS 2.C.m:
E(X) = E(X I good) * P(good) + E(X I poor) * P(poor)
E(X) = [.018 * 0.5] + [.007 * 0.5] = .0125
We can apply this procedure to any set of mutually exclusive and exhaustive scenarios S1, S2,...Sn. The
total probability rule for expected value is then represented by:
E(X) = E(X1 I S1) * P(S1) + E(X2 I S2) * P(S2) + E(X3 I S3) * P(S3) +...+ E(Xn I Sn) * P(Sn)
LOS o: Define, calculate, and interpret covariance and correlation.
The covariance is the most basic measure of how two assets move together. The covariance is the
expected value of the product of the deviations of the two random variables around their respective
means. Since the formula is often applied to the returns of assets, the formula below has been written in
terms of the covariance of the return of asset "i" and the return of asset "j:"
Cov(Ri,Rj) = E{(Ri - E(Ri)] * [Rj - E(Rj)]}
The covariance is difficult to interpret because it can take on large values and is measured in units
squared.
The correlation coefficient measures the strength of the relationship between the two variables.
Correlation can only take on values from -1 to +1. The correlation coefficient is equal to the covariance /
σ1 x σ2
A correlation of +1 indicates a strong positive relationship. A correlation of -1 indicates a strong negative

correlation.
LOS p: Explain the relationship among covariance, standard deviation, and correlation.
17
The relationship between these three is that given two of the three, you can calculate the third - be it
covariance, standard deviation, or correlation. Go back to LOS 1.C.o to see this relationship.
Correlation coefficient = Cov / [σ1 x σ2]
LOS q: Calculate the expected return and the variance for return on a portfolio.
An analyst can determine the expected value and variance of a portfolio of assets using the
corresponding properties of the assets in the portfolio. To do this, we must first introduce the concept of
portfolio weights:
wi = market value of investment in asset i / market value of the portfolio
For the exam, memorize the formula for the two-stock portfolio:
Var(Rp) = [wA2 x Var(RA)] + [wB2 x Var(RB)] + 2 x wA x wB x Cov(RA,RB)
LOS r: Calculate covariance given a joint probability function.
Example: Assume the joint probabilities given below. The asset weights are wA = 0.40 and wB = 0.60,
with and E(RA) = 0.13 and E(RB) = 0.18. Further assume that the variances are Var(RA) = 0.0030 and
Var(RB) = 0.0156.
Joint Probability Table
Joint probabilities RB = 0.40 RB = 0.20 RB = 0.00
RA = 0.20 0.15 0 0
RA = 0.15 0 0.60 0
RA = 0.04 0 0 0.25
Cov (RA, RB) = 0.15(0.20 - 0.13)(0.40 - 0.18) + 0.60(0.15 - 0.13)(0.20 - 0.18) + 0.25(0.04 - 0.13)(0.00 -
0.18) = 0.0066
LOS s: Calculate an updated probability, using Bayes' formula.
Bayes' formula says that given a set of prior probabilities for an event of interest, if you can receive
new information, the rule for updating the probability of the event is:
Updated probability = {probability of new information given event / unconditional probability of

new information}{prior probability of event}.
Example: Electcomp Corporation manufactures electronic components for computers and other
devices. There is speculation that Electcomp will soon announce a major expansion into overseas
markets. Electcomp would only do this if its managers estimated the demand to be sufficient to support
18
the sales. If demand is sufficient, Electcomp would also be more likely to raise prices. For ease of
notation, let expand overseas = "O" and let increase prices = "I."
An industry analyst determines the following probabilities:
P(I) = 0.3 and P(IC) = 0.7
P(O given I) = 0.6 and P(O given IC) = 0.4
The probabilities P(I) and P(IC) are also called the priors. Bayes' formula now allows us to compute P(I
given O) where this is the updated probability given new information about "I." We recall the following
formulas:
Conditional probability: P(I given O) = P(IO) / P(O)
Joint probability: P(IO) = P(O given I) * P(I)
For this case, Bayes' formula becomes:
P(I given O) = {P(O given I) / P(O)} * P(I)
Use the total probability rule to compute P(O):
P(O) = P(O given I) * P(I) + P(O given IC) * P(IC)
P(O) = 0.6 * 0.3 + 0.4 * 0.7 = 0.46
P(I given O) = {.6 / .46} * .3 = .3913
If the new information of expand overseas is announced, the prior probability of P(I) = 0.30 must be
updated with the new information to give P(I given O) = .3913.
LOS t: Calculate the number of ways a specified number of tasks can be performed using the
multiplication rule of counting.
The multiplication rule of counting applies when we have a list of k choices, and each choice "i" has
ni ways of being chosen. The number of total choices is n1 x n2 x...xnk.
Example: An analyst is interested in whether a firm raises prices, lowers prices, or keeps prices the
same. The analyst also is interested in whether the firm expands overseas. Each of these represent
separate choices that can occur in different ways: n1 = 3 and n2 = 2. This gives n x n2 = 2 x 3 = 6
possible ways of arranging the choices. The list of possible pairs of choices is: (raise prices, expand),
(raise prices, not expand), (lower price, expand), (lower prices, not expand), (keep prices the same,
expand), (keep prices the same, not expand).
If a third item is up for consideration for which there are n3 choices, then there will be n1 x n2 x n3 ways
of arranging the items.
LOS u: Solve counting problems using the factorial, combination, and permutation notations.
19
Labeling is where there are n items of which each can receive one of k labels. The number of items
that receive label "1" is n1 and the number that receive label "2" is n2, etc. Also, the following relationship
holds:
n1 + n2 + n3 +...+nk = n.
The number of ways labels can be assigned is:
n! / [(n1!) * (n2!) *...*(nk!)]
On your TI financial calculator, factorial is [2nd], [x!].
Example: A portfolio consists of eight stocks. The goal is to designate four of the stocks as "long-term
holds," designate three of the stocks as "short-term holds," and designate one stock a "sell." How many
ways can these labels be assigned to the eight stocks? The answer is (8!) / (4!x3!x1!) = (40,320) /
(24x6x1) = 280.
LOS v: Distinguish among problems for which different counting methods are appropriate.
The multiplication rule of counting is used when there are two or more groups. The key is that we
choose only one item from each group.
Factorial is used by itself when there are no groups - we are only arranging a given set of n items.
Given n items, there are n! ways of arranging them.
The labeling formula applies to three or more sub-groups of predetermined size. Each element of the
entire group must be assigned a place or label in one of the three or more sub-groups.
The combination formula applies to only two groups of predetermined size. Look for the word
"choose" or "combination."
The permutation formula applies to only two groups of predetermined size. Look for a specific
reference to "order" being important.
LOS w: Calculate the number of ways to choose r objects from a total of n objects, when the order in
which the r objects is listed does or does not matter.
A special case of this labeling is when k = 2. That is that n items can only be in one of two groups and
n1 + n2 = n. In that case, we can let r = n1 and n2 = n - r. Since there are only two categories, we usually
talk about choosing r items. Then (n - r) are not chosen. The general formula for labeling when k = 2 is
called the combination formula (or binomial formula):
n! / (n - r)! x r!
Example: In a portfolio of eight stocks, we decide to sell three stocks. How many ways can we choose
three of the eight to sell? The answer is 8! / (5! x 3!) = 56.
20
21
SESSION 2
Reading 1.D: Common Probability Distributions
LOS
a: Explain a probability distribution.
A probability distribution specifies the probabilities of all the possible outcomes of a random variable.
Those outcomes may be discrete or continuous.
LOS b: Distinguish between and give examples of discrete and continuous random variables.
A discrete random variable is one where we can list all the possible outcomes and for each possible
outcome there is a measurable, positive probability. An example of a discrete random variable is the
number of days it rains in a given month. A continuous distribution would define the probabilities for the
actual amount of rainfall. A continuous random variable is one where we cannot list the possible
outcomes because we can always list a third number between any two numbers on the list. The number
of outcomes is essentially infinite even if lower and upper bounds exist. The number of points between
the lower and upper bounds are essentially infinite.
LOS c: Describe the range of possible outcomes of a specified random variable.
In the discrete case, such as the number of days of rain in a month, there are a finite number of
outcomes as defined by the number of days in the month. In the continuous case, such as the amount
of rainfall, the outcome can be recorded out to many decimal places. We say the probability of two
inches of rainfall is essentially zero because it is a single point. We must talk about the probability of the
amount of rain being between two and three inches. In other words:
 For a discrete distribution p(x) = 0 when "x" cannot occur, or p(x) > 0 if it can.
 For a continuous distribution p(x) = 0 even though "x" can occur, but we can only consider
p(x1≤ X ≤ x2), where x1 and x2 are actual numbers.
LOS d: Define a probability function and state whether a given function satisfies the conditions for a
probability function.
The probability function specifies the probability that the random variable takes on a specific value.
Example: The following is a probability function:
For X: (1, 2, 3, 4), p(x) = x / 10, else p(x) = 0.
The probabilities are p(1) = 0.1, p(2) = 0.2, p(x) = 0.3, and p(x) = 0.4, all of which are between zero and
one. Also, 0.1 + 0.2 + 0.3 + 0.4 = 1.
22
LOS e: State the two key properties of a probability function.
The two key properties of a probability function are:
 0 ≤ p(x) ≤ 1
 The sum of all the probabilities for mutually exclusive and exhaustive outcomes must equal
one.
LOS f: Define a cumulative distribution function and calculate probabilities for a random variable, given
a cumulative distribution function.
The cumulative distribution function or just "distribution function" defines the probability that a
random variable takes a value equal to or less than a given number: P(X ≤ x) or F(X). Using the
probability function defined earlier: X:(1, 2, 3, 4), p(x) = x / 10, F(1) = 0.1, F(2) = 0.3, F(3) = 0.6, F(4) =
1. In other words, F(3) represents the cumulative probability that outcomes 1, 2, and 3 occur.
LOS g: Define a probability density function.
The probability function for a continuous random variable is called the probability density function
(pdf), or simply density, denoted as f(x).
f(x) = P(a<X)
As shown in the figure below, the graph of the pdf for a contininous distribution is simply a horizontal
line with a height of f(x) = 1/(b-a) over the range of values froma to b
LOS h: Define a discrete uniform random variable and calculate probabilities, given a discrete uniform
probability distribution.
The discrete uniform random variable is one where the probabilities are equal for all possible
outcomes
23
LOS i: Define a binomial random variable and calculate probabilities, given a binomial probability
distribution.
The binomial random variable is the number of "success" in a given number of "trial" where the
outcome can either be "success" or "failure." The probability of success is constant for each trial, and
the trials are independent. A trial is like a mini-experiment and the final outcome is the number of
successes in the series of n-trials. Under these conditions, the probability of "x" success in "n" trials is
calculated using the following formula:
p(x) = P(X = x) = [number of ways to choose x from n]px(1 - p)n - x
Where the term in brackets = n! / {(n - x)!x!}, and p = the probability of "success" on each trial.
Example: Assuming a binomial distribution, compute the probability of drawing three black beans from
a bowl of black and white beans if the probability of selecting a black bean in any given attempt is 0.6.
You will draw five beans from the bowl.
P(X = 3) = p(3) = [5! / 2!3!](0.6)3(0.4)2 = (120/12)(0.216)(0.160) = 0.3456
LOS j: Calculate the expected value and variance of a binomial random variable.
Example: Using empirical probabilities, for any given day the probability that the DJIA will increase is
0.67. We will assume that the only other outcome is that it decreases in a day. Hence, p(UP) = 0.67 and
p(DOWN) = 0.33. We will assume that whether the DJIA increases in one day is independent of
whether it decreases in another day. What is the probability that the DJIA will increase three out of five
days? What is the expected number of up days in a five-day period?
Here we define success as UP, so p = 0.67. The definition of success is critical to any binomial problem.
The n items are the five days: n = 5. The number of successes we are computing the probability for is x
= 3. The formula is:
p(3) = P(X = 3) = 5! / [(5 - 3)! (3!)] x (0.673) x (0.335 - 3)
p(3) = (10)(0.301)(0.109)
p(3) = 0.328
Expected value of X: E(X I n = 5, p = 0.67) = (5)(0.67) = 3.35
Variance of X: V(X) = (n)(p)(1 - p) = (5)(0.67)(0.33) = 1.106
LOS k: Describe the continuous uniform distribution and calculate probabilities, given a continuous
uniform probability distribution.
24
The continuous uniform distribution is defined over a range that spans between some lower limit "a"
and some upper limit "b" which serve as the parameters of the distribution. Outcomes can only occur
between a and b. Because the distribution is continuous, even if a < x < b, p(x) = 0. For a ≤ x1 < x2 ≤ b,
P(X < a or X > b) = 0
P(x1 ≤ X ≤ x2) = (x2 - x1) / (b - a).
Example: Random variable X follows a continuous uniform distribution over 12 to 28, that is a = 12, and
b = 28. The probability of an outcome between 15 and 25 is:
P(15 ≤ X ≤ 25) = (25 - 15) / (28 - 12)
= 10 / 16 = 0.625
LOS l: Explain the key properties of the normal distribution.
The normal distribution has the following key properties:
 It is completely described by its mean, m, and variance, σ 2, stated as X ~ N(μ, σ2). In words,
this says that "X is normally distributed with mean m and variance σ2."
 Skewness = 0, meaning that the normal distribution is symmetric about its mean, so that P(X ≤
μ) = P(μ ≤ X) = 0.5, and mean = median = mode.
 Kurtosis = 3; this is a measure of how "flat" the distribution is. Recall that excess kurtosis is
measured relative to the number "3."
 A linear combination of normally distributed random variables is also normally distributed.
LOS m: Construct and interpret confidence intervals for a normally distributed random variable.
The 90 percent confidence interval for X is +/- 1.65s.
25
Example: Using a 20-year sample, the average return of a mutual fund has been 10.5 percent per year
with a standard deviation of 18 percent. What is the 95 percent confidence interval for the mutual fund
return next year?
Here the point estimates for and s are 10.5 percent and 18 percent, respectively. Thus, the 95 percent
confidence interval for the return, R, is:
10.5+/-1.96(18) = -4.78 percent to 45.78 percent.
Symbolically, this result can be expressed as:
P(-4.78 < R < 45.78) = 0.95 or 95%.
LOS n: Define the standard normal distribution, explain how to standardize a random variable, and
calculate probabilities using the standard normal probability distribution.
The standard normal distribution is a normal distribution that has been "normalized" so that it has a
mean of zero and a standard deviation of one. To standardize an observation from any given normal
curve, you must calculate the observation's Z-value. The Z-value tells you how far away the given
observation is from the population mean in units of standard deviation. This is how we standardize a
random variable.
Z = (observation - population mean) / standard deviation = (X – μ)/ σ
Example: The Z table tells us that F(2) = 0.9772, thus F(-2) = 1 - .9772 = 0.0228. This tells us that
0.0228 or 2.28% of the area falls below Z = -2 and an equal amount falls above Z = +2. Furthermore,
P(-2 ≤ Z ≤ 2) = 1 - 0.0228 - 0.0228 = 0.9544. Another way to do this is to write:
P(-2 ≤ Z ≤ 2) = F(2) - F(-2) = 0.9772 - 0.0228 = 0.9544
LOS o: Distinguish between a univariate and a multivariate distribution.
A univariate distribution is a single random variable, while a multivariate distribution specifies the
probabilities for a group of random variables. A multivariate distribution is meaningful when the random
variable in the group are not independent. A joint probability table describes the multivariate distribution
between two discrete random variables. Multivariate relationships can exist between two or more
continuous random variables, e.g., inflation, unemployment, interest rates, and exchange rates.
LOS p: Explain the role of correlation in the multivariate normal distribution.
A multivariate normal distribution can be described by the mean and variance of the individual random
variables. It is necessary to specify the correlation between the individual pairs of variables when
describing a multivariate distribution. Correlation is the feature that distinguishes a multivariate
26
distribution from a univariate normal distribution. Correlation indicates the strength of the linear
relationships between a pair of random variables.
Using asset returns as our random variables, the multivariate normal distribution for the returns on n
assets can be completely defined by the following three sets of parameters:
 n means of the n series of returns (μ1, μ2,,…, μn).
 n variances of the n series of returns (σ21, σ22, …, σ2n).
 0.5n(n -1) pair-wise correlations.
LOS q: Define shortfall risk
Shortfall risk is the risk that a portfolio will fall below a particular value over a given horizon and is the
focus of safety-first rules.
LOS r: Calculate the safety-first ratio and select an optimal portfolio using Roy's safety-first criterion.
Roy's safety-first criterion states that the optimal portfolio minimizes the probability that the return of
the portfolio falls below some minimum acceptable level. This minimum acceptable level is called the
"threshold" level.
Symbolically, Roy safety-first criterion can be stated as:
Maximize the SFRatio where SFRatio = [E(Rp) - RL] / standard deviation sp
Where:
Rp = portfolio return
RL = threshold level return
Example: Let's assume that you were given the following information assuming a threshold (R L) level of
0.030:
Portfolio A: SFRatio = 0.5 and F(-SFRatio) = 0.3085
Portfolio B: SFRatio = 0.4 and F(-SFRatio) = 0.3446
Portfolio C: SFRatio = 0.44 and F(-SFRatio) = 0.3300
The best choice is Portfolio A because its value for F(-SFRatio) = 0.3085 is the lowest of the three
choices. Note, choosing the portfolio with the largest SFRatio gives the same result as choosing
based on the smallest F(-SFRatio).
LOS s: Explain the relationship between the lognormal and normal distributions.
27
A lognormal distribution is simply the distribution of a random variable Y = e X, where X ~ N(? σ2) and e =
2.718, the base of the natural logarithms.
What you need to know for the exam:
 The lognormal distribution is skewed to the right.
 The lognormal distribution is bounded from below by zero.
 The random variable Y = eX. Or alternatively, from the properties of logarithms, ln[Y] = X and X
is normally distributed. To prove this, remember the following fun fact about logarithms: ln[eX] =
X.
LOS t: Distinguish between discretely and continuously compounded rates of return.
A discretely compounded rate of return is based on the relationship of the change of an asset's price
and its starting price. It is denoted in the readings as "R." If the beginning price and ending price of an
asset are S0 and S1 respectively, R = S1 / S0 – 1.
A continuously compounded rate of return measures the change over the average of all numbers
between S0 and S1. This gives a useful perspective on the return earned over a period. This is done with
the use of the natural logarithm ( ln). Given a holding period return of R, the continuously compounded
rate of return is:
ln(1 + R) = (S1 / S).
LOS u: Calculate a continuously compounded return, given a specific holding period return.
Example: A stock increases from $100 to $110 during a year.
A continuously compounded return measures the nominal return with respect to the average of all
values between the beginning and ending value: r = ln($110 / $100) = 0.0953 and it is also true that r =
ln($100 / $110) = -0.0953. The fact that the stock ends at the same value it started at and (0.0953 -
0.0953) = 0 more accurately depicts the true movement of the stock.
LOS v: Explain Monte Carlo simulation and historical simulation and describe their major applications
and limitations.
Monte Carlo simulation is used to find approximate solutions to complex problems. The procedure
usually uses a random number generator from a computer to generate sets of realized random
variables from specified distributions. The results combine to form a new series of variables for which
the true distributions are too mathematically complex to define. After generating a large number of sets
of realizations, the statistics of the generated numbers can be used as estimates of the true parameters
of the complex distribution.
28
Major applications:
 Projecting the interaction of pension plan assets and liabilities.
 Developing estimates of value at risk.
 Estimating the potential success of a given trading strategy.
 Estimating the distribution of the return of a portfolio composed of assets that do not have
normally distributed returns.
 Estimating the distribution of an asset that has features such as embedded options, call
features, and parameters that change as market conditions change.
Historical simulation uses historical data to generate the sets of realized random variables (as
opposed to a random number generator as in Monte Carlo simulation).
Limitations:
 Historical simulation cannot take into account the effect of significant events that did not
occur during the sample period. For example, if a particular security only came into existence after
1987, we do not have observations for its behavior during a "market crash."
 It cannot perform "what if" analysis. The source of the sample data is a fixed set, and we
cannot investigate the effects of changing parameters in certain ways.
29
SESSION 3
Reading 1.A: Sampling and Estimation
LOS
a: Define simple random sampling.
Simple random sampling is a method of selecting a sample in such a way that each item or person in
the population begin studied has the same (non-zero) likelihood of being included in the sample. This is
the standard sampling design.
LOS b: Define and interpret sampling error.
Sampling error is the difference between a sample statistic (the mean, variance, or standard deviation
of the sample) and its corresponding population parameter (the mean, variance or standard deviation of
the population).
The sampling error of the mean = sample mean - population mean = X - μ
LOS c: Define a sampling distribution.
The sample statistic itself is a random variable, so it also has a probability distribution. The sampling
distribution of the sample statistic is a probability distribution made up of all possible sample statistics
computed from samples of the same size randomly drawn from the same population, along with their
associated probabilities.
LOS d: Distinguish between simple random and stratified random sampling.
Simple random sampling is where the observations are drawn randomly from the population. In a
random sample each observation must have the same chance of being drawn from the population. This
is the standard sampling design.
Stratified random sampling first divides the population into subgroups, called strata, and then a
sample is randomly selected from each stratum. The sample drawn can be either a proportional or a
non-proportional sample. A proportional sample requires that the number of items drawn from each
stratum be in the same proportion as that found in the population.
LOS e: Distinguish between time-series and cross-sectional data.
A time-series is a sample of observations taken at a specific and equally spaced points in time. The
monthly returns on Microsoft stock from January 1990 to January 2000 are an example of time-series
data.
Cross-sectional data is a sample of observations taken at a single point in time. The sample of
30
reported earnings per share of all Nasdaq companies as of December 31, 2000 is an example of cross-
sectional data.
LOS f: State the central limit theorem and describe its importance.
The central limit theorem tells us that for a population with a mean ?and a finite variance σ 2, the
sampling distribution of the sample means of all possible samples of size n will be approximately
normally distributed with a mean equal to μ and a variance equal to σ2/n.
What you need to know for the exam:
 If the sample size n is sufficiently large (greater than 30), the sampling distribution of the
sample means will be approximately normal.
 The mean of the population, μ and the mean of all possible sample means, are equal.
 The variance of the distribution of sample means is σ2/n.
LOS g: Calculate and interpret the standard error of the sample mean.
Example: The mean hourly wage for Iowa farm workers is $13.50 with a standard deviation of $2.90.
Let x be the mean wage per hour for a random sample of Iowa farm workers. Find the mean and
standard error of the sample means, x, for a sample size of 30.
The mean of the sampling distribution of x is ?sub>x = ?= $13.50. Since σ is known, the standard error
of the sample means is: σx = σ/ √n = 2.90 / √30 = $0.53. In conclusion, if you were to take all possible
samples of size 30 from the Iowa farm worker population and prepare a sampling distribution of the
sample means you will get a mean of $13.50 and standard error of $0.53.
LOS h: Distinguish between a point estimate and a confidence interval estimate of a population
parameter.
Point estimates are single (sample) values used to estimate population parameters. The formula we
use to compute the point estimate is called the estimator. For example, the sample mean X is an
estimator of the population mean and is computed using the following formula:
X = (Σ x / n)
The value we obtain from this calculation for a specific sample is called the point estimate of the mean.
A confidence interval is a range of estimated values within which the actual value of the parameter will
lie with a given probability of 1 - a. The term a is also called the significance level of the test. It is also
known as the confidence level.
31
LOS i: Identify and describe the desirable properties of an estimate.
 An unbiased estimator is one for which the expected value of the estimator is equal to the
parameter you are trying to estimate.
 An unbiased estimator is also efficient if the variance of its sampling distribution is smaller
than all the other unbiased estimators of the parameter you are trying to estimate. The sample
mean, for example, is an efficient estimator of the population mean.
 A consistent estimator provided a more accurate estimate of the parameter as the sample
size increases. As the sample size increases, the standard deviation (standard error) of the sample
mean falls and the sampling distribution bunches more closely around the population mean.
LOS j: Calculate and interpret a confidence interval for a population mean, given a normal
distribution with 1) a known population variance or 2) an unknown population variance.
Example: Suppose we administer a practice exam to 100 CFA Level I candidates, and we discover the
mean score on this practice exam for all 36 of the candidates in the sample who studied at least 10
hours a week in preparation for the exam is 80. Assume the population standard deviation is 15.
Construct a 99% confidence interval for the mean score on the practice exam of candidates who study
at least 10 hours a week.
80 +/- 2.575 (15 / √36) = 80 +/- 6.4
The 99% confidence has a lower limit of 73.6 and an upper limit of 86.4.
Example: Suppose you take a sample of the past 30 monthly returns for McCreary Inc. The mean
return is 2%, and the sample standard deviation is 20%. The standard error of the sample was found to
be 3.6%. Construct a 95% confidence interval for the mean monthly return.
Remember, because this is a two-tailed test, α = 5%; the probability in each tail will be 2.5% when df =
29. From the t-table, we can determine that the reliability factor for tα/2, or t2 5, is 2.045. Then the
confidence interval is:
2 +/- 2.045 (20 / √30) = 2% +/- 7.4%
The 95% confidence interval has a lower limit of -5.4% and an upper limit of 9.4%.
LOS k: Discuss the issues surrounding selection of the appropriate sample size.
When the distribution is non-normal, the size of the sample influences whether or not we can construct
the appropriate confidence interval for the sample mean. If the distribution is non-normal, but the
variance is known, we can still use the Z-statistic as long as the sample size is large (n > 30). Why??
Because of the central limit theorem.
32
If the distribution is non-normal and the variance is unknown, we can use the t-statistic as long as the
sample size is large (n > 30).
This means that if we are sampling from a non-normal distribution, we cannot create a confidence
interval if the sample size is less than 30. So, all else equal, make sure you have a sample larger than
30, and the larger, the better.
LOS l: Define and discuss data-snooping/data-mining bias.
Data-snooping bias occurs when the researcher bases his research on the previously reported
empirical evidence of others, rather than on the testable predictions of well-developed economic
theories.
Data snooping often leads to data mining, when the analyst continually uses the same database to
search for patterns or trading rules until he finds one that "works."
LOS m: Define and discuss sample selection bias, survivorship bias, look-ahead bias, and time-period
bias.
 Sample selection bias occurs when some data is systematically excluded from the
analysis, usually because of the lack of availability.
 Survivorship bias is the most common form of sample selection bias. A good example of
the existence of survivorship bias in investments is the study of mutual fund performance. Most
mutual fund databases, like Morningstar?/font>, only include funds currently in existence 鍟 he
"survivors." They do not include funds that have ceased to exist due to closure or merger.
 Look-ahead bias occurs when a study tests a relationship using sample data that was not
available on the test date.
 Time-period bias can result if the time period over which the data is gathered is either too
short or too long.
33
SESSION 3
Reading 1.B: Hypothesis Testing
LOS a: Define a hypothesis and describe the steps of hypothesis testing.
A hypothesis is a statement about the value a of population parameter developed for the purpose of
testing a theory or belief. The process of hypothesis testing is shown below:
Hypothesis Testing
Ste Action Steps

p
1 Write the Null H0 Will always contain an equality
Hypothesis and the H1 Never contains an equality
Alternative
Hypothesis
For a one tailed test H0 is greater than or equal to (or less than or equal to) a
given test value; H1 is subsequently less than (or greater than) the test value
For a two tailed test H0 is equal to a test value and H1 does not equal that
value
2 Select level of The significance level is the probability of rejecting the Null Hypothesis when it
significance is true (related to confidence intervals):
5% significance level: 95% confidence level . . . . ± 1.96 (for 2 tailed test)
1% significance level: 99% confidence level . . . . ± 2.58  (for 2 tailed test)
Risk of rejecting a correct H0 is risk; Risk of accepting a wrong H0 is risk
H0 Accept Reject
Is True correct  error
Is False  error correct
34
Ste Action Comments

p
3 Calculate the test x−µ
statistic Z=
Use: σ *
n
Where:
Z = Test statistic
x = Sample Mean
µ = Population Mean (H ) 0
σ = Standard error of sample means (if unavailable use sample std. dev.**)
* for single observations use just Z value

Z=x-µ
σ
s
** sx =
n
4 Establish the The decision rule states when to reject the null hypothesis:
decision rule • How many Standard Deviations (Z) the sample mean x must be from
H0 in order to be rejected
• Driven by the significance level
Critical Z – value
Significance Level One tailed Two tailed
10% 1.29 σ 1.65 σ
5% 1.645 σ 1.96 σ
1% 2.33 σ 2.58 σ
If the sample mean is less than the critical Z value from H0 then the null
hypothesis is accepted
5 Make the decision Based on the data
LOS b: Define and interpret the null hypothesis and alternative hypothesis.
The null hypothesis is the hypothesis that the researcher wants to reject. This is the hypothesis that is
actually tested and is the basis for the selection of the test statistics. The null is generally stated as a
simple hypothesis, such as:
H0: μ = μ0
The null hypothesis always includes some form of the equal sign.
35
The alternative hypothesis, Ha, is what is concluded if there is sufficient evidence to reject the null
hypothesis.
Ha: μ ≠ μ0
LOS c: Distinguish between one-tailed and two-tailed hypothesis tests.
If a researcher wants to test whether something is greater than zero, she may use a one-tailed test.
However, if the research question is whether something is different from zero, the researcher may use
a two-tailed test (which allows for deviation on both sides of the null value).
LOS d: Define and interpret a test statistic and a significance level and explain how significance levels
are used in hypothesis testing.
A test statistic is calculated by comparing the point estimate of the population parameter with the
hypothesized value of the parameter (i.e., the value specified in the null hypothesis).
The significance level is the probability of making a Type I error (rejecting the null when it is true) and
is designated by the Greek letter alpha ( ). For instance, a significance level of 5 percent (  = 0.05)
means that there is a 5 percent chance of rejecting a true null hypothesis. When conducting hypothesis
tests, a significance level must be specified when selecting the critical values against which test
statistics are compared.
LOS e: Define and interpret a Type I and a Type II error.
 A Type I error is the probability of rejecting the null hypothesis when, in fact, the null
hypothesis is correct. The probability of a Type I error equals the selected level of significance.
 A Type II error is the probability of accepting the null hypothesis when, in fact, the null
hypothesis is wrong.
LOS f: Define the power of a test.
The power of a test is the probability of correctly rejecting the null hypothesis when the null hypothesis
is indeed false. The power of a test statistic is important, because we wish to use the test statistic that
provides the most powerful test among all the possible tests.
LOS g: Define and interpret a decision rule.
36
The decision rule for rejecting or failing to reject the null hypothesis is based on the test statistic's
distribution. For example, if a z-statistic is used, the decision rule is based on critical values determined
from the normal distribution.
The critical values are determined based on the distribution and the significance level chosen by the
researcher. The significance level may be 1%, 5%, 10%, or any other level. The most common is 5%.
If we reject the null hypothesis, the result is statistically significant; if we fail to reject the null hypothesis,
the result is not statistically significant.
The following graph illustrates a two-tailed test using a z-distributed test statistic (a z-test) at a 5 percent
level of significance with the critical z-values of ±1.96.
LOS h: Explain the relationship between confidence intervals and tests of significance.
A confidence interval is a range of values within which the researcher believes the true population
parameter may lie. The strict interpretation of a confidence interval is that for a level of confidence of,
say 95%, we expect 95% of the intervals formed in this manner to have captured the true population
parameter.
LOS i: Distinguish between a statistical decision and an economic decision.
 A statistical decision is based solely on the sample information, the test statistic, and the
hypotheses. That is, the decision to reject the null hypothesis is a statistical decision.
 An economic decision considers the relevance of that decision after transaction costs,
taxes, risk, and other factors - things that don't enter into the statistical decision.
LOS j: Identify the test statistic and interpret the results for a hypothesis test about the population
mean of a normal distribution with (1) known or (2) unknown variance.
The choice between the t-distribution and the normal (or z) distribution is dependent on sample size and
whether the variance of the underlying population is known.
37
Use the t-test when the population variance is unknown and either of the following exists:
 The sample size is large (n > 30)
 The sample is small (< 30) but the distribution of the population is normal or approximately
normal.
Use the z-test when the population is normally distributed with known variance.
LOS k: Explain the use of the z-test in relation to the central limit theorem.
The standard deviation of the standard normal distribution is 1. The standard deviation of the t-
distribution is df/ df - 2), where df is the degrees of freedom.
It is appropriate to use the z-test with an unknown variance if the sample size is large enough (n > 30)
regardless of the distribution of the population. This is because of the central limit theorem that says
that for any given distribution with a mean of μ and a variance of σ2, the sampling distribution of the
mean approaches a normal distribution as the sample size increases.
LOS l: Identify the test statistic and interpret the results for a hypothesis test about the equality of two
population means of two normally distributed populations based on independent samples.
A test of differences in means requires using a test statistic chosen depending on two factors: 1)
whether the population variances are equal, and 2) whether the population variances are known. This
test can be used to test the null hypothesis:
H0: μ1 - μ2 = 0 vs. the alternative of HA: μ1 - μ2 ≠ 0
The test when population means are normally distributed and the population variances are unknown but
assumed equal uses a pooled variance. Use the t-distributed test statistic:
t = {(X1 - X2) - (μ1 - μ2)} / {(sp2 / n1) + (sp2 / n2)}1/2
The test when the population means are normally distributed and population variances unknown and
cannot be assumed to be equal uses the t-distributed test statistic that uses both samples' variances:
t = {(X1 - X2) - (μ1 - μ2)} / {(s12 / n1) + (s22 / n2)}1/2
LOS m: Identify the test statistic and interpret the results for a hypothesis test about the mean
difference for two normal distributions (paired comparisons test).
Frequently, we are interested in the difference of paired observations. If questions about the differences
can be expressed as a hypothesis, we can test the validity of the hypothesis. When paired observations
are compared, the test becomes a test of paired differences. The hypothesis becomes:
H0: population mean d = population mean d0
38
HA: population mean d does not equal population mean d0
The alternative may be one-sided:
HA: μd >μd0
HA: μd <μd0
For paired differences, the test statistic is: t = {(d -μd0) / sd}
LOS n: Identify the test statistic and interpret the results for a hypothesis test about the variance of a
normally distributed population.
Given that many financial market observers measure risk as variance, the difference in variance is a
common focus of statistical analysis.
A test of the population variance requires the use of a Chi-squared distributed test statistic.
The hypotheses tested are:
H0: σ12 = σ02
HA: σ12 ≠ σ02
The alternative hypothesis may also be one-sided.
The Chi-squared distribution is asymmetrical and approaches the normal distribution in shape as the
degrees of freedom increase.
The Chi-squared test statistic is:
X 2n-1= (n - 1)s2 / σ02
The following graph illustrates a chi-square distribution: a two-tailed test with a 5 percent level of
significance and 30 degrees of freedom.
LOS o: Identify the test statistic and interpret the results for a hypothesis test about the equality of the
variances of two normally distributed populations, based on two independent random samples.
39
The equality of variances of two populations can be tested with an F-distributed test statistic. The
hypotheses tested are:
H0: σ12 = σ22
HA: σ12 ≠ σ22
One-sided alternative tests are also permissible. It is assumed that the populations from which the
samples are drawn are normally distributed. The test statistic is F-distributed:
F = s12 / s22
With n1 - 1 and n2 - 1 degrees of freedom.
The F-distribution is right-skewed and is truncated at zero on the left-hand side. The shape of the F-
distribution is determined by two degrees of freedom (one pertaining to the numerator, one pertaining to
the denominator). The rejection region is always in the right side tail of the distribution. Therefore, when
constructing this test statistic, always put the larger variance in the numerator.
The following graph illustrates an F-distribution at the 5 percent level of significance with 10 degrees of
freedom in the numerator and denominator.
LOS p: Distinguish between parametric and nonparametric tests.
 Parametric tests rely on assumptions regarding the distribution of the population and are
specific to parameters.
 Nonparametric tests either do not consider a parameter or have few assumptions about
the population that is sampled.
 Often nonparametric tests are used along with parametric tests. In this way, the
nonparametric test is a backup in case the assumptions underlying the parametric test do not hold.
40
SESSION 3
Reading 1.C: Correlation and Regression
LOS
a: Define and calculate the covariance between two random variables.
The covariance between two random variables is a statistical measure of the degree to which the two
variables move together. The covariance captures how one variable changes when the other variable
changes. A positive covariance indicates that the variables tend to move together; a negative
covariance indicates that the variables tend to move in opposite directions. The covariance is calculated
as:
Covariance = the sum of i = 1 to n of (Xi - X)(Yi - Y) / n - 1
where n is the sample size, Xi is the ith observation on variable X, X is the mean of the variable X
observations, Yi is the ith observation on variable Y, and Y is the mean of the variable Y observations.
LOS b: Define, calculate, and interpret a correlation coefficient.
The correlation coefficient, r, is a measure of the strength of the relationship between or among
variables. Correlation is a unitless measure of the tendency of two variables to move together. The
correlation coefficient is bounded by +1 (the variables move together perfectly) and -1 (the variables
move exactly opposite of each other).
r = covariance of X and Y / (σX)(σY)
Assume there are 10 observations and you are given the data below (these are sum figures):
 X = 135 and Y = 416
 X - X = 0.00 and (X - X)2 = 374.50
41
 Y - Y = 0.00 and (Y - Y)2 = 2,342.40
 (X - X)(Y - Y) = 445.00
Calculations:
X = 135 / 10 = 13.5
Y = 416 / 10 = 41.6
s2X = 374.5 / 9 = 41.611
s2Y = 2,342.4 / 9 = 260.267
r = (445 / 9) / (square root of 41.611)(square root of 260.267) = 49.444 / (6.451)(16.133) = 0.475
LOS c: Formulate a test of the hypothesis that the population correlation coefficient equals zero and
determine whether the hypothesis is rejected at a given level of significance.
Example: Suppose the correlation coefficient is 0.2, and the number of observations is 32. What is the
calculated test statistic? Is this correlation significantly different from zero using a 5% level of
significance?
The hypotheses are:
H0: ρ = 0
HA: ρ ≠ 0
The calculated t-statistic is:
t = .2(√32 - 2) / (√1 - 0.04) = 0.2 √30 / √0.96 = 1.11803
Degrees of freedom = 32 - 2 = 30. Hence, the critical t-value for a 5% level of significance and 30 df is
2.042. Therefore, there is no significant correlation (1.11803 falls between the two critical values of
-2.042 and +2.042).
LOS d: Differentiate between the dependent and independent variables in a linear regression.
 The dependent variable is the variable that is being predicted or estimated.
 The independent variable is the variable used to predict the dependent variable.
Example: The percentage change in profit for the auto industry can be predicted by the number of
percentage change in personal income. Percentage change in profit is the dependent variable; the
percentage change in personal income is the independent variable. There may be more than one
independent variable.
Notation often used for a regression line is: Y = a + bX + ε , where Y is the dependent variable, X is the
independent variable, a is the intercept, b is the slope, and  is the regression error.
42
LOS e: Distinguish between the slope and the intercept terms in a regression equation.
The parameters in a simple regression equation are the slope (b1) and the intercept(b0):
Yi = b0 + b1 Xi + εi
Where:
Yi = the ith observation on the dependent variable
Xi = the ith observation on the independent variable
b0 = the intercept
b1 = the slope coefficient
εi = the residual for the ith observation
The slope, b1, is the change in Y for a given one-unit change in X. The slope can be positive, negative,
or zero.
The intercept,b0, is the line's intersection with the Y-axis at X = 0. The intercept can be positive,
negative, or zero.
LOS f: List the assumptions underlying linear regression.
 A linear relationship exists between the dependent and independent variable.
 The independent variable is uncorrelated with the residuals.
 The expected value of the disturbance term is zero, that is the mean of the residuals is zero.
 There is a constant variance of the distribution term. In other words, the disturbance terms
are homoskedastistic.
 The residuals are independently distributed; that is, the residual or disturbance for one
observation is not correlated with that of another observation.
 The disturbance term (residual, or error term) is normally distributed.
LOS g: Define and calculate the standard error of the estimate and the coefficient of determination.
The standard error of the estimate (SEE) is the standard deviation of the predicted dependent
variable values about the estimated regression line.
The SEE is easy to calculate. The sum of the squared prediction errors is called the sum of squared
errors. If the relationship between the variables in the regression is very strong, then the prediction
errors (and the SSE) will be small. Hence, the standard error of the estimate is a function of the SSE.
43
Standard error of the estimate (SEE) =√se2 = √(SSE / n - 2).
The coefficient of determination is another way to measure the relationship between the X and Y
variables. The coefficient of determination tells you what proportion of the total variation of the
dependent variable (Y) is explained or accounted for by the variation in the independent variable (X).
The coefficient of determination is called R2 because mathematically it turns out to be just the square of
the coefficient of correlation (r). Assuming a correlation coefficient of .86, we discover that the R2 of the
index and stock returns is (.86)2 = .74.
LOS h: Calculate a confidence interval for a regression coefficient.
A confidence interval is the range of regression coefficient values for a given value estimate of the
coefficient and a given level of probability. The confidence interval for a regression coefficient b1 is
calculated as:
b1 ± tcsb1
Where tc is the critical t-value for the selected confidence level. Although this looks slightly different than
what we've seen before, it is precisely the same. All confidence intervals take the predicted value then
add and subtract the critical test statistic times the variability of the statistic.
LOS i: Identify the test statistic and interpret the results for a hypothesis test about the population
value of a regression coefficient.
To test the hypothesis concerning the slope coefficient (e.g., to see whether the estimated slope is
equal to a hypothesized value, say b0) we calculate a t-distributed statistic.
If the t-statistic is greater than the critical t-value for the appropriate df, (or less than the critical t-value
for a negative slope) we can say that the slope coefficient is different from the hypothesized value, b1.
To test whether an independent variable explains the variation in the dependent variable, the hypothesis
that is tested is whether the slope is zero:
H0: b1 = 0 versus HA: b1 ≠ 0
LOS j: Interpret a regression coefficient.
 The estimated intercept is interpreted as the value of the dependent variable (the Y) if the
independent variable (the X) takes on a value of zero.
 The estimated slope coefficient is interpreted as the change in the dependent variable for a
given one-unit change in the independent variable.
44
 Any conclusions regarding the important of an independent variable in explaining a dependent

variable requires determining the statistical significance of the slope coefficient. Simply looking
at the magnitude of the slope coefficient does not address the issue of the importance of the
variable (i.e., you must perform a hypothesis test or create a confidence interval to assess the
importance of the variable).
LOS k: Calculate a predicted value for the dependent variable, given an estimated regression model
and a value for the independent variable.
Example: Suppose you estimate a regression model with the following parameters:
Y = 1.50 + 2.5 X1
In addition, you have forecasted the value of the independent variable to be 20 (i.e., X1 = 20). What is
the forecasted value of the Y variable?
Y = 1.50 + 2.50(20) = 1.50 + 50 = 51.5
LOS l: Calculate and interpret a confidence interval for the predicted value of a dependent variable.
Example: Suppose an analyst generates the following regression results:
Y = 0.01 + 1.2X
SEE = 0.23 (square to get se2), sx = 0.16 (square to get sx2), n = 32, and X = 0.06.
Calculate the value of the dependent variable given that the forecast value of X is 0.05. Calculate a
confidence interval on the forecasted value. Use a significance level of 5%.
Y = 0.01 + 1.2(0.05) = 0.07
Using a 5% significance level, the critical t-value is 2.042 (t-table, 32-2 = 30 df). The variance of the
forecast is:
sf2 = 0.0529 [ (1) + (1 / 32) + (0.05 - 0.06)2 / (32 - 1)(0.0256) ] = 0.05456.
The standard error of the forecast is the square root of 0.05456 or 0.23358. Hence, the prediction
interval is:{-0.40697 < Y < 0.54697}
LOS m: Describe the use of analysis of variance (ANOVA) in regression analysis.
An ANOVA table is a summary of the explanation of the variation in the dependent variable and is
included in the regression output of many statistical software packages. You can think of the ANOVA
table as the source of the data for the computation of many of the concepts discussed in this summary.
For instance, the data to compute the R2 and the standard error of the estimate (SEE) comes from the
45
ANOVA table. There are many ways the data from this table can be used in the statistical inference
process (most beyond the scope of the CFA curriculum).
ANOVA – ANALYSIS OF VARIANCE TABLE
Source of Degrees of Sum of Mean Square F-Statistics

Variation Freedom (df) Squares
Regression (R) k SSR SSR MSR
MSR = F =
k
MSE
Sampling Error n - (k+1) SSE SSE

(E) MSE =
n − ( k +1)
Total (T) n-1 SST
SEE, Standard Error of the Estimate
SSE
= MSE =
n − ( k +1)
LOS n: Define and interpret an F-statistic.
 The F-statistic is used to test whether at least one independent variable in the set of independent
variables explains a significant portion of the variation of the dependent variable. This is a
goodness of fit test and is calculated by:
 F = Mean square regression, MSR / Mean square error, MSE = {(SSR / k) / (SSE / n - k - 1)}
 To determine whether an F-statistic is statistically significant, we compare the calculated F-statistic

with the critical F-statistic for k(numerator) and n - k - 1 (denominator) degrees of freedom (n =
number of observations, k = number of slope coefficients).
 The analysis of the F-statistic is similar to the t-statistic, except you use the F-table, and you need
to worry about the degrees of freedom in both the numerator and denominator of the previous
equation. The numerator df is the number of independent variables. The denominator df is [n -
(number of independent variable + 1)] or n - k - 1.
LOS o: Discuss the limitations of regression analysis.
 Regression relations change over time. This is referred to as a non-stationarity.
 If the assumptions of regression analysis are not valid, the interpretation and tests of hypotheses
are not valid. For example, if the data is heteroskedastic (non-constant variance of the error terms)
or exhibits autocorrelation (error terms are not independent), then it is very difficult to use the
regression to forecast the dependent variable given information about the independent variables.
46
47

Cfal1 QM

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Cfal1 QM

Загружено:

Авторское право:

Доступные форматы

CFA Level 1 – Quantitative Methods

CFA Level 1 - Quantitative Methods

Future Value Example:

LOS b: Calculate the PV of a perpetuity.

PVperpetuity = PMT / I/Y

PVperpetuity = 4.50 / .08 = $56.25

Example 1: Solving for I/Y

N = 15, PV = -500, FV = 2,000; CPT I/Y = 9.68%

Example 2: Solving for PMT

N = 15, I/Y = 7, FV = 3,000; CPT PMT = -$119.38 (ignore sign)

LOS d: Calculate the FV and PV of a series of uneven cash flows.

PV = -1,000, I/Y = 10, N = 5, CPT FV = -1,610.51

PV = -500, I/Y = 10, N = 4, CPT FV = -732.05

PV = 0, I/Y = 10, N = 3, CPT FV = 0

PV = 4,000, I/Y = 10, N = 2, CPT FV = 4,840.00

PV = 3,500, I/Y = 10, N =1 , CPT FV = 3,850.00

PV = 2,000, I/Y = 10, N = 0, CPT FV = 2,000.00

Total cash flow stream = $8,347.44

FV = -1,000, I/Y = 10, N = 1, CPT PV = -909.09

FV = 0, I/Y = 10, N = 3, CPT PV = 0

FV = 4,000, I/Y = 10, N = 4, CPT PV = 2,732.05

FV = 3,500, I/Y = 10, N =5 , CPT PV = 2,173.22

FV = 2,000, I/Y = 10, N = 6, CPT PV = 1,128.95

Total cash flow stream = $4,711.91

Annual: N=1 I = 12% PV = $100 CPT FV = $112.00

Semi-annual: N=2 I = 6% PV = $100 CPT FV = $112.36

Quarterly: N= 4 I = 3% PV = $100 CPT FV = $112.55

Monthly: N = 12 I = 1% PV = $100 CPT FV = $112.68

Daily: N = 365 I = .03287 PV = $100 CPT FV = $112.74

Continuous: FV = (PV) e(i rate)(n) =100e(.12)(1) CPT FV = $112.75

Solve: I/Y = 10/4 = 2.5; N = 5*4 = 20; PV = 500: CPT FV = $819.31

Effective rate = (1 + periodic rate)m - 1

Where m = the number of compounding periods in a year.

(1 + .03)4 - 1 = 1.1255 - 1 = 12.55%

Example: Paying off a Loan (or Mortgage)

N = 5, I/Y = 9, PV = 50,000; CPT PMT = $12,854.62

Example: Loan Amortization

PV = 10,000, N = 5, I/Y = 10; CPT PMT = $2,637.97

Example: Funding a Retirement Program

Step 2: N = 25, I/Y = 12.5, FV =259,240; CPT PMT = $1,800.02

LOS a: Differentiate between a population and a sample.

LOS b: Explain the concept of a parameter.

Any descriptive measure of a population characteristic is called a parameter.

LOS c: Explain the differences among the types of measurement scales.

 Nominal scale: Observations are classified or counted with no particular order.

LOS d: Define and interpret a frequency distribution.

LOS e: Define, calculate, and interpret a holding period return.

LOS f: Define and explain the use of intervals to summarize data.

 Tally the observations.

 Count the number of observations.

LOS g: Calculate relative frequencies, given a frequency distribution.

LOS h: Describe the properties of data presented as a histogram or a frequency polygon.

LOS j: Describe and interpret quartiles, quintiles, deciles, and percentiles.

 Quartiles: The distribution is divided into quarters.

 Quintiles: The distribution is divided into fifths.

 Decile: The distribution is divided into tenths.

 Percentile: The distribution is divided into hundredths (percents).

 Population standard deviation is the square root of the population variance.

mean, using Chebyshev's inequality.

Chebyshev's inequality states that for any distribution, approximately:

 75% of observations lie within +/- 2 standard deviations of the mean

 89% of observations lie within +/- 3 standard deviations of the mean

E(X) = ∑xiP(xi) = x1P(x1) + x2P(x2) + …+ xnP(xn)

n! / [(n1!) * (n2!) ...(nk!)]