Вы находитесь на странице: 1из 16

Skip to main content

Chapters 1 and 2

If a researcher uses daily data to examine a particular problem and creates a variable that assigns a
numerical value of 1 to Monday observations, what term would best describe this type of number?

a) Continuous

b) Cardinal

c) Ordinal

d) Nominal

Incorrect! This would be a good example of a nominal number, since it does not even produce an
ordering - the numbers assigned to each day of the week are entirely arbitrary. There is no sense that
Tuesday is "better" than Monday because it is assigned a higher value. We could instead and equally
validly have assigned the value 5 to Monday, 4 to Tuesday, and so on. Clearly since the numbers
assigned to the days of the week would only comprise 5 values, we would not term it a continuous
variable.

Check your answer

The price of a house is best described as what type of number?

a) Discrete

b) Cardinal

c) Ordinal

d) Nominal
Incorrect! The price of a house is not a discrete variable because, at least in principle, it can take on any
value (limited only by the granularity of the currency it is traded in). It is a cardinal number because the
actual numerical values that it takes have a meaning, and for example, a house valued at £500,000 is
worth twice as much as one valued at £250,000.

Check your answer

Which of the following is NOT a feature of continuously compounded returns (i.e. log-returns)?

a) They can be interpreted as continuously compounded changes in the prices

b) They can be added over time to give returns for longer time periods

c) They can be added across a portfolio of assets to give portfolio returns

d) They are usually fat-tailed

Incorrect! Log-returns can indeed be interpreted as continuously compounded changes in the price or
index value over time. This is useful since it means we don't have to worry about the compounding
frequency. Log-returns can also be added up over time, so that the return over a year is simply the sum
of the daily returns for all trading days in that year. Asset returns are usually fat-tailed (leptokurtic), and
this is true whether they are measured as log-returns or simple returns. However, log-returns cannot be
aggregated across a portfolio to get a portfolio return. This would be possible with simple returns but
does not work for log-returns because taking the log is a non-linear transformation process. Therefore
the sum of a log is not the same as the sum of a log. In order to calculate portfolio log-returns, it is
necessary to calculate the value of the whole portfolio first at each point in time and then to take the log
of the portfolio price changes.

Check your answer

Which of the following are alternative names for the dependent variable (usually denoted by y) in linear
regression analysis?

(i) The regressand


(ii) The regressor

(iii) The explained variable

(iv) The explanatory variable

a) (ii) and (iv) only

b) (i) and (iii) only

c) (i), (ii), and (iii) only

d) (i), (ii), (iii), and (iv)

Correct! Since regressand and explained variable are alternative names for the variable whose
movements we are trying to explain. The regressor or explanatory variable are names for x, the variable
that is doing the explaining in the model.

Check your answer

Which of the following are alternative names for the independent variable (usually denoted by x) in
linear regression analysis?

(i) The regressor

(ii) The regressand

(iii) The causal variable

(iv) The effect variable


a) (ii) and (iv) only

b) (i) and (iii) only

c) (i), (ii), and (iii) only

d) (i), (ii), (iii), and (iv)

Correct! The independent variable, usually denoted by x, is also known as the regressor or the causal
variable. The regressand and effect variable are alternative names for y.

Check your answer

Which of the following statements is TRUE concerning the standard regression model?

a) y has a probability distribution

b) x has a probability distribution

c) The disturbance term is assumed to be correlated with x

d) For an adequate model, the residual (u-hat) will be zero for all sample data points

Incorrect! Since y depends on u as well as x, and since u is a random variable, y will also be a random
variable. x is assumed to be non-stochastic, i.e. to be fixed and it is therefore not a random variable.
Since x is assumed to be non-stochastic, it cannot be correlated with a random variable u, otherwise it
would be stochastic! A good model would be one where the residuals are as close to zero as possible.
However, unless there is a perfect relationship between y and x (i.e. all of the points lie on a straight
line), the residuals cannot all be zero.

Check your answer

Which of the following statements is TRUE concerning OLS estimation?

a) OLS minimises the sum of the vertical distances from the points to the line
b) OLS minimises the sum of the squares of the vertical distances from the points to the
line

c) OLS minimises the sum of the horizontal distances from the points to the line

d) OLS minimises the sum of the squares of the horizontal distances from the points to the
line.

Correct! OLS minimises the sum of the squares of the vertical distances from the points to the line. The
reason that vertical rather than horizontal distances are chosen is due to the set up of the classical linear
regression model that assumes x is non-stochastic. Therefore, the question becomes one of how to find
the best fitting values of y given the values of x. If we took horizontal distances, this would mean that we
were choosing fitted values for x, which wouldn't make sense since x is fixed. The reason that squares of
the vertical distances are taken rather than the vertical distances themselves is that some of the points
will lie above the fitted line and some below, cancelling each other out. Therefore, a criterion that
minimised the sum of the distances would not give unique parameter estimates since an infinite number
of lines would satisfy this.

Check your answer

The residual from a standard regression model is defined as

a) The difference between the actual value, y, and the mean, y-bar

b) The difference between the fitted value, y-hat, and the mean, y-bar

c) The difference between the actual value, y, and the fitted value, y-hat

d) The square of the difference between the fitted value, y-hat, and the mean, y-bar

Incorrect! The residual is defined as the difference between the actual value y and the fitted value, y-
hat.

Check your answer

Which one of the following statements best describes the algebraic representation of the fitted
regression line?
a)

b)

c)

d)

Correct! The fitted value for y is obtained by taking the value of the explanatory variable for a particular
observation, multiplying it by the slope estimate and adding the intercept estimate. This then gives a
value for y-hat from the fitted line for that observation. The answers for a and c are not plausible
equations for anything, since the fitted value from the regression model cannot include either a residual
or a disturbance in its calculation. The equation in d is a valid equation that splits the actual value y into
a part that is explained by the model and a part which the model cannot explain (the residual). However,
the equation in d is not the equation for the fitted value.

Check your answer

10

Which of the following statements concerning the regression population and sample is FALSE?

a) The population is the total collection of all items of interest

b) The population can be infinite

c) In theory, the sample could be larger than the population

d) A random sample is one where each individual item from the population is equally likely
to be drawn.

Incorrect! By definition, the population is indeed the collection of all items of interest, and this can be
either infinite or finite depending on the context. Also by definition, a random sample is one where each
item from the population is equally likely to be drawn. It is of course impossible for the sample to be
larger than the population, since the sample takes just some items from the population.

Check your answer

11
Which of the following statements is true concerning the population regression function (PRF) and
sample regression function (SRF)?

a) The PRF is the estimated model

b) The PRF is used to infer likely values of the SRF

c) Whether the model is good can be determined by comparing the SRF and the PRF

d) The PRF is a description of the process thought to be generating the data.

Incorrect! The PRF is the true population model for the relationship between the variables x and y. Some
researchers draw a distinction between the PRF and data generating process, but the two terms have
been used synonymously on this course. The sample is used to estimate a SRF, which is used to
determine what are the likely values of the population parameters described by the PRF. Therefore a, b,
and c are false and d is a true statement.

Check your answer

12

Which of the following models can be estimated using OLS, following suitable transformations if
necessary? (Note that "e" denotes the exponential).

i)

ii)

iii)

iv) .

a) (i) only

b) (i) and (iii) only


c) (i), (iii), and (iv) only

d) (i), (ii), (iii), and (iv)

Incorrect! In fact, all of models (i) to (iv) can be estimated using OLS, following suitable transformations
where necessary. Clearly (i) is simply the standard model. For (ii), creating a new variable (call it z) as z =
e^x, would give the standard model as a regression of y on a constant and z. In (iii), substituting Y = ln(y)
and X = ln(x) and regressing Y on a constant and X would again give the standard model. Finally, to
estimate (iv), set z = x^2, and regress y on a constant and z.

Check your answer

13

Which of the following is an equivalent expression for saying that the explanatory variable is "non-
stochastic"?

a) The explanatory variable is partly random

b) The explanatory variable is fixed in repeated samples

c) The explanatory variable is correlated with the errors

d) The explanatory variable always has a value of one

Correct! The word "stochastic" means random, so non-stochastic means non-random! One of the
classical linear regression model assumptions is that the explanatory variable x is non-stochastic or fixed
in repeated samples. These are approximately equivalent expressions, although the latter is a slightly
stronger statement. Note that "fixed in repeated samples" does not mean that its value is always the
same (answer d), and also that this prevents x from being partly random (stochastic) or correlated with
the errors (also implying that x is stochastic).

Check your answer

14

If an estimator is said to be consistent, it is implied that


a) On average, the estimated coefficient values will equal the true values

b) The OLS estimator is unbiased and no other unbiased estimator has a smaller variance

c) The estimates will converge upon the true values as the sample size increases

d) The coefficient estimates will be as close to their true values as possible for small and
large samples.

Incorrect! By definition, a consistent estimator is one where the sample estimates converge on their
true (population) values as the sample size increases. Answer a is the definition for an unbiased
estimator, while b is the result that is proved by the Gauss-Markov theorem. Answer d is a slightly
different way of stating the unbiasedness property.

Check your answer

15

If an estimator is said to have minimum variance, which of the following statements is NOT implied?

a) The probability that the estimate is a long way away from its true value is minimised

b) The estimator is efficient

c) Such an estimator would be termed "best"

d) Such an estimator will always be unbiased

Incorrect! An estimator that has minimum variance would also be defined as efficient and "best" - these
terms are equivalent to one another. A minimum variance estimator means that the sampling variation
in the parameter estimates between one sample and another will be minimised. This is also equivalent
to stating that the probability that the estimate for any given sample is a long way off from its true value
will be minimised. An estimator can have minimum variance but be a biased estimator. Typically there is
an implicit trade off between choosing an unbiased but inefficient estimator and choosing an estimator
with a smaller variance that is biased.

Check your answer

16
Consider the OLS estimator for the standard error of the slope coefficient. Which of the following
statement(s) is (are) true?

(i) The standard error will be positively related to the residual variance

(ii) The standard error will be negatively related to the dispersion of the observations on the
explanatory variable about their mean value

(iii) The standard error will be negatively related to the sample size

(iv) The standard error gives a measure of the precision of the coefficient estimate.

a) (ii) and (iv) only

b) (i) and (iii) only

c) (i), (ii), and (iii) only

d) (i), (ii), (iii), and (iv)

Incorrect! All of statements (i) to (iv) are true. The bigger the residual variance is, the bigger must be the
RSS, and therefore the further away are the points from the line. Therefore, the bigger the residual
variance is, the bigger will be the coefficient standard errors. This can bee seen since the term "s"
appears positively in the standard error formulae for the intercept and the slope. The more dispersed
are the observations on the explanatory variable (x) about its mean value, the more precisely the
coefficient estimates can be calculated since we would have information about the relationship
between y and x over a wider range of values for x. In the formulae, the variation of x about its mean
value enters into the denominator for both the slope and the intercept standard errors, so the bigger
the dispersion is, the smaller will be the standard errors. The bigger the sample size, the more pieces of
information are available from which to estimate the model parameters. The number of observations
appears explicitly in the formula for the intercept standard error and implicitly in the formula for the
slope standard error. In the latter case, the standard error is inversely related to the sample size since
the sum of the squares of the observations on x about their mean value appears in the denominator,
and the larger the sample size is, the more terms will be included in this sum.
Check your answer

17

Which of the following statements is INCORRECT concerning the classical hypothesis testing framework?

a) If the null hypothesis is rejected, the alternative is accepted

b) The null hypothesis is the statement being tested while the alternative encompasses the
remaining outcomes of interest

c) The test of significance and confidence interval approaches will always give the same
conclusions

d) Hypothesis tests are used to make inferences about the population parameters.

Incorrect! Hypothesis tests are used to make statements about the plausibility of certain values for the
population parameters given the estimates made from the sample. By definition, the null hypothesis is
the statement being tested while the alternative encompasses other outcomes of interest. The test of
significance and confidence interval approaches will always give the same answer (so long as a fixed
significance level is used for both) since one can be viewed as just a rearrangement of the other. It is
never said that the alternative hypothesis is accepted. The reason that this is not done is that, in general
terms, it is possible to reject a null hypothesis without the alternative hypothesis being correct.
Therefore a is the only incorrect statement.

Check your answer

18

Suppose that a hypothesis test is conducted using a 5% significance level. Which of the following
statements are correct?

(i) The significance level is equal to the size of the test

(ii) The significance level is equal to the power of the test


(iii) 2.5% of the total distribution will be in each tail rejection region for a 2-sided test

(iv) 5% of the total distribution will be in each tail rejection region for a 2-sided test.

a) (ii) and (iv) only

b) (i) and (iii) only

c) (i), (ii), and (iii) only

d) (i), (ii), (iii), and (iv)

Correct! (i) and (iii) are true while (ii) and (iv) are false. The significance level and the size of the test are
different ways of saying the same thing: they measure the proportion of the total distribution of that the
test statistic is assumed to follow which is placed in the rejection region(s). The significance level is equal
to the probability of type I error. The probability of type II error is given by the power of the test, not the
significance level. If a 5% significance level is used, this implies that in total 5% of the whole distribution
must be in the rejection region and if the test is 2-sided, this means that 2.5% of the distribution will be
in each of the rejection regions, and not 5% in each tail. 5% in each tail would imply a 10% significance
level for a 2-sided test, while the question specifies a 5% significance level.

Check your answer

19

Consider an identical situation to that of question 21, except that now a 2-sided alternative is used.
What would now be the appropriate conclusion?

a) H0 is rejected

b) H0 is not rejected

c) H1 is rejected

d) There is insufficient information given in the question to reach a conclusion

Incorrect! Now, if a 2-sided test is used, the test statistic would still take the same value, and rejection
would occur if the test statistic fell in either region. Since the 5% 2-sided critical values are close to -2
and +2, the statistic is clearly now in the rejection region, and hence a is correct.
Check your answer

20

Which one of the following would be the most appropriate as a 95% (two-sided) confidence interval for
the intercept term of the model given in question 21?

a) (-4.79,2.19)

b) (-4.16,4.16)

c) (-1.98,1.98)

d) (-5.46,2.86)

Incorrect! Recall that the formula for estimating a confidence interval for the intercept parameter would
be

(alphahat - SE(alphahat)Xcritical_value, alphahat + SE(alphahat)Xcritical_value )

putting the relevant terms would give the interval in this case as

(-1.3-1.98X2.1, -1.3+1.98X2.1) or (-5.46,2.86). Therefore d is the correct answer. Errors that you could
have made would include using the one-sided 5% critical value, which would be about 1.66 instead of
1.98. This would have given answer a. The second possible error would be to forget to add in the
coefficient value, so that the interval would be wrongly calculated as (-1.98X2.1, 1.98X2.1), which would
give answer b. Answer c would have been obtained if the critical values alone had been used!

Check your answer

21

Which one of the following is the most appropriate definition of a 99% confidence interval?
a) 99% of the time in repeated samples, the interval would contain the true value of the
parameter

b) 99% of the time in repeated samples, the interval would contain the estimated value of
the parameter

c) 99% of the time in repeated samples, the null hypothesis will be rejected

d) 99% of the time in repeated samples, the null hypothesis will not be rejected when it
was false

Incorrect! Although from a philosophical perspective, some researchers would disagree with this
definition, on this course a 99% confidence interval is taken to mean that 99% of the time in repeated
samples, the interval would contain the true parameter value. Thus a is correct. Of course, by
construction the interval will always contain the parameter estimate exactly in the middle, so b is
incorrect. For a 99% confidence interval, we can say that 99% of the time the null would not be rejected
when the null was correct (i.e. we made the right decision), which is not the formulation of d, so d is
incorrect. We cannot say how often the null hypothesis will be rejected - it depends on whether it is
right or wrong! All we could say is how often the null would be rejected as a result of chance alone.
Therefore c is incorrect.

Check your answer

22

Which one of the following statements best describes a Type II error?

a) It is the probability of incorrectly rejecting the null hypothesis

b) It is equivalent to the power of the test

c) It is equivalent to the size of the test

d) It is the probability of failing to reject a null hypothesis that was wrong

Incorrect! By definition, a type II error occurs when failing to reject a null hypothesis that was wrong.
Thus d is correct. The situation of incorrectly rejecting the null hypothesis when the null was true is a
type I error. This probability of type I error is equal to the size of the test. The power of the test is one
minus the probability of type II error, not the probability of type II error itself.

Check your answer

23
Suppose that a test statistic has associated with it a p-value of 0.08. Which one of the following
statements is true?

(i) If the size of the test were exactly 8%, we would be indifferent between rejecting and not
rejecting the null hypothesis

(ii) The null would be rejected if a 10% size of test were used

(iii) The null would not be rejected if a 1% size of test were used

(iv) The null would be rejected if a 5% size of test were used.

a) (ii) and (iv) only

b) (i) and (iii) only

c) (i), (ii), and (iii) only

d) (i), (ii), (iii), and (iv)

Incorrect! (i) to (iii) are correct while (iv) is incorrect. The p-value is defined as the marginal significance
level where we would be indifferent between rejecting and not rejecting the null hypothesis. This is also
sometimes termed the exact significance level, or the plausibility level for the null hypothesis. Thus,
under this definition, it is clear that if a significance level (size of test) were exactly 8%, we would be
indifferent between rejecting and not rejecting the null hypothesis (so i is true). If we used a larger size
of test (i.e. a bigger rejection region), the null would be rejected (so ii is true), while if we used a smaller
size of test (a smaller rejection region), the null would not be rejected (so iv is false and iii is true).

Check your answer

24

Suppose that observations are available on the monthly bond prices of 100 companies for 5 years. What
type of data are these?
a) Cross-sectional

b) Time-series

c) Panel

d) Qualitative

Incorrect! Since the data have the dimensions of both time series (5 years of observations) and of cross-
sections (100 companies), this would be known as a panel data set. A cross-sectional series would not
have data over a period of time, while a time-series data set would use information on one company at
a time. Bond prices are clearly an example of quantitative rather than qualitative data, since they can
take on any (non-negative) values and are not constrained to take on only certain values as qualitative
data would be.

Check your answer

Вам также может понравиться