Вы находитесь на странице: 1из 4

We learned that the estimate, 𝑠 2 , of residual variance for an underspecified model is

biased by the quantity


𝑛
[𝐵𝑖𝑎𝑠 𝑦̂(𝑥𝑖 )]2

𝑛−𝑝
𝑖=1

As a result, if 𝜎 2 were known, an estimate of the quantity in Eq.(4.17) (the estimate is


called the 𝐶𝑝 statistic) is given by

(𝑠 2 − 𝜎 2 )(𝑛 − 𝑝)
𝐶𝑝 = 𝑝 +
𝜎2
There are many different ways of writing Eq. (4.20). Some prefer expressing it in
terms of 𝑆𝑆𝑅𝑒𝑔 . At any rate, it expresses variance + bias and if an independent
estimate of 𝜎 2 , say 𝜎̂ 2 , can be found, the 𝐶𝑝 statistic can be extremely useful as a
criterion for discriminating between models. The 𝐶𝑝 for a p-parameter regression
model would then be written as

(𝑠 2 − 𝜎̂ 2 )(𝑛 − 𝑝)
𝐶𝑝 = 𝑝 +
𝜎̂ 2
One then favors the candidate model with the smallest 𝐶𝑝 value.

A reasonable norm by which to judge the 𝐶𝑝 value of a model is 𝐶𝑝 = 𝑝, 𝑎 value that


suggests that the model contains no estimated bias. That is, all of the error in 𝑦̂ is
variance, and the model is not underspecified. Of course, clear interprelation is often
clouded because of the questionable nature of the estimate 𝜎̂ 2 . In many practical
situations, the residual mean square for the full model or most complete model is
used as this estimate. Since the residual mean square for the complete model need not
be the smallest estimate of 𝜎 2 among those for the candidate models, it is quite
possible that Eq. (4.21) will yield a 𝐶𝑝 < 𝑃 for a few of the candidate models.

Often the 𝐶𝑝 values for various candidate models can be displayed on aplot, with the
line 𝐶𝑝 = p representing the norm. A 𝐶𝑝 much larger than p occurs with a heavily
biased model. A typical 𝐶𝑝 plot appears in Figure 4.2. Models A and D appear to be
undesirable, heving 𝐶𝑝 values well above the variance line. Model D is clearly the
poorest performer. Models B and C appear to be reasonable candidates. For model C,
𝐶𝑝 below p = 3 implies the 𝑠 2 value is smaller than 𝜎̂ 2 .

Example 4.2 Sales Data


In this section we consider an example in which the 𝐶𝑝 statistic is used for model
discrimination. This example considers the sales data in table 4.1. It is always
desirable to use an estimate 𝜎̂ 2 that is independent of the 𝑠 2 value for the candidate
model. Unfortunately such an estimate is not always available. Common practice
among data analysts is use of the residual mean square for the complete model as 𝜎̂ 2 .
In the case of the sales data, this is 𝜎̂ 2 = 26.2073.

(44.5552 − 26.2073)(12)
𝐶𝑝 = 3 +
26.2073
Clearly, this value is well above 3.0 and thus reflects what would seem to be a biased
model. Simple computations reveal that, for the model (𝑥1 , 𝑥2 , 𝑥3 ). Using 𝐶𝑝 and
PRESS statistics lead to the same conclusion regarding the desirability of model
(𝑥1 , 𝑥2 , 𝑥3 ).

𝐶𝑝 against p plot

PROPERTIES OF RESIDUAL MEAN SQUARE FOR AN OVERFITTED


MODEL

We learned from Eq. (4.11) that the residual mean square is biased upward in cases
where the fitted model is biased, i.e., when the analyst is underfitting. In order to
again even more insight into the role of 𝑠 2 as a model discriminator, it is instructive
to consider its properties when the analyst uses an overfitted model. Suppose one
assumes the model

𝑦 = 𝑋1 𝛽1 + 𝑋2 𝛽2 + 𝜀 (𝑚 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠)

When in fact 𝛽2 = 0 and thus the m parameter model is an overfitted model. Letting

𝑋 = [𝑋1 ⋮ 𝑋2 ]

The residual mean square for the overfitted model is given by

2
𝑦 ′ [𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ]𝑦
𝑠𝑚 =
𝑛−𝑚
The expected value is given by

2)
1
𝐸(𝑠𝑚 = 𝐸𝑦 ′ [𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ]𝑦
𝑛−𝑚
1
= {𝜎 2 𝑡𝑟[𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ] + [𝐸(𝑦)]′ [𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ][𝐸(𝑦)]}
𝑛−𝑚
1
= {𝜎 2 (𝑛 − 𝑚) + 𝛽1′ 𝑋1′ [𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ] + [𝐸(𝑦)]′ [𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ]𝑋1 𝛽1 }
𝑛−𝑚
Details of this development follow the same line as that given in Apendx B.2. From
Eq. (4.23), the expected value of the residual mean square for the overfitted model is
given by
1
2)
𝐸(𝑠𝑚 = 𝜎 2 + 𝑛−𝑚 𝛽1′ 𝑋1′[𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ]𝑋1 𝛽1

Since 𝑋 ′ [𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ] = 0, 𝑋1′ [𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ] = 0, and thus


2)
𝐸(𝑠𝑚 = 𝜎2

Even if the investigator overfits, i.e., includes model terms that are zero, the residual
mean square is unbiased for 𝜎 2 . However, the estimator contains fewer degrees of
freedom than the error mean square computed from fitting the “correct” model.

In the normal theory case, we know that


2 (𝑛
𝑠𝑚 − 𝑚) 2
~𝜒𝑛−𝑚
𝜎2
See Graybill (1976). As a result,

2
𝜎 2 𝜒𝑛−𝑚
2
𝑠𝑚 ~
(𝑛 − 𝑚)

2
𝜎4 2
𝑉𝑎𝑟 𝑠𝑚 = 𝑉𝑎𝑟 𝜒𝑛−𝑚
(𝑛 − 𝑚)

The variance of a 𝜒𝑣2 random variable is 2v, where v is the degrees of freedom. Thus

2
2𝜎 4
𝑉𝑎𝑟 (𝑠𝑚 )=
(𝑛 − 𝑚)

Вам также может понравиться