Академический Документы
Профессиональный Документы
Культура Документы
(𝑠 2 − 𝜎 2 )(𝑛 − 𝑝)
𝐶𝑝 = 𝑝 +
𝜎2
There are many different ways of writing Eq. (4.20). Some prefer expressing it in
terms of 𝑆𝑆𝑅𝑒𝑔 . At any rate, it expresses variance + bias and if an independent
estimate of 𝜎 2 , say 𝜎̂ 2 , can be found, the 𝐶𝑝 statistic can be extremely useful as a
criterion for discriminating between models. The 𝐶𝑝 for a p-parameter regression
model would then be written as
(𝑠 2 − 𝜎̂ 2 )(𝑛 − 𝑝)
𝐶𝑝 = 𝑝 +
𝜎̂ 2
One then favors the candidate model with the smallest 𝐶𝑝 value.
Often the 𝐶𝑝 values for various candidate models can be displayed on aplot, with the
line 𝐶𝑝 = p representing the norm. A 𝐶𝑝 much larger than p occurs with a heavily
biased model. A typical 𝐶𝑝 plot appears in Figure 4.2. Models A and D appear to be
undesirable, heving 𝐶𝑝 values well above the variance line. Model D is clearly the
poorest performer. Models B and C appear to be reasonable candidates. For model C,
𝐶𝑝 below p = 3 implies the 𝑠 2 value is smaller than 𝜎̂ 2 .
(44.5552 − 26.2073)(12)
𝐶𝑝 = 3 +
26.2073
Clearly, this value is well above 3.0 and thus reflects what would seem to be a biased
model. Simple computations reveal that, for the model (𝑥1 , 𝑥2 , 𝑥3 ). Using 𝐶𝑝 and
PRESS statistics lead to the same conclusion regarding the desirability of model
(𝑥1 , 𝑥2 , 𝑥3 ).
𝐶𝑝 against p plot
We learned from Eq. (4.11) that the residual mean square is biased upward in cases
where the fitted model is biased, i.e., when the analyst is underfitting. In order to
again even more insight into the role of 𝑠 2 as a model discriminator, it is instructive
to consider its properties when the analyst uses an overfitted model. Suppose one
assumes the model
𝑦 = 𝑋1 𝛽1 + 𝑋2 𝛽2 + 𝜀 (𝑚 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠)
When in fact 𝛽2 = 0 and thus the m parameter model is an overfitted model. Letting
𝑋 = [𝑋1 ⋮ 𝑋2 ]
2
𝑦 ′ [𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ]𝑦
𝑠𝑚 =
𝑛−𝑚
The expected value is given by
2)
1
𝐸(𝑠𝑚 = 𝐸𝑦 ′ [𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ]𝑦
𝑛−𝑚
1
= {𝜎 2 𝑡𝑟[𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ] + [𝐸(𝑦)]′ [𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ][𝐸(𝑦)]}
𝑛−𝑚
1
= {𝜎 2 (𝑛 − 𝑚) + 𝛽1′ 𝑋1′ [𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ] + [𝐸(𝑦)]′ [𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ]𝑋1 𝛽1 }
𝑛−𝑚
Details of this development follow the same line as that given in Apendx B.2. From
Eq. (4.23), the expected value of the residual mean square for the overfitted model is
given by
1
2)
𝐸(𝑠𝑚 = 𝜎 2 + 𝑛−𝑚 𝛽1′ 𝑋1′[𝐼 − 𝑋(𝑋 ′ 𝑋)−𝐼 𝑋 ′ ]𝑋1 𝛽1
Even if the investigator overfits, i.e., includes model terms that are zero, the residual
mean square is unbiased for 𝜎 2 . However, the estimator contains fewer degrees of
freedom than the error mean square computed from fitting the “correct” model.
2
𝜎 2 𝜒𝑛−𝑚
2
𝑠𝑚 ~
(𝑛 − 𝑚)
2
𝜎4 2
𝑉𝑎𝑟 𝑠𝑚 = 𝑉𝑎𝑟 𝜒𝑛−𝑚
(𝑛 − 𝑚)
The variance of a 𝜒𝑣2 random variable is 2v, where v is the degrees of freedom. Thus
2
2𝜎 4
𝑉𝑎𝑟 (𝑠𝑚 )=
(𝑛 − 𝑚)