Академический Документы
Профессиональный Документы
Культура Документы
3
The Scenario
multiple regression
4
Assessing Systematic Relationships
5
Graphical Methods for Analyzing Data
Techniques
• scatterplots
• scatterplot matrices
» also referred to as “casement plots”
• Time sequence plots
45
40
35
30
DISCOLOR
25
20
15
trend - possibly
10 nonlinear?
5
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
FLUORIDE
45
40
35 signficant trend?
30 - doesn’t appear to
DISCOLOR
25 be present
20
15
10
5
4 5 6 7 8 9 10 11 12 13
BRUSHING
30
DISCOLOR
25
20
15
10
5
4 5 6 7 8 9 10 11 12 13
BRUSHING
FLUORIDE
AGE
BRUSHING
DISCOLOR
470
460
450
90% point (degrees F)
440
430
420
410
meandering about
400
average operating point
390
0 30 60 90 120 150
- time
180
correlation
210 240
in data
270
chee824 - Winter 2004 J. McLellan 13
What do dynamic data look like?
0 var1
#1 # 301 # 601 # 901 # 1201 # 1501 # 1801 # 2101 var2
# 151 # 451 # 751 # 1051 # 1351 # 1651 # 1951
Quantitative Methods
• correlation
» formal def’n plus sample statistic (“Pearson’s r”)
• covariance
» formal def’n plus sample statistic
15
Covariance
Formal Definition
Y Y
X X
mean of X, Y mean of X, Y
17
Correlation
Sample Correlation
1 N
∑ ( X i − X )(Yi − Y )
N − 1i =1
r=
s X sY
19
Making Inferences
N − 3(tanh −1 ( r ) − tanh −1 ( ρ ))
−1
» derive confidence limits for tanh ( ρ ) and convert to
confidence limits for the true correlation using tanh
20
Confidence Interval for Correlation
Procedure
1. find zα /2 for desired confidence level
−1
2. confidence interval for tanh ( ρ ) is
−1 1
tanh ( r ) ± zα / 2
N −3
3. convert to limits to confidence limits for correlation by
taking tanh of the limits in step 2
230
220
210
thickness
200
190
180
170
160
150
140
200 210 220 230 240 250 260 270
tem perature
23
Example - Solder Thickness
Confidence Interval
24
Empirical Modeling - Terminology
• response
» “dependent” variable - responds to changes in other
variables
» the response is the characteristic of interest which we are
trying to predict
• explanatory variable
» “independent” variable, regressor variable, input, factor
» these are the quantities that we believe have an
influence on the response
• parameter
» coefficients in the model that describe how the
regressors influence the response
25
Models
Y = f ( X ,θ ) + ε
explanatory parameters
variables
26
The Random Error Term
27
Types of Models
28
Linear Regression Models
29
Nonlinear Regression Models
−E
r = k 0 exp( ) nonlinear
RT
linear
(if E is fixed)
30
Nonlinear Regression Models
32
Ordinary LS vs. Multi-Response
33
Linear Multiple Regression
Model Equation
Yi = β1 X i1 +K+ β p X ip + εi
34
Assumptions for Least Squares Estimation
35
Assumptions for Least Squares Estimation
response o
o deterministic
(solder thickness)
o “true”
o relationship
o
T
prediction error
“residual”
More Notation and Terminology
Measurement y = β0 + β1x + ε
39
More Notation and Terminology
40
Matrix Representation for Multiple Regression
We can arrange the observations in “tabular” form - vector of
observations, and matrix of explanatory values:
Y1 X11 X12 L X1 p ε1
β
1
Y2 X 21 X 22 L X2 p
ε2
β
2
M = M M M M
+ M
M
YN −1 X N −1,1 X N −1,2 L X N −1, p
ε N −1
β
p
YN X N ,1 X N ,2 L X N , p ε N
41
Matrix Representation for Multiple Regression
Y = X β +ε
Nx1
vector Nx1
Nxp px1 vector
matrix vector
42
Least Squares Parameter Estimates
43
Residual Vector
~
Given a set of parameter values β , the residual vector is formed
from the matrix expression:
e1 Y1 X11 X12 L X1 p
β~
1
e2 Y2 X 21 X 22 L X2 p
~
β2
M = M − M M M M
M
e N −1 YN −1 X N −1,1 X N −1,2 L X N −1, p
~
β p
e N YN X N ,1 X N ,2 L X N , p
44
Sum of Squares of Residuals
45
Least Squares Parameter Estimates
46
Least Squares Parameter Estimates
47
Example - Solder Thickness
Let’s analyze the data considered for the straight line case:
48
Example - Solder Thickness
1716. 1 245 ε1
In matrix form: 2011
. 1 215 ε2
213.2 1 218 ε3
153.3 1 265 ε4
178.9 1 251 β0 ε5
Y = Xβ + ε ⇔
=
+
226.6 1 213 β1 ε6
190.3 1 234 ε7
171 1 257 ε8
197.5 1 244 ε9
209.8 1 225 ε
10
49
Example - Solder Thickness
10 2367 1910
( X T X) = ; XT Y =
2367 563335 449420
50
Example - Solder Thickness
51
Example - Wave Solder Defects
52
Example - Wave Solder Defects
100 1 −1 −1 − 1 ε1
In matrix form:
119 1 1 −1 − 1 ε2
118 1 −1 1 − 1 ε3
217 1 1 1 − 1 ε4
β0
20 1 −1 −1 1 ε
5
Y = Xβ + ε ⇔
β1
42 = 1 1 −1 1 + ε
6
β2
41 1 −1 1 1 ε
7
β 3
113 1 1 1 1 ε8
101 1 0 0 0 ε9
96 1 0 0 0 ε
10
115 1 0 0 0 ε
11
53
Example - Wave Solder Defects
11 0 0 0 1082
0 8 0 0 212
( X T X) = ; XT Y =
0 0 8 0 208
0 0 0 8 − 338
54
Example - Wave Solder Defects
1 0 0 1082 93.36
0
11
1 212 26.50
0 0 0
8
β$ = ( XT X) −1 XT Y = =
1
0 0 0 208 26.0
8
1
0 0 0 − 338 − 42.25
8
55
Examples - Comments
56
Graphical Diagnostics
57
Graphical Diagnostics
* * * y$i
- roughly half the residuals
* * * * are positive, half negative
58
Graphical Diagnostics
59
Graphical Diagnostics
NON-CONSTANT VARIANCE
60
Graphical Diagnostics
61
Graphical Diagnostics
residual *
ei *
* * * * * w
* * *
** * systematic trend
not accounted for in model
- include a linear term in “w”
62
Graphical Diagnostics
63
Quantitative Diagnostics - Ratio Tests
64
Quantitative Diagnostics - Ratio Tests
65
Quantitative Diagnostics - Ratio Tests
66
Quantitative Diagnostics - Ratio Test
SSR SSE
N N
= ∑ ( y$i − y ) 2 = ∑ ( yi − y$i ) 2
i =1 i =1
N
TSS = ∑ ( yi − y ) 2
i =1
68
Analysis of Variance (ANOVA) for Regression
69
Quantitative Diagnostics - R2
R 2 = [ corr ( y , y$)]2
» relationship to sums of squares:
2 SSE SSR
R = 1− =
TSS TSS
» values typically reported in “%”, i.e., 100 R2
» ideal - R2 near 100%
70
Issues with R2
71
Adjusted R2
2 MSE SSE / ( N − p)
Radj = 1− = 1−
TSS / ( N − 1) TSS / ( N − 1)
» want value close to 1 (or 100%), as before
» if N>>p, adjusted R2 is close to R2
» provides measure of agreement, but does not account for
magnitude of residual error
72
Testing the Need for Groups of Terms
Test
» compare difference in residual variance between full and
reduced model
» benchmark against an estimate of the inherent variation
» if significant, conclude that the group of terms ARE
required
» if not significant, conclude that the group of terms can be
dropped from the model - not explaining significant trend
» note that remaining parameters should be re-estimated in
this case
73
Testing the Need for Groups of Terms
Test:
A - denotes the full model (with all terms)
B - denotes the reduced model (group of terms deleted)
Form:
SSE model A − SSE model B
2
s ( p A − pB )
74
Testing the Need for Groups of Terms
Fp A − pB ,νinherent ,0.95
75
Lack of Fit Test
Replicates -
» repeated runs at the SAME experimental conditions
» note that all explanatory variables must be at fixed
conditions
» indication of inherent variance because no other factors
are changing
» measure of repeatibility of experiments
76
Using Replicates
77
Using Replicates
78
The Lack of Fit Test
“lack of fit”
sum of squares
m ni 2
SSEP = ∑ ∑ ( yij − yi • )
i =1 j =1
i.e., add together sums of squares associated with each replicate
group (there are “m” replicate groups in total)
80
The Lack of Fit Test
82
Example - Wave Solder Defects
Many statistical software packages will perform the Lack of Fit test
in their Regression modules - Excel does NOT
84
The Parameter Estimate Covariance Matrix
86
Parameter Estimate Covariance Matrix
{
Σ = E ( β$ − β )( β$ − β ) T }
Compare expression with variance for single parameter:
$ $ 2
Var ( β ) = E {( β − β ) }
For linear regression, the covariance matrix is obtained as:
Σ = ( XT X) −1σ ε2
87
Parameter Estimate Covariance Matrix
For wave solder defect data, the sample variance of the replicates
is 384.86 with 7 degrees of freedom, and the parameter
covariances are:
1 0
11
0 0
34.99 0 0 0
1
0 0 0 0 48.11 0 0
8
Σ$ = ( XT X) −1 se2 = ( 384.86) =
1
residual 0 0 48.11 0
0 0
8
0
variance from
MSE 1
0 0 0 0 0 0 48.11
8
89
Using the Covariance Matrix
90
Correlation of the Parameter Estimates
Note that
β$0 = Y − β$1x
91
Getting Rid of the Covariance
β$i ± tν ,α / 2 sβ$
i
The degrees of freedom for the t-statistic come from the
estimate of the inherent noise variance
» the degrees of freedom will be the same for all of the
parameter estimates
β$i
Test statistic:
sβ$
i
» compare absolute value to tν ,α /2
» if test statistic is greater (“outside the fence”), parameter
is significant -- retain
» inside the fence? - consider deleting the term
95
Example - Wave Solder Defects Data
96
Example - Wave Solder Defects Data
From Excel:
prob. that
standard dev’ns. test statistic a value is confidence
of each parameter for each greater than limits
estimate parameter computed test
ratio - 2-tailed
test! 98
Precision of the Predicted Responses
99
Precision of the Predicted Responses
In general, both the variances and covariances of the parameter
estimates must be taken into account.
For prediction at the k-th data point:
Var ( y$ k ) = xTk ( XT X) −1x k σ ε2
xk 1
xk 2
[
= xk 1 ]
xk 2 L xkp ( XT X) −1
2
σ ε
M
xkp
100
Example - Wave Solder Defects Model
Var ( y$11 ) = Var ( β$0 ) + Var ( β$1 )( 0) + Var ( β$2 )( 0) + Var ( β$3 )( 0)
= Var ( β$ )
0
101
Precision of “Future” Predictions
102
Estimating Precision of Predicted Responses
103
Confidence Limits for Predicted Responses
y$ k ± tν ,α / 2 s2y + se2
$ future
104
Practical Guidelines for Model Development
~ xi − xi
Coding - one standard form: xi = 1
range( xi )
2
» places designed experiment into +1,-1 form
» if run conditions are from an experimental design, this
coding must be used in order to obtain all of the benefits
from the design - uncorrelated parameter estimates
» if conditions are not from an experimental design, such a
coding improves numerical conditioning of the problem --
similar numerical scales for all variables
105
Practical Guidelines for Model Development
2) Types of models -
» linear in the explanatory variables
» linear with two-factor interactions (xi xj)
» general polynomials
106
Practical Guidelines for Model Development
108
Polynomial Models
Order - maximum over the p terms in the model of the sum of the
exponents in a given term
e.g.,
2 2 3
Y = β0 + β1x1 + β2 x2 + β3 x1 x2 + ε
is a fifth-order model
109
Polynomial Models
Comments -
» polynomial models can sometimes suffer from collinearity
problems - coding helps this
» polynomials can provide approximations to nonlinear
functions - think of Taylor series approximations
» high-order polynomial models can sometimes be
replaced by fewer nonlinear function terms
• e.g., ln(x) vs. 3rd order polynomial
110
Joint Confidence Region (JCR)
111
Joint Confidence Region
β$ ±t
Step 3) Rearrange this interval to obtain interval i ν ,α / 2 sβ$i
which contains true value of parameter 100(1 − α )% of the time
112
Joint Confidence Region
113
Joint Confidence Region
Sequence:
Step 1) Identify a statistic which is a function of the parameter
estimate statistics
114
Joint Confidence Region
The quantity
( β$ − β ) T XT X( β$ − β )
p
~ Fp,n − p
estimate of s2 ε
inherent
noise variance
(if MSE is used, degrees of freedom is n-p)
is the ratio of two sums of squares, and is distributed as an F-
distribution with p degrees of freedom in the numerator, and n-p
degrees of freedom in the denominator
115
Joint Confidence Region
( β$ − β ) T X T X ( β$ − β ) ≤ psε2 Fp ,n − p ,α
116
Joint Confidence Region - Definition
( β$ − β ) T X T X ( β$ − β ) ≤ psε2 Fp ,n − p ,1−α
Interpretation:
» the region defined by this inequality contains the true
values of the parameters 100(1 − α )% of the time
» if values of zero for one or more parameters lie in this
region, those parameters are plausibly zero, and
consideration should be given to dropping the
corresponding terms from the model
117
Joint Confidence Region - Example with 2 Parameters
118
Joint Confidence Region - Example with 2 Parameters
119
Joint Confidence Region - Example with 2 Parameters
-1.6
( β$ − β ) T XT X( β$ − β )
What is the motivation for the ratio
p
sε2
used to define the joint confidence region?
Consider the joint distribution for the parameter estimates:
1 1 $
exp{− ( β − β ) T Σ −$1 ( β$ − β )}
( 2π ) p / 2 det( Σ β$ ) 2 β
b1
lower upper b
b0
volume = 1-alpha
area = 1-alpha Joint Confidence
Region
123
Relationship to Marginal Confidence Limits
Region
marginal confidence interval
-0.6
Slope
for slope
124
Relationship to Marginal Confidence Limits
95% confidence
Region
region implied by
considering parameters
marginal confidence interval
-0.6
individually
125
Relationship to Marginal Confidence Intervals
126
Going Further - Nonlinear Regression Models
random noise
Model: component
Yi = η ( xi ,θ ) + εi
explanatory parameters
variables
Estimation Approach:
» linearize model with respect to parameters
» treat linearization as a linear regression problem
» iterate by repeating linearization/estimation/linearization
about new estimates,… until convergence to parameter
values - Gauss-Newton iteration - or solve numerical
optimization problem
127
Interpretation - Columns of X
y$
y$
model surface
chee824 - Winter 2004 J. McLellan 130
Properties of LS Parameter Estimates
E{ofβ$repeated
» “average” } = β data collection / estimation
sequences will be true value of parameter vector
Consistent
» behaviour as number of data points tends to infinity
» with probability 1,
lim β$ = β
N →∞
» distribution narrows as N becomes large
Efficient
» variance of least squares estimates is less than that of
other types of parameter estimates
Covariance Structure
» summarized by variance-covariance matrix
$ T −1 2
Cov( β ) = ( X X) σ
… in matrix form -
var( y$ k ) = x Tk ( X T X) −1 x k σ 2
where is vector of conditions at k-th data point
xk
β1
marginal confidence limits
chee824 - Winter 2004 J. McLellan 135