Вы находитесь на странице: 1из 160

Management 6381: Managerial Statistics

Lectures and Outlines

2016 Bruce Cooil

TABLE OF CONTENTS

Lecture 1
Descriptive Statistics
(Including Stem &
Leaf Plots, Box Plots,
Regression Example)
Stem & Leaf Display
Descriptive Statistics: Means,
Median, Std.Dev., IQR
Box Plot
Regression

Lecture 2
Central Limit
Theorem & CIs
Statement of Theorem
Simulations
Practical Issues & Examples
Tail Probabilities & Z-values
Z-Value Notation
Picture of CLT
Everything There Is to Know
Summary & 3 Types of CIs
Glossary

Lecture 3
CIs & Introduction to
Hypothesis Testing
Examples of Two Main Types
of CIs
Hypothesis Testing
Type I & Type II Error
Pictures of the Right and Left
Tail P-Values
Big Picture Recap
Glossary

Lecture 4
One- & Two-Tailed
Tests, Tests on
Proportion, & Two
Sample Test

1
1

When to Use t-Values (Case 2)


Test on Sample Proportion
(Case 3)
Means from Two Samples
(Case 4)
About t-Distribution

2
3
10

12
12
13
15
16
17
19
20
21
22

25
27

35
38

39
39
44
45
47

Main Regression Output


Measures of Fit
Correlation
Discussion Questions
Interpretation of Plots

29
30
31

34

Tests on Two Proportions


Odds, Odds Ratio, & Relative
Riskon Paired Samples
Tests
Finding the Right Case

Purposes
The Three Assumptions,
Terminology, & Notation
Modeling Cost in Terms of
Units
Estimation & Interpretation of
Coefficients
Decomposition of SS(Total)

23

34

Lecture 5
More Tests on Means
from Two Samples

Lecture 6
Simple Linear
Regression

23

32

How to Do Regression in
Minitab
How to Do Regression in Excel

48
48
49
50
51
52
53
54
56
57
59
61
62

See the Bottom Right Corner of Each Page for the Document Page Numbers Listed Here.

TABLE OF CONTENTS
Lecture 6 Addendum
Terminology, Examples
&Notation
Synonym Groups
Main Ideas
Examples of Correlation
Notation for Types of Variation
and R2

Lecture 7
Inferences About
Regression Coefficients
& Confidence/Prediction
Intervals for Y /Y
Modeling Home Prices Using
Rents
Regression
Output
Two Basic Tests
Test for Lack- of-Fit
Test on Coefficients
Prediction Intervals & Confidence
Intervals
How to Generate These Intervals
in Minitab 17

Lecture 8
Introduction to Multiple
Regression
Application to Predicting Product
Share (Super Bowl Broadcast)
3-D Scatterplot
Regression Output
Sequential Sums of Squares
Squared Coefficient t-Ratio
Measures Marginal Value
Discussion Questions on
Interpreting Output

Lecture 9
More Multiple
Regression Examples

63
63
63
64

90

Modeling Salaries (NFL


Coaches2015)
Modeling Home Prices
Regression Dialog Boxes

90
93
99

66

Lecture 10
Strategies for Finding the
Best Model
Stepwise Approach
Best Subsets Approach
Procedure for Finding Best Model
Studying Successful Products (TV
Shows)
Best Subsets Output
Stepwise Options
Stepwise Output
Best Predictive Model
Regression on All Candidate
Predictors to Find Redundant
Predictors
Other Criteria for Selecting Models
Discoverers

67
68
72
73
74
75
76
77

Lecture 11
1-Way Analysis of
Variance (ANOVA) as a
Multiple Regression

80
81
82
84
85
86
88

102
102
103
104
105
108
109
110
111

113
114
115

116

Comparing Different Types of


Mutual Funds
Meaning of the Coefficients
Purpose of Overall F-test and
Coefficient t-Tests
Comparing Network Share by
Location of Super Bowl
Broadcast
Standard
Formulation of 1-Way
ANOVA

116

Analysis of Covariance

126
128

Looking Up F Critical Values

118
120
122
125

See the Bottom Right Corner of Each Page for the Document Page Numbers Listed Here.

TABLE OF CONTENTS

Lecture 12
Chi-Square Tests for
Goodness-of-Fit &
Independence

129

Goodness-of-Fit Test
Test for Independence
Using
Using Minitab
Minitab

129
130
130
132
132

Lecture 13
Executive Summary &
Notes for Final Exam,
Outline of the Course &
Review Questions

133

Executive Summary & Notes for


Final
Outline of Course
Review Questions with Answers
Appendix for Review Questions

133
135
140
145

The Outlines
Tests Concerning Means
and Proportions &
Outline of Methods for
Regression

149

Tests Concerning Means and


Proportions
151
Confidence Intervals for the Seven
Cases
Outline of Methods for Regression

153
154

See the Bottom Right Corner of Each Page for the Document Page Numbers Listed Here.

Lecture 1: Descriptive Statistics


Managerial Statistics
Reference: Ch. 2: 2.4, 2.6pp. 56-59, 67-68; Ch. 3: 3.1-3.4 --pp. 98-105, 108-116, 118-129
Outline:
!Stem and Leaf displays
!Descriptive Statistics
Measures of the Center: mean, quartiles, trimmed mean, median
Measures of Dispersion: standard deviation, interquartile range
!Box plots & Regression as Descriptive Tools
Stem and Leaf Displays The rules:
1) List extremes separately;
2) Divide the remaining observations into from 5 to 15 intervals;
3) The stem represents the first part of each observation and is used to label the interval, while the
leaf represents the next digit of each observation;
4) Dont hesitate to bend or break these rules.
Famous Ancient Example (modified slightly): Salaries of 10 college graduates in thousands (1950s):

2.1, 2.9, 3.2, 3.3, 3.5, 4.6, 4.8 , 5.5, 7.9, 50.
Stem and Leaf
(With trimming)
Units: 0.10

Thousand $

Stem and Leaf Display When Numbers Above are


in Units of $100,000 (i.e., same data X 100)
UNITS: 0.1 *100 = 10 Thousand $

2|19

3|235

Same Display

4|68
5|5
6|
7|9
High: 500
High: 500
MINITABs Version: This is an option in the Graph Menu, Or you can give the commands shown.
Stem-and-Leaf Displays
With Trimming

MTB > Stem c1;


SUBC> trim.
Leaf Unit = 0.10
2
5
5
3
2
2
HI

2 19
3 235
4 68
5 5
6
7 9
500

No Trimming! (Here the extreme observations are


included in the main part of the display.)

MTB> Stem c1
Leaf Unit = 1.0
(9)
1
1
1
1
1

0 223334457
1
2
3
4
5 0
Page 1 of 156

Another Example: Make S&L of 11 customer expenditures at an electronics store


(dollars): 235, 403, 539, 705, 248, 350, 909, 506, 911, 418, 283.

Units:
3
4
(2)
5
3
3
2
2

$10
2|348
3|5
4|01
5|30
6|
7|0
8|
9|01

Now reconsider the first example with the 10 salaries!


I put those 10 observations into the first column of a Minitab spreadsheet (or
worksheet) and then asked for descriptive statistics.
MTB > desc c1

Descriptive Statistics
Variable
Salaries

N
10

Mean
8.78

Median
4.05

TrMean
4.46

Variable
Salaries

Minimum
2.10

Maximum
50.00

Q1
3.12

Q3
6.10

StDev
14.58

SE Mean
4.61

What do the Mean, TrMmean, Median, Q1" and Q3" represent?


Mean: Average of Sample
TrMean (5% Trimmed Mean): Average of middle 90% of sample
Median: Middle Observation (when n is even: average of middle two obs.)
OR 50th Percentile
Q1 & Q3 (1st and 3rd quartiles): 25th and 75th Percentiles

Page 2 of 156

Note how the median is much better measure of a typical central value in this case.
Recall how standard deviation is calculated.
First the sample variance is calculated:
2

S = estimate of average squared distance from the mean


2

= {sum of squared differences (Obs-Mean) }/(n-1)


={2.1 -8.78)2 +(2.9 -8.78)2 +...+ (50 -8.78)2 }/9 = 212.6 .
Then the sample standard deviation is calculated as the square root of the
sample variance:
s = (212.6) = 14.58 .
As a descriptive statistic, s is usually interpreted as the typical distance of an
observation from the mean. But what does s actually measure?

Square root of average squared distance from mean


Whats the disadvantage of S as a measure of dispersion (or spread)?

Sensitive to extreme observations (large and small)


Whats an alternative measure of dispersion that is insensitive to extremes?

0.75 * (Q3 - Q1)


[Q3-Q1] is referred to as the interquartile range (IQR). If the distribution is
approximately normal, then
(0.75)(Q3 - Q1) (0.75) IQR
provides an estimate of the population standard deviation ().
For sample of 10 salaries: (0.75) IQR = 0.75(6.10 - 3.12) =2.2.

(Compare with s= 14.58.)

The Boxplot
Elements: Q1, median, and Q3 are represented as a box, and 2 sets of fences
(inner and outer) are graphed at intervals of 1.5 IQR below Q1 and above Q3.
The figures on pages 122-125 (in our text by Bowerman et al.) provide good
illustrations.

Page 3 of 156

Inner Fences

Page 4 of 156

MINITAB Boxplot of the 10 Salaries

Result of Menu Commands: Graph Boxplot


Boxplot of Salaries
50

Salaries

40

30

20

10

Page 5 of 156

More Examples with Another Data Set Where We Compare Distributions


These data are from http://www.google.com/finance and consist of daily closing prices and
returns (in %) for Google stock and the S&P500 index (see the variables Google_Return and
S&P500_Return below), and a column of standard normal observations.

Page 6 of 156

Page 7 of 156

Boxplot of Google_Return, S& P_Return, Standard_Normal


2
1
0

Data

-1
-2
-3
-4
-5
22-Apr-16

Apparently due to announcement of disappointing


quarterly results

-6
Google_Return

S& P_Return

Standard_Normal

(Recall what the Standard Normal distribution looks like, e.g. http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg .)
MTB > describe c3 c5 c6

(Or to do same analysis from menus: start from Stat menu, got to Basic Statistics & then to Display
Descriptive Statistics, then in the dialog box select c3, c5,and c6 as the variables.)

Descriptive Statistics: Google_Return, S&P_Return, Standard_Normal


Descriptive Statistics: Google_Return, S&P_Return, Standard_Normal

Variable
Google_Return
S&P_Return
Standard_Normal

N
29
29
30

N*
1
1
0

Mean
-0.207
0.026
-0.134

SE Mean
0.260
0.116
0.170

StDev
1.401
0.624
0.931

Minimum
-5.414
-1.198
-1.778

Q1
-0.772
-0.414
-0.813

Median
-0.086
0.017
-0.184

Q3
0.771
0.534
0.598

Maximum
1.674
1.051
1.871

Page 8 of 156

In contrast to the boxplots on the previous page, many business distributions are
positively skewed. For example, here is a comparison of the revenue distributions
for the largest firms in three health care industries.
Boxplot of 201 4 Revenues in Three Health Care Industries
(for Firms That Are Among the Largest 1000 in U.S.)
1 20

United_Health_ Group

Express_Scripts_Holding

Revenue (Billions)

1 00

80

60

40

HCA_Holdings

20

0
Insurance & Managed Care
(12 Firms)

Medical Facilities
(13 Firms)

Pharmacy & Other Services


(13 Firms)

Page 9 of 156

10

Page 10 of 156

11

Fitted Line Plot

Google_Return = - 0.231 7 + 0.9408 S& P_Return


2

S
R-Sq
R-Sq(adj)

1 .29557
1 7.5%
1 4.5%

Google_Return

0
-1
-2
-3
-4
-5
-6
-1 .5

-1 .0

-0.5

0.0

0.5

1 .0

S& P_Return

The regression equation is approximately:


Google_Return = - 0.2317 + 0.9408 [S&P_Return]
This equation describes the relationship between Google returns and
returns on the S&P500. If we assume the S&P500 represents the market
as a whole, then this regression model is a form of the market model (or
security characteristic line). It summarizes the relationship between
contemporaneous returns, since the Google returns and S&P returns
occur at the same time. Consequently, it has no direct predictive value
(i.e., it does not allow me to predict future Google returns because that
would require me to know future S&P returns). Nevertheless, it allows
me to study the relationship between Google returns and the market as a
whole. For example, the expected return on Google stock, when the
S&P500 return is 0, is:

Google_Return = - 0.2317 + 0.9408 [0] = -0.23% (approximately)


The regression analysis also shows that there is a relatively weak
relationship between the two types of returns: the R-squared (adjusted)
value (to the right of the plot) indicates that only 14.5% of the variance
in Google returns is explained by the market return.
This type of regression is generally done with weekly or monthly
returns, rather than daily returns (as was done here).
Page 11 of 156

Lecture 2: The Central Limit Theorem & Confidence Intervals


Outline (Reference: Ch. 7-8: 7.1-7.3, App 7.3, 8.1-8.5, App 8.3):
The Central Limit Theorem (CLT):
Sample Means Have Approximately a Normal Distribution (given a
sufficiently large sample, from virtually any parent distribution)
Illustration of How Sample Proportion is a Sample Mean (with simulations)
Two Examples: Example 1: sample mean; Example 2: sample proportion
Introduction to Confidence Intervals (CLT application): Z-values, Picture of
CLT, a Quiz, & Major Types of CIs

Assume you have a "large" sample size n, and that you find the sample mean,
, as the average of n observations, each of which is from a parent

distribution (or population) with mean and standard deviation .


Statement of the Central Limit Theorem:
The sample mean has approximately a normal distribution with mean
and standard deviation /n.
Note: A sample proportion ( ) is simply the mean of n independent
observations where each observation takes on the value "1" with probability p,
and takes on the value 0" with probability 1-p.
For example, suppose a company studies the probability (p) that an individual
customer complains. Think of each customer response as a random variable
X, which takes on the value "1" if they complain, and "0" otherwise. Imagine
that each customer complains with a probability p = 0.1.
What is the name of the parent distribution of X ?
X has a Bernoulli distribution, which is also referred to as a binomial
distribution with 1 trial (n=1, p=0.1). The mean () is n*p =1*0.1=0.1,
and the standard deviation () is [np(1-p)] =[1*0.1*0.9]1/2 = 0.3.

By Central Limit Theorem: if I sample 100 customers, then the proportion who
complain (p^) will have a normal distribution (approximately), with the same mean
(0.1) but a much smaller standard deviation of

0.3/n = 0.3/100 = 0.03.


Page 12 of 156

Here is a picture of the parent distribution.


Parent Distribution: Binomial (n=1, p=0.1)

0.9
0.8

Frequency

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0

Value of Observation (1: Complaint; 0: No Complaint)

In a simulation, I repeatedly took a random sample of 100


observations from the parent distribution above, and calculated
the mean of each sample of 100 observations. I did this a 1000
times. Here is a histogram of those 1000 means.
Histogram of 1000 Means (Each is the Average of 100 Observations)
(and Comparison with Normal Distribution)
160
140

0.09962

StDev

0.02918

120

Frequency

M ean

1000

100
80
60

Note: this is
approximately 0.03

40
20
0

0.03

0.06

0.09

0.12

0.15

0.18

0.21

Value of the Mean

As predicted by the Central Limit Theorem: this distribution is


approximately normal and the sample mean (the mean of the
means) is approximately 0.1 (same as the parent), and the
sample standard deviation (of the means) is approximately 0.03
(= [parent distributions std.dev.]/ n = 0.3/100).
Page 13 of 156

Another simulation: suppose we toss one fair die. Here is the


probability distribution of the outcome.
Parent Distribution: Integers 1-6 Are Equally Probable

0.18
0.16

Frequency

0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00

Outcome of Tossed Die

I repeatedly take a random sample of 2 observations from the


parent distribution above, and calculate the mean of each sample
of 2 observations. I do this a 1000 times. Here is a histogram of
those 1000 means (each mean is of only 2 observations).
Histogram of 1000 Means (Each is the Average of 2 Observations)
180
160

3.539

StDev

1.184

140

Frequency

Mean

1000

Note: This is
approximately 1.2

120
100
80
60
40
20
0

0.8

1.6

2.4

3.2

4.0

4.8

5.6

6.4

Value of the mean

As predicted by the Central Limit Theorem: this distribution is


approximately normal and the sample mean (the mean of the
means) is approximately 3.5 (same as the parent), and the
sample standard deviation (of the means) is approximately 1.2
(1.2 = [parent distributions std. dev.]/n = 1.7/2).

Page 14 of 156

Practical Issues & Two Examples


How large should n be? Here are two guides:
: n > 30 (this is a conservative rule);
1)for a typical sample mean,
& (
) .
2)for a sample proportion: n large enough so that
Example 1
Here are descriptive statistics for 40 annual returns on the S&P500
(these returns are simple annual percent gain or loss on index, without
compounding or inclusion of dividends), 1975-2014.
StDev/N = [16.57/40

MTB > desc 'S&P_Return'


Descriptive Statistics: S&P_Return

Variable
S&P_Return
Variable
S&P_Return

N
40

N*
0
Q1
4.99

Mean
13.41

SE Mean
2.62

Median
15.75

Q3
27.74

StDev
16.57

Minimum
-36.55

Maximum
37.20

This summary shows that:


n = 40, = 13.41 (this is an estimate of ), s = 16.57 (an estimate of );
and s/n = 16.57/(40) = 2.62
(the sample mean), assuming the actual
Describe the distribution of
distribution of S&P_Return remains unchanged during 1975-2014:
The distribution is approximately Normal with mean of approximately
13.41 and std. dev. of approx. 2.62 .
Example 2
Suppose I interview 100 people and 20 prefer a new product (to
competing brands). I want to estimate: p proportion of population that
prefer the new brand. (Each customer preference is a Bernoulli
observation, with an approx. mean of 0.20 and approx. variance of
Recall that for
[0.20 0.8]=0.16.)
In summary, the sample proportion,p, is: 20/100 = 0.2 .
p behaves as though it has a normal distribution,

Bernoulli Distribution:
= p, 2=p(1-p).
Consequently:
/n = [p(1-p)/n]1/2

with a mean of approximately 0.2 (this is our estimate) and a standard


deviation of approximately [0.2*0.8/100]1/2 = 0.04 .
Page 15 of 156

Tail-Probabilities & the Corresponding Normal Values (Z-values)


General Normal Distribution
0.4
Tail Probabilities
>

0.10

Frequency

0.3

0.2

>

0.05

0.1

0.025 >
0.0


Value of Normal Random Variable

Page 16 of 156

Z-Value Notation
z is used to represent the standard normal value above which there is a tail
probability of .
Tail probability is

Verify that z0.10 = 1.28, z0.05=1.645, and that z0.025 = 1.96. (Use normal table, e.g.,
http://www2.owen.vanderbilt.edu/bruce.cooil/cumulative_standard_normal.pdf.)
To verify that Z

0.10

= 1.28:
Tail probability is 0.10,
0.90

So find Z-value that


corresponds
to cumulative prob. of 0.9 .
Z 0.10

To verify that Z = 1.645:


0.05

=> It's 1.28

Tail probability is 0.05,


So find Z-value that
corresponds
to cumulative prob. of 0.95.
Z 0.05

Verify that Z

0.025

=> It's 1.645

= 1.96 !

0.025

Page 17 of 156

Cumulative probabilities for POSITIVE z-values are in the following table:


z

.00

.01

.02

.03

.04

.05

.06

.07

.08

.09

0.0

.5000

.5040

.5080

.5120

.5160

.5199

.5239

.5279

.5319

.5359

0.1

.5398

.5438

.5478

.5517

.5557

.5596

.5636

.5675

.5714

.5753

0.2

.5793

.5832

.5871

.5910

.5948

.5987

.6026

.6064

.6103

.6141

0.3

.6179

.6217

.6255

.6293

.6331

.6368

.6406

.6443

.6480

.6517

0.4

.6554

.6591

.6628

.6664

.6700

.6736

.6772

.6808

.6844

.6879

0.5

.6915

.6950

.6985

.7019

.7054

.7088

.7123

.7157

.7190

.7224

0.6

.7257

.7291

.7324

.7357

.7389

.7422

.7454

.7486

.7517

.7549

0.7

.7580

.7611

.7642

.7673

.7704

.7734

.7764

.7794

.7823

.7852

0.8

.7881

.7910

.7939

.7967

.7995

.8023

.8051

.8078

.8106

.8133

0.9

.8159

.8186

.8212

.8238

.8264

.8289

.8315

.8340

.8365

.8389

1.0

.8413

.8438

.8461

.8485

.8508

.8531

.8554

.8577

.8599

.8621

1.1

.8643

.8665

.8686

.8708

.8729

.8749

.8770

.8790

.8810

.8830

1.2

.8849

.8869

.8888

.8907

.8925

.8944

.8962

.8980

.8997

.9015

1.3

.9032

.9049

.9066

.9082

.9099

.9115

.9131

.9147

.9162

.9177

1.4

.9192

.9207

.9222

.9236

.9251

.9265

.9279

.9292

.9306

.9319

1.5

.9332

.9345

.9357

.9370

.9382

.9394

.9406

.9418

.9429

.9441

1.6

.9452

.9463

.9474

.9484

.9495

.9505

.9515

.9525

.9535

.9545

1.7

.9554

.9564

.9573

.9582

.9591

.9599

.9608

.9616

.9625

.9633

1.8

.9641

.9649

.9656

.9664

.9671

.9678

.9686

.9693

.9699

.9706

1.9

.9713

.9719

.9726

.9732

.9738

.9744

.9750

.9756

.9761

.9767

2.0

.9772

.9778

.9783

.9788

.9793

.9798

.9803

.9808

.9812

.9817

2.1

.9821

.9826

.9830

.9834

.9838

.9842

.9846

.9850

.9854

.9857

2.2

.9861

.9864

.9868

.9871

.9875

.9878

.9881

.9884

.9887

.9890

2.3

.9893

.9896

.9898

.9901

.9904

.9906

.9909

.9911

.9913

.9916

2.4

.9918

.9920

.9922

.9925

.9927

.9929

.9931

.9932

.9934

.9936

2.5

.9938

.9940

.9941

.9943

.9945

.9946

.9948

.9949

.9951

.9952

2.6

.9953

.9955

.9956

.9957

.9959

.9960

.9961

.9962

.9963

.9964

2.7

.9965

.9966

.9967

.9968

.9969

.9970

.9971

.9972

.9973

.9974

2.8

.9974

.9975

.9976

.9977

.9977

.9978

.9979

.9979

.9980

.9981

2.9

.9981

.9982

.9982

.9983

.9984

.9984

.9985

.9985

.9986

.9986

3.0

.9987

.9987

.9987

.9988

.9988

.9989

.9989

.9989

.9990

.9990

Page 18 of 156

Picture of the Central Limit Theorem

Acknowledgment: This picture of the Central Limit Theorem is based on a much prettier graph made for this course by Tim Keiningham,
Global Chief Strategy Officer and Executive Vice President, Ipsos Loyalty (also a student in an earlier version of this course).

Page 19 of 156

Everything There is to Know About the Normal Distribution,


The Central Limit Theorem, and Confidence Intervals

The Central Limit Theorem states that the distribution of (the


distribution of sample means) is approximately normal with mean and
variance 2/n, abbreviated:
is approximately N(,2/n), where:
is the mean of the distribution of ( is also the mean of the
population from which the observations were sampled).
is the sample mean. (The sample is taken from a population with
mean and variance 2. Think of as an "estimate" of .)
2/n is the variance of the distribution of . Also referred to as the
variance of .
/n is the standard error of (the sample mean). It is also sometimes
called the SE mean or standard deviation of .
The figure on the top of the previous page indicates:
is within 1.28 standard errors* of with probability 80% .
is within 1.645 standard errors* of with probability 90% .
is within 1.96 standard errors* of with probability 95% .
*

Remember that the standard error of is /n.

Another Way of Saying the Same Thing

x (1.28) (/n) is an __80__% confidence interval for .


x (1.645)(/n) is a __90__% confidence interval for .
x (1.96) (/n) is a __95__% confidence interval for .
Page 20 of 156

Brief Summary of Chapter 8


Three Types of Confidence Intervals Are Introduced in Chapter 8
1)

100(1-)% confidence Interval for when n > 30:

2)

/ (/ ).

Section 8.1, p. 297 (shaded box)


with s in place of

100(1-)% confidence Interval for p:

/ ( )/ .

Section 8.4, p. 311 (shaded box)

& (
) .
This assumes n is large enough so that
3)

100(1-)% confidence Interval for when n < 30:

()

/ (/ ).

Section 8.2, p. 302 (shaded box)

This is the same as confidence interval (1), except that a t-value is now used in place of the zvalue.

NOTE:

()

The text refers to "/ " as "t/2".

General Form of Confidence Intervals:


Estimate [ (t- or z-value) x (Standard Dev.of Estimate)]
Example 1: Consider the return data: n=40, =13.41, s/n=2.62.
Find a 90% confidence interval for :

13.41 1.645 (2.62)


Find a 95% C.I. for :

13.41 1.96 (2.62)


Example 2: Consider the product preference example above.
= 0.2 and n=100.
Here
Find a 90% confidence interval for p (the actual proportion):

0.2 (1.645)[.2(.8)/100]
Find an 80% C.I. for p:

0.2 1.28(0.04)
Example 3: Suppose we consider only the last 16 changes in S&P:
n=16, = 6.93, s/n= 4.69. Must use t-value because n<30.
Find a 99% C.I. for :

6.93 2.947 (4.69)


Page 21 of 156

10

Glossary
Reference: Chapter 5 (pp. 188, 190) versus Chapter 3 (pp. 100, 110)
The Mean of a Distribution: () ().

(1)

The mean of a distribution (or a random variable X) is simply the weighted average of its
realizable outcomes, where each realizable value is weighted by its probability, P(x).
Contrast this definition with the definition of a sample mean:

x x (1 / n ) (= ).

(2)

The only difference is that (1/n), the frequency with which each observation occurs in the
sample, replaces P(x) in equation (1).
The Variance of a Distribution:

[( ) ] ( ) ().

(3)

The variance of a distribution (of a random variable X) may also be calculated as =


2 () 2. Note the first term in this last expression is just the expectation or average
value of X2.
Standard Deviation of a Distribution:
= [( ) ()]/ =
[ ]/ .
Compare this with the definition of the sample standard deviation:

) (())]
= [ (

(4)

) / ( )]/.
= [ (

) / ( )] .)
( The sample variance is: = [ (
_________________________________________________________________
ANSWERS to Examples (on Bottom of Previous Page)
Example 1
1) 90% CI:

2) 95% CI:

Example 2
1) 90% CI:

Z0.05 (s/n) = 13.41 1.645 (2.62) = 13.41 4.31


OR: (9.1, 17.7)
Z0.025 (s/n) = 13.41 1.96 (2.62) = 13.41 5.13
OR: (8.3, 18.5)

0.05 (1 )/

= 0.2 (1.645)[.2(.8)/100] = 0.2 0.066


OR: (13%, 27%)

2) 80% CI:

0.10 (1 )/ = 0.2 1.28(0.04) = 0.2 0.051

OR: (15%, 25%)


Example 3
99% CI:

()

/ (/ )

= 6.93 2.947 (4.69) = 6.93 13.82

OR: (-6.9, 20.8)


Page 22 of 156

Lecture 3
Confidence Intervals for Means and Proportions
& Introduction to Hypothesis Testing (Large Sample Mean)
Outline (Ch.9: 9.1-9.2)
Recap of C.I.s for Means and Proportions
One-tailed tests on sample mean
What is type I error? Type II error? Power?
Everything to Know About the Test Statistic and P-value
Recap of Confidence Intervals
Example 1
I have just designed a new type of mid-size car with a hybrid engine.
To determine its average fuel efficiency (mpg), I sample 30 mile per
gallon measurements from 30 different cars (city driving).
MTB > print c1
70.4
80.3
89.3

70.5
80.9
89.7

MPG
70.8
71.2
81.1
81.4
89.9
90.6

72.5
84.2
91.0

73.5
84.2
92.1

75.1
84.3

77.0
85.4

77.2
85.6

77.4
86.3

77.9
86.3

78.3
86.7

MTB > stem c1;


SUBC> trim.
Stem-and-Leaf Display: MPG
Stem-and-leaf of MPG N = 30; Leaf Unit = 1.0
7
7
7
7
7
8
8
8
8
8
9
9

0001
23
5
7777
8
0011

Boxplot of MPG
95

90

85

44455
666
999
01
2

MPG

4
6
7
11
12
(4)
14
14
9
6
3
1

80

75

70

MTB > desc c1

Descriptive Statistics: MPG


Variable
MPG

N
30

Mean
81.37

Variable
MPG

Median
81.25

SE Mean
1.24

Q3
86.40

TrMean
81.43

StDev
6.80

Minimum
70.40

Q1
76.53

Maximum
92.10

Note: SE Mean (or 1.24) = / = (6.8/30 )


Page 23 of 156

Find a 95% confidence interval for the real mean mpg () and
interpret it.
C.I.:

81.37 1.96 (1.24)

= 81.37 2.43 or (78.9, 83.8)


Interpretation: This covers the real mean () with 95%
probability
Would an 80% confidence interval be longer or shorter?
Shorter !
(Use Z0.10 = 1.28, and interval becomes (79.8, 83.0).)
(The convention is : Use t-values when n<30 or simply always use
t-values!!)
Example 2
I randomly sample 150 customers to ask whether or not they would
recommend our new product to others, and 120 answer
affirmatively. Find a 90% confidence interval for the actual
proportion of customers in the population who would recommend
our product.

Note:
C.I.:

/
.

= 0.8 1.645 * [(0.8(0.2) )/150 ]

OR: (0.75, 0.85)

(To use this approach, must have:

In this case: n = 120

(successes),

&

. )

(failures).

Page 24 of 156

Hypothesis Testing
Reconsider the new hybrid car example (example 1). Suppose that I want
to show that my new car has an average mpg () that is better than that of
the best performing competitor, for which the average mpg is 78. Formally, I want
to "disprove" a null hypothesis
H0: =78 (or sometimes written as 78)
in favor of the alternative hypothesis:
H1: > 78.
=81.37, s= 6.8, s/n = 1.24. (For n < 30, the procedure is
Note that: n=30,
identical except when we find the critical value. That case will also be discussed.)
To build a case for H1, I follow 3 logical steps (typical of all hypothesis testing).
1)

Assume H0 is true.

2)

Construct a test statistic with a known distribution (using H0).


In this case I use the test statistic,
- 78]/(s/n)
z [
which should have approximately a standard normal
distribution if H0 is true. (WHY? CLT, since n is large)

3)

Reject H0 in favor of H1 if the value of z supports H1.


("Large" values of z support H1 in this case.)

Regarding step 3, if H0 is true, I would see values of z greater than z0.05 = 1.645
only 5% of the time. This seems improbable and it supports H1 and so a reasonable
decision rule is to: reject H0 in favor of H1 if z is greater than 1.645. This assumes
that I am willing to make a mistake 5% of the time.

Page 25 of 156

Z0.05

In this sample,
- 78]/(s/n) = [81.37-78]/1.24 = 2.72 > 1.645.
z = [
Therefore, I reject H0 in favor of H1.
SUMMARY: to test H0: = 78 versus H1: > 78
we use the decision rule: reject H0 if
- 78]/(s/n) > z
z = [
or equivalently if:

> 78 + z(s/n).

Otherwise we accept H0.


In this case z= 2.72, so I reject the null hypothesis H0 at the 0.05 level, and
conclude in favor of the alternative hypothesis H1 . That is, I conclude that
the average mpg of the new hybrid automobile is significantly greater than
78, but using this decision rule (i.e., rejecting H0 whenever z>z0.05) there is
a 5% chance that I have erroneously rejected H0 and that the real average
mpg () really is only 78 (or less).
Above we chose = 0.05, so that z0.05 = 1.645. This probability is referred
to as the significance level, and it is the maximum probability of making a
type I error: type I error refers to the error we make if we reject H0 when H0
is in fact true. Typically we use
= 0.001, 0.010, 0.025, 0.05, 0.1, or 0.2
so that

z = 3.09, 2.33, 1.96, 1.645, 1.28, or 0.84, respectively


(the corresponding t-values are very similar for moderate values of n:
( 19 )
for n=20: t
= 3.6, 2.5, 2.1, 1.7, 1.3, or 0.86;
( 29)

for n=30: t

= 3.4, 2.5,

2.0, 1.7,

1.3, or 0.85 ).

Suppose that I had chosen = 0.001, then since z0.001 = 3.09, and z = 2.72,
I would accept H0 because z =2.72 >/ z0.001=3.09. In this case, I would be
concerned that I made a type II error. Type II error refers to the case where
the null hypothesis H0 is really false but I fail to reject it! The following
figure summarizes the situation with type I and II errors.

Page 26 of 156

DECISION

WHAT IS REALLY TRUE


H0 IS TRUE

H1 IS TRUE

REJECT H0

Type I Error

Correct
Decision

ACCEPT H0

Correct
Decision

Type II Error

Good Lingo: Cannot Reject H0 can be used for Accept H0.


Bad Lingo: Accept H1 should not be used for Reject H0.
How do we protect against:
Type 1 Error? Small

Type II Error? Large n

Note that to make a decision on whether to reject or accept H0: =78, we simply
- 78]/(s/n) with an appropriate normal
need to compare the test statistic z = [
value, z, that corresponds to the significance level that is chosen beforehand. If
z > z , we reject H0 (otherwise accept H0).
Distribution of Test Statistic (Z) When H0 Is True
P-Value: the
probability to right
of z

z0.05
1.645

z z0.001
2.72 3.09

Alternatively, we could simply look up the tail probability that corresponds to


the test statistic z (this is called the p-value) and compare it to the
significance level . If the p-value is less than (p-value < ), we reject H0
(otherwise accept H0).
In this case z = 2.72, and the p-value for H0: = 78 versus H1: > 78, is the
right tail-probability (because this is a one-tailed test where the alternative
hypothesis goes to the right-side). What is the p-value in this case?
P-value (probability to the right of 2.72) = 1 - [Cumulative probability at 2.72]
= 1 - 0.9967 0.0033
Can we reject H0 at the 0.05 level?

YES

At the 0.001 level?

NO!

Page 27 of 156

Another One-Tailed Test (in the other direction):


Suppose I make the claim that my cars average mpg is 80 (the
observations on page 1 were really drawn from a normal distribution
with = 80). My competitor might be interested in testing:
H0: = 80 (sometimes written > 80) versus H1: < 80.
And suppose my competitor chooses a significance level of = 0.10.
In this case the test statistic is:
- 80]/(s/n) = [81.37 - 80]/1.24 = 1.10
z = [

and small values of z support the alternative hypothesis H1: <80.


Consequently the decision rule is to reject H0 in favor of H1, if we see
small values of z. How small? Since the significance level was chosen
to be 0.1, we reject if we see a value of z that is as small as values we
would expect to see only 10% of the time (i.e., values of z smaller than
- z0.1 = -1.28 ).
Specifically, we reject H0 if z < - z0.1 = -1.28 (and otherwise accept
H0). In this case z= 1.10 <\ -1.28, so my competitor is forced to accept
H0 at the 10% (or 0.1) level.
How do we remember the decision rule and critical value that go
with each type of 1-tailed test?
Use the alternative hypothesis to decide! Consider the two
examples:
Details
Alternative Hypothesis
Significance Level
Test Statistic z

Example 1
H1: > 78
= 0.05
2.72

Example 2
H1: < 80
= 0.10
1.10

Reject H0 (in favor of H1) if z > z 0.05 = 1.645 z < - z0.1 = -1.28
Page 28 of 156

Alternatively, we can find the p-value that corresponds to the test


statistic, z, for this hypothesis test and compare it with , and (as always)
we only reject H0 if the p-value is less than . Remember that when
the alternative hypothesis goes to the left side, the p-value refers to the
tail probability to the left of the test statistic z. Given the way the p-value
is calculated, we always reject H0 when p-value < , and accept H0
otherwise.
Given the test statistic z =1.10 for H0: = 80 versus H1: <80 , what is
the p-value?

Here P-value is a left tail


probability because H1 goes
to the left !

2.5

2.0

1.5

1.0

0.5

0.0
-4

-3

-2

-1

1.10

p-value =[Cumulative Probability at 1.10] =


Can we reject H0 at the 0.1 level?

0.8643

At the 0.2 level?

NO

NO

(We cannot reject H0 at any reasonable level.)


What was the p-value in the first situation where we were testing
H0: = 78 versus H1: > 78, and the test statistic was z= 2.72?
This was calculated on page 5 as 0.0033.
Here P-value is a right
tail probability because
H1 goes to the right !

2.5

2.0

1.5

1.0

0.5

0.0
-4

-3

-2

-1

2.72 Page 29 of 156

Big Picture Recap


Let 0 represent the constant benchmark to which we wish to compare , &
consider three scenarios.
H0 also written as:
Alternative
Hypothesis H1
Critical Value
Decision Rule
Reject H0 if:
Definition of
p-value
Example

1)H0: = 0
0

2)H0: = 0
0

H 1: > 0

H1: < 0

-z

3)H0: = 0
No Other Way
H1: 0
z/2

z > z
z < z
|z| > z/2
(Note that z is the test statistic )
Tail prob. > z
Example 1
(see bottom p.6)

Tail prob. < z


Example 2
(see bottom p.6)

Tail prob. > |z|


Example 3
(new example)

P-value=0.86

Picture of
test statistic and
p-value (shaded area)
relative to standard
normal distribution

P-Value=0.27

P-Value=0.0033

H0 : = 78
H1: > 78

H0 : = 80
H1: < 80

Test Statistic
z=


= .


= .

Critical Value

Z0.05= 1.645

-Z0.10 = -1.28

Z0.10/2=Z0.05=1.645

Decision

Reject H0

Accept H0

Accept H0 Because
|1.10| 1.645

Null Hypothesis
Alternative
Significance Level

= 0.05

= 0.10

H0 : = 80
H1: 80
= 0.10


|
| = .

Page 30 of 156

Glossary
= significance level = maximum probability of making a type I error.
p-value = tail-probability that corresponds to test statistic, that is calculated for
specific alternative hypothesis H1.
= probability of making a type II error (not rejecting H0 when H1 is true).
Power = 1- = probability of making correct decision when H1 is true.

How does power change with sample size?

Power increases as sample size increases

(ceteris paribus).

Because as n increases, the test statistic becomes larger in absolute


value, and is more likely to exceed the critical value in the appropriate
direction. See the 3rd-to-last row of the table on the last page (i.e., the
test statistic formulas). Another way to think about it: as the test
statistic becomes larger in absolute value in the direction supporting H1,
the p-value decreases.
How does power change with ?

Power increases as increases

(ceteris paribus).

Because as increases, the critical value decreases in absolute value,


and is more likely to be exceeded by the test statistic, see the
penultimate row of last page (i.e., the critical values and how they
change with ).

Bruce Cooil, 2016


Page 31 of 156

Lecture 4
One and Two-Tailed Tests, Tests on a Sample
Proportion, & Introduction to Tests on Two Samples
Main References
(1) Ch.9: 9.3-9.4, Summary, Glossary, App. 9.3;
Ch.10: 10.1
(2) The Outline "Tests on Means and
Proportions" (referred to as "The Outline")
Topics
I. Tests on Means and Proportions from One Sample (Reference:
9.3-9.4)
Example of a two-tailed test (Case 1)
When to use t-values (Case 2)
Tests on a sample proportion (Case 3)
II. Tests on Means from Two Samples (Ref: 10.1)
Tests on means from two large samples (Case 4)
Tests on means when it is appropriate to assume variances are
equal(Case 5)

I. Tests on Means & Proportions from One Sample


Summary of Last Time (1-Tailed Versions of Case 1)
Last time we first considered the one-tailed hypothesis test:
H0: = 78 versus H1: > 78.
(OR H0: 78 )
In this case we use the decision rule: reject H0 if:

z = [ - 78 ] /(s/n) > z ,
or equivalently if > 78 + z(s/n). Otherwise we
accept H0.
Page 32 of 156

Then we considered the one-tailed test going the other way (


still represents the mean mpg of my new hybrid). I make the
claim that the average mpg is 80, and so my competitor wants to
test:
H0: = 80 versus H1: < 80 .
The decision rule will be to reject H0 in favor of H1 if:

( : =

.
= . )
.

is calculated using observations from a


supports H1. If
distribution where < 80 (as my competitor believes is the
case), then we will tend to get small values of z. So the decision
rule would be, reject H0 in favor of H1 if

=
<
(or equivalently if:

<

(/ ) .

[Note that this is just Case 1 in the outline: 0 refers to the constant used
in the null hypothesis, which is "80" in this last case.]

Example of a 2-tailed Test


A two-tailed test would be:
H0: = 80 versus H1: 80.
So, for example if =0.05, we would reject H0 in favor of H1 if
|z| > z0.025 (because /2 = 0.025).
What do we conclude if we do this 2-tailed test?
= . , / =1.24 .)
(Recall that: n=30,
Test Stat:Z =(81.37 - 80)/1.24 = 1.10 (SAME as above )
Critical Value: Z0.025 = 1.96
Conclusion: Accept H0 .
( is not significantly different from 80.)

Page 33 of 156

When to Use T-values


Case 2 (p.1 of outline) is identical to Case 1, except that the
critical values in Case 2 are based on the t-statistic. When
should we use Case 2?
A good conservative approach is to always use t-values when
you have to estimate (which is always) but it does not make
much of a difference if n >30.
Lets do the two-tailed test above, using Case 2 (=0.05):
H0: = 80 versus H1: 80 (SAME)
Only
Test Statistic: t = 1.10 (Same)
()
()
difference
Critical Value: ./
= .
= .
from Case 1
Conclusion: Accept H0.
Tests on a Sample Proportion
Example: Let p = proportion of customers in the population that
prefer my new product. Suppose I need to test ( = 0.05):
H0: p = 0.1 versus H1: p > 0.1.
This is case 3 in the outline (with p0= 0.1).The details are just
like case 1 except we use a different standard error of the mean.
If 30 of 100 ( =0.3 ) randomly selected customers prefer my
product, can I show that more than 10% of the population of
all customers prefer my product at the 0.05 level?
Test Statistic:


( )

..
.(.)

.
.

= . > . = .

Critical Value: Z0.05 = 1.645


Conclusion: Reject H0 (YES!).
Page 34 of 156

II. Means from Two Samples


Case 4: What To Do When Both Samples Are Large
Example:
The owner of two fast-food restaurants wants to compare the
average drive-thru experience times for lunch customers at each
restaurant (experience time is the time from when vehicle
entered the line to when the entire order was received). There is
reason to believe that Restaurant 1 has lower average experience
times than Restaurant 2 because its staff has more training.
Suppose n1 experience times during lunch are randomly selected
for Restaurant 1, n2 from Restaurant 2 with following results
(units: minutes): n1 = 100,
= ,
s1 = 0.7
= . , s2 = 0.5 .
n2 = 50,
Why do we use Case 4 on page 1 of the outline?
Both Samples 30 (& Independent).
If we want to show Restaurant 1 has a lower average experience
time, what are the appropriate hypotheses and what can we
conclude (at the 0.1 level)?
H0: 1 - 2 = 0 (OR: 0 ) In Outline: D0 = 0.
H1: 1 < 2 OR 1 - 2 < 0
Test Statistic:
=

(. )

(. )

= .

Critical Value: -Z0.10 = -1.28 Conclusion: Reject H0.


(YES!)
What would happen if we test at the 0.01 level?
New Critical Value: -Z0.01 = -2.33 (Still Reject H0)
Is there any reason to pick in advance?
Yes, its more objective!
Page 35 of 156

Would Welchs t-test (p. 376) make a difference?


In this case we use the same test statistic but compare it with a
critical value from the t-distribution with degrees of freedom,

(1 / 100) 2
( s12 / n1 s22 / n2 ) 2
df 2

133.
2
2
2
2
2
2
2
( s1 / n1 ) ( s2 / n2 )
(0.7 / 100) (0.5 / 50)

n1 1
n2 1
100 1
50 1
So for the =0.1 and =0.01, the critical values are:
(133)
(133)
0.1 = 1.29, 0.01
= 2.35, respectively, and
the conclusions are the same in each case!
Case 5: What If We Are Willing To Assume Equal
Variances?
Example : I'm comparing weekly returns on the same stock
over two different periods. The average sample return is larger
during period 2. Can one show that the return during period 2
is significantly higher than during period 1 at the 0.01 level?
The data are: n1 = 21,
= . %, = .
n2 = 11,
= . %, = .
What are the appropriate hypotheses?

H0: 1 - 2 = 0
H1: 1 < 2

1 - 2 < 0.

It may be risky to rely only on the CLT. (Why?)


Technically I make 3 additional assumptions if I use Case 5:
(1) observations are approximately normal,
(2) the two populations have equal variances and
(3) samples are independent.
Page 36 of 156

The test statistic in Case 5 allows us to use a pooled estimate of


the variance:

The test statistic is:

JustlikeCase4withspused
inplaceofs1"ands2

.
.

. .

Suppose I do this test at the 0.01 significance level. What would


be the critical value for the test statistic "t" and what would be
the conclusion?
?
Critical Value:
-2.457
.
.
.
Conclusion: Reject H0 . (YES!)
What would be the two-tailed test in this case? (Specify H0 &
H1.) Also give the critical value and conclusion if testing at the
0.01 level?
H0: 1 - 2 = 0 versus

H1: 1 - 2 0

Test Statistic: t = -2.6 (Same as for one-tailed test)


Critical Value:

= 2.75

Conclusion: Accept H0 . (No!)


(Because |t|=2.6 < 2.75 .)

Page 37 of 156

About the t-Distribution (Reference: Bowerman, et al., pp. 344-346)


According to the Central Limit Theorem, x (the sample mean of n observations),
has approximately a normal distribution with mean , and standard deviation /n .
Also, this approximation improves as the sample size, n, increases. Consequently,
by the Central Limit Theorem, the standardized mean,

=
,

has approximately a standard normal distribution. We have been using this single
result to justify the construction of confidence intervals and hypothesis tests.
When using this result, we have generally been approximating by substituting
the sample standard deviation, s, for it. If the sample is large enough, this
doesnt impose much additional error. But when samples are smaller (e.g., n < 30),
the convention is to accommodate the additional error (caused when using s for )
by using the fact that if the original distribution was normal, then the t-statistic,

=
,

really has what is referred to as a t-distribution with n-1 degrees of freedom. The
degrees of freedom number, n-1, refers to the amount of information that the
sample standard deviation, s, contains about the true standard deviation . If we
have only 1 observation, we have no information about (n-1= 1-1 = 0), if we
have 2 observations we have essentially 1 piece of information about , and so on.
This is the reason we divide by the degrees of freedom n-1, when calculating s,

) / ( )] .
= [ (
The real question becomes: why should we use the t-distribution when it relies
on the strong assumption that the original distribution is normal, which is
exactly the type of assumption we were trying to avoid by using the Central Limit
Theorem?! The answer is essentially this: by using t-values in place of z-values
we are doing something that accommodates the additional inaccuracy we generate
by using s to estimate , and in practice it works quite well even when the parent
distribution is not normal! Of course, t-values converge to z-values as the sample
size increases: see the t-table.
Page 38 of 156

Lecture 5: More Tests on Means from Two Samples


Outline: (Reference: Bowerman et al., 10.2-10.3, Appendix 10.3; the Outline
Tests Concerning Mean and Proportions)

Tests on Two Proportions (Case 6, Ch. 10.3)


Everything to Know About Odds, Odds Ratios and Relative Risk
Tests on Paired Samples (Case 7, Ch. 10.2)

Tests on Two Proportions


(Case 6: Large Samples)
This example comes from an article, 10 Most Popular
Franchises published in the Small Business section of
CNN.com (April, 2010):
http://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.html .

(More recent data through early 2016 consist primarily of a


smaller sample of settled loans from the same period:
http://fitsmallbusiness.com/best-franchises-sba-default-rates/# . )
It provides franchise failure rates based on loan data from the
Small Business Administration (October, 2000 through
September, 2009) and it illustrates all of the issues one will
typically face when comparing rates (expressed as proportions).
The 10 most popular franchises are: 1)Subway, 2)Quiznos,
3)The UPS Store, 4)Cold Stone Creamery, 5)Dairy Queen,
6)Dunkin Donuts, 7)Super 8 Motel, 8)Days Inn, 9)Curves for
Women, and 10)Matco Tools. Super 8 Motel and Days Inn have
the highest start-up costs (average SBA loan sizes are 0.91 and
1.02 million dollars, respectively), and nominally Super 8
Motels seem to have a lower failure rate. Here are the data.
SBA Loans
Super 8 Motel
Days Inn

456
390

Failures*
18
23

*Failures are loans in liquidation or charged off.


Page 39 of 156

Is there a higher failure rate for SBA loans to Days Inn than for
Super 8 Motel at the 0.05 level?
H0:

p1 - p2 = 0 (Or 0)

H1:

p1 - p2 > 0

D0 = 0

(Where p1 = proportion of Days Inn failures;


p2 = proportion of Super 8 Motel failures.)
Are the sample sizes sufficiently large to use the normal
approximation in CASE 6?
(In Case 6, the relevant sample sizes are the number of successes and failures in
each sample; each must be at least 5, i.e., n1 p 1 , n1 (1 - p 1 ), n2 p 2 , n2 (1 - p 2 ) 5 .)

YES, all 4 groups 5.


The sample estimates of p1 and p2 are:
1 =

23
390

= 0.0590 ;

2 =

18
456

= 0.0395 .

Consequently, the test statistic is:

1 2 0
(1
1)
(1
2)

1
+ 2
1

0.0590 0.0395 0
0.0590(0.9410) 0.0395(0.9605)

+
390
456

0.0195
0.0151

= 1.30.

Page 40 of 156

OR, following the texts approach (which is appropriate only


when the null hypothesis states that the proportions are equal),
we could also use the overall rate of failure to calculate the
standard error of the test statistic. Since,
=

23 +18
390 + 456

= 0.0485 (see data on p.1),

the test statistic becomes:

0.0590 0.0395 0
0.0485(0.9515) 0.0485(0.9515)

+
390
456

0.0195
0.0148

= 1.32 .

With either test statistic we get essentially the same result:


Critical Value:
Conclusion:
Z0.05 = 1.645
Accept H0
(No, the rate at Days Inn is not significantly higher.)

Which approach does MINITAB take?

Case 1
Case 2
Cases 4 & 5
Case 7
Case 3
Case 6

Page 41 of 156

Three options are provided here:


1)Both samples in one column
2)Each sample in its own column
3)Summarized data.

D0 = 0

I could have selected the


appropriate one-sided alternative
here but instead used the default
option (the two-sided test).
The default setting is to not pool !

If We Do NOT Pool (which is the default unless we click on the used


pooled estimate... option ):
0.4

tail prob. is
0.194/2 =0.097

Test and CI for Two Proportions


0.3

Sample
1
2

X
23
18

N
390
456

Sample p
0.058974
0.039474

Difference = p (1) - p (2)


Estimate for difference: 0.0195007
95% CI for difference: (-0.00992794, 0.0489293)
Test for difference = 0 (vs not = 0): Z = 1.30
P-Value = 0.194
Sum of two tail probabilities

0.2

0.1

0.0
-3

-2

-1

Wait!! This p-value is for a two-sided test!


We need the p-value for H1: p1>p2 , which is: 0.194/2 =0.097
=> Accept H0.

Page 42 of 156

If We Pool:
Test and CI for Two Proportions

Sample
1
2

X
23
18

N
390
456

Sample p
0.058974
0.039474

Difference = p (1) - p (2)


Estimate for difference: 0.0195007
95% CI for difference: (-0.00992794, 0.0489293)
Test for difference = 0 (vs not = 0):
Z = 1.32

P-Value = 0.188 1-sided p-value=0.094 .

Other Caveats and Notes

1) 1 & 2 may seriously underestimate actual rates of


failure, since the study includes recent loans to franchises
that probably will fail within 5 years (but had not yet
failed during the study period). To get better estimates,
each loan should be observed over a period of equal
duration. For example, we might observe each over a 5
year period (from the time of the loan is granted), and
1 & 2 would then be legitimate estimates of the failure
rate of SBA loans to each franchise.
2) Sometimes data of this type are summarized in terms of
odds and odds ratios, especially in health/medical care
applications.

Page 43 of 156

Odds, Odds Ratio and Relative Risk


Definition
If an event occurs with probability p, then the odds of it occurring is
defined as p/(1-p).
In the Example
If we use 1 = 6% as an estimate of the failure rate at Days Inn
franchises and 2 = 4% as the corresponding estimate for Super 8
Motel franchises, then the odds of failure are:
Days Inn franchises: 0.06/(1-0.06)= 0.0638;
Super 8 franchises: 0.04/(1-0.04)= 0.0417.
And the odds ratio (or ratio of odds for failure for Days Inn versus
Super 8 Motels) is:
/(
)

Odds Ratio

)
/(

.
.

= . ,

(1)

indicating that the odds of failure is 1.5 times higher for the Days Inn
franchises. (To turn this into a health care example: imagine
companies are people, and that failure is a disease to which certain
people are more susceptible.)
Alternatively, since this is a prospective study, sometimes the results
are summarized in terms of the relative risk of failure (for small versus
large), which is simply the ratio of 1 2 :

.
Relative Risk = . = . ,
(2)

indicating that failure is 1.5 times more likely for the Days Inn. Of
course, remember that 1 & 2 are not good estimates, which is a
common problem in health/medical applications. Also, 1 is not even
significantly larger than 2 at the 0.05 level!
Page 44 of 156

The Relative Size of These Numbers


Note that odds, odds ratios and relative risk ratios can each be
anywhere between 0 and infinity. Also note the difference between the
probability and odds scales.
Probability Scale(p) : 0___1/4___1/2___3/4___1

Odds Scale (p/[1-p]):

0___1/3____1_____3____

Finally, whenever 1 > 2 , the odds ratio will be greater than the
relative risk:

/(
)

/(

(
)

=
>
)

(
)

Tests on Paired Samples


(Case 7: Large or Small Samples)
In late December of 2009, Forbes did a study of the best and worst
mutual funds of the prior decade. I thought it would be interesting to
compare the best and worst (among funds that still exist from that
period) in terms of annual returns during the six subsequent years
(2010-2015).
Fund
Best: CGM Focus Fund (CGMFX)
(Current Morningstar Rating: )
Worst: Fidelity Growth Strategies (FDEGX)
(Current Morningstar Rating: )
S&P 500

Annualized Return
1999-2009
18.8%
-9.5%
-2.6%

Page 45 of 156

If I expect the CGM Focus Fund to outperform Fidelity Growth


Strategies Fund during 2010-2015, I might ask the following research
question: Does the CGM Focus have an average return that is
significantly more than 0.5% higher than the average annual
return of the Fidelity Growth Strategies during 2010-2014 (=0.1)?
Then: H0:CGM - Fidelity =0.5(OR < 0.5) H1:CGM - Fidelity > 0.5 .
The actual data are below.
The average
Year

CGM Focus
Fund

2010
2011
2012
2013
2014
2015
Mean

16.94
- 26.29
14.23
37.61
1.39
- 4.11
6.63

Fidelity Growth
Strategies Fund
25.63
- 8.95
11.78
37.87
13.69
3.17
13.87

difference makes it
Differences:
=CGMFidelity clear we cannot

reject H0 (Fidelity
outperforms CGM)!
But we formally
apply the test
anyway (as an
illustration).

- 8.69
- 17.34
2.45
- 0.26
- 12.30
- 7.28
- 7.24

We cant apply cases 4 or 5 to this problem because the annual returns


are from the same years and are affected by the same market forces.
Consequently,
the two samples are not independent!
But we can take differences (CGM minus Fidelitysee the last column
in the table above) and apply Case 2 to the single sample of differences.
The following hypotheses are equivalent to the ones above but are
written in terms of the differences:
H0: Differences =0.5 (OR < 0.5) H1: Differences > 0.5.
The mean and standard deviation of the five differences are:
2

( )
= 7.24 ; =
= 7.38.
1

mean is:

Test statistic:
Critical Value:

7.38
6

Thus, the standard error of the

= 3.01. Here are the details of the case 2 test.

0
=

()
()

= . = .

7.240.5
3.01

= 2.57

Conclusion: Accept H0 (No!)

Because t is not greater than 1.48


Page 46 of 156

Addendum: Finding the Right Case

Large Sample

(Case 1)

Small Sample

(Case 2)

Mean

1 Sample
Proportion

(Case 3)

Large Samples

(Case 4)

When Either Sample Is Not Large, Use Welchs t

Means

(OR: always use Welchs t)


Equal Variances (Case 5)

2 samples

Proportions

Paired Samples

(Case 6)

(Case 7)

Bruce Cooil, 2016

Page 47 of 156

Lecture 6: Simple Linear Regression


Outline: Main reference is Ch. 13, especially 13.1-13.2, 13.5, 13.8
The Why, When ,What and How of Regression
Why
Purposes of Regression
When
Three Basic Assumptions:
Linearity, Homoscedasticity, Random Error
What
Estimation and Interpretation of the Coefficients

How
Decomposition of SS(Total) = =1( )2
(See third equation on page 492:
SS(Total) is referred to there as Total Variation.)
Measures of fit: MS(Error) (the variance of error), R2(Adjusted)
Purposes

1.

To predict values of one variable (Y) given the values of


another (X). This is important because the value of X may
be easier to obtain, or may be known earlier.

2.

To study the strength or nature of the relationship between


two variables.

3.

To study the variable Y by controlling for the effects (or


removing the effects) of another variable X.

4.

To provide a descriptive summary of the relationship


between X and Y.
Assumptions

The basic model is of the form:


(1)
= 0 + 1 + ,
where 0, and 1 are called coefficients, and represent unknown
constants (that will be estimated in the regression analysis), and
is used to represent random error. The error, , is assumed
Page 48 of 156

to come from a distribution with mean 0 and constant variance


2 . The main result of the regression analysis is to provide
estimates of the coefficients so that we can use the estimated
regression equation,
= 0 + 1

(2)
to predict Y.

Notes on Terminology and Notation


y^ is the predicted value and is referred to as the "fit" or the
"fitted value."
The residuals, ei, (the observed errors) are defined as the
difference between the actual and the predicted value of Y,
i.e.,
ei = [residual for observation i]= .
Note that the theoretical error term, i, from equation (1), is
slightly different from the residuals:
i yi (0 + 1xi) versus ei yi (b0 + b1xi).
Formally the model makes the assumption that the errors (the
i) are a random sample from a distribution with mean 0 and
variance 2. This one assumption is sometimes referred to in 3
parts.
1.
Linearity: there is a basic linear relationship between y and
x as shown in (1), which is equivalent to saying that the
real mean of the errors (the i) is 0.
2.

Homoscedasticity: the variance of the errors i is constant


for all yi.

3.

Random Error: the errors i are independent from one


another.
Page 49 of 156

Two plots provide a way of checking these assumptions:


To check linearity: the plot of y versus x;
To check linearity, homoscedasticity and randomness: the
plot of the residuals, ( ), versus the fit values, .
Plots of standardized residuals versus fit are especially
useful.
Imagine I have developed a special new product and that I
develop a model to estimate the cost of producing it using data
from the first 5 orders.
Order
1
2
3
4
5

Number
of
Units(x)
1
3
4
5
7

Cost(y)
($1000)
6
14
10
14
26

Predicted Cost
Residual
(or fit)
)
= (

5
1
11
3
14
-4
17
-3
23
3

Page 50 of 156

Estimation and Interpretation of Coefficients

In the plot above, the open circles are the actual observations of
y & x (cost & units), and the solid circles are the values of y^ & x
(predicted, or fitted cost, & units). The vertical distances
between open circles and solid circles represent the observed
errors or residuals of the regression model. The estimated
regression line is:
= 0 + 1
(3)

= 2 + 3.

The two estimated coefficients,


0 = 2 and 1 = 3,
are chosen to minimize the sum of squared residuals or errors
that are made when we use the estimated regression equation to
predict cost (y). Note that in this case the sum of squared
residuals (or errors) is (see last column of table on previous
page):

SS(Error) = =1( )2
= 12 +32 +(-4)2 +(-3)2 +32 = 44.
This is apparently the smallest value of sum of squared error
obtainable among all possible choices of b0 and b1.
Please interpret these coefficients.
b0: predicted (or average) value of Y when X=0
(in this application it is the fixed cost) ;
b1: average change in Y per unit change in X
(in this application it is the variable cost).
Page 51 of 156

Decomposition of SS(Total)

Without this regression model, we might be forced to use the


average, , to predict future values of y. To get an indication of
how we1l would do as a prediction, we can find the sum of
squared differences between each yi & :

( )2
=1

=(6-14)2 +(14-14)2 +(10-14)2 +(14-14)2 +(26-14)2 =224


(see the 3rd column of the table on the next page). This sum of
squares is referred to as SS(Total),i.e.,

SS(Total) = =( ) = 224 .
The regression model succeeds in reducing the uncertainty about
y if SS(Error) is significantly less than SS(Total). Also,
regression models actually allow us to decompose SS(Total)
into two parts, SS(Error) and SS(Regression):
SS(Total) = SS(Regression) + SS(Error);
where:

SS(Regression) ==1( )2

= the sum of squares of the fitted values around


their mean (the mean of the values is ).
=(5-14)2 +(11-14)2 +(14-14)2 +(17-14)2 +(23-14)2
= 180
(see the 4th column of the table on the next page).
Page 52 of 156

So in this case, the decomposition of SS(Total) works out as


follows: SS(Total) = SS(Regression) + SS(Error)
224 =
180
+ 44.
Summary of the Decomposition of SS(Total)
Units(x)

Cost(y)

)
( )
(
( )
1
6
(6-14)2
(5-14)2
12
3
14
(14-14)2
(11-14)2
32
4
10
(10-14)2
(14-14)2
(-4)2
5
14
(14-14)2
(17-14)2
(-3)2
7
26
(26-14)2
(23-14)2
32
TOTALS:
224
=
180
+
44
Name of SS:
SS(Total)=
SS(Regress.)+ SS(Error)
Minitab Summary: Main Regression Output of Version 17
(See Page 11 for a Comparison with Excel)
Regression Analysis: Cost(y) versus Units(x) "MS" refers to "Mean Square" which is always the
corresponding SS (Sum of Squares) divided by
DF (degrees of freedom): MS=SS/DF.
Analysis of Variance

Source
Regression
Units(x)
Error
Total

DF
1
1
3
4

Adj SS
180.00
180.00
44.00
224.00

Model Summary
S
R-sq
3.82971 80.36%

Adj MS
180.00
180.00
14.67
224/4

R-sq(adj)
73.81%

F-Value
12.27
12.27

P-Value
0.039
0.039

Variance of Error
Variance of Y

R-sq(pred)
38.11%

Measures of Fit

Coefficients
Text Notation:

b
Term
Constant
Units(x)

Coef
2.00
3.000

sb
SE Coef
3.83
0.856

=
T-Value
0.52
3.50

P-Value
0.638
0.039

VIF
1.00

Regression Equation
Cost(y) = 2.00 + 3.000 Units(x)

Page 53 of 156

Measures of Fit (Model Summary)

Note that on the line just below the Analysis of Variance table in
the MINITAB output, there are 4 primary measures of fit:
s=3.83, R-sq=80.4%, R-Sq(adj)=73.8%, R-sq(pred)=38.1 .
The first three can be calculated using the information in the
Analysis of Variance table. The standard deviation s
represents the estimated standard deviation of the residuals, or
observed errors, also written as s,
s = [Variance of observed errors]1/2
= [SS(Error)/(n -[# parameters in model])]1/2
= [44/(5-2)]1/2 = [14.67]1/2 = 3.83.
[The text calls "s" the "standard error," and writes it as simply s.
See the shaded expressions on page 479.]

Note that the variance of the errors, 14.67, is also provided in


the Analysis of Variance table, and is often referred to there as
"MS(Error)" or Adj MS(Error) which is shorthand for
"adjusted mean squared error." "Mean Square" is generally used
as a synonym for "variance." For these data:
s = [MS(Error)]1/2 = [14.67]1/2 = 3.83 .
Another obvious overall measure of how well the model
performs would be:
=

()
()
=
()
()

which is the proportion of SS(Total) generated or "explained" by


the model. R2 is often referred to as the "coefficient of
determination."
Page 54 of 156

In this example:
()

=
= . . %
=
()

(OR:

()
()

where 180," 44, and 224" are all shown in the Analysis of
Variance table.
A better measure of fit is found by adjusting R2 so that it
estimates the proportion of the variance of y that is explained
by the fitted values from the model. This proportion is referred
to as "R2(Adjusted),"
()

() = () .
In this case:
()

( )

.
=
=

[
( )]
= . . %

Note that 14.67" is shown in the Analysis of Variance


table.
R2(Adj) represents the proportion of the variance of Y that is
"explained" (or generated) by the regression equation, while s
represents the estimated standard deviation of the residuals. In
this example, 73.8% of the variance in cost (Y) is "explained"
by the model that uses units (X) as a predictor and the
standard deviation of the errors made by this model is 3.8
thousand dollars.
Page 55 of 156

In simple linear regression, R2 (unadjusted) is generally written


as "r2" and it represents the squared correlation coefficient (also
see page 494 of the text). The estimated correlation between
cost (Y) and units (X) is 0.896. See the correlation matrix
below.
In spreadsheet:
c1: Units(x); c2: Cost(y); and c3: Fits1

MTB > corr c1 c2 c3


Correlations (Pearson)
Cost(y)
FITS1
r

Cell Contents:

Units(x)
0.896
0.039

Cost(y)

1.000
*

0.896
0.039

Correlation
P-value

Formulas for Correlation (Pearson Correlation):


=

n
i 1

x y i y /( n 1)

1
2
2
1
n
n
x i x *
y i y

i 1
i 1
(n 1)

(n 1)

(See pp. 125-127, 492-495 of the text for more examples and discussion.)

Alternatively:
(sign of b1) * [square root of R2(from simple regression)] .
0.804 0.896 .
In the example: r = +
r=

Note that in general, r is between -1 and +1.


Page 56 of 156

1/ 2

10

Discussion Questions
(The Regression Output Is Redisplayed on the Next Page.)

1.

Use the regression equation to predict the cost(Y) when number of units
(X) is 4.

y^ =
2.

b0 + b1 X

2 + 3(4) = 14 thousand dollars

What was the actual cost for an order when units =4?
residual or error at that point?)

(What is the

From page 3: When Units (X) is 4, Y is 10,


thus: residual = y - y^ = 10 - 14 = -4 .
3.

What is the sample variance of cost (Y)? (See next page.)

SY2 = SS(Total)/4 = 224/4


( SY = 56 = 7.5 thousand )
4.

What is the estimated variance of the residuals (or errors) of the


regression?

S2 = MS(ERROR) = 14.67
5.

= 56

(Find this in Analysis of Variance Table!)

How good is the fit?


There are two ways of measuring fit (see the Model Summary):

S = 3.83 thousand dollars


R2(Adjusted) = 73.8%

(74% of the Variance in cost(Y) is explained by the model.)

6.

Show how R2(Adjusted) is related to the variance of cost and the variance
of the residuals?
()
14.67

() = 1

7.

()

=1

56

Show how R2(unadjusted) is related to the correlation between cost (Y) and units (X).

R2

= r2

(r represents the sample correlation; this only works in


simple linear regression!!)

(On next page: R2 = 0.804; on page 9: r = 0.896.)


Page 57 of 156

11

Appendix
Comparison of MINITAB Output (Versions 14-17) with Excel
MINITAB 17 Output
Regression Analysis: Cost(y) versus Units(x)
Analysis of Variance
Source
DF Adj SS
Regression
1 180.00
Units(x)
1 180.00
Error
3
44.00
Total
4 224.00

Adj MS
180.00
180.00
14.67

F-Value
12.27
12.27

P-Value
0.039
0.039

#4

Model Summary

224/4 = Var(Y) for #3


S

R-sq

R-sq(adj)

R-sq(pred)

3.82971

80.36%

73.81%

38.11%

#5
Coefficients
Term
Coef
Constant
2.00
Units(x) 3.000

SE Coef
3.83
0.856

T-Value
0.52
3.50

P-Value
0.638
0.039

VIF
1.00

Regression Equation
Cost(y) = 2.00 + 3.000 Units(x)

Excel Output
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.896421
R Square
0.803571
Adjusted R
Square
0.738095
Standard
Error
3.829708
Observations
5
ANOVA
Df
Regression
Residual
Total

Intercept
X Variable 1

MS
180
14.66667

F
12.27273

Significance
F
0.039389

Lower 95%

1
3
4

SS
180
44
224

Coefficients

Standard
Error

t Stat

P-value

2
3

3.829708
0.856349

0.522233
3.503245

0.637618
0.039389

-10.18784
0.274716

Upper
95%

Lower
95.0%

Upper
95.0%

14.18784
5.725284

10.18784
0.274716

14.18784
5.725284

Page 58 of 156

12

Interpreting the Plot of Residuals versus Fit

One way to check on the three assumptions (linearity,


homoscedasticity and random error) is to plot the residuals
(errors) against the predicted (or fitted) values .

There are hardly enough observations here to be very confident


in the assumptions. But in general we look for symmetry around
the horizontal line through zero as an indication that the
assumptions of linearity and randomness are met. To confirm
homoscedasticity, we look for roughly constant vertical
dispersion around the horizontal line through zero.
The ideal situation generally looks something like the following
plot.
Residuals Versus the Fitted Values
(response is C3)

Residual

-1

-2

-3
-10

10

20

Fitted Value

Page 59 of 156

13

Here is a situation where the linearity assumption is violated.


Residuals Versus the Fitted Values
(response is C3)
30

Residual

20

10

-10
5

10

15

Fitted Value

Here is a common situation (below) where homoscedasticity is


violated: notice how the residuals show increasing vertical
dispersion around the horizontal line through zero as the fitted
values increase.
Residuals Versus the Fitted Values
(response is C3)

Residual

10

-10
-10

10

20

30

Fitted Value

Page 60 of 156

14

How to Do This Regression Analysis in MINITAB


Minitab 17

Page 61 of 156

15

How to do a Regression Analysis in Excel


Click into the Data menu and check for the Data Analysis
option (far right).
If the Data Analysis option is not there :
Start from File menuClick on Options Click on
Add-Ins Select Analysis ToolPak & hit Go near
the bottom of the dialog box.
Otherwise start from the Data menu:
Click on Data Analysis (far right), Select Regression and
then specify the Y- and X-range in the dialog box.
(You can simply click into each range box and then move the
mouse directly into the spreadsheet to select the numerical
data cells from the appropriate column(s) of the spreadsheet.
The appropriate range of cells should then appear in the range
box. The Input X Range may consist of several columns,
each column for a different predictor.)
Other Good References
See page 519 of the text for an example with great screen
pictures. Another good reference is:
www.wikihow.com/Run-Regression-Analysis-in-Microsoft-Excel.

Page 62 of 156

Lecture 6 Addendum: Terminology, Examples, and Notation


Regression Terminology
Synonym Groups
1) Y, Dependent Variable, Response Variable
2) X, Predictor Variable, Independent Variable
3)

, Prediction, Predicted Value, Fit, Fitted Value

4) Variance of Y, MS(Total), Adj MS(Total)


5) Variance of Error, MS(Error), Adj MS(Error)
6)

, Error, Residual

7) Coefficients are sometimes referred to using the more general term parameters.
Coefficients are the parameters that are used in linear models.

Main Ideas
Simple linear regression refers to a regression model with only one predictor. The underlying
theoretical model is:

= 0 + 1 + ,
where y represents a value of the dependent variable, x is a value of the predictor, represents
random error and and 1 represent unknown constants.
The corresponding estimate regression equation is:

= 0 + 1 .
The regression coefficient b0 and b1 refer to sample estimates of the true coefficients 0 and 1,
respectively.
The sample correlation coefficient, r, estimates the true (or population) value of the correlation,
, which is a measure of the degree to which two variables (Y and X) are linearly related.
Of course, the sample correlation (r) and the slope coefficient (b1) are closely related:

1 =

(,)
()

(*)

where and are the sample standard deviations of Y and X respectively.


The corresponding relationship between the true values, 1 and , is:

1 =

(,)

(,)
2

Page 63 of 156

Examples of Correlation
Change in GDP vs Consumer Sentiment (1995-2015)
1999

Change in GDP

500

2015

250

-250

2009

-500
60

70

80

90

100

110

Consumer Sentiment

Correlation: r= 0.725 (R2 (unadjusted) = r2 x100% = 52.6%); y = -756.6 + 12.25 x.


Change in GDP (Y): change in Annual U.S. GDP in billions of dollars
Consumer Sentiment (X): Index of financial well-being and the degree to which consumer
expectations are positive (based on five questions on a survey conducted by the University of
Michigan. (https://data.sca.isr.umich.edu/fetchdoc.php?docid=24770)

Life Expectancy(Both_Sexes) vs Literacy Rate (>=15 years) for 112 Nations


Life Expectancy (Both Sexes in years)

85
80
75
70
65
60
55
50
20

30

40

50

60

70

80

90

100

Literacy Rate (for People >=15 years-old)

Correlation: r= 0.725 (R2 (unadjusted) = r2 x100% = 51.3%); y = 44.45 +0.3045 x.


Data are from the World Health Organization (Life Expectancy (Y) as of 2015, Literacy Rate (X)
for 2007-2012).
Page 64 of 156

MPG (City) vs Weight (Lbs)


28

26

MPG

24

22

20

18

16
2500

2750

3000

3250

3500

3750

4000

Weight

Correlation: r = -0.891 (R2 (unadjusted) = r2 x100% = 79.4%);


MPG = 41.71 0.006263 Weight.
Data are for 14 automobiles (2005) from www.chegg.com.

Random Y vs Random X
4
3

Random Y

2
1
0
-1
-2
-3
-4
-4

-3

-2

-1

Random X

Correlation: r= 0.018 (R2 (unadjusted) = r2 x100% = 0.1%);


Random Y = 0.06153 + 0.01688 Random X.
Y and X are two sets of 1000 standard normal random numbers.
Page 65 of 156

Notation for Types of Variation and R2


For linear regression models (with one or more predictor variables), the basic types of sums of
squares represent three types of variation.

1.

) ()
Total Variation = (
The sum of squares of the observations of Y around their mean.


) ()
2. Explained Variation = (
) around their mean (which is also ).
The sum of squares of the predicted values of Y (
) ()
3. Unexplained Variation = (
The sum of squares of the differences between each observation and the corresponding
predicted value.

Note:

Total Variation = Explained Variation + Unexplained Variation


Or:

SS(Total) = SS(Regression) + SS(Error)

The R2 (Unadjusted) is sometimes called the simple coefficient of determination and it is


the square of the correlation:

()
=

()
()
=
()

() = =

R2 (Adjusted) is a more accurate assessment of the strength of the relationship between Y


and X. In general:
()
R2 (Adjusted) =
()

()
[
( [# ])]
=
()
[
( )]
For simple linear regression, which includes a constant and a slope coefficient: [# of
parameters in model] = 2.
[Reference: pp. 493-495, Essentials of Business Statistics (2015), 5th Edition, Bowerman
et al.]

Page 66 of 156

Lecture 7
Inferences About Regression Coefficients &
Confidence/Prediction Intervals for Y /Y

Outline:(Ref: Ch. 13: 13.3-13.4, 13.6-13.7)


Recap of Main Ideas from Lecture 6
Testing Lack-of-Fit
Inferences Based on Regression Coefficients (Ch. 13.3)
Prediction Intervals versus Confidence Intervals for Y (Ch. 13.4)
(Please read pp. 486-489, not for details on how PIs and
CIs are calculated but for the main idea of what they tell
us about Y!)
Summary of Ideas from Lecture 6
3 Assumptions:
The basic relationship between Y and X is linear up to a
random error term that has mean 0 (linearity) and constant
variance (homoscedasticity). Errors are random in the sense
that they are independent of each other and do not depend on
the value of Y.
One way to check these assumptions is to plot residuals versus
fitted values.
The coefficients estimates b0, and b1, are chosen to minimize the
sum of squared errors (or residuals).
b1 represents the average change in Y that is associated with a
one unit change in X.
Regression is useful because it allows us to reduce the
uncertainty regarding Y. We can think about this is in terms of
the decomposition of SS(Total) (NOTE: SS is used in
regression to refer to Sums of Squares):
Page 67 of 156

SS(Total)

SS(Regression) +

=1( )2

SS of observed
y-values
around mean

2
+ =1( )

=1( )2
SS of fitted
values around
the mean

SS(Error)

SS of errors (errors
are the actual y minus
fitted or predicted y)

This decomposition of total uncertainty (SS(Total)) suggests two


Variance of Error
useful summaries of how well the model fits:
2(

1) ) = 1

()/([# ])
( ) /(1)

()
Variance of Y

(Recall that:

2)

2(

) = 1

=1

()

())
()

= ()
{()/( [#

])}

Application

Data are available from Blackboard (with this lecture note).


These data are from 43 urban communities (2012)
Citation: Cost of Living Index, Council for Community and
Economic Research, January, 2013.
MTB > info c1-c3

Information on the Worksheet


T

Column
C1
C2

Count
43
43

Name
URBAN AREA AND STATE
HOME PRICE
Avg for 2400 sq. ft. new home, 4 bed, 2 bath on 8000 sq.ft. lot

C3

43

Apt Rent
Avg for 950 sq. ft. unfurnished apt., 2 bed, 1.5-2bath
excluding all utilities except water.

Other interesting data sets on home prices and rental rates by city:
https://smartasset.com/mortgage/price-to-rent-ratio-in-us-cities
https://smartasset.com/mortgage/rent-vs-buy#map .
Page 68 of 156

Regression for All 43 Cities


Fitted Line Plot

HOME PRICE = - 61366 + 339.3 Apt Rent


1400000

S
R-Sq
R-Sq(adj)

New York (Manhattan) NY

HOME PRICE

1200000

70561.6
88.2%
87.9%

New York (Brooklyn) NY

1000000

San Francisco CA

800000

Honolulu HI

600000

New York (Queens) NY

400000
200000
1000

1500

2000

2500

3000

3500

4000

Apt Rent

Regression Using 38 Cities Where Rent < $1500


HOME PRICE = 1894 + 286.5 Apt Rent
S
R-Sq
R-Sq(adj)

500000

59728.1
35.9%
34.2%

HOME PRICE

450000

400000

350000

300000

250000

200000
1000

1100

1200

1300

1400

1500

Apt Rent

Page 69 of 156

Residuals Versus Fits

(response is HOME PRICE)


100000

Residual

50000

-50000

-100000

-150000
300000

320000

340000

360000

380000

400000

420000

440000

Fitted Value

Page 70 of 156

Do the assumptions of regression hold (approximately)?


Linearity (and randomness):

Residuals fall symmetrically around horizontal line


where Residual =0. This indicates that the linearity
and randomness assumptions hold (approximately).
Homoscedasticity:

There is approximately constant vertical dispersion


of residuals which supports homoscedasticity.
How would one identify possible investment opportunities?
> Y ).
Negative Residuals ( Y

Interpret the estimated slope coefficient, b1.


On average, home price (Y) increases
per $1 increase in

Rent (X)

$286.5
.

Interpret the constant coefficient, b0.


Hypothetical home price when apartment rent is 0.
(Its an extrapolation,since no rents are close to zero!)

Page 71 of 156

Regression Analysis: HOME PRICE versus Apt Rent


Analysis of Variance
Source
DF
Adj SS
Regression
1 72076453090
Apt Rent
1 72076453090
Error
36 1.28428E+11
Lack-of-Fit 34 1.24610E+11
Pure Error
2
3818260289
Total
37 2.00505E+11
Model Summary
S
R-sq
59728.1 35.95%

R-sq(adj)
34.17%

Coefficients
Term
Coef
Constant
1894
Apt Rent 286.5

SE Coef
77528
63.7

Adj MS
72076453090
72076453090
3567451386
3664999695
1909130145

F-Value
20.20
20.20

P-Value
0.000
0.000

1.92

0.401

R-sq(pred)
28.48%

T-Value
0.02
4.49

P-Value
0.981
0.000

VIF
1.00

Regression Equation
HOME PRICE = 1894 + 286.5 Apt Rent

Recall how R2, R2(adj), and s summarize information provided


in the Analysis of Variance table.
2

()
()

7.21 1010
2.01 1011

OR

72.1
201

=36%

s = () = 3.57 =59.7 thousand $

) =

()
()

.
2.01 1011 /37

34.2%

Interpret R2(adjusted):

34 % of the variance in home price is explained by


model. R-sq(pred) (28.48%) refers to the predicted R2
and represents the estimated proportion of variance of
home price that the model would explain in a different
sample from the same population.

Interpret s:

Estimated standard deviation of theoretical error is 60


thousand dollars.
Page 72 of 156

The analysis on the last page also includes two types of statistical tests.
1) Test for Lack-of-Fit
H0: The Model Is Appropriate versus H1: Model Is Not Appropriate
Here we hope that we do not reject H0. That is, we hope to see a
large p-value (e.g., p-value > 0.2). If the p-value is small and
forces us to reject H0, then we should try to find another model.
If there is substantial information that the model is inappropriate,
the Lack-of-Fit variance will be significantly larger than the
Pure-Error variance and the ratio of these two variances should
be significantly greater than 1. Here the p-value (0.4) indicates this
ratio, called the F-value (1.92), is not significantly greater than 1.
(Please find these numbers in the Analysis of Variance Table.)
F-value = 1.92 =

3.6 6
1.91

()
( )

Lack-of-Fit Variance
Pure Error Variance

According to the p-value (0.4), there is a 40% probability that the


F-value would be 1.92 or larger, even when the real population
variances are equal. (The estimate of the pure-error variance is
not very accurateits based on 2 degrees of freedom.) Thus, there
is no significant indication that the model is inappropriate.
2) Test of Whether Each Coefficient (0 and 1) Is Significantly
Nonzero
H0: =0 versus H1: 0.
Here we hope to at least be able to reject H0 when testing 1 (the
coefficient of the predictor), i.e., we hope to reject H0: 1 =0 in
favor of H1: 1 0. Consequently we would prefer to see a small
p-value in this case (e.g., p-value < 0.05).
If we test at 0.05 level, we accept H0 for 0 (p-value =0.981 >0.05)
and reject H0 for 1 (p-value =0.000<0.05). (Please find these pvalues in coefficient table.) We conclude that there is a significant
relationship between home price and apartment rent.
Page 73 of 156

The Test for Lack-of-Fit

It is only possible to do this test when there are two or more


observations that have the same predictor values (x-values). In these
cases, SS(Error) can be further decomposed into two components:

2
= SS(Lack-of-Fit) + SS(Pure Error).
SS(Error) = =1( )

SS(Pure Error) is calculated as follows: for each group of observations


that have the same predictor values (i.e., same apartment rent value)
the sum of squares of the y-values around the mean is calculated, and
these sums are then added up across all groups. In this case there are
two groups of cities at the same rent level.
Urban Area
Jacksonville FL
Bryan-College Station TX

Home Price (Y)


218265
246858
Mean: 232562

Apt Rent (X)


1019
1019

Rochester MN
Minot ND

250723
333300
Mean: 292012

1122
1122

Consequently:
SS(Pure Error) = (218265 232562) 2 + (246858 232562)2
+ (250723 292012)2 + (333300 292012)2 = 3.818 billion
SS(Lack-of-Fit ) = SS(Error) SS(Pure Error) =128.4 billion 3.8 billion
= 124.6 billion
Analysis of Variance Table (again)
Source
Regression
Apt Rent
Error
Lack-of-Fit
Pure Error
Total

DF
1
1
36
34

Adj SS
72076453090
72076453090
1.28428E+11
1.24610E+11

Adj MS
72076453090
72076453090
3567451386
3664999695

3818260289

1909130145

37

2.00505E+11

F-Value
20.20
20.20

P-Value
0.000
0.000

1.92

0.401

The degrees of freedom of SS(Pure Error) is number of observations with


common predictor values (x-values) minus the number of group means
(4-2= 2). The degrees of freedom for SS(Lack-of-Fit) is the number of
distinct predictor values minus the number of parameters in the model
(36-2=34).
Page 74 of 156

Tests & Other Inferences Based on Regression Coefficients


Note that we can make inferences (i.e., do hypothesis tests and form
confidence intervals) about the coefficients from a regression
analysis by treating the coefficients as though they were CASE 2
sample means and using t-values based on the degrees of
freedom of SS(Error). This is also true in multiple regression.
We have already considered the basic significance test on 1:
H0: 1= 0 versus

H1: 1

Test Statistic: =

Please find these numbers


in Coefficient Table !!!!!!

286.5 0
63.7

= 4.49; p-value=0.000

Conclusion: Reject H0 for =0.001 (1 is significantly nonzero).

Is the average increase in home price per $1 increase in rent


significantly greater than $100 ( = 0.05) ?
H0: 1= 100
Test Statistic: =

versus
286.5 100
63.7

H1: 1 > 100

= 2.93

Critical Value: t (df= 36 , = 0.05 ) = 1.69


Conclusion: Reject H0

(Yes!!)

Find a 90% confidence interval for the average increase in


home prices associated with a $1 increase in monthly
apartment rent.
286.5 ( 1.69

)*(

63.7 ) = $(179, 394)

Page 75 of 156

10

Prediction Intervals Versus Confidence


Intervals for Y

If we use the model to make predictions, two types of intervals are


provided at the end of the regression analysis: a confidence interval
for the average value of Y for all observations in the population at a
specified value of X (labeled "C.I."), and a prediction interval for
the predicted value of Y for a single observation with the specified
value of X (labeled P.I.). Note that in each case the interval is
centered at ^y.
A 95% confidence interval for y (the true average of Y at specific
predictor values X), is of the form:
(..)

0.025

[ ]

where SE Fit stands for standard error of the fit and represents
the uncertainty due to the fact that we only have estimates of the
regression coefficients. In calculating the prediction, these
estimated coefficients are multiplied by the corresponding predictor
value (or predictor values in the case of multiple regression), so that
the resulting error is partly a function of the actual value(s) of the
predictor variable(s). SE Fit IS NOT something to be calculated
by hand!

Page 76 of 156

11

A 95% prediction interval for Y (at specific predictor value(s) X)


is of the form:
(..)

0.025

[( )2 + ()]

Here we use a standard deviation that is the square root of the sum
of two variances. The first variance is the one which we discussed
on the last page. The second variance is MS(Error): this is the
estimated variance of the error term in the model and it affects all
individual predictions but does not depend on the predictor values.
Here is an illustration of how you would do this analysis in Minitab.
Minitab 17

After running the regression analysis (via the Fit Regression


Model dialog box), one can then select the Predict option (third
item in the last menu on the right) and this opens the following
dialog box.

Page 77 of 156

12

Prediction for HOME PRICE


Regression Equation
HOME PRICE = 1894 + 286.5 Apt Rent

#3
Variable Setting
Apt Rent
1100
Fit
SE Fit
95% CI
317061 11839.4 (293049, 341072)
Variable Setting
Apt Rent
1300
Fit
SE Fit
95% CI
374364 11367.6 (351309, 397418)

95% PI
(193569, 440552)

#2
95% PI
(251055, 497672)

1. What is the predicted home price when rent is $1100?

317,061 ( = 1894 + 286.5 (1100) )


2. What is the 95% prediction interval for the home price in a
community where the rent is $1100 per month?

See interval labelled #2 above.


(..)
[( )2 + ()] .)
(PI calculation: 0.025
3. What is the 95% confidence interval for the average home price in
all communities where the rent is $1100 per month?

See interval labelled #3 above.


(..)

(CI calculation: 0.025

[ ] .)
Page 78 of 156

13

The Fitted Line Plot analysis (listed in the Regression menu) will
draw pictures of the confidence and prediction intervals, along with
the regression line (use the Options button).

Fitted Line Plot "Options" Provides Picture of the CIs and PIs
HOME PRICE = 1894 + 286.5 Apt Rent

600000

Regression
95% CI
95% PI
S
R-Sq
R-Sq(adj)

HOME PRICE

500000

59728.1
35.9%
34.2%

400000

300000

200000

1000

1100

1200

1300

1400

1500

Apt Rent

Note:
1. CIs are always narrower than PIs (and sometimes much narrower)
because they are confidence intervals for the line itself and do not
reflect the random error at individual points.
2. Both the CIs and PIs get narrower as X approaches its mean
because this is where the value of SE FIT is smallest.
Page 79 of 156

Lecture 8: Introduction to Multiple Regression

(Ref: Ch.14: 14.1-14.5, pp. 524-544)


Outline:
1. Model: y = 0 + 1 x1 + 2 x2 + . . . + k xk +
Fit: y^ = b0 + b1 x1 + b2 x2 + . . . + bk xk .
2. Check Linearity, Homoscedasticity and Random Error
(with the plot of residuals versus fit).
3. Interpretation of coefficient bi: avg. change in Y associated with a
one unit increase in xi (other predictors held constant).
4. Decomposition of SS(Total)
Sums of Squares:
SS(Total)

Drop Predictors

= SS(Regression)

+ SS(Error).

Add Predictors

Model with constant:

=1( )2 =
=1( )2 + =1( )2 .
Model with no constant:

=1 2
+ =1( )2 .
=
=1( )2
SS(Total) remains unchanged as predictors are added to, or dropped
from, the model. But as predictors are added, SS(Regression)
increases by the amount that SS(Error) decreases, and as predictors
are dropped, SS(Regression) decreases by the amount that SS(Error)
increases.
Degrees of Freedom:
Total
= Regression
+
Error
[(n-1) or n] = [# of predictors] + [n-{# of parameters}].
Model with constant:
n-1
=
k
+
[n - (k+1) ].
Model with no constant:
n
=
k
+
[n k ] .
(In simple linear regression: {# of predictors}= 1, and if there is a
constant in the model, {# of parameters} = 2.)
Page 80 of 156

Data from the First 49 Super Bowls


MTB > info c7 c60 c20 c37 c40

Information on the Worksheet


Column Count Missing
C7
49
0

Name
Share

Description
% of households that watch the
game among all households
watching TV (Market Share)

Share_LastYR Share for last years super bowl

C60

49

C20

49

Over/Under

C37

49

|Line|

Predicted value of {[winning score]


[losing score] } (median of wagers)

C40

49

Dallas

Indicator for whether Dallas played

Predicted sum of two scores


(median of wagers)

MTB > print c7 c1 c60 c20 c37 c40

Row Year Share Share_lastYR Over/Under


1
2

1967
1968

79
68

*
79

47
48
49

2013
2014
2015

69
69
72

71
69
69

*
43.0

48.0
47.5
47.5

|Line|

Dallas Teams (Winner)

14.0
13.5

0
0

Green Bay, Kansas City


Green Bay, Oakland

0
0
0

San Fran, Baltimore


Seattle, Denver
Seattle, New England

4.5
2.5
1.0

MTB > corr c7 c60 c20 c37 c40 (Correlations c7 c60 c20 c37 c40)

Share
0.462
0.001

Share_lastYR

Over/Under

-0.385
0.007

-0.466
0.001

|Line|

-0.115
0.433

-0.095
0.522

0.241
0.098

Dallas

0.365
0.010

0.070
0.637

-0.204
0.164

Share_lastYR

Over/Under

|Line|
Cell Contents:
Correlation (r)
P-value

-0.017
0.905

1.

Which of these variables would be the best predictor of Share in


a simple linear regression?
Share_lastYR

2.

What would be the value of R2 (Unadj) in this simple regression?


(0.462)2 = 0.21 or 21%

Page 81 of 156

Page 82 of 156

A Multiple Regression with the Predictors 'Share_lastYR and Dallas

Minitab 17 Commands

Minitab 17 Menus

MTB> regr;
SUBC> response Share;
Stat Menu Regression Regression
SUBC> continuous 'Share_lastYR' Dallas;
Fit Regression Model
SUBC> terms 'Share_lastYR' Dallas;
SUBC> GFITS.
Regression Equation:

Share = 38.82 + 0.411 Share_lastYR + 4.60 Dallas

3. Do the assumptions hold?


Approximate symmetry around line
where residuals=0 supports
linearity & randomness.

Approximate constant vertical


dispersion supports
homoscedasticity.

Page 83 of 156

5
Regression Analysis: Share versus 'Share_lastYR, Dallas
Analysis of Variance (ANOVA Table)

Source
Regression
Share_lastYR
Dallas
Error
Lack-of-Fit
Pure Error
Total

DF
2
1
1
45
23
22
47

Adj SS
344.9
180.7
140.2
615.0
327.4
287.5
959.9

Adj MS
172.44
180.71
140.21
13.67
14.24
13.07

F-Value
12.62
13.22
10.26

P-Value
0.000
0.001
0.002

1.09

0.422

For the predictors, the F-values are


just squares of the T-values, and
the P-values are the same. Thus,
the important information in these
rows of this type of Adjusted
Analysis of Variance table is already
provided in the Coefficient table.

Model Summary

S
3.69681

R-sq
35.93%

R-sq(adj)
33.08%

R-sq(pred)
26.84%

Coefficients

Term
Constant
Share_lastYR
Dallas

Coef
38.82
0.411
4.60

SE Coef
7.64
0.113
1.44

T-Value
5.08
3.64
3.20

#5

P-Value
0.000
0.001
0.002

VIF
1.00
1.00

#6
Regression Equation: Share = 38.82 + 0.411 Share_lastYR + 4.60 Dallas

4. Now consider the multiple regression model.


a. What should be done before significance testing?

Check Assumptions

b. Is it appropriate to go ahead and use this model for significance testing


and prediction?
Yes

(See # 3)

5. Is the model significant at the 0.01 level?


H0: Share_lastYR = Dallas = 0
P-value: 0.000
Conclusion: Reject H0

versus

H1: Not H0

see Regression line of ANOVA table


(The model is significant!)

(See test #2 in the outline of methods for regression.)


6. Interpret the coefficient of Dallas & provide an 80% confidence interval.
Interpretation:
{Avg. Share when Dallas plays} {Avg. Share when Dallas not in game}
after adjusting for Share_lastYR.

4.60 t(df= 45 , = 0.1 ) * [ 1.44 ] = (2.7, 6.5 ) %


1.30

Page 84 of 156

Notes on Sequential SS(Regression)


The Analysis of Variance table on the previous page (reproduced below)
is the default format where each SS and MS is Adjusted so that it
reflects the marginal effect of each Source. In this case the F-value
for each predictor is simply the square of each predictors t-value.
Analysis of Variance
Source
DF Adj SS
Regression
2
344.9
Share_lastYR
1
180.7
Dallas
1
140.2
Error
45
615.0
Total
47
959.9

Adj MS
172.44
180.71
140.21
13.67

F-Value
12.62
13.22
10.26

P-Value
0.000
0.001
0.002

In the Fit Regression Model dialog box, we can click on the


Options button to request Sequential (Type I) Sum of Squares.

Analysis of Variance

Analysis of Variance
Source

F-Value

P-Value

344.9

172.44

12.62

0.000

Share_lastYR 1

204.7

204.67

14.98

0.000

Dallas

140.2

140.21

10.26

0.002

Error

45

615.0

13.67

Total

47

959.9

Regression

DF

Seq SS

Seq MS

Source
Regression

DF

Seq SS

Seq MS

F-Value

P-Value

344.9

172.44

12.62

0.000

164.2

164.18

12.01

0.001

Share_lastYR 1

13.22

0.001

Dallas

180.7

180.71

Error

45

615.0

13.67

Total

47

959.9

Note that the three Analysis of Variance Tables (above) are identical except
for the rows that refer to predictors. The last two tables show the sequential
(rather than marginal) contribution of each predictor to SS(Regression).
The marginal effect of a predictor is what it contributes when it comes in
last. The sequential contribution refers to the amount by which
SS(Regression) increases when a predictor is entered in the order indicated.
Page 85 of 156

For example, the shaded rows show the amount that each predictor would
contribute to SS(Regression) if it were the only predictor in the model.
7. What would be the value of SS(Regression) in a simple linear
regression where Share is regressed on Share_lastYR?
204.7
8. What would be the value of SS(Regression) in a simple linear
regression where Share is regressed on Dallas?
164.2
Also note that in each of the Sequential tables the values of Seq SS
for the predictors add up to the Seq SS value for Regression which is
SS(Regression):
1st Sequential Table: 204.7 + 140.2 =
344.9;
nd
2 Sequential Table: 164.2 + 180.7 =
344.9.
Ultimately, a predictors value in a given model depends on its
marginal contribution, i.e., what it contributes to SS(Regression) when
added last. Thus, these values are the ones provided in the default
version of the analysis of variance table (near the top of the last page).
Each coefficients t-value also measures the corresponding predictors
marginal value in the following way: the squared t-value of the
coefficient of each predictor is equal to that predictors marginal
contribution to SS(Regression) divided by MS(Error). Here is the
coefficient table with t-values:
Coefficients
Term
Constant
Share_lastYR
Dallas

Coef
38.82
0.411
4.60

SE Coef
7.64
0.113
1.44

T-Value
5.08
3.64
3.20

P-Value
0.000
0.001
0.002

VIF
1.00
1.00

Find these F-values


near top of p. 6

To illustrate:
2
_
=
2

. . ()
()

. . ()
()

180.7
13.67

140.2
13.67

= (3.64)2 =13.22
= (3.20)2=10.26

Since the square of each t-value reflects the marginal value of the
predictor relative to MS(Error), the fit of a model will improve
(MS(Error) will decrease) whenever we remove predictors with
absolute t-values less than 1 (or if we add predictors with |t-value| >1).
Page 86 of 156

Now I Add the Predictor |Line|*Over/Under to the Model


Analysis of Variance (ANOVA Table)
Source
DF Seq SS Seq MS
Regression
3 383.15 127.72
Share_lastYR
1 204.67 204.67
Dallas
1 140.21 140.21
|Line|xOver/Under 1
38.27
38.27
Error
44 576.72
13.11
Total
47 959.87
Model Summary
S
R-sq
3.62039 39.92%
Coefficients
Term
Constant
Share_lastYR
Dallas
|Line|xOver/Under

R-sq(adj)
35.82%

Coef
42.55
0.376
4.57
-0.00420

F-Value
9.74
15.61
10.70
2.92

P-Value
0.000
0.000
0.002
0.095

T-Value
5.46
3.33
3.25
-1.71

P-Value
0.000
0.002
0.002
0.095

R-sq(pred)
28.72%

SE Coef
7.80
0.113
1.41
0.00246

The highlighted p-values in


the ANOVA table are based
on Sequential MS values and
therefore do NOT refer to the
significance of each
predictors coefficient in this
model. The real significance
of each coefficient is
actually provided in the
Coefficient Table. In this
case the p-value for Dallas is
the same in both tables.
VIF
1.04
1.01
1.04

Regression Equation

Share = 42.55 + 0.376 Share_lastYR + 4.57 Dallas - 0.00420 |Line|xOver/Under


In the previous model:
Share = 38.82 + 0.411 Share_lastYR + 4.60 Dallas

Page 87 of 156

9. Do the basic assumptions seem to hold (i.e., linearity, homoscedasticity,


random residuals)?
Linearity & Randomness: Approximate symmetry (around the line

where Residual =0) supports both assumptions.


Homoscedasticity: Approximately constant vertical dispersion
supports this assumption.

10. Is this 3-predictor multiple regression model significant at the 0.01


level?
Yes (p-value on Regression line of ANOVA is 0.000)
11. Which regression coefficients are significantly nonzero at the 0.05 level?
Only the constant & coefficients of Share_lastYR & Dallas
12. After correcting for the effects of Dallas and Share_lastYR, does
Share decrease by significantly more than 0.1% for every 100 point
increase the value of |Line|xOver/Under at the 0.05 level? (This is
equivalent to asking: does Share decrease by significantly more than
[0.1%]/100 = 0.001% per 1 point increase in |Line|xOver/Under after
correcting for other predictors?)
H0: |Line|xOver/Under =- 0.001 versus H1: |Line|xOver/Under < -0.001
Test Statistic:

. (.)
.

= 1.30

Critical value: -t (44 df, = 0.05) = -1.68


Conclusion: Accept H0 , No!
13. Provide a 90% confidence interval for the amount by which share
changes per 1 point increase in |Line|xOver/Under (holding
other predictors constant).
-0.00420 + t(44 df, = 0.05)* 0.00246
= (-0.0083,-0.000067)%
1.68
Page 88 of 156

10

14. Ceteris paribus, does Share increase by significantly more than


2% when Dallas plays at the 0.1 level? (Equivalent question: Is the
average difference in Share between those games when Dallas
plays, and all other games, significantly greater than 2% at the 0.1
level, holding other predictors constant?)
H0: Dallas = 2
versus
H1: Dallas > 2
Test Statistic:

4.57

= 1.82

1.41
Critical value: t (44 df, = 0.1) = 1.3 Conclusion: Yes!

15. Do the following test (0.01 level).


H0: Dallas = |Line|xOver/Under =0

(Reject H0)

versus H1: not H0

Analysis of Variance with Sequential SS from page 8


Source
Regression
Share_lastYR
Dallas
|Line|xOver/Under
Error
Total

DF
3
1
1
1
44
47

Seq SS
383.15
204.67
140.21
38.27
576.72
959.87

Seq MS
127.72
204.67
140.21
38.27
13.11

F-Value
9.74
15.61
10.70
2.92

P-Value
0.000
0.000
0.002
0.095

We are testing whether there is useful information in this group of


two predictors. This is done by comparing two models:
a) the complete model with:
Share_lastYR, Dallas, & |Line|xOver/Under
b) the reduced model with just Share_lastYR
(i.e., the reduced model is always the model without the group of
predictors being tested). The numerator of the test statistic is the
average increase in SS(regression) per additional predictor that
occurs as we move from the reduced model to the complete model.
The denominator is the variance of error in the complete model.
For this test, we need the predictors being tested to come in last
sequentially (see test #5 in the outline of methods for regression).
[ 140.21 + 38.27 ]/2
[383.15 204.67]/2

13.11
Critical value: F ( 2 & 44
Conclusion: Reject H0 .

13.11

= 6.8

df, = 0.01) = 5.1


Bruce Cooil, 2016

Page 89 of 156

Lecture 9: More Multiple Regression Examples


Outline

Ref: Chapter 14: 14.5 (again)-14.9, End of 14.10 (Partial


F-test), 14.11, Summary, Glossary

More Examples of Multiple Regression


Checking Assumptions
Testing Hypotheses About the Model
Application 1
Modeling the Salary of NFL Football Coaches

This application illustrates a compensation model based on


performance and other factors.
MTB > info c8 c6 c15 c23 c25
Information on the Worksheet

Column Count Name


C8
32 Salary
C6
32 Age
C15
32 W_with_Teams
C23
32 AFC
C25
32 HeadCoach_in_SB

Compensation (millions)
Age (years)
Wins with Team (as HC)
Indicator for AFC Team
Indicator for Head Coach in Super B.

Menu Guide:
MINITAB 17: Stat >Regression >Regression >Fit Regression Model
In Dialog Box:
Response: c8
Continuous Predictors: c6 c23 c25 c15
[When fitting this model, I actually listed the predictors in the order: c6 c23 c25 c15. The order is
inconsequential, unless we plan to use the sequential sums of squares to test the importance of a
specific subgroup of predictors. If we plan to test a subgroup, those predictors should be listed
last.]

Model 1
Regression Equation

Salary = 4.57 - 0.0016 Age - 0.669 AFC + 1.092 HeadCoach_in_SB


+ 0.01494 W_with_Team
Page 90 of 156

Model Summary
S
R-sq
0.955828 60.99%

R-sq(adj)
55.21%

R-sq(pred)
47.31%

Coefficients
Term
Constant
Age
AFC
HeadCoach_in_SB
W_with_Team

Coef SE Coef
4.57
1.70
-0.0016
0.0323
-0.669
0.360
1.092
0.415
0.01494 0.00441

T-Value
2.68
-0.05
-1.86
2.63
3.39

Variance Inflation Factor: factor by


which the variance of the estimated
coefficient of a predictor increases due to
correlation between that predictor and the
other predictors in the model, relative to
what it would be if there were no
correlation with other predictors.

P-Value
0.012
0.960
0.074
0.014
0.002

VIF
1.32
1.14
1.45
1.35

The simple correlation between age and salary is positive (r= 0.364, pvalue=0.040; not shown in output above), but the negative coefficient
for age in the model above suggests there is age discrimination! (Until
we realize the coefficient for age does not even approach significance.)
This model and the significance of the positive simple correlation
between age and salary indicate that age is a proxy for past success (the
last two predictors in the model above).
Discussion Question
1. What is wrong with this model? What would be a better fitting
model?
The model without Age would fit better, because |tAge| < 1
(see discussion in last lecture, bottom of p.7).
What if the constant has a t-ratio that is less than one in
absolute value? If the primary purpose of the model is
prediction, you should take the constant out of the model (as
you would any other predictor), otherwise you might keep the
constant. For example, you would not eliminate the constant in
the Market Model. The purpose of the Market Model is not
prediction, and the constant is actually used to estimate the
securitys return when the market return is zero.

Model 2
Here we drop Age
Regression Equation

Salary = 4.484 - 0.665 AFC + 1.086 HeadCoach_in_SB


+ 0.01490 W_with_Team
Page 91 of 156

This is the p-value for


testing overall significance.

Model 2 (Continued)
Analysis of Variance
Source
DF Adj SS
Regression
3 38.5626

Error
Lack-of-Fit
Pure Error
Total

Adj MS
12.8542

F-Value
14.59

P-Value
0.000

0.8811
0.9518
0.2917

3.26

0.180

.
28
25
3
31

24.6698
23.7948
0.8750
63.2324

Model Summary

S
R-sq R-sq(adj) R-sq(pred)
0.938650 60.99%
56.81%
49.58%
Here Is the Summary for Model 1:
S
R-sq R-sq(adj) R-sq(pred)
0.955828 60.99%
55.21%
47.31%
Coefficients
Term
Constant
AFC
HeadCoach_in_SB
W_with_Team

Coef
4.484
-0.665
1.086
0.01490

SE Coef
0.294
0.343
0.392
0.00425

T-Value
15.26
-1.94
2.77
3.50

P-Value
0.000
0.062
0.010
0.002

VIF
1.07
1.35
1.30

2. Do the assumptions hold?


Approximate symmetry around line where residuals=0
supports assumptions of linearity and randomness.
No systematic violations of roughly constant vertical
dispersion supports assumption of homoscedasticity.
Page 92 of 156

3. Does the lack-of-fit test indicate there is a problem?


The p-value (=0.180) is marginally significant but I
would use this model (until I find a better one).
4. Is the model significant at the 0.01 level?
Yes! (P-value=0.000see the p-value in the
Regression line of the Analysis of Variance Table.)
5. Why is Model 2 clearly better than Model 1?
Model 2 is simpler and fits better !
Model 2 has fewer predictors and smaller standard deviation of
residuals. Simplicity & fit are the two fundamental dimensions on
which every model is judged. Usually we have to sacrifice fit for
simplicity or vice versa but not in this case!
Application 2
Models for Home Price Based on the Home's Attributes
MTB > Retrieve
MTB > info
Column
Name

'C:\MTBWIN\DATA\PROMOD.MTW'.
Count

C1

price

24

C3

baths

24

C4

lotsize

24

C5

space

24

C7

nROOMS

24

C11

two-car

24

C12

HALFbath

24

In thousands of U.S. dollars


Number of bathrooms (either 1 or 1.5)
Lot size in thousands of square feet
Living space (thousands of square feet)
Number of rooms
Indicator for 2 garage stalls
Indicator for extra half-bath

Model 1
Stat> Regression >Regression > Fit Regression Model
In regression dialog box:
1) List as continuous predictors: c12 c11 c4 c7 c5
2) Open Options, and set Sum of Squares to
Sequential.
Regression Equation:
price = 132.8 + 28.49 HALFbath + 22.17 two-car
+ 3.52 lotsize - 5.32 nROOMS + 12.0 space
Page 93 of 156

Analysis of
Source
Regression
HALFbath
two-car
lotsize
nROOMS
space
Error
Total

Variance
DF
Seq SS
5 12955.2
1
8530.4
1
3242.6
1
969.3
1
139.2
1
73.7
18
3832.6
23 16787.7

Seq MS
2591.04
8530.44
3242.62
969.25
139.15
73.72
212.92

Model Summary
S
R-sq
14.5918 77.17%

R-sq(adj)
70.83%

Coefficients
Term
Coef
Constant 132.8
HALFbath 28.49
two-car
22.17
lotsize
3.52
nROOMS
-5.32
space
12.0

SE Coef
32.1
9.42
8.15
1.96
5.46
20.4

F-Value
12.17
40.06
15.23
4.55
0.65
0.35

P-Value
0.000
0.000
0.001
0.047
0.429
0.564

R-sq(pred)
56.25%

T-Value
4.13
3.03
2.72
1.79
-0.97
0.59

P-Value
0.001
0.007
0.014
0.090
0.343
0.564

VIF
2.22
1.86
1.61
2.52
3.42

6. What is wrong with this model? (I dont even bother to plot residuals.)
|t| < 1 for two predictors! (Thus, we know we can find a
better model.)
7. Is it best to drop both nRooms and Space, or should we hold on to
at least one of them? (Consider the marginal value of both predictors.)
{Total Marginal Value} = 139.2 + 73.7 = 212.9
Since 212.9 < MS(Error) we should drop both!
[Its just a coincidence that the total marginal value of
both predictors is actually equal to MS(Error). ]
Page 94 of 156

8. Do the following formal test at the 0.05 level. (The underlying question
is: does the information in these two predictors have significant
marginal value at the 0.05 level? We would deduce from the analysis
for the last question that they do not!)

H0 : nRooms = space = 0
Test Statistic:

H1: not H0

(.+.)/

= 0.5

Critical Value: F( 2 & 18 df; = 0.05) = 3.55


Conclusion: Accept H0
Model 2
Here We Drop nRooms and space
Analysis of Variance
Source
DF
Seq SS
Regression
3 12742.3
HALFbath
1
8530.4
two-car
1
3242.6
lotsize
1
969.3
Error
20
4045.4
Lack-of-Fit 19
4045.4
Pure Error
1
0.0
Total
23 16787.7
Model Summary
S
R-sq
14.2222 75.90%

R-sq(adj)
72.29%

Coefficients
Term
Coef
Constant 113.98
HALFbath
28.73
two-car
18.59
lotsize
3.92

SE Coef
9.83
6.78
6.49
1.79

Seq MS
4247.44
8530.44
3242.62
969.25
202.27
212.92
0.00

F-Value
21.00
42.17
16.03
4.79

P-Value
0.000
0.000
0.001
0.041

R-sq(pred)
60.81%
T-Value
11.60
4.24
2.87
2.19

P-Value
0.000
0.000
0.010
0.041

VIF
1.21
1.24
1.41

Regression Equation
price = 113.98 + 28.73 HALFbath + 18.59 two-car
+ 3.92 lotsize

Page 95 of 156

9. Does the plot of residuals indicate the assumptions hold,


approximately?
My view: Approximate symmetry around Resid=0" supports
the linearity and randomness assumptions. Using a new
dependent Y*=Log(Price) may help achieve homoscedasticity.
(The Lack-of-Fit test does not provide any information.)

10. In what ways is Model 2 superior to Model 1 (if the three assumptions
hold in #9)?
Every way (simpler and better fit).
11. Is Model 2 significant at the 0.01 level?
Yes (1st p-value in Analysis of Variance refers to the
overall Regression model, and it is 0.000).
12. Which coefficients in Model 2 are significant (i.e., significantly
nonzero) at the 0.02 level?
All except lotsize (p=0.04). Be sure to refer to the
Coefficient table and NOT the Analysis of Variance Table.
Page 96 of 156

13. Interpret the coefficient of HALFbath.


Estimated average value of the extra half-bath
is 28.73 thousand dollars (holding other predictors
constant).
The above is preferred to:The average increase in price when

Halfbath increases 1 unit (holding other predictors constant).

14. Is the estimated average value of an extra half-bath greater than 10


thousand dollars at the 0.05 level (holding other predictors constant)?
H0: HALFbath = 10
Test Statistic:

H1: HALFbath > 10

Critical Value: t ( 20 df , = 0.05)= 1.7


Conclusion: Reject H0 Yes!
15. Interpret the coefficient of lotsize.
On average, price increases 3.92 thousand dollars
per 1000 square feet of lot size (ceteris paribus).
16. At the 0.1 level, does the average price of a home increase
significantly more than $500 per thousand square feet in lot size
(holding other predictors constant)?
H0: Lotsize =

Test Statistic:

0.5

H1: Lotsize > 0.5

Critical Value: t ( 20 df , = 0.1) = 1.3;


Conclusion:

Reject H0 Yes!
Page 97 of 156

17. Please give an 90% confidence interval for the difference between the
average price of a home with a two-car garage and the average price
without a two-car garage (holding other predictors constant).

btwo car

(20)
s
t 0.05

btwo car

= 18.59 + 1.725 ( 6.49 ) =

18.6 + 11.2

OR: (7.4, 29.8) thousand dollars.


(Units)

18. For the model on page 5, please test the following hypothesis at the
0.05 level.
H0: lotsize = NROOMS = space = 0

H1: not H0
Please find these
numbers on page 5.

Test Statistic:

Critical Value:

F (3 & 18 df; = 0.05 ) = 3.16

Conclusion: Accept H0 .

Note that for these types of tests on groups of predictors, the


predictors tested (e.g., the predictors lotsize, NROOMS and
space, that are referred to in H0) must be added to the model
last, so that we can use the SEQ SS values to find their
marginal value.

Page 98 of 156

10

Appendix: Here Are MINITAB Settings for Model 2 on Pages 2-3


In MINITAB 17, the Model subdialog box allows one to drop the constant. See the next page.

The first two predictors are indicator variables (a type of


categorical variable) but they may still be listed here.

This button opens


Model subdialog
box (see next page).

45 45

"Years" includes 2015


Page 99 of 156

11

In MINITAB 17, the Model subdialog box (below on the left) allows you to remove the constant term
(see the Include the constant option near the lower left corner).
The Model Subdialog Box

The buttons labelled Add in the top right corner of this dialog box allow one to add multiples or powers
of predictors listed. For example, if we highlight AFC and Head_Coach_in_SB in the top left right
margin, and hit the first Add button, we add the predictor AFCx Head_Coach_in_SB. This is
referred to as an interaction of the two predictors and allows us to estimate the incremental value (relative
to compensation) of being an AFC couch who also has been a head coach in a previous Super Bowl.

Page 100 of 156

12

To Make a Prediction
After fitting a model,
predictions are made
from a separate part of
the Regression menu
(Stat Menu Regression
Regression Predict).
Here I am requesting
predicted compensation
for a head coach who
works in the AFC, has
not been a head coach for
a team in the Super Bowl,
and has 100 team wins.
That is, the numbers 1 0
100 refer to the values
of the three predictors in
the model:
AFC =1,
Head_Coach_in_SB=0,
W_with_Team=100.

Page 101 of 156

Lecture 10: Strategies for Finding the Best Model


Outline:(Reference: Ch. 14: 14.10, pp. 565-573)
Stepwise Approach: Add (or Remove) Predictors One at a Time
Best Subsets Approach: MS(Error) versus R-sq(pred) &
A Procedure for Finding the "BEST" Model
1. Stepwise Approach
a. Basic forward stepwise analysis: at each step, add the predictor (from
among those predictors not currently in the model) that will have the
smallest p-value, once it is in the model. In this way, we are adding
the predictor that will contribute the most to SS(Regression) at each
step. Note that the squared t-ratio of an individual coefficient
measures the predictors marginal value:

()
()

and the largest t2 value corresponds to the smallest p-value.


b. Minitab's stepwise procedure allows you to set standards for removing
and adding predictors by specifying Alpha to Enter and Alpha to
Remove. At each step it will remove the predictor with the largest pvalue if

{largest p-value among predictors} > Alpha to Remove


and add the best of the predictors not currently in the model if

{p-value of best candidate predictor}< Alpha to Enter


where the best candidate predictor is the one that will have the largest
t2 value (and smallest p-value) after it has been added to the model.
Page 102 of 156

2. Best Subsets Analysis


a. One disadvantage of a stepwise procedure is that it may not find the
best group of predictors because the procedure only adds and removes
predictors one step at a time.
b. Minitab's best subsets procedure provides a summary of the best
models of each size (i.e., the best models with 1 predictor, the best
with 2 predictors, etc.). How should one choose from among these
best models? If we simply select the model that fits best in the sample
(i.e., minimizes MS(Error), or equivalently, maximizes R2(adjusted))
we tend to get a model that over-fits idiosyncrasies in the specific
sample, and this model may not be the best general model for
predicting y.
A model will tend to provide more accurate predictions if it
maximizes Predicted R-squared,

Predicted R-squared = 1
,
()
where Variance(Error*), represents the variance of error that occurs
when each observation of Y is predicted without including that
observation when estimating the model. When Predicted R-squared is
not available, another approach to finding the best predictive model is
to minimize either the criterion or the Mallows Cp criterion:
Cp =

= () +
SS(Error)

,
[ + ]

MS(Error for Model with All Predictors)

( 2),

where n= # observations, p= # parameters. All three approaches tend


to penalize models for being overly complex. By choosing the model
that optimizes these criteria (i.e., minimizes either Cp or , or

maximizes R-squared (predicted)), we tend to get the best predictive


model. Theoretically, the and Predicted R-squared criteria are more

Page 103 of 156

reliable and I would recommend maximization of Predicted R-squared


as the preeminent criterion. Each criterion has some technical
drawbacks.
[1]Cp is not reliable when the smallest Cp value is negative, or when
making comparisons between different best-subsets analyses.
[2] Predicted R-squared is not calculated for models without constant
terms. Otherwise, it is calculated by Minitab 17, but not by most
statistical programs and it is very difficult to calculate without a
special program.
[3] is not calculated by most statistical programs (including Minitab)
but is easy to calculate directly.
3. Procedure for Finding the "BEST" Model
a. If necessary, transform Y to achieve homoscedasticity. Once you
have found the best transform for Y, consider transformations of the
predictors that strengthen their linear relationship with the chosen
transform of Y. Also try to think of NEW predictors to include.
b. Search for the best model by doing a best subsets (or stepwise)
analysis and optimizing the appropriate criterion.
[1]Sometimes there are too many possible predictors to consider in one
analysis. Also, it may not be possible to consider all predictors
because they are too highly correlated. In either case, one can run a
best subsets analysis on a large subset that includes the most
promising predictors, and then do additional best subsets (or stepwise)
analyses to see what else should be added to the model to minimize
the appropriate criterion. (See Example 2.)
[2]If the model that optimizes the chosen criterion does not satisfy the
three primary assumptions (linearity, homoscedasticity, random error),
then consider using one of the runner-up models.
Page 104 of 156

Example 1: Studying Successful Products


Here we have data on the television programs that were ranked among
the top 25 (in terms of estimated number of people viewing the program
live or on the same day) during the week they were shown. The goal is
to find the best predictive model for the number of viewers among these
successful shows.
Other similar applications include studies of how sales of any type of
product or service depends on the attributes of that product or service, or
how customer share-of-wallet or customer satisfaction depend on the
attributes of a firms services or products.
Source: http://tvbythenumbers.zap2it.com (These data come from four
weeks during April through June of 2014.)
MTB > retr 'TV-Spring-2004.mtw'
MTB > info c2-c11 c19
Information on the Worksheet
Column Count Missing Name
T
T
T

C2
C3
C4

100
100
100

0
64
0

C5

100

Viewers

C6

100

Comedy

C7

100

Drama

C8

100

Talent/Variety

C9

100

NewsMagazine

C10

100

Reality

C11

100

Sports

C19

100

Period

Name of Show

Shows
Repeat_Special_Premier
Network

Live+SD (Millions)
Indicator
Indicator
Indicator
Indicator
Indicator
Indicator
Weeks 1-4

print c5 c3 c4 c6 c7 c10 c19


Row
1
2
3
4

Viewers
12.001
9.396
8.717
8.215

97
98

8.395
8.327

Repeat_Special_Premier
P
R
R
R

Network
NBC
CBS
CBS
CBS

Comedy
0
0
1
0

ABC
ABC

Drama
0
1
0
0

1
0

Reality
0
0
0
0

Period
1
1
1
1

.
0
0

0
1

.
Page 105 of 156

4
4

All of the candidate predictors in this data set are categorical


variables. The simplest categorical variables are indicator variables
(which take on values 0 or 1 only) and these can be used in a
regression model the same way as continuous predictors are.
Nevertheless, there are also three categorical variables that are not
indicators: Repeat_Special_Premier, Network and Period. By
coincidence, each of these variables represents 4 possible categories,
and can be re-expressed using 3 indicator variables. (In general, for a
categorical variable with M values, we can re-express the information
using M-1 indicator variables, and then fit those indicators in a
regression analysis the same way we would fit continuous predictors.)
In Minitab, one can create these indicator variables using the Make
Indicator Variables option in the Calc Menu.
Below I create indicators for the 4 Networks, although I will only
need to use 3 of these variables in a regression analysis.

These pictures are of Minitab 17. The


Student Version also includes this menu
item, but the dialog box does not actually
name the indicators variables for you.

I rename the first three


indicators above as ABC,
CBS, and Fox and use them
in the analysis that follows.

Similarly, I create and use indicators for Repeat, Special,


Premier, and for Period 1, Period 2, Period 3.
Page 106 of 156

MTB > corr Viewers Comedy-FOX 'Period_1'-'Period_3'

Correlations: Viewers, Comedy, Drama, Talent/Varie, NewsMagazine, Reality, ...


Viewers
-0.030
0.767

Comedy

Drama

0.025
0.803

-0.370
0.000

Talent/Varie

0.315
0.001

-0.219
0.028

NewsMagazine

-0.271
0.006

-0.159
0.114

Reality

-0.166
0.099

-0.159
0.114

between variables
and 1 is the
-0.287
-0.170
0.004
0.090
slope coefficient of a simple

Sports

-0.124
0.219

-0.065
0.523

-0.117
other.
0.248

-0.069
0.494

Premier

-0.152
0.130

-0.035
0.727

-0.015
0.882

0.045
0.656

Repeat

-0.353
0.000

0.134
0.182

0.043
0.675

-0.161
0.111

Special

-0.248
0.013

-0.017
0.870

-0.194
0.053

0.192
0.056

ABC

0.045
0.659

0.017
0.869

-0.187
0.062

0.050
0.618

CBS

0.063
0.531

0.231
0.021

0.229
0.022

-0.494
0.000

FOX

-0.203
0.043

-0.151
0.134

-0.068
0.501

0.263
0.008

Period_1

-0.488
0.000

0.046
0.649

-0.047
0.641

-0.162
0.108

Period_2

-0.088
0.384

-0.015
0.879

-0.236
0.018

0.250
0.012

Period_3

0.226
0.024

-0.015
0.879

0.141
0.160

-0.044
0.663

Comedy

Drama

Talent/Variety

Cell Contents: Correlation


P-Value

The p-value can be used for either


of the following 2-tailed tests:
H0: =0 vs. H1:0
-0.395
0.000
OR
H0: 1=0 vs. H1:10
-0.222
0.027

-0.170
0.090

where is the true correlation

regression of either variable on the

1. What would be the worst predictor for VIEWERS in a simple linear


regression? What would be the best predictor in a simple regression?
WORST: Drama
BEST: Period_1
(Next Best: Repeat)
2. What would be the R-Squared (Unadjusted) values for the two best
simple linear regressions?
R2 =

(-0.488)2

= 0.24 or

24%

2nd Best with Repeat: R2=

(-0.353)2

= 0.12 or

12%

Best with Period_1:

Page 107 of 156

7
MTB > Breg Viewers Comedy-Reality Premier-FOX 'Period_1'-'Period_3'

Best Subsets Regression: Viewers versus Comedy, Drama, ... : Response is Viewers

S represents the standard deviation of


error (i.e., the square root of MS(Error))
First two models have the
predictors and R2-values
predicted on the last page!
Vars
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14

R-Sq
23.9
12.5
30.9
29.6
40.1
38.3
46.4
43.9
48.4
47.8
50.2
50.1
52.2
52.2
54.2
54.1
55.7
55.1
56.4
56.3
57.2
56.9
57.9
57.3
58.4
58.1
58.7

R-Sq
(adj)
23.1
11.6
29.5
28.1
38.2
36.3
44.1
41.6
45.7
45.0
47.0
46.8
48.6
48.5
50.2
50.1
51.3
50.6
51.5
51.4
51.9
51.5
52.1
51.4
52.1
51.7
51.9

R-Sq
(pred)
21.3
9.9
26.7
25.1
34.5
33.9
40.9
37.9
42.2
42.0
43.2
43.3
45.4
44.9
46.5
46.8
47.5
47.0
47.4
46.8
46.7
46.5
46.6
45.7
45.6
45.9
45.0

Mallows
Cp
60.6
84.0
48.1
50.8
31.2
35.0
20.3
25.3
18.1
19.4
16.4
16.7
14.2
14.4
12.2
12.4
11.0
12.4
11.6
11.8
12.0
12.7
12.7
13.8
13.5
14.2
15.0

S
2.4982
2.6786
2.3916
2.4147
2.2393
2.2727
2.1298
2.1773
2.0995
2.1123
2.0737
2.0766
2.0418
2.0439
2.0106
2.0130
1.9875
2.0028
1.9827
1.9855
1.9765
1.9836
1.9724
1.9852
1.9711
1.9788
1.9763

C
o
m
e
d
y

D
r
a
m
a

T
a
l
e
n
t
/
V
a
r
i
e
t
y

N
e
w
s
M
a
g
a
z
i
n
e

R
e
a
l
i
t
y

P
r
e
m
i
e
r

R
e
p
e
a
t

S
p
e
c
i A C F
a B B O
l C S X

P
e
r
i
o
d
_
1
X

P
e
r
i
o
d
_
2

X
X
X
X
X
X
X
X
X

P
e
r
i
o
d
_
3

X
X X
X
X
X
X
X

X
X
X
X
X

X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X

X
X

X
X
X
X
X X
X
X
X
X X
X
X X
X
X X
X
X X
X
X X
X
X X
X
X X
X
X X
X
X X
X
X X X X
X X
X
X X X X
X X X X

X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X
X
X
X

X
X X
X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X

X
X
X
X
X

X
X
X
X
X
X
X
X
X
X
X
X
X

X
X

X X
X
X X
X
X X

3. Which model minimizes the variance of error (& maximizes R-sq(adj) )?


1st Model with 13 predictors (everything except Premier).
4. Which appears to be the best predictive and why?
1st Model with 9 predictors because it maximizes R-sq(pred).
(Note that it also minimizes Cp.)
5. Find for two best predictive models. These models include constants!
1st with 9 predictors: = . [ +
1st with 10 predictors: = . [ +

(+)

] =

] =

(+)

4.39

4.42

Page 108 of 156

Stepwise Options in Minitab 17


The Fit Regression Model Dialog box below
(accessed from the Stat Menu Regression
Regression) includes Model and Stepwise subdialogs that allow one to create additional predictors
(including interactions) and to then use these in a
stepwise analysis.

One can create interactions or powers of variables,


selected on the left, to be included in the model.

In the Student Version of Minitab, the Stepwise


dialog is a primary and separate option listed in the
Regression menu (Stat RegressionStepwise).

Page 109 of 156

Backward Elimination of Terms


Candidate terms: Comedy, Drama, Talent/Variety, NewsMagazine, Reality, Premier, Repeat,
Special, ABC, CBS, FOX, Period_1, Period_2, Period_3, Talent/Variety*CBS,
NewsMagazine*CBS, Reality*CBS

Constant
Comedy
Drama
Talent/Variety
NewsMagazine
Reality
Premier
Repeat
Special
ABC
CBS
FOX
Period_1
Period_2
Period_3
NewsMagazine*CBS
Reality*CBS

-----Step
Coef
10.72
-2.40
-1.92
1.61
-3.00
-3.83
0.659
-1.785
-1.272
1.390
2.312
-2.499
-2.095
-1.131
-0.611
-0.36
0.50

S
R-sq
R-sq(adj)
R-sq(pred)

1---P
0.060
0.101
0.202
0.038
0.014
0.470
0.031
0.079
0.070
0.005
0.005
0.020
0.105
0.286
0.799
0.729

-----Step
Coef
10.71
-2.35
-1.87
1.61
-3.18
-3.83
0.677
-1.825
-1.278
1.380
2.252
-2.530
-2.055
-1.112
-0.609
0.56

1.99730
58.78%
50.83%
43.56%

2---P
0.061
0.103
0.198
0.011
0.013
0.455
0.023
0.075
0.070
0.004
0.004
0.019
0.107
0.286

-----Step
Coef
10.68
-2.33
-1.86
1.65
-3.15
-3.53
0.662
-1.866
-1.272
1.310
2.314
-2.530
-2.083
-1.087
-0.622

3---P
0.061
0.103
0.183
0.011
0.008
0.462
0.019
0.075
0.075
0.003
0.003
0.017
0.112
0.273

-----Step
Coef
10.55
-2.08
-1.63
1.86
-2.92
-3.27

4---P
0.081
0.136
0.124
0.015
0.010

-1.992
-1.434
1.187
2.194
-2.498
-1.849
-1.006
-0.591

0.010
0.035
0.096
0.003
0.004
0.022
0.135
0.294

0.692
1.98616
58.75%
51.38%
44.60%

1.97630
58.67%
51.86%
44.99%

1.97108
58.40%
52.11%
45.63%

********************************************************************************************
Constant
Comedy
Drama
Talent/Variety
NewsMagazine
Reality
Premier
Repeat
Special
ABC
CBS
FOX
Period_1
Period_2
Period_3
NewsMagazine*CBS
Reality*CBS
S
R-sq
R-sq(adj)
R-sq(pred)

-----Step
Coef
10.18
-2.02
-1.57
1.93
-2.86
-3.18

5---P
0.090
0.152
0.110
0.017
0.013

-1.997
-1.439
1.201
2.203
-2.511
-1.548
-0.704

0.010
0.035
0.093
0.003
0.004
0.040
0.245

1.97237
57.86%
52.05%
46.60%

-----Step
Coef
9.76
-2.02
-1.46
2.04
-2.94
-3.20
-2.379
-1.691
1.324
2.495
-2.578
-1.061

6---P

-----Step 8---Coef
P
8.237

0.091
0.182
0.092
0.014
0.012

-----Step 7---Coef
P
8.322
-0.641
0.279
3.372
-1.677
-1.827

0.000
0.019
0.013

3.495
-1.487
-1.610

0.000
0.032
0.022

0.001
0.009
0.062
0.000
0.003
0.087

-2.388
-1.695
1.393
2.566
-2.239
-1.069

0.001
0.009
0.050
0.000
0.006
0.086

-2.407
-1.738
1.267
2.460
-2.203
-1.090

0.001
0.008
0.071
0.000
0.007
0.081

1.97649
57.20%
51.85%
46.70%

1.98546
56.32%
51.41%
46.81%

1.98752
55.74%
51.31%
47.48%

The model in Step 8 (with 9 predictors) is the best predictive model on page 8.
The analysis continues for two more steps but R-sq(pred) decreases. Here is
the summary for those two steps:
S
R-sq
R-sq(adj)
R-sq(pred)

-----Step 9-------Step 10---2.01059


2.04180
54.20%
52.25%
50.18%
48.62%
46.52%
45.41%

Page 110 of 156

10

The Best Predictive Model (pages 7 & 9)


Analysis of Variance
Source
DF
Regression
9

. .

Error
Lack-of-Fit
Pure Error
Total
Model Summary
S
R-sq
1.98752 55.74%
Coefficients
Term
Constant
Talent/Variety
NewsMagazine
Reality
Repeat
Special
ABC
CBS
FOX
Period_1

Adj SS
447.70

.
90
24
66
99

Adj MS
49.744

.
355.52
81.99
273.53
803.22

R-sq(adj)
51.31%
Coef
8.237
3.495
-1.487
-1.610
-2.407
-1.738
1.267
2.460
-2.203
-1.090

F-Value
12.59

3.950
3.416
4.144

P-Value
0.000

.
0.82

.
0.694

R-sq(pred)
47.48%

SE Coef
0.628
0.656
0.683
0.690
0.691
0.638
0.693
0.681
0.803
0.617

T-Value
13.12
5.32
-2.18
-2.33
-3.48
-2.73
1.83
3.61
-2.74
-1.77

P-Value
0.000
0.000
0.032
0.022
0.001
0.008
0.071
0.000
0.007
0.081

VIF
1.68
1.16
1.18
1.79
1.16
2.08
2.93
1.47
1.81

Regression Equation
Viewers = 8.237 + 3.495 Talent/Variety - 1.487 NewsMagazine - 1.610 Reality - 2.407 Repeat
- 1.738 Special + 1.267 ABC + 2.460 CBS - 2.203 FOX - 1.090 Period_1

6. Do the residuals support assumptions?


Linearity and randomness? Based on the symmetry of errors
around the line where residuals=0: Marginal support (lowest
and highest predictions are underestimates).

Homoscedasticity? The Lack-of-Fit test indicates that


there are no serious problems (p-value =0.694 > 0.2).
Note that the range of errors is greater at some fit levels
just because of the larger number of observations at that
level. The IQR is proportional to standard deviation (not
Page 111 of 156

11

the range), thus its better to try to visualize if the IQR


is approximately constant at different levels of fit.
7. If there were a problem with linearity or randomness, what action should
we take?
Try the runner-up model: this is the 1st Model with 10
predictors on page 7. The stepwise procedure does not find
this model.
8. If there were a problem with homoscedasticity, what remedial action might
help?
Transform Y & rerun the best-subsets analysis. For
example, we could use Log(Y) as the new dependent variable
if we are convinced that the variance of the residuals
increases with fit level (I do not think this happens here).
9. Does the model achieve overall significance at the 0.01 level?
Yes (p-value on Regression line of the Analysis of Variance
Table is 0.000).
10. Which coefficients are significantly positive at the 0.01 level? (Except for
the constant, this is equivalent to asking: what attributes of a program
significantly increase viewership at the 0.01 level?)
Only:

the constant, Talent/Variety, CBS

(each is positive with [p-value/2] <0.01).


11. How many viewers are lost on average when a program is rebroadcasted
holding other predictors constant? (In the data set a rebroadcasted
program is designated as a Repeat.) Please give an 80% confidence
interval for this average.
2.407 million viewers are lost on average.
-2.407 + t( 90 df,=0.10) * 0.691 = (-1.5, -3.3) million.
1.3

Page 112 of 156

12

Example 2: Same Example (with Less Prior Deliberation)


On page 5, we discussed how it was necessary to re-express three of the more
complex categorical variables (Repeat_Special_Premier, Network and
Period) as indicator variables. What would have happened if I then tried to
use too many of the resulting indicator variables in the subsequent analysis?
(Recall that I only need to use 3 indicators for each of the 4-category
variables.) Below I run Breg (best subsets regression) and include 3 of the
unnecessary indicators (see the 3 highlighted variables below).
MTB > BReg 'Viewers' 'Comedy'-'Reality' 'Premier' 'Repeat' 'Special'
CONT>
'Not_Repeat_Special_Premier' 'ABC' 'CBS' 'FOX' 'NBC'
&
CONT>
'Period_1'-'Period_4' .

&

* ERROR * Predictor columns are highly correlated. Use REGR command


to find correlated variables.
* ERROR * Completion of computation impossible.

The inclusion of any one of the 3 highlighted variables above will make it
impossible to do the best subsets analysis. If these variables had already been
created, I might not realize that they described mutually exclusive categories
(and, consequently, that the inclusion of the highlighted variables was
redundant information). Whenever I get the error message above, I

can determine how to proceed by trying to use all of the candidate


predictors in a regression analysis. When I do this regression analysis, I
get the following output:
Regression Analysis:
Viewers versus {Same Predictors Used Above with Breg Command}
The following terms cannot be estimated and were removed:
Not_Repeat_Special_Premier, NBC, Period_4 .

[This message is then followed by a summary of the regression model


that uses all predictors except the 3 predictors listed above.]
If we exclude these 3 candidate predictors and do the best subsets regression
on all other predictors, we have the analysis on page 7, which provides us with
the best predictive model.
Page 113 of 156

13

Other Criteria
(A Vocabulary for Conversations with Other Modelers)
1. Sometimes when we compare models, the sample size changes (perhaps
because of missing values). But if the sample size, n, is not changing, it is
easier to calculate and minimize Sp (rather than ) :

(Error)
=
=
.
[( ( + 1)]
1
2. Another approach to finding the best predictive model is to select the
model that minimizes the Akaike Information Criterion (AIC),
AIC = (

(SS(Error)
n

)+

log ( ).

When the sample size is sufficiently large, AIC is minimized whenever the
criterion is minimized (and R-sq(pred) is maximized), so that one will
generally select the same model using any one of these three criteria.
3. Sometimes the objective is not to find the best predictive model, but rather
to find the model that comes closest to describing the true relationship
between Y and a set of predictors. The criteria, that are used in this case,
incorporate a more severe penalty for model complexity. One approach to
finding the true scientific model is to minimize the Bayesian Information
Criterion (BIC),
BIC = (

(SS(Error)
)
n

()

For large sample sizes, this is equivalent to minimizing,


p [log(n)1]
]}.
n

BIC* = () { + [

If you compare BIC* with , you can see that BIC* imposes a greater
penalty for model complexity!
Bruce Cooil, 2016

Page 114 of 156

14

Partial List of Statisticians and Mathematicians Who Discovered


Some of the Tools Used Here and in Earlier Lectures
Cp: Colin Mallows
Sp: Leo Breiman and David Freedman
AIC: Hirotugu Akaike
BIC: Gideon Schwarz and Hirotugu Akaike (independently)
Predicted R-square (and the optimality of cross-validation): Seymour
Geisser, Mervyn Stone, David Allen (independently)
Central Limit Theorem: Pierre-Simon Laplace and Simeon Poisson made
seminal contributions. A good reference on origins is:
http://salserver.org.aalto.fi/vanhat_sivut/Opinnot/Mat-2.4108/pdffiles/emet03.pdf.
t-distribution: William Gosset
Stem and Leaf Display and Boxplot: John Tukey

For Those Who Are Reading the Text


One of the most relevant sections in chapter 14 runs from page 568 (from the
paragraph subtitled Comparing regression models) through page 571
(excluding the last paragraph of 571). Please note that the formula for Cp on page
570 assumes the model includes k predictors and a constant. In general:

Cp =

SS(Error)
MS(Error for Model with All Predictors)

( 2) ,

where n= # observations, p= # parameters (i.e., the number of predictors, plus one


if there is a constant in the model), and SS(Error) is the sum of squared error for
the model being considered.
[Please note that the text refers to MS(Error for Model with All Predictors)
as ". This is NOT the square of either or , the two model
selection criteria defined in Lecture 10.]
Page 115 of 156

Lecture 11
1-Way Analysis of Variance (ANOVA)
As a Type of Multiple Regression Analysis
Outline
Reference: Ch. 14: 14.8(again); Ch. 11: 11.1-11.2 (for main ideas only)

Purpose of 1-Way ANOVA


Example: How Its Done as a Regression
Meaning of the Coefficients
Purpose of the Overall F-test and Coefficient T-tests for this Type
of Application.
Second Example

1. Purpose: To study how the average value of y differs across two or


more populations (or groups). (This is a generalization of the Case 5
test on 2 sample means.)
2. Example: An investor is studying the 10-year returns of the highest
ranked mutual funds in three different categories. [These funds are
chosen because they have high U.S. News Scores which are based
on a composite of ratings of some of the mutual fund industrys best
known analysts. This type of information is available from
http://www.usnews.com/fundshttp://money.usnews.com/funds.]
She looks at four funds in each of three categories and treats this data
set as though it were a random sample of the past returns of the funds
that represent the best opportunities in each category. Categories are
based on the size of the firms in which each fund invests. A question
of particular interest: among the best rated funds, does the average
trailing 10-year return vary by the size of firms that make up the
portfolio? (For example, one might expect portfolios of smaller firms
to get higher average returns, but with higher variance.)

Page 116 of 156

10-Year Returns (Annualized in %) of Funds By Capitalization Level


Fund Type:

Large-Cap
Mid-Cap
Small-Cap
Blend
Blend
Blend
14.1
6.8
10.7
6.1
11.4
6.9
Blend
Blend
3.3
10.2
6.4
7.3
10.4
8.5
7.70
9.70
8.125
Mean
4.58
2.00
1.94
Std. Dev.
This analysis does not actually require the same number of observations
per category.
A simple (and elegant way) of summarizing the average differences in
return by group is to perform a multiple regression where:
Y is the return, and
Two indicator variables are used as predictor variables to denote
group membership. (Note: for k groups, only k-1 indicators are
needed.)
If we are primarily interested in contrasting large-cap funds with the
others, the above would be organized as follows for the multiple
regression analysis.

Fund
1
2
3
4
5
6
7
8
9
10
11
12

X1

X2

Return
14.1
6.1
3.3
7.3
6.8
11.4
10.2
10.4
10.7
6.9
6.4
8.5

Mid-Cap
0
0
0
0
1
1
1
1
0
0
0
0

Small-Cap
0
0
0
0
0
0
0
0
1
1
1
1
Page 117 of 156

Regression Analysis: Return versus Mid-Cap, Small-Cap


Analysis of Variance (skeletal version without predictors)

Source
Regression
Error
Total

DF
2
9
11

Model Summary
S
3.09709

R-sq
9.33%

Coefficients
Term
Constant
Mid-Cap
Small-Cap

Adj SS
8.8817
86.3275
95.2092

Adj MS
4.4408
9.5919

R-sq(adj)
0.00%

Coef SE Coef
7.70
1.55
2.00
2.19
0.425
2.19

F-Value
0.46

P-Value
0.644

R-sq(pred)
0.00%

T-Value
4.97
0.91
0.19

P-Value
0.001
0.385
0.850

VIF
1.33
1.33

Regression Equation

Return = 7.70 + 2.00 Mid-Cap + 0.425 Small-Cap

3. Meaning of the Coefficients

Recall that the three group means are:


Large-Cap
Mid-Cap
Means:
7.70
9.70

Small-Cap
8.125

From these means, how could we have deduced that the regression
equation would have been:
Return = 7.70 + 2.00 Mid-Cap + 0.425 Small-Cap ?
To answer this question, first note the Return values that this model
predicts for each of the 3 types of funds.
Whats the predicted value of Return for Large-Cap?
(It will be the sample mean for Large-Cap.")
Predictor values in this case are:

Mid-Cap = 0

, Small-Cap = 0

Predicted Return value:

Return = 7.70 + 2.00 ( 0 ) + 0.425 ( 0 ) = 7.70

Page 118 of 156

Whats the predicted return for Mid-Cap funds? (It will be the
sample mean return for Mid-Cap funds.)
Predictor values in this case are:

Mid-Cap = 1 , Small-Cap =

Predicted Return value:

Return = 7.70 + 2.00 ( 1 ) + 0.425 ( 0 ) = 9.70


Whats the predicted return for Small-Cap funds? (It will be the
sample mean return for Small-Cap funds.)
Predictor values in this case are:

Mid-Cap = 0

, Small-Cap =

Predicted Return value:

Return = 7.70 + 2.00 ( 0 ) + 0.425 ( 1 ) = 8.125


In this regression equation, note how the constant represents the mean
for Large-Cap funds, the group not represented by an indicator
variable, while the other coefficients represent the difference between
each of the other fund types and the mean for Large-Cap funds.
For example, the coefficient for Mid-Cap is:

2.00

= {Mid-Cap Mean} - {Large-Cap Mean}


=

9.70

7.70

And the coefficient for Small-Cap is:

0.425

= {Small-Cap Mean} - {Large-Cap Mean}


=

8.125

7.70
Page 119 of 156

4.

Purpose of the Overall F-test and Coefficient T-tests in This


Type of Application

Before doing any testing we should check the residuals.


Versus Fits
(response is Return)
7.5

Residual

5.0

2.5

0.0

-2.5

-5.0
8.0

8.5

9.0
Fitted Value

9.5

10.0

a.

The overall test for model significance (the F-test) is provided


in the skeletal ANOVA table (reproduced here for
convenience).
Analysis of Variance

Source
Regression
Error
Total

DF
2
9
11

Adj SS
8.8817
86.3275
95.2092

Adj MS
4.4408
9.5919

F-Value
0.46

P-Value
0.644

This information is used to test overall model significance, which is


typically stated as follows.
H0: Mid-Cap = Small-Cap =0

H1: not H0.

But in this application, it is equivalent to testing whether the group


means (the average return for each type of mutual fund) are equal:
H0: Large-Cap= Mid-Cap = Small-Cap

H1: not H0.

Page 120 of 156

Assume = 0.01. For this test, please specify:


P-value:

0.644

Conclusion:

(p-value of F)

Accept H0

(Average returns do not differ significantly.)


If the p-value was not available, we would use:
Test Statistic:

F = 0.46

Critical Value: F( 2 & 9


b.

df; = 0.01) = 8.02

The coefficient t-ratios allow us to determine whether there is a


significant difference between the average returns of each group,
by itself, relative to the "Large-Cap" group (because this is the
group represented by the constant). (The coefficient table is
reproduced here for convenience.)

Term
Constant
Mid-Cap
Small-Cap

Coef
7.70
2.00
0.43

SE Coef
1.55
2.19
2.19

T-Value
4.97
0.91
0.19

P-Value
0.001
0.385
0.850

VIF
1.33
1.33

Which funds have significantly higher average returns than "Large-Cap"


funds at the 0.2 level?

For Mid-Cap funds:

For Small-Cap funds:

H0: Mid-Cap = 0

H0: Small-Cap = 0

H1: Mid-Cap

H1: Small-Cap > 0

>0

P-value: 0.385/2=0.1925

P-value: 0.850/2 = 0.425

Conclusion:

Conclusion: Accept H0

Reject H0

Page 121 of 156

On the next few pages we will study another example. Heres a


summary of it.
a.

Y is network share of a Super Bowl broadcast.

b.

There are 4 groups being studied: Super Bowl games played in


Miami, games played in LA/Pasadena, games played in New
Orleans, and games played somewhere else.

c.

Y is regressed on 3 indicators: MIAMI, LApsdna, and Norleans.


The regression equation of the form:

^ = b0 + bMIAMI MIAMI + bLapsdna LApsdna + bNorleans Norleans.


Share
The predictors variables simply tell us where the game was played: if it
was played somewhere besides Miami, LA/Pasadena, or New Orleans,
then three indicators are each "0", if it was played in Miami then
MIAMI=1 and the other indicators are "0", and similarly for games
played in LA/Pasadena or New Orleans. The prediction equation that
minimizes the sum of squared errors will simply predict the average
share for the group of games played at a given location as indicated by
the values of the three indicator variables. Thus:
b0

average share for location 4 games


(location 4 represents all locations other than Miami,
LA/Pasadena, or New Orleans)

bMIAMI

average share for MIAMI games


- average share for location 4

bLApsdna

average share for LA/Pasadena games


- average share for location 4

bNorleans

average share for New Orleans games


- average share for location 4

Page 122 of 156

Example 2:
Analysis of Network Share By Location of the First 36 Super Bowls
(Imagine This is the Entire Sample)
Data Display

Row

year

share

site

MIAMI

LApsdna

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002

79.0
68.0
71.0
69.0
75.0
74.0
72.0
73.0
72.0
78.0
73.0
67.0
74.0
67.0
63.0
73.0
69.0
71.0
63.0
70.0
66.0
62.0
68.0
63.0
63.0
61.0
66.0
66.0
63.0
72.0
65.0
67.0
61.0
68.5
60.0
61.0

2
1
1
3
1
3
2
4
3
1
2
3
1
2
3
4
2
4
4
3
2
4
1
3
4
4
2
4
1
4
3
4
4
4
4
3

0
1
1
0
1
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0

1
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
1
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0

Norleans Loc. 4
0
0
0
1
0
1
0
0
1
0
0
1
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
1

0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
1
1
0
0
1
0
0
1
1
0
1
0
1
0
1
1
1
1
0

Page 123 of 156

1-Way ANOVA as Regression

(Minitab 17: Stat Regression RegressionFit Regression Model)


If you accidently regressed Share on all four indicator variables, all
versions of Minitab will tell you that the fourth variable is redundant and
it is removed from the model automatically!!

Analysis of
Source
Regression
MIAMI
LApsdna
Norleans
Error
Total

Variance
DF
Seq SS
3 148.323
1
70.444
1
73.389
1
4.490
32 736.087
35 884.410

Seq MS F-Value P-Value


49.441
2.15
0.113
70.444
3.06
0.090
73.389
3.19
0.084
4.490
0.20
0.662
23.003
To obtain the Sequential SS values above, click on
Options within the regression dialog box, and then
choose Sequential (Type 1) on the 4th line of the
Options subdialog box.

Model Summary
S
R-sq
4.79611 16.77%

R-sq(adj)
8.97%

Coefficients
Term
Coef
Constant 66.19
MIAMI
4.81
LApsdna
4.09
Norleans
0.92

SE Coef
1.33
2.25
2.25
2.08

R-sq(pred)
0.00%

T-Value
49.76
2.14
1.82
0.44

P-Value
0.000
0.040
0.078
0.662

VIF
1.24
1.24
1.27

Regression Equation
share= 66.2 + 4.81 MIAMI + 4.09 LApsdna + 0.92 Norleans

Assume the basic assumptions of regression hold.


1.
Are the mean share values at each location significantly
different at the 0.2 level?
H0:LOCATION 4=MIAMI=LApsdna=Norleans H1: not H0.
This is equivalent to testing whether the coefficients are
significantly nonzero at the 0.2 level.
H0:MIAMI=LApsdna=Norleans=0
P-value: 0.113

H1: not H0.

Conclusion: Reject H0 (p-value < )

Page 124 of 156

10

2.

Which location has a significantly higher average share than


"location 4" at the 0.03 level?
Note: In each case we are testing H0:=0 vs. H1:> 0, so whenever
the estimated coefficient is positive, just divide the p-value by 2.
Answer to #2: Only Miami, where the p-value for the 1-sided
test is 0.04/2 < 0.03 .

Standard Formulation of 1-Way ANOVA


(From the menus: Stat ANOVA One-way; In the dialog box:
share is the response, & site is the factor)
One-way ANOVA: share versus site
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
site
3
148.3
49.44
2.15
0.113
Error
32
736.1
23.00
Total
35
884.4
Note how site is referred to as Regression in

Means
site
N
1
7
2
7
3
9
4
13

Analysis of Variance Table near the top of last page.

Mean
71.00
70.29
67.11
66.19

StDev
5.10
4.75
4.46
4.88

95%
(67.31,
(66.59,
(63.85,
(63.48,

Pooled StDev = 4.79611 =

CI
74.69)
73.98)
70.37)
68.90)

MS ( Error )

[(7 1)510
. 2 ] [(7 1)4.752 ] [ (9 1)4.462 ] [ (13 1)4.882 ]

(7 1) (7 1) (9 1) (13 1)
The pooled standard deviation (above) is calculated the same way as in
the Case 5 test for two means and this is an illustration of how the
homoscedasticity assumption is used in regression models.

One problem with this analysis: SHARE tends to decline with time, and
games played at "Location 4" (i.e., other locations) tend to be more
common in later years, relative to games played at the first three
locations (especially MIAMI). To correct for the approximate linear
decline in SHARE over the years, I add YEARS as a predictor.
Page 125 of 156

11

Analysis of Covariance
Now we regress 'share' on 4 predictors:
year-1967, MIAMI, LApsdna, Norleans.
Analysis of Variance
Source
DF
Seq SS
Regression
4 447.891
year-1967
1 422.879
MIAMI
LApsdna
Norleans
Error
Total

1
1
1
31
35

Model Summary
S
R-sq
3.75250 50.64%

Seq MS
111.973
422.879

F-Value
7.95
30.03

P-Value
0.000
0.000

5.372
8.260
11.382
14.081

0.38
0.59
0.81

0.541
0.450
0.376

5.372
8.260
11.382
436.518
884.410

Sum of last three sequential SS values


= 447.891 422.879

R-sq(adj)
44.27%

Coefficients
Term
Coef
Constant
73.95
year-1967 -0.3221
MIAMI
0.64
LApsdna
0.53
Norleans
-1.54

SE Coef
1.98
0.0698
1.98
1.92
1.71

R-sq(pred)
35.35%

T-Value
37.40
-4.61
0.32
0.27
-0.90

P-Value
0.000
0.000
0.748
0.786
0.376

VIF
1.35
1.57
1.48
1.41

Regression Equation
share = 73.95 - 0.3221 year-1967 + 0.64 MIAMI
+ 0.53 LApsdna - 1.54 Norleans

1.

Now test for mean differences among locations after correcting


for YEAR. Are the adjusted means significantly different at
the 0.2 level?
This is equivalent to testing whether just the coefficients for
location are significantly nonzero at the 0.2 level, i.e., we would
write H0 and H1 as:
Page 126 of 156

12

H0:MIAMI=LApsdna=Norleans=0
H1: not H0.
(Note: This is a subset of the coefficients in the current model!!)
Equivalent numerator: [5.372 +8.260 +11.382]/3

F-statistic:

[447 .891 422 .879 ] / 3


F
0.59
14.081

Critical Value: F( 3 & 31 df; =0.2) = 1.64


Conclusion:
Accept H0
Contrast this result with the one on the bottom of p. 9 !!
2.

Which location has a significantly different share from that of


"location 4" (at any reasonable significance level, i.e., < 0.2)?
For each coefficient test H0: =0 vs. H1: 0.
In each case: Accept H0
(p > 0.2 for each).
No Location is significantly different from location 4 at 0.2 level.

The last regression model is sometimes called an analysis of


covariance because we are using YEAR as a covariate (or additional
predictor) in order to remove (or exclude) its effects. Recall that this
is one of the four main purposes of modeling (Lecture 6, page 1,
purpose #3). Of course there are many useful applications for this type
of analysis. Here are two examples:
1. Discrimination cases:
Y: Salary
Groups: Gender or Demographic Groups
Covariates: Experience, Education, Past performance;
2. Improving Customer Satisfaction or Sales
Y: Sales Performance or Customer Satisfaction
Groups: Brands or Sales Approach
Covariates: Customer Attributes (age, sex, region).
Page 127 of 156

13

Using MINITAB to Look Up F Critical Values: Here Im Finding F(=0.2; 3 & 31 df)

MTB > InvCDF 0.8;


SUBC> F 3 31.
Inverse Cumulative Distribution Function
F distribution with 3 DF in numerator and
31 DF in denominator
P( X <= x)
0.8000

x
1.6413
Bruce Cooil, 2016
Page 128 of 156

Lecture 12: Chi-Square Tests for Goodness-of-Fit and Independence


Summary (Reference: Ch 12: 12.1-12.2: pp. 440-447, 450-454)
Chi-square ( 2 ) tests can be used to test whether:
1. Data fit a theoretical distribution (the 2 goodness-of-fit test);
2. Two categorical variables are independent (the 2 test for independence).
1. Goodness-of-Fit Test (Ch. 12.1: pp. 440-447)
a. Purpose: This test provides a way of testing whether two or more proportions
take on specified values. (This is a generalization of the Case 3 test on a
single proportion.)
b. Example: Suppose we want to test whether the market shares of three
cigarette brands (A, B, C) have remained at previous levels after the
introduction of a new ad campaign by brand A. Then:
H0: pA = 0.2, pB=0.1, pC=0.3, pOther=0.4
H1: not H0. (Let = 0.1.)
We interview a random sample of 200 smokers and obtain the following
observed and expected counts for each brand.
Brand A Brand B Brand C Other Brands
Oi (Observed)
50
30
40
80
Ei (Expected)
40
20
60
80
Note that the expected counts, Ei, are calculated as 200*pi (from H0 ):

40 = 200* 0.2 ; 20 = 200* 0.1 ; 60 = 200 * 0.3 ; 80 = 200 * 0.4.


c. Mechanics of the Test: We reject H0 in favor of H1 if the observed are
sufficiently different from the expected counts. To gauge how different the
observed and expected counts are, we use the chi-square statistic:
2 = all categories [(Observed Count) - (Expected Count)]2/[Expected Count].

Page 129 of 156

If H0 is true, 2 should behave approximately like a sum of k-1 squared


independent standard normal random variables, where k is the number of
categories (k=4 in this example). Formally we reject H0 in favor of H1 if:
2 > 2 (k-1 d.f.) (i.e., the chi-square critical value for k-1 degrees of freedom).

d. In This Example:
Test Statistic:

2 =

(5040)2
40

(3020)2
20

(4060)2
60

(8080)2
80

= 14.2

Critical Value: 2 0.1(3 d.f.) 6.25 (Table A.5, p. 610, Appendix A).
Conclusion: Reject H0.
There has been a change in market shares at the 0.1 level.

e. Assumption: As in the Case 3 and Case 6 tests on proportions, all expected


counts should be at least 5, to justify the normal approximation.

2.

Chi-Square Test for Independence (12.2, pp. 450-454)

a. Purpose: This test provides a way of determining whether two discrete


variables are independent. (When both of the variables have only 2 possible
values or categories, this is the two-sided version of the Case 6 test for the
equality of two proportions. When either variable consists of 3 or more
categories, this test generalizes the Case 6 test, just as 1-Way ANOVA
generalizes the Case 5 test for two means.)

b. Example: Suppose we want to test whether age (as measured by 3 age groups)
and soap brand preference are independent (=0.1):
H0: Age group and brand preference are independent
H1: Age group and brand preference are dependent.
Page 130 of 156

We obtain the following data after interviewing 100 people.


Age Group

Brand Preferred
A

Other

Row Total(Ri)

< 25

10
(6)

5
(9)

15
(15)

30

25 to 40

5
(6)

10
(9)

15
(15)

30

> 40

5
(8)

15
(12)

20
(20)

40

Column Total(Cj)
20
30
50
100
This table shows the number of people in each age and brand- preference
category. The numbers in parenthesis represent the estimated expected cell
counts, Ei j, that we would have if age and brand-preference were independent:
Ei j = total*(prob.of being in age group)*(prob.of preferring that brand)
= n*(Ri/n)*(Cj/n) = (RiCj)/n.
For example:
6 = ( 30 * 20 )/100 , 9 = ( 30 * 30 )/100 , ..., 20 = ( 40 * 50 )/100.
c. Mechanics of the Test: We reject H0 in favor of H1 if the observed and
expected counts are sufficiently different. As before, we use the chi-square
statistic:
2 = all cells [(Observed Count) - (Expected Count)]2/[Expected Count].
If H0 is true, 2 should behave approximately like a sum of (r1)*(c1) squared
independent standard normal random variables. Formally we reject H0 in favor of
H1 if:
2 > 2 ( with(r-1)*(c-1) d.f.).
d. Test Statistic in this case:

2 =

(106)2
6

(59)2
9

(56)2
6

(109)2
9

(58)2
8

(1512)2
12

= 6.6

Critical Value: 20.1(4 d.f.) 7.8 (Table A.5, p. 610, Appendix A).
Conclusion: Accept H0.
Age and Brand preference are not significantly dependent.
e. Assumption: All expected counts should be at least 5.
Page 131 of 156

Using MINITAB in the 2nd Example

The same analysis can


be done using either of
the two menu options
below and each will
take either raw data or
a summary table.

This is just the table at the top of


the last page,which summarizes the
responses of the 100 subjects by
brand and age category.

Here the 100 subjects are listed by row;


c15 and c16 identify the brand group and
age group, respectively, of each subject.

Using either approach above, the analysis provided is:


Pearson Chi-Square = 6.597, DF = 4, P-Value = 0.159

So we accept independence (Ho) at the 0.1 level


because the p-value > 0.1 .
But we would reject Ho if we had tested at the
0.2 level, because the p-value < 0.2 .
Page 132 of 156

Lecture 13
Executive Summary of Managerial Statistics (Mgt. 6381)
& Notes for the Final Quiz
1.

Except for a possible question requiring a chi-square test, the final


will consist primarily of questions about regression modeling since
this provides a general conceptual framework for talking about
everything weve done (including probability distributions and
hypothesis testing).

2.

Problem Sets 4-5 & the final posted in Blackboard provide


examples of the types of questions to expect.

3.

Regarding regression, here are some BIG PICTURE reminders:


a.

T-tests are used for tests on individual coefficients, and Ftests are used when testing groups of coefficients.

b.

T-ratios simply measure the distance, in standard error


units (or SEs), between an estimate (e.g. a coefficient
estimate or mean) and some benchmark constant of interest.
The coefficient table gives the appropriate t-ratios, and
corresponding 2-tailed p-values, that allow us to determine
whether the coefficients are significantly different from zero
(which is a constant benchmark of particular interest).

Page 133 of 156

c.

The t-ratio formed by dividing a coefficient estimate by its


standard error, which measures that coefficient estimates
distance from zero (in SEs), also measures the marginal
value of the predictor. When squared, it has the same
interpretation as an F-ratio. F-ratios used for tests on
coefficients always have the following interpretation:

where the marginal value in the numerator is measured in


terms of the average contribution of these predictors to
SS(Regression) when they are added last.
d.

The Predicted R-squared represents the estimated adjusted


R-squared that the model will achieve when used with new
data and the model that maximizes Predicted R-squared is a
good choice as the model that will provide the most accurate
predictions outside the sample. An alternative approach is to
select the model that minimizes Sp* (or Cp or AIC). Sp*
represents an adjusted MS(Error).
= () [ + (

e.

)] ,
[ + ]

One-Way analysis of variance can be represented as a


multiple regression model that reports group means as its
only predictions.

Page 134 of 156

Outline of Managerial Statistics


I.

Descriptive Statistics (Lecture 1)


A.

Measures of Location: mean, median, trimmed mean

B.

Measures of Dispersion: variance, standard deviation, interquartile


range

C.

Graphical Displays: stem and leaf display, boxplots

D.

Regression provides a descriptive summary of how two variables are


related

II.

The Central Limit Theorem (Lecture 2)


Means have approximately normal distributions (under very general
conditions)

III.

Seven Cases for Tests on Means and Proportions and Confidence Intervals
(Lectures 3-5 & The Outline)

IV. Regression (Lectures 6-11)


A.

Purposes
1.

Prediction of Y

2.

Relationship: To study the nature of the relationship between Y


and the predictors (e.g., the precise value of the coefficients)

3.

Exclusion: to study Y while controlling for the effects of other


variables (see 2nd example of Lecture 11 regarding Super Bowl
locations)

4.

Descriptive Summary

Page 135 of 156

IV. Regression
B.

(Continued)

Checking Assumptions
Using the plot of standardized residuals versus fit to check linearity,
homoscedasticity, and residual randomness.

C.

Interpretation of Coefficients

D.

Decomposition of SS(Total) and basic measures of fit


1.

SS(Total) = SS(Regression) + SS(Error)

2.

MS(Error) (This is the variance of the residuals.)

3.

R2(Adjusted) (The proportion of variance of Y explained by


the model.)

4.

The correlation coefficient (The square of the correlation is


the unadjusted R2 of a simple linear regression.)

E.

Significance Testing (see the Outline of Methods for Regression, and


examples in Lectures 8-11, &13)
1.

Lack-of-Fit test

2.

Overall test for significance (F-test)

3.

Tests and confidence intervals for individual coefficient (ttests)

4.

Tests on subsets of coefficients (F-test)

5.

The t-ratio is a measure of marginal value:

=
F.

()
()

Confidence Intervals (CIs) and Prediction Intervals (PIs) for Y (see


item #6 in Outline of Methods for Regression and Lecture 7, pp. 1013)
1.

SE Fit is an ingredient of each interval and it is a function of


both the error in the coefficient estimates and the values of
predictors. The PI is always wider than the CI because the PI
4
Page 136 of 156

also depends on MS(Error).


2.

Example of a confidence interval for y (at specific predictor


values X):
(..)

80% C.I.: 0.10


3.

Example of a prediction interval for y (at specific predictor


values X):
(..)

95% . .: 0.025
G.

[ ]

[( )2 + ()]

Selecting the "Best" Model (Lecture 10)


1.

A good criterion for finding the best predictive model (i.e., the
model that will minimize mean squared error when used to
predict y using new data): maximize Predicted Rsquare:
Predicted R-squared = 1

()
()

where Variance(Error*), represents the variance of error that


occurs when each observation of Y is predicted without
including that observation when estimating the model.
Alternatively, one could minimize the criterion or the
Mallows Cp criterion:
= () [ + (
Cp =

)] ,
[ + ]

SS(Error)
MS(Error for Model with All Predictors)

( 2),

Akaike Information Criteria, AIC, is equivalent to in large


samples. Also there is a procedure for finding the model that best
represents the "true scientific relationship between Y and the
predictors": minimize the Bayesian Information Criterion (BIC).
5
Page 137 of 156

2.

Cp depends in part on the set of all candidate predictors


considered, so its value can only be used to compare models
within the same best subsets analysis. Predicted R-squared,
AIC and Sp* do not have this limitation. Also, it is not always a
reliable criterion when the minimum value of Cp is negative.

3.

When Predicted R-squared is not available, or when Cp does not


provide a reliable or definitive answer, the most appropriate
approach is to select the model that minimizes . This allows
one to compare models even when they are estimated from
different parts of the sample (perhaps because of missing
values), or estimated from entirely different samples.

4.

H.

Always check assumptions (linearity, homoscedasticity and


randomness) and use runner-up models if appropriate.

One-Way Analysis of Variance is a Multiple Regression on Indicator


Predictors for Mutually Exclusive Groups (Lecture 11).
1.

An indicator coefficient represents the difference between group


averages.

2.

The F-test for overall significance also tests whether group


means are significantly different.

3.

Analysis of covariance provides a way of testing for differences


among group means after adjusting for additional variables
(covariates) and this test is an F-test on the subset of indicator
variables that represent group membership.

6
Page 138 of 156

V. Chi-square tests for goodness-of-fit and independence (Lecture 12).


In both cases the test statistic is,
[ ]2
=

where Observed and Expected refer to the observed and expected


counts, respectively.
A.

If it is a goodness-of-fit test for whether proportions for mutually


exclusive categories take on specified values, then the degrees of
freedom are: (number of categories)-1. The expected count for each
category is n*pi (where pi is the specified value of the proportion for
category i under H0).

B.

If its a test for independence, the degrees of freedom are {(r-1)*(c-1)}


where r & c are the number of rows and columns of the table,
respectively. The expected count for cell in row i and column j is
RiCj/n where Ri and Cj represent the corresponding row i and column
j totals, respectively.

C.

The goodness-of-fit test generalizes the Case 3 test on one proportion.


The test for independence generalizes the Case 6 test for equality of 2
proportions.

7
Page 139 of 156

1
Review Questions
(45)

1. Please refer to the analysis on page 1 of the Appendix. Here I am studying models
that predict the gross receipts from the opening release of a movie (represented by the
variable OPENING). Please use the best subsets analysis on the bottom of page 1 of
the Appendix to answer the following questions.

(6) a. Based on the best subsets analysis, which model minimizes the variance of the error (or
residuals)? (To identify the model, please specify the number of predictors and whether it
is the first or second model of that type.) Also, please specify the variance of the error.
Model Choice:

1st Model with 3 predictors

Estimated variance of error:

(17.076)2 = 291.6

(6) b. What would be the best predictive model of OPENING? Why?


Model Choice:
Justification:

1st Model with 3 predictors


Largest R-sq(pred)

(6) c. What proportion of the variance in OPENING is explained by a simple linear


regression on BUDGET?

19.3% (R-squared Adjusted)

(6) d. Assume that the correlation between OPENING and BUDGET is positive. Is there
enough information in the best-subsets analysis to deduce the actual value of the sample
correlation between these two variables? If so, please specify its value.
Please circle one: NO (there is not enough information)

YES (there is enough information)

If YES, give the value of the correlation between OPENING and BUDGET: p0.219

= 0.47

(6) e. Assume that the correlation between OPENING and SUMMER is positive. Is there
enough information in the best-subsets analysis to deduce the actual value of the sample
correlation between these two variables? If so, please specify its value.
Please circle one: NO (there is not enough information)

YES (there is enough information)

If YES, give the value of the correlation between OPENING and SUMMER: p0.030

= 0.17

(9) f. Use the Sp* criterion to choose the best predictive model from among the following two
alternatives on page 1 of the Appendix: ( 1) the best predictive model according to Cp and
(2) the best predictive model according to R-sq(pred). Which of these two models has the
best Sp* value?

Sp* (best Cp model) = (17.256)2 [ 1 + 3/(32-4)] = 330


Sp*(best R-sq(pred) model)= (17.076)2 [1+ 4/(32-5)] = 335

Model 1 (1st with 2 preds) is the better predictive model according to Sp*
Page 140 of 156

2
(6) g. In the best subsets analysis, consider the last model with all 4 predictors. Is it possible to
deduce whether the t-ratio of the coefficient of SUMMER is greater than 1 in absolute
value? (If it is not possible, please specify what additional analysis is needed.)
Please circle one:
For predictor SUMMER in 4-predictor model , |t-ratio| is:
>1
<1
Explain how you know or what additional information is needed:

1st model with 3 predictors has all the same predictors except for
SUMMER and it has a lower MS(Error) .
(45) 2. Now consider the simple linear regression model on page 2 of the Appendix.
(12)a. As a predictive model for OPENING, this is a dreadful model. Assuming that I knew
beforehand that this model would not have much predictive value, is there any rational
reason for doing the regression analysis on page 2? (Please explain very briefly.)

Yes, its useful as a 1-way ANOVA to study average OPENING


with and without superstars. (See #2d below.)
(6) b. What is the average value of OPENING (i.e., the average opening gross receipts) for a
movie that does not have a superstar in it? (Be sure to specify the units in which your
answer is expressed!)

20.3 million dollars (the Constant)


(6) c. What is the difference between the average value of OPENING for movies with
superstars and the average value of OPENING for movies without superstars?

0.06 million dollars ($60,000)


(9) d. Is the difference in part (c) significantly nonzero at the 0.1 level? Please specify:
H0:

STAR = 0

P-value: 0.994

H1:

STAR 0

Conclusion: Accept

H0

Difference is not significant at 0.1 level


Page 141 of 156

3
(12)e. Now briefly consider the two predictor model on page 3 of the Appendix. Here
OPENING is regressed on STAR and BUDGET. A superstars agent concedes that this
model shows that adding a superstar to a movie actually results in a decrease of 13.2
million in OPENING receipts, assuming we hold BUDGET fixed. Nevertheless, this
agent argues that this decrease in OPENING can be prevented by increasing the total
BUDGET sufficiently, and that this alone shows that movies with superstars can make
more money than movies without superstars. Is there anything wrong with this
argument? (Please explain briefly and assume that the regression model itself is valid.)
Is there anything wrong with this argument? YES
(Please circle your answer.)

NO

Brief explanation:

Model indicates that opening receipts are higher when budget


dollars are spent on things other than superstars.
(OR: ceteris paribus, superstars hurt revenue.)
(57) 3.

The following questions refer to the model on page 4 of the Appendix. Here the dependent
variable, LN(Opening), is the natural logarithm of OPENING.

(3) a. What specific problem does taking the log of the dependent variable usually help solve in a
regression analysis?

The heteroscedastic situation where the variance of the


residuals increases with fit level
(6) b. Does this model achieve overall significance at the 0.01 level?

YES (p-value of F is 0.001 < 0.01).


(6) c. What proportion of the variance of LN(Opening) is explained by this model?

42.45% (R-squared adjusted)


(6) d. Is the coefficient for STAR significantly less than zero at the 0.1 level? Please specify:
H0:

P-value:

STAR

= 0

0.187/2 <0.1

H1:

STAR

<0

Conclusion: Reject

H0

Page 142 of 156

4
(9) e. On average, does the value of LN(Opening) increase significantly more than 0.01 units for
every 1 unit increase in BUDGET at the 0.05 level, assuming we hold the other predictors
constant? Please specify:
H0:

Budget

Test statistic:

= 0.01

H1:

( 0.01937 0.01)

Budget

> 0.01

2.2

0.00426

Critical Value: t

( 27df, 0.05) = 1.7

Conclusion:

Reject H0 (Yes)

(12) f. Please do the following test at the 0.1 level.


H0: STAR = SUMMER = STARxSUMMER = 0

Test statistic:

Critical Value:

H1: not H0

(11.3039 9.7696) / 3
1.2
0.42072

F( 3 & 27 df, 0.1) = 2.3

Conclusion: Accept

H0

(9) g. From the analysis on page 4 of the Appendix, can we determine what the value of
R2(unadjusted) would be if we dropped BUDGET from this model and only did the
regression on the other three predictors (with a constant)? (If it is possible, please give the
value of R2(unadjusted) for this simpler model.)
R-squared (Unadjusted)

()
()

[11.3039 ( 4.55) 0.42072)]


0.11
=
22.6633
2

(6) h. If I find a model for LN(Opening) that uses 5 predictors (and a constant) and reduces
MS(Error) to 0.400, is this new model of greater predictive value than the model on page 4
of the Appendix? (Assume that both models satisfy the assumptions of regression and
please justify your answer!)
Please circle I or II :
Best predictive model: I.
II.

Model on p.4 of Appendix


New Model with 5 predictors & constant

Justification:

Sp for 5 predictor model:

0.4/(32-7) = 0.0160

Sp for 4 predictor model:


(Or use Sp*)

0.42072/(32-6) = 0.0162
Page 143 of 156

10 = 30*100/300

(20) 4.

5
90 = 270*100/300

A company wants to test whether the proportion of defective items in 3 large shipments are
the same. A random sample of 100 items is drawn from each shipment, and the results are
summarized below.
Shipment 1

Defective

Not Defective

Total

Shipment 2

Shipment 3

10

15

(10)

(10)

(10)

90

85

95

(90)

(90)

(90)

100

100

100

Total

30

270

300

Record the expected counts in each category above and do the following test.
H0 : p1 = p2 = p3 (OR: Shipment is independent of whether or not an item is defective)
H1 : not H0
Please specify:
Test statistic:

2
2
2
2
5
5
5
5
2

5.6
10 10 90 90

Critical value:

2( 2 df , 0.1) = 4.6

Conclusion:

Reject H0

(Proportion of defects differs significantly by shipment.)

Page 144 of 156

Appendix (Review Questions)


These data were collected on 32 movies. The dependent variable of interest (Y) is the variable
Opening in C2, which gives the gross receipts (in millions of dollars) for the first weekend of
each movies release. As potential predictors we have data on each movies total budget,
whether or not a superstar was in it (Star), whether or not it was released during the summer
(Summer), and the product of the indicator variables Star and Summer (see below).

Column Count
T C1
32

Information on the Worksheet


Description
Name of the Movie

Name
Movie

C2

32

Opening

Gross Receipts for the weekend after the movie was


released (in millions of dollars)

C3

32

Budget

Total Budget for movie (millions $)

C6

32

Star

1 if a superstar is in the movie,


0 otherwise

C7

32

Summer

1 if movie released during the summer,


0 otherwise

C8

32

StarXSummer

1 if movie released during the summer and has superstar,


0 otherwise

The following analysis is for problem 1


MTB > Breg c2 c3 c6-c8

Best Subsets Regression: Opening versus Budget, Star, ...


Response is Opening

Vars
1
1
2
2
3
3
4

R-Sq
21.9
3.0
29.9
23.6
33.7
32.2
33.8

R-Sq
(adj)
19.3
0.0
25.0
18.3
26.6
24.9
24.0

R-Sq
(pred)
12.3
0.0
9.6
8.7
13.4
7.3
4.1

Mallows
Cp
3.8
11.6
2.6
5.2
3.0
3.7
5.0

S
17.900
19.950
17.256
18.010
17.076
17.272
17.374

B
u
d
g
e
t
X

S
t
a
r

S
u
m
m
e
r

S
t
a
r
X
S
u
m
m
e
r

X
X
X
X
X
X

X
X
X
X
X X
X X X
Page 145 of 156

The following analysis is for problems


2(a) through 2(d)
Regression Analysis: Opening versus Star
Analysis of
Source
Regression
Star
Error
Total

Variance
DF
Adj SS
1
0.0
1
0.0
30 12313.9
31 12313.9

Adj MS
0.024
0.024
410.462

Model Summary
S
R-sq
20.2599 0.00%

R-sq(adj)
0.00%

Coefficients
Term
Coef
Constant 20.30
Star
0.06

SE Coef
4.65
7.29

F-Value
0.00
0.00

P-Value
0.994
0.994

R-sq(pred)
0.00%

T-Value
4.37
0.01

P-Value
0.000
0.994

VIF
1.00

Regression Equation
Opening = 20.30 + 0.06 Star
Fits and Diagnostics for Unusual Observations
Std
Obs Opening
Fit Resid Resid
21
92.73 20.30 72.43
3.67 R
22
84.13 20.30 63.83
3.24 R
R

Large residual

Page 146 of 156

The following analysis is for problem 2(e)


Regression Analysis: Opening versus Star, Budget
Analysis of Variance
Source
DF
Adj SS
Regression
2
3679.0
Star
1
977.6
Budget
1
3679.0
Error
29
8634.8
Lack-of-Fit 21
5053.2
Pure Error
8
3581.6
Total
31 12313.9
Model Summary
S
R-sq
17.2555 29.88%

R-sq(adj)
25.04%

Coefficients
Term
Coef
Constant
3.33
Star
-13.15
Budget
0.398

SE Coef
6.24
7.26
0.113

Adj MS
1839.5
977.6
3679.0
297.8
240.6
447.7

F-Value
6.18
3.28
12.36

P-Value
0.006
0.080
0.001

0.54

0.879

R-sq(pred)
9.58%

T-Value
0.53
-1.81
3.52

P-Value
0.598
0.080
0.001

VIF
1.37
1.37

Regression Equation
Opening = 3.33 - 13.15 Star + 0.398 Budget
Fits and Diagnostics for Unusual Observations
Std
Obs Opening
Fit Resid Resid
21
92.73 32.35 60.38
3.67 R
22
84.13 39.11 45.02
2.83 R
R

Large residual

Page 147 of 156

The following analysis is for problem 3

Regression Analysis: LN(Opening) versus Budget, Star,


Summer, StarXSummer
Analysis of Variance
Source
DF
Seq SS
Regression
4 11.3039
Budget
1
9.7696
Star
1
0.3241
Summer
1
0.8433
StarXSummer
1
0.3669
Error
27 11.3594
Lack-of-Fit 25 11.3225
Pure Error
2
0.0369
Total
31 22.6633
Model Summary
S
R-sq
0.648628 49.88%

R-sq(adj)
42.45%

Coefficients
Term
Coef
Constant
1.623
Budget
0.01937
Star
-0.490
Summer
0.147
StarXSummer
0.439

Seq MS
2.82597
9.76959
0.32415
0.84325
0.36691
0.42072
0.45290
0.01843

F-Value
6.72
23.22
0.77
2.00
0.87

P-Value
0.001
0.000
0.388
0.168
0.359

24.57

0.040

R-sq(pred)
29.41%

SE Coef
0.262
0.00426
0.362
0.302
0.471

T-Value
6.20
4.55
-1.35
0.49
0.93

P-Value
0.000
0.000
0.187
0.631
0.359

VIF
1.37
2.40
1.73
2.88

Regression Equation
LN(Opening) = 1.623 + 0.01937 Budget - 0.490 Star + 0.147 Summer
+ 0.439 StarXSummer
Fits and Diagnostics for Unusual Observations
Obs LN(Opening)
Fit
Resid Std Resid
3
0.813 2.049 -1.236
-2.02 R
21
4.530 3.037
1.493
2.48 R
R

Large residual

Page 148 of 156

The Outlines
Tests Concerning Means and Proportions
and Confidence Intervals (outline of the
ideas in Lectures 2-5)
Outline of Methods for Regression
(summary of inferential methods in
Lectures 6-11)

Page 149 of 156

[This Page Is Deliberately Left Blank.]

Page 150 of 156

TESTS CONCERNING MEANS AND PROPORTIONS


H0 (Case #)

Test Statistic & Assumptions

1 = 0

z=

One Large Sample:


Sec. 9.2, p. 341,
with replaced by s.
2 = 0

x - 0
s/ n

( unknown, n >30)
t =

x - 0
s/ n

One Small Sample


Sec. 9.3, p. 344

( unknown, n<30, approx. normal obs.)

H1

Critical Region
(When to Reject H0)

< 0

z < -z

> 0

z >z

|z| > z/2

< 0

(n - 1)
t<-t

> 0

(n - 1)
t> t

|t| > t

(n - 1)
/2

Case 2 may always be used in place of Case 1, even when n >30!


3 p= p0

z=

p - p 0
p 0 (1 - p 0)/n

One Large Sample


Sec.9.4, p. 349
4 1 - 2 = D0

(np0 > 5, n(1-p0) > 5)

z=

( x 1 - x 2 ) - D0
2

p < p0

z < -z

p > p0

z > z

p p0

|z| > z/2

1-2<D0

z < -z

s1 s2
+
n1 n 2
Two Large Samples
1-2>D0
z > z
Sec. 10.1, p. 372, with
1 and 2 replaced
1-2 D0
|z| >z/2
by s1 and s2 & p. 376
Alternative tests, that are especially appropriate when at least one sample is not large: a)Welchs t-test [MINITABs
default approach, see shaded box on p.376]: this uses the Case 4 test statistic as a t-statistic with a special formula for
the degrees of freedom; b) Mann-Whitney test, also known as Wilcoxon Rank-Sum test [In MINITAB: Stat
Nonparametrics Mann-Whitney].
D

is often zero, but refers to whatever constant we choose to use on the right side of the equation in H0.
Page 151 of 156

H0

Test Statistic & Assumptions

5 1 - 2 = D0

t=

( x 1 - x 2) - D 0
sp

1
n1

H1

Critical Region
( n n 2 2)

1 - 2<D0

t t 1

1 - 2 > D0

t t 1

1 - 2 D0

|t|t

n2

Appropriate if

( n n 2 2)

1 = 2
Sec. 10.1, p. 375

(1 = 2, independent,

( n1 n 2 2)
/2

approximately normal samples)


6 p1 - p2 = D0

Two Large Samples


Sec. 10.3, p. 390

z=

p 1 p 2 D 0
p1 (1 - p1)

p 2 (1 - p 2)

+
n1
n2
Denominator is typically
calculated with pooled estimate
of p when D0 = 0 (see 1st note, near
bottom of p. 390)

p1-p2<D0

z < -z

p1-p2 >D0

z > z

p1-p2 D0

|z| > z/2

(Large, independent samples:


n1 p 1, n1 (1 - p 1), n 2 p 2 , n 2 (1 - p 2) 5)

7 1 - 2 = D0
Paired Samples:
Sec. 10.2, p. 383

Take differences and use Case 1 or 2


(depending on whether n>30 or n<30)
Or simply always use Case 2

is often zero, but refers to whatever constant we choose to use on the right side of the equation in H0.

Page 152 of 156

CONFIDENCE INTERVALS
(To Accompany: ATests Concerning Means and Proportions@)
Case #.

Parameter

(1-) x 100% Confidence Interval

x z/2 (s/ n )

-1)
x t (n
/2 (s/ n )

(small sample)
p z/2 p (1 - p )/n

1 - 2

( x1 - x 2) z/2

1 - 2

( x1 - x 2) t ( n1+ n 2-2) sp

p1 - p2

(p 1 - p 2) z/2

s1 s2
+
n1 n 2

/2

1
1
+
n1 n 2

p 1 (1 - p 1)
n1

p 2 (1 - p 2)
n2

Page 153 of 156

Outline of Methods for Regression 2015

Inferential Methods for Regression Models:


Tests of Hypotheses, Confidence & Prediction Intervals
Only items #2 and #5 are specific to multiple regression. Items #1,
#3, #4, and #6 were introduced with simple linear regression models.
1. F-test for Lack-ofFit (Frequently it is not possible to do this test.)
H0: The linear regression model is appropriate, versus H1: not H0.
Test Statistic: F = MS(Lack-of-Fit)/MS(Pure Error).
(1)
Rationale: MS(Pure Error) is a pooled estimate of the variance in Y that
is estimated within groups of observations that have the same predictor
values and it is a component of the variance of error from the regression
model (i.e., MS(Error)). MS(Lack-of-fit) is the other independent
component of MS(Error). If the error from the model really is
homoscedastic, then these two components of the error variance should be
equal in the population and the ratio of the estimates of the two variances
(i.e., the test statistic, F) should not be significantly greater than 1. This is
a diagnostic test and if we are trying to find a viable model, then we hope
to be able to accept H0.
Decision Rule: Reject H0 and conclude the model is not appropriate at
significance level if F > F(df1, df2) where:
df1 degrees of freedom of SS(Lack-of-Fit),
df2 degrees of freedom of SS(Pure Error).
OR: Reject H0 if p-value < .
2. F-Test for Overall Model Significance
H0: 1 = 2 = . . . = k = 0 versus H1: not H0.
Test Statistic: F = MS(Regression)/MS(Error).
(2)
Rationale: This F ratio will be significantly greater than 1 only if at least
one of the predictors has a coefficient that is significantly different from
zero.
Decision Rule: Reject H0 and conclude that the model is significant at
significance level if F > F(df1, df2) where:
df1 degrees of freedom of SS(Regression),
df2 degrees of freedom of SS(Error).
OR: Reject H0 if p-value < .

Page 154 of 156

Outline of Methods for Regression 2015

3. Confidence Intervals for the Value of a Coefficient


For example, a 95% confidence interval would be:
(..)

0.025
[ ],
(3)
where b and sb are the coefficient estimate and the estimated standard error
of that coefficient, respectively. Both are provided in the coefficient table.

4. The t-test for Individual Coefficients


We can test null hypotheses of the form,
H0: =constant, versus one of three possible alternatives:
a. H1: > constant; b. H1: <constant; c. H1: constant.
Test Statistic: t = (b - constant)/sb.
(4)
This t-ratio has degrees of freedom equal to that of SS(Error). (With
degrees of freedom defined this way it is mechanically just like a Case 2
test.) Most statistical packages (including MINITAB) report the p-value
for the two-sided test with constant 0, so that the decision rule in that
case is to simply reject H0 and conclude that the coefficient is significantly
nonzero if the reported p-value < .

5. Joint Test on Groups of Coefficients


H0: Specific subgroup of m predictor coefficients are zero (i.e., subgroup
of m predictors is not useful)
H1: not H0
Test Statistic:
=

[SS() (, )]/
MS()
=

[SS(, ) ()]/
MS()

Average marginal value of the subgroup


MS()

. (5)

Rationale:
Here the Reduced Model refers to the model without the m predictors
that are being tested. This test evaluates the combined marginal value (or
incremental value) of the m predictors. SS(Regression, Reduced Model)
and SS(Error, Reduced Model) refer to the sum of squared regression
and the sum of squared error, respectively, of the model without the m
Page 155 of 156

Outline of Methods for Regression 2015

predictors that are being tested. We should reject H0 if the test statistic, F,
is significantly greater than 1.
Decision Rule:
If F > F (m, df of error), reject H0 and conclude that there is useful
additional information in the subgroup of predictors (otherwise accept H0).
(Usually the p-value is not available for this test.)

Final note on F- and t-Tests


Both the F-test for overall model significance (test #2, the second test
listed on the first page) and the two-sided t-test for whether a predictors
coefficient is significantly nonzero (test #4(c) with constant=0), are special
cases of test #5. In the latter case, the square of the t-ratio is an F-ratio and
it has the interpretation in expression (5) above.

6. Confidence Intervals for Y and Prediction Intervals for Y


a. A 95% confidence interval for y (the true average of Y at specific
predictor values), is of the form:
(..)

0.025

[ ]

(6)

where is calculated at the predictor values of interest and SE Fit stands


for standard error of the fit and represents the uncertainty due to the fact
that we only have estimates of the regression coefficients. Since these
estimated coefficients are multiplied by the corresponding predictor values,
this error is partly a function of the actual values of the predictor variables.
b. A 95% prediction interval for Y at specific predictor values is of the
form:
(..)
0.025
[( )2 + ()].

(7)

This accommodates both the error due to the estimation of coefficients and
the intrinsic uncertainty in the model (the variance of the error term).

Page 156 of 156

Вам также может понравиться