Академический Документы
Профессиональный Документы
Культура Документы
SIMPLE LINEAR
REGRESSION ANALYSIS
Correlation Analysis (p. 579)
Given: Bivariate data={(X1,Y1), (X2,Y2), …, (Xn,Yn)}
𝐶𝑜𝑣(𝑋, 𝑌)
𝜌=
𝜎𝑋 𝜎𝑌
Properties of ρ (p. 580)
A linear correlation coefficient can only assume values [-1,+1].
The sign of describes the direction of the linear relationship
between X and Y.
➢A positive value for means that the line slopes upward to the
right, and so as X increases, the value of Y increases.
➢A negative value for means that the line slopes downward to
the right, and so as X increases, the value of Y decreases.
If =0, then there is no linear correlation between X and Y.
However, this does not mean a lack of association. It is possible to
obtain a zero correlation even if the two variables are related,
though their relationship is nonlinear, such as a quadratic
relationship.
Properties of ρ
When is -1 or 1, there is perfect linear relationship between X and
Y and all the points (x, y) fall on a line whose slope is not equal to
0. ( is undefined when the slope is 0 since Var(Y)=0 in this case).
A that is close to 1 or -1 indicates a strong linear relationship.
A strong linear relationship does not necessarily imply that X
causes Y or Y causes X. It is possible that a third variable may
have caused the change in both X and Y, producing the observed
relationship.
➢This is an important point that we should always remember
when studying not just relationships, but also comparing two
populations, say by using a t-test.
Properties of ρ
➢Unless we collected our data using a well-designed experiment
where we were able to randomize the treatments and
substantially control the extraneous variables, we need to use
the more complex “causal” models to study causality.
➢Otherwise, we just describe the observed relationship or the
observed difference between means.
Pearson Product Moment Correlation (p. 581)
The Pearson product moment correlation coefficient between X
and Y, denoted by r, is defined as:
𝑛 σ𝑛𝑖=1 𝑋𝑖 𝑌𝑖 − σ𝑛𝑖=1 𝑋𝑖 σ𝑛𝑖=1 𝑌𝑖
𝑟=
𝑛 2 𝑛 2 𝑛 2 𝑛 2
𝑛 σ𝑖=1 𝑋𝑖 − σ𝑖=1 𝑋𝑖 𝑛 σ𝑖=1 𝑌𝑖 − σ𝑖=1 𝑌𝑖
σ𝑛𝑖=1(𝑋𝑖 − 𝑋)(𝑌
ሜ 𝑖 − 𝑌) ሜ
=
σ𝑛𝑖=1(𝑋𝑖 − 𝑋)ሜ 2 σ𝑛𝑖=1(𝑌𝑖 − 𝑌)
ሜ 2
Y Y
X X
r=0
r = 0.87
X
Remark
If r=1 then all the data points belong in a line whose slope is positive.
If r=-1 then all the data points belong in a line whose slope is negative.
If r=0 then we cannot conclude that all the data points belong in a line whose
slope is 0 (or a horizontal line). 5
Example 4
X Y XY X2 Y2 3
-4 4 -16 16 16
-2 2 -4 4 4 2
0 0 0 0 0
1
2 2 4 4 4
4 4 16 16 16 0
Sum 0 12 0 40 40 -4 -2 0 2 4
𝑛 σ𝑛 𝑛 𝑛
𝑖=1 𝑋𝑖 𝑌𝑖 − σ𝑖=1 𝑋𝑖 σ𝑖=1 𝑌𝑖 (5)(0)−(0)(12)
𝑟= = =0
2 2 (5)(40)−(0)2 (5)(40)−(12)2
𝑛 σ𝑛 𝑋
𝑖=1 𝑖
2 − σ𝑛 𝑋
𝑖=1 𝑖 𝑛 σ𝑛 𝑌
𝑖=1 𝑖
2 − σ𝑛 𝑌
𝑖=1 𝑖
Test of Hypothesis
Ho: =0 vs Ha: 0
𝑟
Test Statistic: 𝑇 =
1−𝑟2
𝑛−2
𝛽መ𝑜 = 𝑌ሜ − 𝛽መ1 𝑋ሜ
100
Calculus Grade
80
60
40
20
0
0 10 20 30 40 50 60 70 80
If the model fits the data adequately, we can use this equation for
prediction purposes and for describing the nature of the
relationship between the variables.
𝑀𝑆𝐸 σ𝑛 𝑋 2
where 𝑆𝛽0 = 𝑛 2
𝑖=1 𝑖
𝑛 2
𝑛 σ𝑖=1 𝑋𝑖 − σ𝑖=1 𝑋𝑖
Hypothesis Testing
To test if there is a significant linear relationship between Y and X:
Ho: 1=0 vs Ha: 10
Test Statistic:
𝛽መ1
𝑇=
𝑆𝛽1
𝑀𝑆𝐸
where 𝑆𝛽1 = 2
σ𝑛
𝑖=1 𝑋𝑖
2− 𝑛
σ𝑖=1 𝑋𝑖 ൗ𝑛
𝑆𝑆𝐸/(𝑛−2) (606.025869)/8
where 𝑆𝛽1 = 2 = = 0.174985
𝑛
σ𝑖=1 𝑋𝑖 23634− (460)2 /10
σ𝑛 2
𝑖=1 𝑋𝑖 − 𝑛
Critical region: |t| > t/2(v=n-2); that is, t >2.306 or t < -2.306
Coefficient of Determination (p. 593)
Definition 18.4
The coefficient of determination, denoted by R2, is defined as the
proportion of the variability in the observed values of the response
variable that can be explained by the explanatory variable through
their linear relationship
Remarks
We can use the coefficient of determination to assess the
goodness-of-fit of the linear regression model.
The realized value of the coefficient of determination will be from
0 to 1. Usually, this value is expressed in percentage so that we
may interpret this as the percentage of the variation in the values
of Y that is explained by the explanatory variable X through the
model.
If the model has perfect predictability then R2=1. If the model has
no predictive capability then R2=0.
Relationship between r and 𝛽መ1
𝑛 σ𝑛𝑖=1 𝑋𝑖 𝑌𝑖 − σ𝑛𝑖=1 𝑋𝑖 σ𝑛𝑖=1 𝑌𝑖
𝛽መ1 = 2
𝑛 σ𝑛𝑖=1 𝑋𝑖2 − 𝑛
σ𝑖=1 𝑋𝑖
Note: (Verify!)
2
𝑛 σ𝑛𝑖=1 𝑋𝑖2 − 𝑛
σ𝑖=1 𝑋𝑖
𝑟 = 𝛽መ1
2
𝑛 σ𝑛𝑖=1 𝑌𝑖2 − 𝑛
σ𝑖=1 𝑌𝑖
Computing for R 2
Student X Y X2 Y2 XY
1 39 65 1521 4225 2535
2 43 78 1849 6084 3354
3 21 52 441 2704 1092
4 64 82 4096 6724 5248
5 57 92 3249 8464 5244
6 47 89 2209 7921 4183
7 28 73 784 5329 2044
8 75 98 5625 9604 7350
9 34 56 1156 3136 1904
10 52 75 2704 5625 3900
Total 460 760 23634 59816 36854
Computing for R 2
R2(100%) = (.8398)2(100%)=70.52%
##
## Call:
## lm(formula = house.price ~ 1 + house.size)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.388 -27.388 -6.388 29.577 64.333
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 98.24833 58.03348 1.693 0.1289
## house.size 0.10977 0.03297 3.329 0.0104 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 41.33 on 8 degrees of freedom
## Multiple R-squared: 0.5808, Adjusted R-squared: 0.5284
## F-statistic: 11.08 on 1 and 8 DF, p-value: 0.01039
Software Output (SAS)
PROC REG DATA = stat115;
MODEL house_price = house_size;
RUN;
QUIT;
Software Output (SPSS)
Software Output (Stata)
. regress houseprice housesize