W11 Regression Analysis

REGRESSION
ANALYSIS
CORRELATION VS REGRESSION
• In correlation, the two variables are treated as
equals.
• From correlation we can only get an index

describing the linear relationship between two
variables.
• In regression we can predict the relationship

between more than two variables and can use it
to identify which variables x can predict the
outcome variable y.
CORRELATION VS REGRESSION
• In regression, one variable is considered
independent (=predictor) variable (X) and the
other the dependent (=outcome) variable Y.
• Regression analysis requires interval and ratio-

level data for dependent variables, can be
continuous or categorical for independent
variables.
• To see if your data fits the models of

regression, it is wise to conduct a scatter plot
analysis.
• The reason?
– Regression analysis assumes a linear
SCATTER PLOTS
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
SCATTER PLOT
•This is a linear relationship

•It is a positive relationship.
•As population with BA’s increases so does
the personal income per capita.
PEARSON’S R
• To determine strength you look at how closely
the dots are clustered around the line. The
more tightly the cases are clustered, the
stronger the relationship, while the more
distant, the weaker.
• Pearson’s r is given a range of -1 to + 1 with 0

being no linear relationship at all.
SCATTER PLOTS OF DATA WITH
VARIOUS CORRELATION COEFFICIENTS
Y Y Y
X X X
r = -1 r = -.6 r=0
Y
Y Y
X X X
r = +1 r = +.3 r=0
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
REGRESSION LINE
•Regression line is
the best straight line
description of the
plotted points and
can use it to describe
the association
between the
variables.
•If all the data fall

exactly on the line,
you have a perfect
relationship.
ASSUMPTIONS
• Linear regression assumes that…
• The relationship between X and Y is linear
• Y is distributed normally at each value of X
• The variance of Y at every value of X is the
same (homogeneity of variances) –
CONSTANT VARIANCE
• The observations are independent
WHAT IS “LINEAR”?
• Remember this: Y = mX+C?
A slope of 2 means that every 1-unit change in X yields a 2-unit

change in Y.
SIMPLE LINEAR REGRESSION
• pengiraan
REGRESSION COEFFICIENT
predicted score on the dependent variable
•
(Multiple linear regression)

READING THE TABLES
Model Summary
Adjusted Std. Error of

Model R R Square R Square the Estimate
1 .736a .542 .532 2760.003
a. Predictors: (Constant), Percent of Population 25 years
and Over with Bachelor's Degree or More, March 2000
estimates
When you run regression analysis on SPSS you

get a 3 tables. Each tells you something about
the relationship.
•The first is the model summary.

–The R is the Pearson Product Moment Correlation
Coefficient. In this case R is .736
–R is the square root of R-Squared and is the
correlation between the observed and predicted values
of dependent variable.
TABLE 1: R-SQUARE
Model Summary

1 .736 a .542 .532 2760.003
estimates
• R-Square is the proportion of variance in the dependent

variable (income per capita) which can be predicted from
the independent variable (level of education). This value
indicates that 54.2% of the variance in income can
be predicted from the variable education.
Note that this is an overall measure of the strength of

association, and does not reflect the extent to which any
particular independent variable is associated with the
dependent variable.
•R-Square is also called the coefficient of determination.

TABLE 1: ADJUSTED R-SQUARE
Model Summary

1 .736a .542 .532 2760.003
estimates
•As predictors are added to the model, each predictor will explain
some of the variance in the dependent variable simply due to
chance.
•The adjusted R-square attempts to yield a more honest

value to estimate the R-squared for the population. The
value of R-square was .542, while the value of Adjusted R-
square was .532. There isn’t much difference because we are
dealing with only one variable.
TABLE 2 : ANOVA
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 4.32E+08 1 432493775.8 56.775 .000 a
Residual 3.66E+08 48 7617618.586
Total 7.98E+08 49
a. Predictors: (Constant), Percent of Population 25 years and Over with Bachelor's
Degree or More, March 2000 estimates
b. Dependent Variable: Personal Income Per Capita, current dollars, 1999
•The p-value associated with this F value is very small (0.000). These
values are used to answer the question "Do the independent variables reliably
predict the dependent variable?". (menguji kesignifikanan model regresi
sekaligus - kesemua X dengan Y bagi regresi berganda)
•The p-value is compared to your alpha level (typically 0.05) and, if smaller,
you can conclude "Yes, the independent variables reliably predict the
dependent variable".
Ho : β=0 (Tiada hubungan linear antara X dan Y) Ho : β1 = β2 = … = βk
= 0 (tiada hubungan linear)
H1: β ≠ 0 ( Terdapat hubungan linear antara X dan Y) H 1 : sekurang-
kurangnya satu β ≠ 0 (regresi
berganda)
•If the p-value were greater than 0.05, you would say that the group of
independent variables does not show a statistically significant relationship with
the dependent variable, or that the group of independent variables does
TABLE 3: COEFFICIENTS
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 10078.565 2312.771 4.358 .000
Percent of Population
25 years and Over
with Bachelor's 688.939 91.433 .736 7.535 .000
Degree or More,
March 2000 estimates
a. Dependent Variable: Personal Income Per Capita, current dollars, 1999
•B - These are the values for the regression equation

for predicting the dependent variable from the
independent variable. These are called
unstandardized coefficients because they are
measured in their natural units.
• As such, the coefficients cannot be compared with

one another to determine which one is more
influential in the model, because they can be
measured on different scales.
TABLE 3: COEFFICIENTS Coefficientsa
1 (Constant) 10078.565 2312.771 4.358 .000
25 years and Over
with Bachelor's 688.939 91.433 .736 7.535 .000
Degree or More,
•Beta - These are the standardized coefficients.
•These are the coefficients that you would obtain if you

standardized all of the variables in the regression,
including the dependent and all of the independent
variables, and ran the regression.
TABLE 3: COEFFICIENTS Multiple
regression
Coefficientsa
1 (Constant) 13032.847 1902.700 6.850 .000
25 years and Over
iv1 with Bachelor's 517.628 78.613 .553 6.584 .000
Degree or More,
Population Per
7.953 1.450 .461 5.486 .000
Square Mile
iv2
•This chart looks at two variables and

shows how the different bases affect the
B value. That is why you need to look
at the standardized Beta to see the
differences.
PART OF THE REGRESSION
EQUATION
• b represents the slope of the line
– It is calculated by dividing the change in the
dependent variable by the change in the
independent variable.
– The difference between the actual value of Y

and the calculated amount is called the
residual.
– The represents how much error there is in the

prediction of the regression equation for the y
value of any individual case as a function of X.
COMPARING TWO VARIABLES
• Regression analysis is useful
for comparing two variables to
see whether controlling for
other independent variable
affects your model.
• For the first independent

variable, education, the
argument is that a more
educated population will have
higher-paying jobs, producing
a higher level of per capita
income in the state.
• The second independent

variable is included because
we expect to find better-
paying jobs, and therefore
more opportunity for state
residents to obtain them, in
urban rather than rural areas.
Report - Single Linear Regression
Model Summary

1 .736a .542 .532 2760.003
a. Predictors: (Constant), Percent of Population 25 years Satu kajian telah dijalankan bagi mengkaji
estimates b
pengaruh peratus populasi berumur 25
ANOVA
Sum of
tahun dan keatas dengan ijazah pertama dan
Model
1 Regression
Squares
4.32E+08
df
1
Mean Square
432493775.8
F
56.775
Sig.
.000a
keatas (IV) terhadap pendapatan personal
Residual
Total
3.66E+08
7.98E+08
48
49
7617618.586
per kapita (DV). Ujian regresi linear mudah
a. Predictors: (Constant), Percent of Population 25 years and Over with Bachelor's menunjukkan peratus populasi berumur 25
Degree or More, March 2000 estimates
b. Dependent Variable: Personal Income Per Capita, current dollars, 1999 tahun dan keatas dengan ijazah pertama (X)
Coefficientsa dan keatas mempunyai hubungan yang
Unstandardized Standardized signifikan dengan pendapatan personal per
Model
1 (Constant)
B Std. Error Beta t Sig. kapita, F(1,48)=56.77, p<.001, dengan nilai
10078.565 2312.771 4.358 .000
Percent of Population R2 =.542. Jangkaan pendapatan personal per
25 years and Over
with Bachelor's
Degree or More,
688.939 91.433 .736 7.535 .000 kapita adalah 10078.565 + 688.939X dollars
apabila X diukur dalam peratus. Pendapatan
personal per kapita meningkat sebanyak
$688.939 bagi setiap peratus populasi
berumur 25 tahun dan keatas dengan ijazah
Y=pendapatan personal per kapita pertama dan keatas.
X=peratus populasi 25 tahun dan
ke atas yang mempunyai ijazah
pertama dan keatas
Report - Multiple Linear Regression
Satu kajian telah dijalankan bagi mengkaji
pengaruh peratus populasi berumur 25
tahun dan keatas dengan ijazah pertama
dan keatas (IV1) dan population per square
mile (IV2) terhadap pendapatan personal
per kapita (DV). Ujian regresi linear
berganda menunjukkan secara
keseluruhan model regresi adalah
signifikan, F(2,47)=60.643, p<.001,
dengan nilai R2 =.721. Faktor peratus
populasi berumur 25 tahun dan keatas
dengan ijazah pertama dan keatas
(b=517.628, t=6.584, p<.001) dan
population per square mile (b=7.953,
t=5.486, p<.001) merupakan peramal
Y=pendapatan personal per kapita
yang signifikan terhadap pendapatan
X1=peratus populasi 25 tahun dan personal per kapita. Rumus regresi
ke atas yang mempunyai ijazah berganda bagi model ini adalah
pertama dan keatas Y=13032.847 + 517.628X1 + 7.953X2.
X2=population per square mile
LATIHAN
Satu kajian telah dijalankan untuk melihat pengaruh intensitas belajar
(kesungguhan belajar) terhadap prestasi statistik. Analisis regresi
mudah telah dilakukan (dengan menggunakan SPSS) dan keputusan
regresi tersebut adalah seperti di bawah. Jawab soalan berikut
berdasarkan keputusan output SPSS berikut.
a) Apakah analisis statistik yang telah
dijalankan?
b) Merujuk jadual Model Summary, jelaskan
fungsi nilai R kuasa dua (R Square) dalam
kajian ini.
c) Berdasarkan cetakan SPSS yang diberikan,
tuliskan laporan kajian secara terperinci
(rujuk pada nota kuliah).

W11 Regression Analysis

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

W11 Regression Analysis

Загружено:

Авторское право:

Доступные форматы

REGRESSION

• From correlation we can only get an index

• In regression we can predict the relationship

• Regression analysis requires interval and ratio-

• To see if your data fits the models of

•This is a linear relationship

• Pearson’s r is given a range of -1 to + 1 with 0

•If all the data fall

A slope of 2 means that every 1-unit change in X yields a 2-unit

(Multiple linear regression)

Adjusted Std. Error of

When you run regression analysis on SPSS you

•The first is the model summary.

Adjusted Std. Error of

• R-Square is the proportion of variance in the dependent

Note that this is an overall measure of the strength of

•R-Square is also called the coefficient of determination.

Adjusted Std. Error of

•The adjusted R-square attempts to yield a more honest

•B - These are the values for the regression equation

• As such, the coefficients cannot be compared with

•Beta - These are the standardized coefficients.

•These are the coefficients that you would obtain if you

•This chart looks at two variables and

– The difference between the actual value of Y

– The represents how much error there is in the

• For the first independent

• The second independent

Adjusted Std. Error of

Вам также может понравиться