Вы находитесь на странице: 1из 24

REGRESSION

ANALYSIS
CORRELATION VS REGRESSION
• In correlation, the two variables are treated as
equals.

• From correlation we can only get an index


describing the linear relationship between two
variables.

• In regression we can predict the relationship


between more than two variables and can use it
to identify which variables x can predict the
outcome variable y.
CORRELATION VS REGRESSION
• In regression, one variable is considered
independent (=predictor) variable (X) and the
other the dependent (=outcome) variable Y.

• Regression analysis requires interval and ratio-


level data for dependent variables, can be
continuous or categorical for independent
variables.

• To see if your data fits the models of


regression, it is wise to conduct a scatter plot
analysis.
• The reason?
– Regression analysis assumes a linear
SCATTER PLOTS
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
SCATTER PLOT

•This is a linear relationship


•It is a positive relationship.
•As population with BA’s increases so does
the personal income per capita.
PEARSON’S R
• To determine strength you look at how closely
the dots are clustered around the line. The
more tightly the cases are clustered, the
stronger the relationship, while the more
distant, the weaker.

• Pearson’s r is given a range of -1 to + 1 with 0


being no linear relationship at all.
SCATTER PLOTS OF DATA WITH
VARIOUS CORRELATION COEFFICIENTS
Y Y Y

X X X
r = -1 r = -.6 r=0
Y
Y Y

X X X
r = +1 r = +.3 r=0
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
REGRESSION LINE
•Regression line is
the best straight line
description of the
plotted points and
can use it to describe
the association
between the
variables.

•If all the data fall


exactly on the line,
you have a perfect
relationship.
ASSUMPTIONS
• Linear regression assumes that…
• The relationship between X and Y is linear
• Y is distributed normally at each value of X
• The variance of Y at every value of X is the
same (homogeneity of variances) –
CONSTANT VARIANCE
• The observations are independent
WHAT IS “LINEAR”?
• Remember this: Y = mX+C?

A slope of 2 means that every 1-unit change in X yields a 2-unit


change in Y.
SIMPLE LINEAR REGRESSION

• pengiraan
REGRESSION COEFFICIENT
predicted score on the dependent variable
•  

(Multiple linear regression)


READING THE TABLES
Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .736a .542 .532 2760.003
a. Predictors: (Constant), Percent of Population 25 years
and Over with Bachelor's Degree or More, March 2000
estimates

When you run regression analysis on SPSS you


get a 3 tables. Each tells you something about
the relationship.

•The first is the model summary.


–The R is the Pearson Product Moment Correlation
Coefficient. In this case R is .736
–R is the square root of R-Squared and is the
correlation between the observed and predicted values
of dependent variable.
TABLE 1: R-SQUARE
Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .736 a .542 .532 2760.003
a. Predictors: (Constant), Percent of Population 25 years
and Over with Bachelor's Degree or More, March 2000
estimates

• R-Square is the proportion of variance in the dependent


variable (income per capita) which can be predicted from
the independent variable (level of education).  This value
indicates that 54.2% of the variance in income can
be predicted from the variable education. 

Note that this is an overall measure of the strength of


association, and does not reflect the extent to which any
particular independent variable is associated with the
dependent variable. 

•R-Square is also called the coefficient of determination.


TABLE 1: ADJUSTED R-SQUARE
Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .736a .542 .532 2760.003
a. Predictors: (Constant), Percent of Population 25 years
and Over with Bachelor's Degree or More, March 2000
estimates

•As predictors are added to the model, each predictor will explain
some of the variance in the dependent variable simply due to
chance. 

•The adjusted R-square attempts to yield a more honest


value to estimate the R-squared for the population.   The
value of R-square was .542, while the value of Adjusted R-
square was .532. There isn’t much difference because we are
dealing with only one variable. 
TABLE 2 : ANOVA
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 4.32E+08 1 432493775.8 56.775 .000 a
Residual 3.66E+08 48 7617618.586
Total 7.98E+08 49
a. Predictors: (Constant), Percent of Population 25 years and Over with Bachelor's
Degree or More, March 2000 estimates
b. Dependent Variable: Personal Income Per Capita, current dollars, 1999

•The p-value associated with this F value is very small (0.000).  These
values are used to answer the question "Do the independent variables reliably
predict the dependent variable?".  (menguji kesignifikanan model regresi
sekaligus - kesemua X dengan Y bagi regresi berganda)

•The p-value is compared to your alpha level (typically 0.05) and, if smaller,
you can conclude "Yes, the independent variables reliably predict the
dependent variable". 
Ho : β=0 (Tiada hubungan linear antara X dan Y) Ho : β1 = β2 = … = βk
= 0 (tiada hubungan linear)
H1: β ≠ 0 ( Terdapat hubungan linear antara X dan Y) H 1 : sekurang-
kurangnya satu β ≠ 0 (regresi
berganda)

•If the p-value were greater than 0.05, you would say that the group of
independent variables does not show a statistically significant relationship with
the dependent variable, or that the group of independent variables does
TABLE 3: COEFFICIENTS
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 10078.565 2312.771 4.358 .000
Percent of Population
25 years and Over
with Bachelor's 688.939 91.433 .736 7.535 .000
Degree or More,
March 2000 estimates
a. Dependent Variable: Personal Income Per Capita, current dollars, 1999

•B - These are the values for the regression equation


for predicting the dependent variable from the
independent variable.  These are called
unstandardized coefficients because they are
measured in their natural units. 

• As such, the coefficients cannot be compared with


one another to determine which one is more
influential in the model, because they can be
measured on different scales. 
TABLE 3: COEFFICIENTS Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 10078.565 2312.771 4.358 .000
Percent of Population
25 years and Over
with Bachelor's 688.939 91.433 .736 7.535 .000
Degree or More,
March 2000 estimates
a. Dependent Variable: Personal Income Per Capita, current dollars, 1999

•Beta - These are the standardized coefficients. 

•These are the coefficients that you would obtain if you


standardized all of the variables in the regression,
including the dependent and all of the independent
variables, and ran the regression. 
TABLE 3: COEFFICIENTS Multiple
regression
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 13032.847 1902.700 6.850 .000
Percent of Population
25 years and Over
iv1 with Bachelor's 517.628 78.613 .553 6.584 .000
Degree or More,
March 2000 estimates
Population Per
7.953 1.450 .461 5.486 .000
Square Mile
a. Dependent Variable: Personal Income Per Capita, current dollars, 1999
iv2

•This chart looks at two variables and


shows how the different bases affect the
B value. That is why you need to look
at the standardized Beta to see the
differences.
PART OF THE REGRESSION
EQUATION
• b represents the slope of the line
– It is calculated by dividing the change in the
dependent variable by the change in the
independent variable.

– The difference between the actual value of Y


and the calculated amount is called the
residual.

– The represents how much error there is in the


prediction of the regression equation for the y
value of any individual case as a function of X.
COMPARING TWO VARIABLES
• Regression analysis is useful
for comparing two variables to
see whether controlling for
other independent variable
affects your model.

• For the first independent


variable, education, the
argument is that a more
educated population will have
higher-paying jobs, producing
a higher level of per capita
income in the state.

• The second independent


variable is included because
we expect to find better-
paying jobs, and therefore
more opportunity for state
residents to obtain them, in
urban rather than rural areas.
Report - Single Linear Regression
Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .736a .542 .532 2760.003
a. Predictors: (Constant), Percent of Population 25 years Satu kajian telah dijalankan bagi mengkaji
and Over with Bachelor's Degree or More, March 2000
estimates b
pengaruh peratus populasi berumur 25
ANOVA

Sum of
tahun dan keatas dengan ijazah pertama dan
Model
1 Regression
Squares
4.32E+08
df
1
Mean Square
432493775.8
F
56.775
Sig.
.000a
keatas (IV) terhadap pendapatan personal
Residual
Total
3.66E+08
7.98E+08
48
49
7617618.586
per kapita (DV). Ujian regresi linear mudah
a. Predictors: (Constant), Percent of Population 25 years and Over with Bachelor's menunjukkan peratus populasi berumur 25
Degree or More, March 2000 estimates
b. Dependent Variable: Personal Income Per Capita, current dollars, 1999 tahun dan keatas dengan ijazah pertama (X)
Coefficientsa dan keatas mempunyai hubungan yang
Unstandardized Standardized signifikan dengan pendapatan personal per
Coefficients Coefficients
Model
1 (Constant)
B Std. Error Beta t Sig. kapita, F(1,48)=56.77, p<.001, dengan nilai
10078.565 2312.771 4.358 .000
Percent of Population R2 =.542. Jangkaan pendapatan personal per
25 years and Over
with Bachelor's
Degree or More,
688.939 91.433 .736 7.535 .000 kapita adalah 10078.565 + 688.939X dollars
March 2000 estimates
a. Dependent Variable: Personal Income Per Capita, current dollars, 1999
apabila X diukur dalam peratus. Pendapatan
personal per kapita meningkat sebanyak
$688.939 bagi setiap peratus populasi
berumur 25 tahun dan keatas dengan ijazah
Y=pendapatan personal per kapita pertama dan keatas.
X=peratus populasi 25 tahun dan
ke atas yang mempunyai ijazah
pertama dan keatas
Report - Multiple Linear Regression
Satu kajian telah dijalankan bagi mengkaji
pengaruh peratus populasi berumur 25
tahun dan keatas dengan ijazah pertama
dan keatas (IV1) dan population per square
mile (IV2) terhadap pendapatan personal
per kapita (DV). Ujian regresi linear
berganda menunjukkan secara
keseluruhan model regresi adalah
signifikan, F(2,47)=60.643, p<.001,
dengan nilai R2 =.721. Faktor peratus
populasi berumur 25 tahun dan keatas
dengan ijazah pertama dan keatas
(b=517.628, t=6.584, p<.001) dan
population per square mile (b=7.953,
t=5.486, p<.001) merupakan peramal
Y=pendapatan personal per kapita
yang signifikan terhadap pendapatan
X1=peratus populasi 25 tahun dan personal per kapita. Rumus regresi
ke atas yang mempunyai ijazah berganda bagi model ini adalah
pertama dan keatas Y=13032.847 + 517.628X1 + 7.953X2.
X2=population per square mile
LATIHAN
Satu kajian telah dijalankan untuk melihat pengaruh intensitas belajar
(kesungguhan belajar) terhadap prestasi statistik. Analisis regresi
mudah telah dilakukan (dengan menggunakan SPSS) dan keputusan
regresi tersebut adalah seperti di bawah. Jawab soalan berikut
berdasarkan keputusan output SPSS berikut.
a) Apakah analisis statistik yang telah
dijalankan?
b) Merujuk jadual Model Summary, jelaskan
fungsi nilai R kuasa dua (R Square) dalam
kajian ini.
c) Berdasarkan cetakan SPSS yang diberikan,
tuliskan laporan kajian secara terperinci
(rujuk pada nota kuliah).

Вам также может понравиться