Вы находитесь на странице: 1из 11

Introduction to

regression analysis
NARMINA RUSTAMOVA
ADA

Simple Linear Regression Model

where
dependent variable or regressand
independent variable or regressor
- intercept parameter
- slope parameter
error term or disturbance
i number of observations

Assumptions
1.
in the population
2. {(: i = 1, . . . , n} is a random sample of the model above, implying
uncorrelated residuals: Cov() = 0 for all i
3. { : i = 1, . . . , n} are not all identical, implying
4. E[u|x] = 0 for all x (zero average error), implying E[u] = 0 and Cov(u, x) = 0
.
5. Var[u|x] = for all x, implying Var[u] = (homoscedasticity).

Example: Campaign expenditures


and election outcomes
Model:
voteA = 0 + 1 x expendA+ 2 x expend B + 3 x prtystr A + u,

where
voteA is the percent of the vote received by Candidate A,
expendA and expend B are campaign expenditures by
Candidates A and B,
prtystr A is a measure of party strength for Candidate A (the
percent of recent presidential vote that went to As party).

Stata Step 1: Import and use


data
Stata > File > Import > Excel Spreadsheet > Browse & Import first row as variables
or use a command
import excel "C:\Users\Narmina\Documents\present\data.xlsx", sheet("Sheet1") firstrow
Data Editor > Save > Name & Location
or use a command
save "C:\Users\Narmina\Documents\present\data.dta
File > Open > Name & Location
or use a command
save "C:\Users\Narmina\Documents\present\data.dta"

Stata Step 2: Estimate the


model
Statistics > Linear regression > dependent variable & independent variable
or use a command
regress vote_A expend_A expend_B prtystr_A

Result:
Vote_A = 33.27 + 0.03 x expend_A 0.03 x expend_B + 0.34 x prtystr_A + u,

Stata Step 3: Output


Source

Coefficients indicate how much


the dependent variable varies
with an independent variable,
when all other independent
variables are held constant.
Example: An increase in
candidate As expenditures by
$1 increases the percent of
votes he/she receives by 0.035

SS

df

MS

Model
Residual

27555.6163
20901.6323

3
169

9185.20542
123.678298

Total

48457.2486

172

281.728189

vote_A

Coef.

expend_A
expend_B
prtystr_A
_cons

.0349245
-.0349236
.342515
33.26714

Std. Err.
.0033695
.0030012
.0879518
4.416778

t
10.36
-11.64
3.89
7.53

Number of obs
F( 3,
169)
Prob > F
R-squared
Adj R-squared
Root MSE

P>|t|
0.000
0.000
0.000
0.000

=
=
=
=
=
=

173
74.27
0.0000
0.5687
0.5610
11.121

[95% Conf. Interval]


.0282728
-.0408482
.1688893
24.54797

.0415762
-.0289989
.5161407
41.9863

Stata Step 3: Output

Source

SS

df

MS

Model
Residual

27555.6163
20901.6323

3 9185.20542
169 123.678298

Total

48457.2486

172 281.728189

vote_A

Coef.

expend_A
expend_B
prtystr_A
_cons

.0349245
-.0349236
.342515
33.26714

Std. Err.
.0033695
.0030012
.0879518
4.416778

t
10.36
-11.64
3.89
7.53

Number of obs
F( 3, 169)
Prob > F
R-squared
Adj R-squared
Root MSE

P>|t|
0.000
0.000
0.000
0.000

=
=
=
=
=
=

173
74.27
0.0000
0.5687
0.5610
11.121

[95% Conf. Interval]


.0282728
-.0408482
.1688893
24.54797

.0415762
-.0289989
.5161407
41.9863

Ifp< .05, the coefficients are


statistically significantly
different from zero.If t>1.70,
the coefficients are statistically
significant at 5% significance
level.

Stata Step 3: Output


Source
Model
Residual
Total

SS
27555.6163
20901.6323
48457.2486

vote_A

Coef.

expend_A
expend_B
prtystr_A
_cons

.0349245
-.0349236
.342515
33.26714

df

MS

Number of obs
F( 3, 169)
Prob > F
R-squared
Adj R-squared
Root MSE

3 9185.20542
169 123.678298
172 281.728189

Std. Err.
.0033695
.0030012
.0879518
4.416778

t
10.36
-11.64
3.89
7.53

P>|t|
0.000
0.000
0.000
0.000

=
=
=
=
=
=

173
74.27
0.0000
0.5687
0.5610
11.121

[95% Conf. Interval]


.0282728
-.0408482
.1688893
24.54797

.0415762
-.0289989
.5161407
41.9863

The "R-squared" row represents


(also called the coefficient of
determination) the proportion of
variance in the dependent
variable that can be explained by
the independent variables:
independent variables explain
56.8% of the variability of our
dependent variable
The adjusted R-squared is a
modified version of R-squared
that has been adjusted for the
number of predictors in the
model: increases only if the new
term improves the model more
than would be expected by
chance. It decreases when a
predictor improves the model by
less than expected by chance.

Stata Step 3: Output

Source

SS

df

MS

Model
Residual

27555.6163
20901.6323

3 9185.20542
169 123.678298

Total

48457.2486

172 281.728189

vote_A

Coef.

expend_A
expend_B
prtystr_A
_cons

.0349245
-.0349236
.342515
33.26714

Std. Err.
.0033695
.0030012
.0879518
4.416778

t
10.36
-11.64
3.89
7.53

Number of obs
F( 3, 169)
Prob > F
R-squared
Adj R-squared
Root MSE

P>|t|
0.000
0.000
0.000
0.000

=
=
=
=
=
=

173
74.27
0.0000
0.5687
0.5610
11.121

[95% Conf. Interval]


.0282728
-.0408482
.1688893
24.54797

.0415762
-.0289989
.5161407
41.9863

The F-statistic tells if the


explanatory variables as a group
explain a statistically significant
share of the variation in the
dependent variable

Stata: Plotting Regression Line


Graphics > twoway graph > create > fit plots & line

To combine scatter plot and fitted plot:

60

twoway (lfit vote_A expend_A)

80

Or use a command

20

40

twoway (scatter vote_A expend_A) (lfit vote_A expend_A)

500
vote_A

expend_A

1000

Fitted values

1500

Вам также может понравиться