Академический Документы
Профессиональный Документы
Культура Документы
Regression Diagnostics
Regression analysis assumes a
random sample of independent
observations on the same
individuals (i.e. units).
What are its other basic
assumptions? They all concern
the residuals (e):
(1) The mean of the probability
distribution of e over all possible
samples is 0: i.e. the mean of e does
not vary with the levels of x.
(2) The variance of the probability
distribution of e is constant for all levels
of x: i.e. the variance of the residuals
does not vary with the levels of x.
(3) The errors associated with any
two different y observations are
0: i.e. the errors are
uncorrelatedthe errors
associated with one value of y
have no effect on the errors
associated with other y values.
1
41 203381
58 172 343
105
12 522 66
61 229 112
17 43642
450 29 16
324
95 142 17789
198 513
355 62 107
98 171 170
449 175 405217
104 230 23518 497
35231 46 505
40197 88 52378
13 515 33 140 399 92
525 72 245
326278
411 167
94 144
488410 25 421 15421 183 386
284283
265 178 339 10
35
275 406
208 202 158 200
256 76 444 476
287 127 227
220
521 420
400 165 68 149 28 325
162 64 294249279 122 196
37
454168 106
182 299214 423 394
156
110
383 518431
181
1
385 328
348 179 4 314
417
19445
407
370
356
195
456228
474
416
428425 255
96 239
319 401 269
143
354
164
373 8252
130 34 366 375
408
338430
281
209
169 484
176508
288
84 90 65 478 26
469 489
187
302
79
22 435 334395
21083 13430
317 246
308
85 304 44
321
75 257 5101 206
71 46863
307
267
11
460251 500
519 199
138
397 330 434
263 459 32
481
253
189 486443114 234 477
231
163 131 7 479427
372 503
271 268 396 39843 520 18418517354 25967 157
121
0
226155
78
93
463
323
266
306424
499 362
221 462102
298 77 357
466329
318
351 377 311 433
494 342
501 292
496467
146 442
48011719 145 273379 20 293 345
113 393 358
409108201 359 441
523159 504309
232
3
240
404
250
365
161
129213 495 285
452 133
191
446
387
368344
116 363
461512 190
336
99 148
135
475
418482
340
280
103
471
403
243 472
132
233274
91
272
414333
422
458
491 510
225437
215
41380 332 6 141
147 211 204
511
415
264419
87493
310
222297
100
526
353 367
247
261 361 402
432
384
81 514 14 23
270192 337
109 451
205
262 216
97 126
301 160
507485
151
465
439
4850
305
322
2219238
124374
517
236
470
118
335360
389
364509
111
57
119207
380
152483
139 392
180
438 412 290
115
53 313 153
376 296506445 27455
136 120
490
303
331
31639
341
244 56429
86
123258
295 237
349
9
22449 320
223 82 350
277 73 137
70 289 388 242
241 38
312 464 498 371
327
55 391 347369 315254218 346 60 524 390
69
248 473 286 36 291
47502 174 426
300
74
276 447 193 492
166
282 382 188
453 51 487
150 457
-1
128
24
-2
1 1.5 2 2.5
Fitted values
Problems of heteroscedasticity?
So, even though the model passed
linktest & estat ovtest, at least one
basic problems remains to be
overcome.
Lets examine other assumptions.
The model specification tests have to do
with biased slope coefficients, & thus with
Assumption 1.
.
. vif
Variable | VIF 1/VIF
-------------+-----------------------------
exper | 15.62 0.064013
exper2 | 14.65 0.068269
hsc | 1.88 0.532923
_Itencat_3 | 1.83 0.545275
ccol | 1.70 0.589431
scol | 1.69 0.591201
_Itencat_2 | 1.50 0.666210
_Itencat_1 | 1.38 0.726064
female | 1.07 0.934088
nonwhite | 1.03 0.971242
-------------+----------------------------
Mean VIF | 4.23
.2
.1
0
-4 -2 0 2 4
Studentized residuals
chi2(1) = 15.76
Prob > chi2 = 0.0001
----------------------------------------------
Variable | chi2 df p
-------------+-------------------------------
hsc | 0.14 1 1.0000 #
scol | 0.17 1 1.0000 #
ccol | 2.47 1 0.7078 #
exper | 1.56 1 0.9077 #
exper2 | 0.10 1 1.0000 #
_Itencat_1 | 0.44 1 0.9992 #
_Itencat_2 | 0.00 1 1.0000 #
_Itencat_3 | 10.03 1 0.0153 #
female | 1.02 1 0.9762 #
nonwhite | 0.03 1 1.0000 #
-------------+--------------------------------
tenure.
# Sidak adjusted p-values
. estat szroeter, rhs mt(sidak)
---------------------------------------
Variable | chi2 df p
-------------+-------------------------
hsc | 0.14 1 1.0000 #
scol | 0.17 1 1.0000 #
ccol | 2.47 1 0.7078 #
exper | 3.20 1 0.5341 #
exper2 | 3.20 1 0.5341 #
_Itencat_1 | 0.44 1 0.9992 #
_Itencat_2 | 0.00 1 1.0000 #
_Itencat_3 | 10.03 1 0.0153 #
female | 1.02 1 0.9762 #
nonwhite | 0.03 1 1.0000 #
---------------------------------------
# Sidak adjusted p-values
So, hettest & szroeter say that the high
category of tenure is to blame.
What measures might be taken to
correct or reduce the problem?
Lets examine the graphs: rvfplot & rvpplot.
. rvfplot, yline(0) ml(id)
260 15
440 186
59
1
41 203381
58 172 343
105
12 522 66
61 229 112
95 17 43642
450 89 29 16
324 142 177198 513 355
23518
62 107
98 171 170 505
449 175 405217 230
104 52378 497
35231 46
411 40197 88 13 515
386 33 140 399 92
525 72 245
326278
167
9435
275 488410 14425 421
15421
183 284283
265
406 178
158 200
256
339
444 10
476
287 127 227
220 420 208
400 202
165 68 76
149 28
162 249 521 196
37
454168 106
299214 423383 394
156
110 431 325
1
385 64 294279 122 328
348 179 182
4 314
417
19445 181 51826
407
370
356
195
456228
474
416
428425 255
96 239
319 401 269
143
354
164
373 8252
130 34366 375
408
338430
281
209
169 484
508
288
176 84 90 65 478
469 489
187
302
79
22 435 334395
83
210 134 30
317 246
308
85 304 44
321
75 257 5101 206
71 46863
307
267
11
460251 500
519 199
138
372 503397 330 434
263
268 459
396 32
481
253
189
398 43 520 443
486 54114 234 477
231
163 131 7 479427
271 121
0
226155
78
93
463
323
266
306424
499 362
221 462102
298 77 357
466329
318
351184185 173 259 67 157
377 311 433
342
501 292
496467
146 442
117 145 379 345
113 358
409108201 359 441
494
523159 504309
232
3
240
404
250
365
161
129
480
213 49519 285
452 133
191
446
273387
368344
116 20
461512 293
363
190
336
99 148
135
475
418482
340
280
103
471
403
243
393
472
132
233274
91
272
414333
422
458
491 510
225437
215
41380 332 6 141
147 211 204
511
415
264419
87493
310
222297
100
526
353 367
247
261 361 402
432
384
81 514 14 27023
192 337
109 451
205
262 216
97 126 160
301
507485
151
465
439
4850
305
322
2219238
124374
517
236
470
118
335360
389
364509
111
57
119207
380
152483
139 392
180
438 412 290
115
53 313 153
376 296506445 27455
136 120
490
303
331
31639
341
244 56429
86
123258
295 237
349
938
22449 320
223 27782 73350
70 289 388 242 312 464 137498 371
55 241
391 347369 315254 524 327
390
248 473 286 218 346 60 69
291
502 174 426
300
74
276 447 193 36 492
47
166
282 382 188
453 51 487
150 457
-1
128
24
-2
1 1.5 2 2.5
Fitted values
260 15
186
. rvpplot _Itencat_3, yline(0) ml(id)
440 59
1
381
172
58
203 343
105
41
522 66
12
229
42
16 61
112
29
436
17
450
95 89
177
62
171
142
324
513
198
355 107
170
98
235
505
405
230
104
217
352 18
497
449
52
31
40
175
13
245
88
399
378
525
197
515
72 46
33
140
92
326
411
386
144
178
421
283
167
339
94
488 183
278
284
25
265
10
406
410
35
21
158
154
275
202
208 200
476
256
444
76
68
28
287
127
165
149
227
420
400
521 220
325
196
249
37
168
106
156
110
454
299
423
431
162
294
279
182
181
122
64 394
214
383
328
179
314
1
385
194
45
348
407
96
370
375 417
4
518
252
478
356
130
430
195
408
456
8
26
269
143
90
281
469
228
474
395
416
401
338
65
428
319
239
354
366
508
288
255
164
34
373
489 334
209
484
187
302
79
22
84
425
169
435
5
206
246
304
71
30
321
199
308
83
500
11
251
138
519
85
101
257
210
460 176
44
63
468
307
267
317
75
434
477
7
134
330
459
443
397
32
481
234
131
253
114
163
263
189
54
372
503
268
396
520
43 427
479
231
486
398
271
121
0
362
462
357
184
329
318
155
78
221
226
93
298
424
323
266
157
306
351
499
311
77
259
67
102 463
466
377
433
185
173
342
467
358
292
442
117
496
113
145
409
501
146
359
19
273
20
494
293
480
285
523
233
504
363
387
472
274 345
441
108
201
379
393
141
458
215
471
512
190
91
309
232
3
491
240
495
344
452
403
404
250
332
159
116
148
368
135
365
225
161
129
418
482
340
280
272
80
99
336
243
133
333
213
191
446 461
475
510
413
103
132
414
422
337
147
402
87
419
247
211
204
511
432
261
100
415
493
526
361
310
222
353 437
297
6
367
514
23
384
81
14
192
264
451
126
205
160
262
109
97
216
301
485
2
219
118
124
207
335
380 270
465
439
48
360
119
53
313
152
507
238
392
50
374
509
153
364
389
305
376
517
322
470
236
506
455 412
290
296
27
180
445
151
57
483
223
490
303
331
316
429
237
9
349
56
39
82
350
49
224 438
111
115
139
320
120
73
86
136
244
123
277
295
137
242 341
38
388
498
312
258
371
70
327
254 464
241
524
369
390
289
55
391
60
315
218
69
291
346 347
286
248
36
426
193
492 473
300
502
174
276
47 74
447
166
382 282
188
453
51
487 457
150
-1
125 448
516 212
128
24
-2
0 .2 .4 .6 .8 1
tencat==3
----------------------------------------------
Variable | m1 m2_robust
-------------+--------------------------------
hsc | .22241643*** .22241643***
scol | .32030543*** .32030543***
ccol | .68798333*** .68798333***
exper | .02854957*** .02854957***
exper2 | -.00057702*** -.00057702***
_Itencat_1 | -.0027571 -.0027571
_Itencat_2 | .22096745*** .22096745***
_Itencat_3 | .28798112*** .28798112***
female | -.29395956*** -.29395956***
nonwhite | -.06409284 -.06409284
_cons | 1.1567164*** 1.1567164***
-------------+--------------------------------
N | 526 526
----------------------------------------------
legend: * p<0.05; ** p<0.01; *** p<0.001
Theres no difference at all!
See Allison, who points out that non-
constant variance has to be
pronounced in order to make a
difference.
Its a good idea, in any case, to
specify robust standard errors in a
final model.
For now, we wont use
robust standard errors so that
we can explore additional
diagnostics.
Our final model, however,
will use robust standard
errors.
Correlated Errors
In the case of these data theres no need
to worry about correlated errors: the sample
is neither cluster nor panel or time series.
In general theres no straightforward way
to check for correlated errors.
If we suspect correlated errors, we
compensate in one or more of the following
three ways:
(1) by using robust standard errors;
(2) if its a cluster sample, by using STATAs
cluster option with the sample-cluster
variable.
. xi:reg wage educ educ2 exper i.tenure, robust
cluster(district)
But again, our data arent based on a cluster
sample.
(3) if its time-series data, by using Statas
bygodfrey option for Breusch-Godfrey
Lagrange Multiplier.
This model seems to be satisfactory from the
perspective of linear regressions assumptions
with the exception of an insignificant problem
with non-constant variance.
306
.05
520
298
266
252
133305
463
253
407 167 282
.04
308
262
164
342196 150
226
219
511
431
202 72 36
26
398
512
250 444 241
391 382
67
418
109265 248
526
468
328
287
299
336
397 43825 105
414
425
64
138
85 58
191
222
442
147509 178 89
.03
239
234
417
471
259
366
179151
46637 388 405 42
62
309
129
9628
44 498 492
409
319 303 390 59
273
23
480335
445 175 447
503
499325
337
130
385 111 276
174
315502 29 381 212
216
311
379
461
403
81
387
365
370
401 127
149341
406 61 487
458
211
32279
318
367
4
486
280
30 57
483
470
126
452 331 378
245230 450
436166 522
504
159
330
345
484
121
131
146 389
165 33
258
92 473
18171
497 260
462
272
485
267
404 488 218
.02
334
430
293
71182
402
467
99
213
433
6160
97 39
455
376277
237
208
506
205
153 52
140
183
123
49 355
426
286
393
479
116119
375
478
274
422
163
408
514
518
297
333
199
80
114
496
11 420
115
120
27
439
207
290
48
132392
227
220 283
386
244
326
524
421 16
43
427
141
508
395
424
185
288
161
443
157
358
7456
195
356400
181
477
441 223
517
316
476
364
296
162
373 7610
158
82 515
278
137 31
46235
449
217 170 229 66
47 112 448125
359
101
155
78
268
501
233
351
413
215
332
231
435412
490
168
156
360
275
313
192 56339
525
173
232
3
176
82
113
368
489
372
210
494
523
428
416
1
255
301
225
357
460
344
377
20
93
77 521
194
281
338
432
45
361
65
54
84
145
5243
475
307
384
83
63
271
236
380
383
124
110350
180
423
251
228
474
294
184
363
247
353139
106
374
118
169
122
221 38
410
320
73
507
214
327
371
70
284
289
464
291
352
41169
399
55
369
505
347
60 193
142
198
74
107
300 12 457203
343 440
172
128 24
493
189
481
100
317
79
396
134
117
495
415
491
510
354
34
472
459
257
209
103
148
269
446
323
519
246
143
14322
270
302
469
437 68
50429
9 312
144
94 346324
95
90
419
500
304
91 394
53
348
454
206
108
285
187 154
238295
256
224 104
254
13 513 188 51 186
362
264
329
22
482
340
434
201
135249
310
190152
314 21
86242
349
35
20088 98177 41 453 51615
261
240
102
87
292
19 136197
263
75451 40
.01
204
321 17
2
1
1
1
1
e( lwage | X )
e( lwage | X )
e( lwage | X )
0
0
0
0
-1
-1
-1
-1
-2
-2
-2
-2
-1 -.5 0 .5 1 -1 -.5 0 .5 1 -.5 0 .5 1 -10 -5 0 5 10
e( hsc | X ) e( scol | X ) e( ccol | X ) e( exper | X )
coef = .22241643, se = .04881795, t = 4.56 coef = .32030543, se = .05467635, t = 5.86 coef = .68798333, se = .05753517, t = 11.96 coef = .02854957, se = .00503299, t = 5.67
1
1
1
1
e( lwage | X )
e( lwage | X )
e( lwage | X )
0
0
0
0
-1
-1
-1
-1
-2
-2
-2
-2
-400 -200 0 200 400 600 -.5 0 .5 1 -1 -.5 0 .5 1 -1 -.5 0 .5 1
e( exper2 | X ) e( _Itencat_1 | X ) e( _Itencat_2 | X ) e( _Itencat_3 | X )
coef = -.00057702, se = .00010737, t = -5.37 coef = -.0027571, se = .0491805, t = -.06 coef = .22096745, se = .04916835, t = 4.49 coef = .28798112, se = .0557211, t = 5.17
1
1
e( lwage | X )
0
0
-1
-1
-2
-2
-1 -.5 0 .5 1 -.5 0 .5 1
e( female | X ) e( nonwhite | X )
coef = -.29395956, se = .03576118, t = -8.22 coef = -.06409284, se = .05772311, t = -1.11
. avplot _Itencat_3, ml(id), ml(id)
15 186 59
440 381
343
1
260 172
105 66
203 41
522 112 61
58 12 89170 107
229 42 450
177 95 98 18
17 171
198 405 104 497 46
140 33
16 142
324 505 183 444
29 436 513 230 62
355 31217
175
352
378
525 88 52 386 92
326278
284
265
10 25
200
476
76 256
220
40
245 399 411
94
339 421 21 154 275 383
235
449 13488 197 72
515
178 144
167
283 406
410 35
165 227
420
158
202
400 196 325
168
454294
279 214394 518
478
334
287 208 68
127
521 149
249 37 106
423
181 299
182
194 417
385
370
8 4
252 187209
302 79468484
176
431 122156 1 162 9045
28126 401
34 6322
267
307 44 486 466
28 110
64 328
179 96 375
356
195
130143
456
469 65338
239
5 199
30 84304
101 32 427398
348 314430269408
319
366
425
71 164
308
474
228
395
83435
246354
416
428
508
519
210
489
206
13811
500 251
85
434
131
54
121
231
479
155
78 463 379 345185
377
433
173
407 169255 321
257
317
75477 443
7 372
234 357362
318
329 499 441
501 108
201
141 393
0
288
373 460 459
163
396 134
481
263
189
253
271 268 93
323226
259 67 467
342
113
293 146458215
471 461
475
103
437
297510 6 23
330114 43 397
520184 462
221
298
351
157 117 424
266
311
292 145
496
494
523 91332 233
472
274
344
272 159
365
191
213 413367
384 514
270
503 102
44219 358
409
4953273
20
309491
240
232
403
148
161 336
414 418
99 422 211
526353 465 27
180
306387 77 359285
512
190
404
368 363
225 452
333280
337129
147 419
493119
192
14 361
264160
118 216
380360374290 412
296
438
115 12073
135 133 126 111 341 241
250480
50480
132116
446
243 482
340
432
301
87
310 402
204
247
261511
451485
207
335
48
100
415
97
238
507
81
222
262
2
509
53
50
322483 82
517
506
86 237
316
364
57 139
320
39498
38 312388464524
369347
313
392
153 305
376
455 236 223
9 429
224
49 70
219205
109
124
439389
152
349
151
56
303 277 490
123
295 136 350
137 327 371 242
254
258 390 473 300 74
470
445
331 244 55
289 69 218
391 315 346 291 502276 174
60 248
28642636 166 188
492193 47 282
382 457150
448
447 453 212
-1
487
51 125
516
128
24
-2
-1 -.5 0 .5 1
e( _Itencat_3 | X )
coef = .28798112, se = .0557211, t = 5.17
150
.02
128
59
58 282
105 212
260 382
381
125 487
448
.01
7 19 32 4354 75
77
78 93102
108
113
114
117131
121134145 163
155
157 173185
184
189201 231
221234
226 253
259
266
263
268
271 292306
298 311
317
318330
323
329 351
358
357
362 377
372 397
396
398 427
433
434
424 441
443459
467
462
463477
466479
481
486499
503 520
Note id=24.
. list rstu h d DF_Itencat_1 DF_Itencat_2
DF_Itencat_3 wage educ exper tenure
female nonwhite if id==24
To repeat, theres no problem of influential
outliers.