Вы находитесь на странице: 1из 52

MODELLING NON-NORMAL OUTCOMES:

USING GLMS
PROF ANDREW JONES
HEALTH, ECONOMETRICS AND DATA GROUP (HEDG) &
DEPARTMENT OF ECONOMICS AND RELATED STUDIES
UNIVERSITY OF YORK

A
Annddrreew
wM
M JJoonneess,, 22001100

Generalized linear models (GLM)


GLMs specify the conditional mean directly:

E y | x f x

For example:

E y | x exp x

Advantages of GLM:
Predictions are made on the cost scale (no retransformation)
They allow for heteroskedasticity through the choice of
distributional family (albeit limited to functions of the mean)

A
Annddrreew
wM
M JJoonneess,, 22001100

The GLM framework


Requirements:
A link function g(.) that relates the conditional mean to the
covariates:

g ( ) x

g 1 ( x ) f ( x )
A distribution (D), that belongs to the linear exponential family, is
used to specify the relationship between the variance and the mean:

y D
Var ( y | x) ( )
A
Annddrreew
wM
M JJoonneess,, 22001100

Link functions
Specifies the shape of the conditional mean function.
Most commonly used:
Identity covariates act additively on mean: with the identity
link the interpretation of coefficients is the same as OLS,
irrespective of the choice of distribution
Log covariates act multiplicatively on mean: this changes the
interpretation of the coefficients.
The link function characterises how the mean on the raw cost scale
is related to the set of covariates.
Example:
E y | x exp x
And therefore:
ln E y | x x

A
Annddrreew
wM
M JJoonneess,, 22001100

Stata glm command:


linkname
identity
log
logit
probit
cloglog
power #
opower #
nbinomial
loglog
logc

description
identity
log
logit
probit
cloglog
power
odds power
negative binomial
log-log
log-complement
A
Annddrreew
wM
M JJoonneess,, 22001100

Distributions
Distributions:
Used to describe the relationship between the variance and

conditional mean:
var y | x E y | x

Distributional families:
Gaussian: constant variance; =0
Poisson: variance proportional to the mean; =1
Gamma: variance proportional to the square of the mean; =2
Inverse Gaussian: variance proportional to cube of the mean; =3
Allows flexibility in modelling cost data
Gaussian with identity link function is comparable to OLS (ML)
Can apply distributions and link functions independently (although
there are canonical links)
A
Annddrreew
wM
M JJoonneess,, 22001100

Stata glm command:


familyname
gaussian
igaussian
binormal[varnameN|#N]
poisson
nbinormal[#k]
gamma

description
Gaussian (normal)
inverse Gaussian
Bernoulli/binomial
Poisson
negative binomial
gamma

A
Annddrreew
wM
M JJoonneess,, 22001100

Estimation of GLMs
Estimation is based on the classical estimating equations (score functions):
r

i
y i i ( )

i
i
i
0
i

i ( )

i ( )

Where r is the Pearson residual.


As GLMs are based on the linear exponential family they have the quasi-ML
property (consistent so long as mean is correctly specified).

A
Annddrreew
wM
M JJoonneess,, 22001100

Pros and cons of GLM


Advantages:
No need for retransformation
Gains in efficiency (precision) if estimator matches data generating process
Provides consistent estimates even if incorrect family distribution (quasiML)
i.e. choice of family only influences efficiency as long as link function
and covariates are correctly specified
Disadvantages:
Can suffer substantial efficiency losses:
If data is heavily tailed (e.g. after log transformation, still has high
kurtosis)
If variance function, represented by distribution family, is misspecified

A
Annddrreew
wM
M JJoonneess,, 22001100

. * log-link gamma-variance
. **************************
. glm y $xs [pweight=wt], link(log) family(gamma) vce(robust)
eform nolog
Generalized linear models
Optimization
: ML
Deviance
Pearson

=
=

4017.717969
6901.264188

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Variance function: V(u) = u^2


Link function
: g(u) = ln(u)

[Gamma]
[Log]

Log pseudolikelihood = -28780.76399

AIC
BIC

=
=
=
=
=

2955
2947
2.341793
1.363325
2.341793

= 19.48478
= -19532.51

A
Annddrreew
wM
M JJoonneess,, 22001100

-----------------------------------------------------------------------------|
Robust
y |
exp(b)
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d |
1.130757
.0678008
2.05
0.040
1.005382
1.271768
phylim |
1.482513
.1020447
5.72
0.000
1.295414
1.696636
actlim |
1.522993
.1135434
5.64
0.000
1.315947
1.762614
totchr |
1.307327
.0286729
12.22
0.000
1.25232
1.36475
age |
.9946331
.0046425
-1.15
0.249
.9855754
1.003774
female |
.8264128
.0474845
-3.32
0.001
.7383941
.9249235
income |
1.001129
.0012703
0.89
0.374
.9986427
1.003622
------------------------------------------------------------------------------

. predict yf if e(sample)
(option mu assumed; predicted mean y)
. predict glmgam_yf if e(sample)
(option mu assumed; predicted mean y)
. predict e if e(sample), deviance //deviance residuals

A
Annddrreew
wM
M JJoonneess,, 22001100

Normal plot of residuals


. pnorm e, title(normal pp plot: glm gamma) ytitle(residuals)
xtitle(inverse normal)
. graph export pnormglmgam.wmf, replace

0.50
0.25
0.00

residuals

0.75

1.00

normal pp plot: glm gamma

0.00

0.25

0.50
inverse normal

0.75

1.00

A
Annddrreew
wM
M JJoonneess,, 22001100

. * average partial effects


. margeff
Average partial effects after glm
y = log(y)
-----------------------------------------------------------------------------variable |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d |
901.7012
467.3564
1.93
0.054
-14.30056
1817.703
phylim |
2834.505
603.9553
4.69
0.000
1650.774
4018.235
actlim |
3228.229
706.5185
4.57
0.000
1843.478
4612.98
totchr |
2015.555
188.1652
10.71
0.000
1646.758
2384.352
age | -39.99365
34.88792
-1.15
0.252
-108.3727
28.38542
female | -1441.191
398.1313
-3.62
0.000
-2221.514
-660.8677
income |
8.388019
9.42908
0.89
0.374
-10.09264
26.86868
-----------------------------------------------------------------------------.

A
Annddrreew
wM
M JJoonneess,, 22001100

. * Counterfactual predictions
. summ y yf y0 attf attc if d_raw==1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------y |
1748
7611.963
12358.83
3
125610
yf |
1748
7834.489
5281.235
2590.564
50808.95
y0 |
1748
6928.532
4670.528
2290.999
44933.55
attf |
1748
.1156371
3.22e-08
.115637
.1156372
attc |
1748
905.9573
610.7064
299.5652
5875.398

So

= 0.116

. lincom d, eform
-----------------------------------------------------------------------------y |
exp(b)
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------(1) |
1.130757
.0678008
2.05
0.040
1.005382
1.271768
-----------------------------------------------------------------------------. nlcom attf: 1-exp(-_b[d])
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------attf |
.1156371
.0530269
2.18
0.029
.0117063
.2195678
-----------------------------------------------------------------------------A
Annddrreew
wM
M JJoonneess,, 22001100

. * tests for link function


. linktest
Iteration 0:

log pseudolikelihood = -31763.279

Generalized linear models


Optimization
: ML
Deviance
Pearson

=
=

3.75441e+11
3.75441e+11

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Variance function: V(u) = 1


Link function
: g(u) = u

[Gaussian]
[Identity]

Log pseudolikelihood = -31763.27921

AIC
BIC

=
=
=
=
=

2955
2952
1.27e+08
1.27e+08
1.27e+08

=
=

21.50002
3.75e+11

-----------------------------------------------------------------------------|
Robust
y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_hat | -10496.15
11257.9
-0.93
0.351
-32561.22
11568.92
_hatsq |
994.3803
645.0974
1.54
0.123
-269.9874
2258.748
_cons |
22750.79
48961.42
0.46
0.642
-73211.83
118713.4

A
Annddrreew
wM
M JJoonneess,, 22001100

Park test
Idea: GLM distribution should reflect the relationship between the
variance and the mean:

var y | x E y | x

2
Park test exploits this by regressing ln yi yi
on ln y i and a
constant.
The estimated coefficient on ln y i provides guidance on the
appropriate distributional family:
Gaussian:
=0
Poisson:
=1
Gamma:
=2
Inverse Gaussian: =3

A
Annddrreew
wM
M JJoonneess,, 22001100

*
.
.
.
.
.

tests for variance


* Park test
gen e2=(y-yf)^2
gen lnyf=log(yf)
quietly glm e2 lnyf, link(log) family(gamma) nolog vce(robust)
test lnyf==0

( 1)

[e2]lnyf = 0
chi2( 1)
Prob > chi2
. test lnyf==1
( 1) [e2]lnyf = 1
chi2( 1)
Prob > chi2
. test lnyf==2
( 1) [e2]lnyf = 2
chi2( 1)
Prob > chi2
. test lnyf==3
( 1) [e2]lnyf = 3
chi2( 1)
Prob > chi2

=
=

95.33
0.0000

=
=

14.76
0.0001

=
=

4.32
0.0376

=
=

64.02
0.0000

A
Annddrreew
wM
M JJoonneess,, 22001100

. *Pearson correlation between raw scale residual and y-hat


|
ey
yfc
-------------+-----------------ey |
1.0000
|
|
yfc | -0.0963
1.0000
|
0.0000
|
. regress ey yfc
Source |
SS
df
MS
-------------+-----------------------------Model | 3.5542e+09
1 3.5542e+09
Residual | 3.7976e+11 2953
128603092
-------------+-----------------------------Total | 3.8332e+11 2954
129762744

Number of obs
F( 1, 2953)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

2955
27.64
0.0000
0.0093
0.0089
11340

-----------------------------------------------------------------------------ey |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------yfc | -.2194556
.0417446
-5.26
0.000
-.3013071
-.1376041
_cons |
1489.313
373.8587
3.98
0.000
756.2634
2222.364
------------------------------------------------------------------------------

A
Annddrreew
wM
M JJoonneess,, 22001100

. * Copas test
. regress y yfv [pw=wt]
(sum of wgt is
2.9550e+05)
Linear regression

Number of obs
F( 1, 2953)
Prob > F
R-squared
Root MSE

=
=
=
=
=

2955
162.03
0.0000
0.1009
11372

-----------------------------------------------------------------------------|
Robust
y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------yfv |
.7594373
.0596621
12.73
0.000
.6424537
.8764209
_cons |
1642.142
385.8655
4.26
0.000
885.5489
2398.734
-----------------------------------------------------------------------------. test yfv==1
( 1)

yfv = 1
F(

1, 2953) =
Prob > F =

16.26
0.0001

A
Annddrreew
wM
M JJoonneess,, 22001100

. * log-link normal-variance
. **************************
. glm y $xs [pweight=wt], link(log) family(normal) eform
vce(robust) nolog
Generalized linear models
Optimization
: ML
Deviance
Pearson

=
=

3.77028e+11
3.77028e+11

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Variance function: V(u) = 1


Link function
: g(u) = ln(u)

[Gaussian]
[Log]

Log pseudolikelihood = -31769.51164

AIC
BIC

=
=
=
=
=

2955
2947
1.28e+08
1.28e+08
1.28e+08

=
=

21.50762
3.77e+11

A
Annddrreew
wM
M JJoonneess,, 22001100

-----------------------------------------------------------------------------|
Robust
y |
exp(b)
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d |
1.06403
.0745162
0.89
0.376
.9275611
1.220577
phylim |
1.434588
.1463193
3.54
0.000
1.174653
1.752042
actlim |
1.519728
.1369488
4.64
0.000
1.273681
1.813306
totchr |
1.19708
.0308068
6.99
0.000
1.138197
1.259009
age |
.989408
.0062263
-1.69
0.091
.9772796
1.001687
female |
.8446423
.0605538
-2.36
0.019
.7339201
.9720685
income |
1.001188
.0014936
0.80
0.426
.9982646
1.00412
------------------------------------------------------------------------------

. predict yf if e(sample)
(option mu assumed; predicted mean y)
. predict glmnorm_yf if e(sample)
(option mu assumed; predicted mean y)
. predict e if e(sample), deviance //deviance residuals

A
Annddrreew
wM
M JJoonneess,, 22001100

Normal plot of residuals


. pnorm e, title(normal pp plot: glm normal) ytitle(residuals)
xtitle(inverse normal)
. graph export pnormglmn.wmf, replace

0.50
0.25
0.00

residuals

0.75

1.00

normal pp plot: glm normal

0.00

0.25

0.50
inverse normal

0.75

1.00

A
Annddrreew
wM
M JJoonneess,, 22001100

. * average partial effects


. margeff
Average partial effects after glm
y = log(y)
-----------------------------------------------------------------------------variable |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d |
455.9004
531.3409
0.86
0.391
-585.5086
1497.309
phylim |
2616.554
871.9372
3.00
0.003
907.5888
4325.52
actlim |
3232.314
856.1917
3.78
0.000
1554.209
4910.419
totchr |
1336.714
193.7161
6.90
0.000
957.0376
1716.391
age | -78.70441
46.9469
-1.68
0.094
-170.7186
13.30982
female | -1266.074
491.2121
-2.58
0.010
-2228.832
-303.3159
income |
8.773708
11.015
0.80
0.426
-12.8153
30.36272
-----------------------------------------------------------------------------.

A
Annddrreew
wM
M JJoonneess,, 22001100

. * Counterfactual predictions
. summ y yf y0 attf attc if d_raw==1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------y |
1748
7611.963
12358.83
3
125610
yf |
1748
7610.402
3948.845
3033.538
33186.42
y0 |
1748
7152.431
3711.216
2850.989
31189.37
attf |
1748
.0601769
3.35e-08
.0601768
.060177
attc |
1748
457.9704
237.6293
182.5491
1997.055

So

= 0.060

. lincom d, eform
( 1) [y]d = 0
-----------------------------------------------------------------------------y |
exp(b)
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------(1) |
1.06403
.0745162
0.89
0.376
.9275611
1.220577
-----------------------------------------------------------------------------. nlcom attf: 1-exp(-_b[d])
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------attf |
.0601769
.0658177
0.91
0.361
-.0688235
.1891773
-----------------------------------------------------------------------------A
Annddrreew
wM
M JJoonneess,, 22001100

. * tests for link function


. linktest
Iteration 0:

log pseudolikelihood = -31762.389

Generalized linear models

No. of obs

2955

-----------------------------------------------------------------------------|
Robust
y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_hat | -27047.53
18262.47
-1.48
0.139
-62841.31
8746.254
_hatsq |
2015.67
1039.161
1.94
0.052
-21.04761
4052.388
_cons |
88823.64
80080.45
1.11
0.267
-68131.15
245778.4
-----------------------------------------------------------------------------.

A
Annddrreew
wM
M JJoonneess,, 22001100

. * tests for variance


. * Park test
. gen e2=(y-yf)^2
. gen lnyf=log(yf)
. quietly glm e2 lnyf, link(log) family(normal) nolog
vce(robust)
. test lnyf==0
( 1)

[e2]lnyf = 0
chi2( 1)
Prob > chi2
. test lnyf==1
( 1) [e2]lnyf = 1
chi2( 1)
Prob > chi2
. test lnyf==2
( 1) [e2]lnyf = 2
chi2( 1)
Prob > chi2
. test lnyf==3
( 1) [e2]lnyf = 3
chi2( 1)
Prob > chi2

=
=

159.13
0.0000

=
=

5.84
0.0157

=
=

60.57
0.0000

=
=

323.33
0.0000

A
Annddrreew
wM
M JJoonneess,, 22001100

. *Pearson correlation between raw scale residual and y-hat


|
ey
yfc
-------------+-----------------ey |
1.0000
|
|
yfc |
0.0172
1.0000
|
0.3495
|
. regress ey yfc
Source |
SS
df
MS
-------------+-----------------------------Model |
111725464
1
111725464
Residual | 3.7689e+11 2953
127628149
-------------+-----------------------------Total | 3.7700e+11 2954
127622766

Number of obs
F( 1, 2953)
Prob > F
R-squared
Adj R-squared
Root MSE

=
2955
=
0.88
= 0.3495
= 0.0003
= -0.0000
=
11297

-----------------------------------------------------------------------------ey |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------yfc |
.0507794
.0542731
0.94
0.350
-.0556376
.1571964
_cons | -476.0856
451.7724
-1.05
0.292
-1361.906
409.7352
------------------------------------------------------------------------------

A
Annddrreew
wM
M JJoonneess,, 22001100

. * Copas test
. regress y yfv [pw=wt]
(sum of wgt is
2.9550e+05)
Linear regression

Number of obs
F( 1, 2953)
Prob > F
R-squared
Root MSE

=
=
=
=
=

2955
190.04
0.0000
0.1033
11357

-----------------------------------------------------------------------------|
Robust
y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------yfv |
1.00221
.0727006
13.79
0.000
.8596612
1.144759
_cons | -118.9316
457.5539
-0.26
0.795
-1016.089
778.2253
-----------------------------------------------------------------------------. test yfv==1
( 1)

yfv = 1
F(

1, 2953) =
Prob > F =

0.00
0.9757

A
Annddrreew
wM
M JJoonneess,, 22001100

. * log-link poisson-variance
. ***************************
. glm y $xs [pweight=wt], link(log) family(poisson) vce(robust)
eform nolog
Generalized linear models
Optimization
: ML
Deviance
Pearson

=
=

26281040.05
44428684.38

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Variance function: V(u) = u


Link function
: g(u) = ln(u)

[Poisson]
[Log]

Log pseudolikelihood = -13155144.23

AIC
BIC

=
=
=
=
=

2955
2947
1
8917.896
15075.9

=
=

8903.656
2.63e+07

A
Annddrreew
wM
M JJoonneess,, 22001100

-----------------------------------------------------------------------------|
Robust
y |
IRR
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d |
1.100106
.0660872
1.59
0.112
.9779118
1.237568
phylim |
1.463529
.1123242
4.96
0.000
1.259135
1.701101
actlim |
1.519216
.1180221
5.38
0.000
1.304647
1.769075
totchr |
1.245825
.0265433
10.32
0.000
1.194872
1.29895
age |
.9907766
.0049607
-1.85
0.064
.9811013
1.000547
female |
.8263785
.048102
-3.28
0.001
.7372794
.9262451
income |
1.001184
.0012994
0.91
0.362
.9986409
1.003734
-----------------------------------------------------------------------------. predict yf if e(sample)
(option mu assumed; predicted mean y)
. predict glmpois_yf if e(sample)
(option mu assumed; predicted mean y)
. predict e if e(sample), deviance //deviance residuals

A
Annddrreew
wM
M JJoonneess,, 22001100

Normal plot of residuals


. pnorm e, title(normal pp plot: glm poisson) ytitle(residuals)
xtitle(inverse normal)
. graph export pnormglmp.wmf, replace

0.50
0.25
0.00

residuals

0.75

1.00

normal pp plot: glm poisson

0.00

0.25

0.50
inverse normal

0.75

1.00

A
Annddrreew
wM
M JJoonneess,, 22001100

. * average partial effects


. margeff
Average partial effects after glm
y = log(y)
-----------------------------------------------------------------------------variable |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d |
688.7766
454.941
1.51
0.130
-202.8915
1580.445
phylim |
2708.822
656.1635
4.13
0.000
1422.765
3994.878
actlim |
3168.856
725.6211
4.37
0.000
1746.665
4591.047
totchr |
1615.309
164.3813
9.83
0.000
1293.128
1937.491
age | -67.55386
36.94308
-1.83
0.067
-139.961
4.85324
female | -1413.186
392.2707
-3.60
0.000
-2182.022
-644.3492
income |
8.629559
9.45929
0.91
0.362
-9.910309
27.16943
-----------------------------------------------------------------------------.

A
Annddrreew
wM
M JJoonneess,, 22001100

. * Counterfactual predictions
. summ y yf y0 attf attc if d_raw==1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------y |
1748
7611.963
12358.83
3
125610
yf |
1748
7611.963
4473.056
2740.152
40358.39
y0 |
1748
6919.302
4066.025
2490.808
36685.93
attf |
1748
.0909964
3.17e-08
.0909963
.0909965
attc |
1748
692.661
407.0319
249.344
3672.465

So

= 0.091

. lincom d, eform
-----------------------------------------------------------------------------y |
exp(b)
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------(1) |
1.100106
.0660872
1.59
0.112
.9779118
1.237568
-----------------------------------------------------------------------------. nlcom attf: 1-exp(-_b[d])
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------attf |
.0909964
.0546071
1.67
0.096
-.0160315
.1980242
------------------------------------------------------------------------------

A
Annddrreew
wM
M JJoonneess,, 22001100

. * tests for link function


. linktest
Iteration 0:

log pseudolikelihood = -31761.733

Generalized linear models


No. of obs
=
2955
-----------------------------------------------------------------------------|
Robust
y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_hat | -18262.86
14338.06
-1.27
0.203
-46364.95
9839.229
_hatsq |
1479.764
820.1017
1.80
0.071
-127.6054
3087.134
_cons |
53368.27
62515.35
0.85
0.393
-69159.58
175896.1
-----------------------------------------------------------------------------.

A
Annddrreew
wM
M JJoonneess,, 22001100

. * tests for variance


. * Park test
. gen e2=(y-yf)^2
. gen lnyf=log(yf)
. quietly glm e2 lnyf, link(log) family(poisson) nolog
vce(robust)
. test lnyf==0
( 1)

[e2]lnyf = 0
chi2( 1)
Prob > chi2
. test lnyf==1
( 1) [e2]lnyf = 1
chi2( 1)
Prob > chi2
. test lnyf==2
( 1) [e2]lnyf = 2
chi2( 1)
Prob > chi2
. test lnyf==3
( 1) [e2]lnyf = 3
chi2( 1)
Prob > chi2

=
=

151.80
0.0000

=
=

12.70
0.0004

=
=

26.97
0.0000

=
=

194.61
0.0000

A
Annddrreew
wM
M JJoonneess,, 22001100

. *Pearson correlation between raw scale residual and y-hat


|
ey
yfc
-------------+-----------------ey |
1.0000
|
|
yfc | -0.0260
1.0000
|
0.1580
|
. regress ey yfc
Source |
SS
df
MS
-------------+-----------------------------Model |
255222942
1
255222942
Residual | 3.7786e+11 2953
127957349
-------------+-----------------------------Total | 3.7811e+11 2954
128000432

Number of obs
F( 1, 2953)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

2955
1.99
0.1580
0.0007
0.0003
11312

-----------------------------------------------------------------------------ey |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------yfc | -.0687222
.0486597
-1.41
0.158
-.1641326
.0266882
_cons |
501.0008
411.2701
1.22
0.223
-305.4042
1307.406
------------------------------------------------------------------------------

A
Annddrreew
wM
M JJoonneess,, 22001100

. * Copas test
. regress y yfv [pw=wt]
(sum of wgt is
2.9550e+05)
Linear regression

Number of obs
F( 1, 2953)
Prob > F
R-squared
Root MSE

=
=
=
=
=

2955
177.49
0.0000
0.1032
11357

-----------------------------------------------------------------------------|
Robust
y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------yfv |
.8973181
.0673539
13.32
0.000
.7652527
1.029384
_cons |
745.5639
420.8999
1.77
0.077
-79.72312
1570.851
-----------------------------------------------------------------------------. test yfv==1
( 1)

yfv = 1
F(

1, 2953) =
Prob > F =

2.32
0.1275

A
Annddrreew
wM
M JJoonneess,, 22001100

. * square root-link gamma-variance


. *********************************
. glm y $xs [pw=wt], link(power 0.5) family(gamma) vce(robust)
eform nolog

Generalized linear models


Optimization
: ML
Deviance
Pearson

=
=

399289.7291
682034.2564

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Variance function: V(u) = u^2


Link function
: g(u) = u^(0.5)

[Gamma]
[Power]

Log pseudolikelihood = -2876835.365

AIC
BIC

=
=
=
=
=

2955
2947
231.4334
135.4902
231.4334

=
=

1947.102
375739.5

A
Annddrreew
wM
M JJoonneess,, 22001100

-----------------------------------------------------------------------------|
Robust
y |
exp(b)
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------d |
99.74859
217.1764
2.11
0.035
1.398424
7114.998
phylim |
2076879
5874551
5.14
0.000
8123.568
5.31e+08
actlim |
3.16e+08
1.08e+09
5.73
0.000
389875
2.56e+11
totchr |
80851.78
73762.22
12.39
0.000
13524.66
483340.1
age |
.8488611
.1507883
-0.92
0.356
.5992839
1.202377
female |
.0030096
.0066479
-2.63
0.009
.0000397
.2284105
income |
1.039825
.0504463
0.80
0.421
.9455073
1.143551
------------------------------------------------------------------------------

A
Annddrreew
wM
M JJoonneess,, 22001100

. * tests for link function


. linktest
Iteration 0:

log pseudolikelihood = -3710552.7

Generalized linear models


Optimization
: ML
Deviance
Pearson

=
=

3.75653e+13
3.75653e+13

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Variance function: V(u) = 1


Link function
: g(u) = u

[Gaussian]
[Identity]

Log pseudolikelihood = -3710552.718

AIC
BIC

=
=
=
=
=

2955
2952
1.27e+10
1.27e+10
1.27e+10

=
=

2511.374
3.76e+13

-----------------------------------------------------------------------------|
Robust
y |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_hat |
89.4913
61.4918
1.46
0.146
-31.03041
210.013
_hatsq |
.4608997
.3709282
1.24
0.214
-.2661063
1.187906
_cons | -3435.796
2368.464
-1.45
0.147
-8077.901
1206.309
-----------------------------------------------------------------------------A
Annddrreew
wM
M JJoonneess,, 22001100

. * tests for variance


. * Park test
. gen e2=(y-yf)^2
. gen lnyf=log(yf)
. quietly glm e2 lnyf [pw=wt], link(log) family(gamma) nolog
vce(robust)
.
( 1)
.
( 1)
.
( 1)
.
( 1)

test lnyf==0
[e2]lnyf = 0
chi2( 1) =
Prob > chi2 =
test lnyf==1
[e2]lnyf = 1
chi2( 1) =
Prob > chi2 =
test lnyf==2
[e2]lnyf = 2
chi2( 1) =
Prob > chi2 =
test lnyf==3
[e2]lnyf = 3
chi2( 1) =
Prob > chi2 =

94.36
0.0000
14.68
0.0001
4.20
0.0404
62.91
0.0000

A
Annddrreew
wM
M JJoonneess,, 22001100

. *Pearson correlation between raw scale residual and y-hat


. pwcorr ey yfc [aw=wt], sig
|
ey
yfc
-------------+-----------------ey |
1.0000
|
|
yfc | -0.0175
1.0000
|
0.3430
|
. regress ey yfc [pw=wt]
(sum of wgt is
2.9550e+05)
Linear regression

Number of obs =
2955
F( 1, 2953) =
0.55
Prob > F
= 0.4592
R-squared
= 0.0003
Root MSE
=
11283
-----------------------------------------------------------------------------|
Robust
ey |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------yfc | -.0462205
.0624408
-0.74
0.459
-.1686524
.0762115
_cons |
312.2847
373.1673
0.84
0.403
-419.4096
1043.979
------------------------------------------------------------------------------

A
Annddrreew
wM
M JJoonneess,, 22001100

. * Copas test
. regress y yfv [pw=wt]
(sum of wgt is

2.9550e+05)

Linear regression

Number of obs
F( 1, 2953)
Prob > F
R-squared
Root MSE

=
=
=
=
=

2955
228.42
0.0000
0.1104
11312

-----------------------------------------------------------------------------|
Robust
y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------yfv |
.934187
.0618105
15.11
0.000
.8129911
1.055383
_cons |
452.4477
373.343
1.21
0.226
-279.5912
1184.487
-----------------------------------------------------------------------------. test yfv==1
( 1)

yfv = 1
F(

1, 2953) =
Prob > F =

1.13
0.2871

A
Annddrreew
wM
M JJoonneess,, 22001100

Extended Estimating Equations (EEE)


Proposed by Basu & Rathouz (2005, Biostatistics)
Combines a Box-Cox transformation for the link function:

With a general power function for the variance:

1 2

A
Annddrreew
wM
M JJoonneess,, 22001100

. * EXTENDED ESTIMATING EQUATIONS (EEE)


.
. summ y, meanonly
. gen scy = y/r(mean)
. global sc = r(mean)
. * EXTENDED ESTIMATING EQUATIONS (EEE)
. pglm scy $xs
Iter: 1 Max % Diff: 6.7972928 Rel Diff: 93.202707
Half step applied. Reset Max % Diff
Iter: 9 Max % Diff: .00028327 Rel Diff: .00062624
Iter: 10 Max % Diff: .00008658 Rel Diff: .00019668
Extended GEE with Power Variance Function
Optimization: Fisher's Scoring
Variance:
Link:
Std Errors:

No of obs
Residual df

=
=

2955
2944

(theta1*mu^theta2)
(mu^lambda - 1)/lambda
Robust

A
Annddrreew
wM
M JJoonneess,, 22001100

-----------------------------------------------------------------------------scy |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------scy
|
d |
.10745
.051777
2.08
0.038
.005969
.2089311
phylim |
.3380732
.068188
4.96
0.000
.2044273
.4717192
actlim |
.4636564
.0848272
5.47
0.000
.2973982
.6299147
totchr |
.258112
.0214944
12.01
0.000
.2159838
.3002403
age | -.0049493
.0040112
-1.23
0.217
-.012811
.0029124
female | -.1453583
.0495788
-2.93
0.003
-.2425309
-.0481857
income |
.000872
.0009348
0.93
0.351
-.0009602
.0027043
_cons | -.4452356
.3137634
-1.42
0.156
-1.060201
.1697292
-------------+---------------------------------------------------------------lambda
|
_cons |
.5626705
.1428813
3.94
0.000
.2826282
.8427127
-------------+---------------------------------------------------------------theta1
|
_cons |
2.139191
.1272745
16.81
0.000
1.889738
2.388645
-------------+---------------------------------------------------------------theta2
|
_cons |
1.665464
.1001817
16.62
0.000
1.469112
1.861817
-----------------------------------------------------------------------------.

A
Annddrreew
wM
M JJoonneess,, 22001100

. ** Incremental Effect of d
. pglmpredict ie if e(sample), ie(d) scale($sc) var(vie)
. summ ie
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------ie |
2955
750.4544
187.4197
431.6049
1375.846

.
.
.
.

global v2 = r(sd)^2/r(N)
** Analytical variance for ie
summ vie, meanonly
noi di "Std Err (IE(d)) = " sqrt(r(mean) + $v2)

Std Err (IE(d)) = 380.7772

.
.
.
.
.

* predict fitted values, linear index & residuals


pglmpredict yfc if e(sample), mu scale($sc)
predict xb if e(sample), xb
gen eee_yf=yfc if e(sample)
gen ey = y - yfc

A
Annddrreew
wM
M JJoonneess,, 22001100

. * Counterfactual predictions - attributable fraction and costs


. * attributable fraction
. summ y eee_yf y0 attf attc if d_raw==1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------y |
1748
7611.963
12358.83
3
125610
eee_yf |
1748
7597.822
4237.654
2095.638
27122.07
y0 |
1748
6847.585
4052.656
1663.242
25746.22
attf |
1748
.1163505
.0350095
.0507279
.2063313
attc |
1748
750.2375
187.6568
432.3958
1375.846

So

= 0.099

A
Annddrreew
wM
M JJoonneess,, 22001100

. *** Goodness of fit tests


. summ ey
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------ey |
2955
-3.782081
11275.93 -21874.07
112465.1
. pwcorr ey yfc, sig
|
ey
yfc
-------------+-----------------ey |
1.0000
|
|
yfc | -0.0073
1.0000
|
0.6902
|

A
Annddrreew
wM
M JJoonneess,, 22001100

. regress ey yfc
Source |
SS
df
MS
-------------+-----------------------------Model | 20206723.1
1 20206723.1
Residual | 3.7557e+11 2953
127182876
-------------+-----------------------------Total | 3.7559e+11 2954
127146662

Number of obs
F( 1, 2953)
Prob > F
R-squared
Adj R-squared
Root MSE

=
2955
=
0.16
= 0.6902
= 0.0001
= -0.0003
=
11278

-----------------------------------------------------------------------------ey |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------yfc | -.0198721
.0498551
-0.40
0.690
-.1176263
.0778821
_cons |
141.1651
418.6606
0.34
0.736
-679.731
962.0612
-----------------------------------------------------------------------------.

A
Annddrreew
wM
M JJoonneess,, 22001100

. * Copas test
. regress y yfv [pw=wt]
(sum of wgt is
2.9550e+05)
Linear regression

Number of obs
F( 1, 2953)
Prob > F
R-squared
Root MSE

=
=
=
=
=

2955
233.42
0.0000
0.1102
11313

-----------------------------------------------------------------------------|
Robust
y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------yfv |
.9552936
.0625276
15.28
0.000
.8326914
1.077896
_cons |
319.1755
374.5653
0.85
0.394
-415.2601
1053.611
-----------------------------------------------------------------------------. test yfv==1
( 1)

yfv = 1
F(

1, 2953) =
Prob > F =

0.51
0.4747

A
Annddrreew
wM
M JJoonneess,, 22001100

. xtile xbtile=xb, nq(10)


. tab xbtile, sum(ey)
10 |
quantiles |
Summary of ey
of xb |
Mean
Std. Dev.
Freq.
------------+-----------------------------------1 |
50.268219
3766.2822
296
2 |
122.53738
5463.176
295
3 | -420.53641
6594.8907
296
4 |
468.00157
8355.2167
295
5 | -148.62698
8390.4382
296
6 | -582.02662
10587.984
295
7 |
550.84835
12333.474
296
8 |
503.23263
14742.039
295
9 |
241.84485
15763.372
296
10 | -824.35603
17773.034
295
------------+-----------------------------------Total | -3.7820814
11275.933
2955

A
Annddrreew
wM
M JJoonneess,, 22001100

Вам также может понравиться