Вы находитесь на странице: 1из 6

Score Test of Proportionality Assumption for Cox Models

Xiao Chen, Statistical Consulting Group UCLA, Los Angeles, CA

ABSTRACT
Assessing the proportional hazards assumption is an important step to validate a Cox model for survival data. This
paper provides a macro program of a score test based on scaled Schoenfeld residuals using SAS PROC IML with
different choices of function forms of time variable. An example is presented to demonstrate the use of the score
test and graphical tools in assessing the proportionality assumption.

INTRODUCTION
Cox proportional-hazards regression models are used widely for analyzing survival data and a key assumption in the
Cox models is that the effect of any predictor variable is constant over time. There are two types of tests for
proportionality assumption. One type is the Wald test for individual predictors and the partial likelihood ratio test for
the global test. This can be performed using PROC PHREG in SAS by creating time varying covariates and using
the test statement. The other type is the test based on the scaled Schoenfeld residuals, which will be presented
here. In this case, testing the time dependent covariates is equivalent to testing for a non-zero slope in a generalized
linear regression of the scaled Schoenfeld residuals on functions of time. A non-zero slope is an indication of a
violation of the proportional hazard assumption. We can also perform an overall test on multiple predictor variables.
The common choices for functions of time include log, rank and Kaplan-Meier together with the identity function, all
of which have been included in the macro program, ph_score_test , presented here. As with any regression, it is
very helpful to graph the scaled Schoenfeld residuals against a time variable so we can visually inspect possible
patterns in addition to performing the tests of non-zero slopes. There are certain types on non-proportionality that
will not be detected by the tests of non-zero slopes alone but that might become obvious when looking at the graphs
of the residuals such as nonlinear relationship (i.e., a quadratic fit) between the residuals and the function of time or
undue influence of outliers.

SCORE TEST BASED ON SCALED SCHOENFELD RESIDUALS


Schoenfeld residuals after a Cox model are defined for each predictor variable in the model. That is to say that the
number of Schoenfeld residual variables is the same as the number of predictor variables. They are based on the
contributions of each of the predictor variable to the log partial likelihood. Grambsch and Therneau (1994) show that
scaled Schoenfeld residuals can be of a great use in diagnostics of Cox regression models, especially in assessing
the proportional hazards assumption. In theory, the scaled Schoenfeld residuals are Schoenfeld residuals adjusted
by the inverse of the covariance matrix of the Schoenfeld residuals. Grambsch and Therneau (1994) suggest that
under the assumption that that the distribution of the predictor variable is similar in the various risk sets, the
adjustment can be performed using the variance-covariance matrix of the parameter estimates divided by the
number of events in the sample.

The null hypothesis for the test on proportional hazards based on the scaled Schoenfeld residuals is that the slope of
Schoenfeld residuals against a function of time is zero for each predictor variable. Once the scaled Schoenfeld
residuals are created, we can perform this test using generalized linear regression approach. More precisely, the test
statistic on an individual predictor variable is

In this formula, rs is the variable of scaled Schoenfeld residuals, g(t) is the function of time predefined before the test,
δi is the indicator variable of event, Δ is the total number of events and Vuu is the estimate for the variance of the
parameter estimate of the predictor variable of interest. The sum is taken over all the observations in the data. It is
2
asymptotically distributed as a χ with 1 degree of freedom. The test statistic for the overall test on p predictor
variables is as follows.

2
where ri is the vector of the unscaled Schoenfeld residuals of interest. It has p degrees of freedom with asymptotically χ
distribution.

1
AN EXAMPLE

The data set used for this example is taken Applied Survival Analysis: Regression Modeling of Time to Event Data,
Chapter 6. The data set can be downloaded following the link. The time to event variable is lenfol and the censor
variable is fstat. The predictor variables that we will use for the example are age, bmi, hr (heart rate) and gender. In
this example, we will show how to manually create scaled Schoenfeld residuals and how to graphically inspect the
possible deviation from the assumption of proportional hazards.

We first run the Cox model using PROC PHREG. In this run, two data sets are created, the data set that contains the
variance-covariance matrix, named est created using the outset option and another data set containing the
Schoenfeld residuals for each predictor variable, named res, using the output statement.

proc phreg data = whas500 outest=est covout;


model lenfol*fstat(0) = age bmi hr gender;
id id;
output out=res ressch = age_r bmi_r hr_r gender_r;
run;

In order to create the scaled Schoenfeld residuals, we need to get the information on the total number of events. We
use proc sql to sum up the censor variable and store the information in a macro variable called total.

proc sql noprint;


select sum(fstat) into :total
from whas500;
quit;

Now we have all the information we need for adjusting the Schoenfeld residuals using proc iml.

proc iml;
use res;
read all variables {age_r bmi_r hr_r gender_r} into L
where (fstat = 1);
read all variables {lenfol fstat} into X
where (fstat = 1);
use est;
read all var {age bmi hr gender} into V
where (_type_ = "COV");
ssr = (&total)*L*V;
W = X || ssr;
create p var {lenfol fstat sage_r sbmi_r shr_r sgender_r};
append from W;
quit;

At this point, a data set called p has been created. This data set has the time variable, the censor variable and all the
scaled Schoenfeld residual variables. To visually inspecting the trend, we can also make use some nonparametric
smoothing technique such as provided by proc loess shown below for scaled Schoenfeld residual variable for the
predictor variable hr (heart rate). This process will have to be done repeatedly for each Schoenfeld residual variable
related to each predictor variable in the model. For the illustration purpose, we just show one.

proc loess data=ats.p;


model shr_r=lenfol /smooth=0.4;
ods output OutputStatistics=myout;

2
run;
quit;

Now we have done all the preparation for displaying the trend of scaled Schoenfeld
residual for heart rate against the original time variable, lenfol.

proc sort data = myout;


by lenfol;
run;
symbol1 c = gray i = none v = circle h=.8 ;
symbol2 c = black i = join v = none w=2.5;
axis1 order=(-.1 to .15 by .05) minor=none
label=(a=90 'Scaled Schonefeld Residuals') ;
axis2 order=(0 to 2400 by 400) label=('Time') minor=none;
proc gplot data = myout;
plot DepVar*lenfol=1 Pred*lenfol=2
/vaxis = axis1 haxis = axis2 vref=0 overlay;
run;
quit;

The plot does not show a strong trend along the original time variable, even though there is a slight sign of negative
slope by the loess estimate.

So far we have shown how to create the scaled Schoenfeld residuals from Schoenfeld residuals that SAS provided
via PROC IML. We can also apply the macro program phreg_score_test to perform the test as shown below.

%phreg_score_test(lenfol, fstat, age bmi hr gender, data=whas500);

3
The first column is the correlation of the scaled Schoenfeld residuals with the time variable. The second column is
the test statistic defined previously. The global test is to test simultaneously all the slopes are zero. All the p-values
are fairly large, indicating that the slopes are zero.

REMARK

Different common transformations of the time variable are available. These are rank, log and Kaplan-Meier estimate.
The default transformation of the macro program phreg_score_test is the identity function. To specify other type of
transformation of time, one can simply use the option “type=” as shown in the examples below. Even though, some
simulation has been done to show that the log transformed time variable works pretty well, there are other situations
where the behavior of the different time variables do differ. The decision on which time variable to use is case by
case, largely depending on the theory and focus of the researchers.

%phreg_score_test(lenfol, fstat, bmifp1 bmifp2, data=whas500);


%phreg_score_test(lenfol, fstat, bmifp1 bmifp2, data=whas500, type="rank");
%phreg_score_test(lenfol, fstat, bmifp1 bmifp2, data=whas500, type="logtime");
%phreg_score_test(lenfol, fstat, bmifp1 bmifp2, data=whas500, type="km");

We will include a segment of the macro program to show what is involved in the computation.

%macro phreg_score_test(time, event, xvars, strata,


weight=, data=_last_, type="time");

%let xvar_r =;
%let k = 1;
%let v = %scan(&xvars, 1);
%do %while ("&v"~="");
%let xvar_r = &xvar_r &v._r;
%let k = %eval(&k + 1);
%let v = %scan(&xvars, &k);
%end;
%let varnames = &time &xvars &xvar_r;

ods listing close;


proc phreg data=&data covout outest=_est_ (drop=_LNLIKE_);
model &time*&event(0) = &xvars;
strata &strata;
output out = _res_ (where = (&event=1)) ressch = &xvar_r;
run;

proc sort data = _res_;


by &time;
run;
/*counting the number of total events*/
proc sql noprint;
select sum(&event) into :delta
from &data;
quit;

ods listing;

4
proc iml;
reset noname printadv = 1;
use _res_;
read all variables {&xvar_r} into S;
use _tvars_;
read all variables {&time _logtime _Rtime s} into T;
c = ncol(S);
r = nrow(S);
use _est_;
read all var _num_ into V where (_TYPE_^="PARMS");
read all var {_name_} into N where (_TYPE_^="PARMS");

sv = J(r, c, 0);
sv = &delta*S*V;

%if (%upcase(&type)="TIME") %then %do;


gbar = sum(T[,1])/δ
top = J(c, 1, 0);
do i = 1 to c;
top[i] = sum((T[, 1]-gbar)#sv[, i])**2;
end;
bottom = J(c, 1, 1);
do i = 1 to c;
bottom[i] = &delta*t(T[,1]-gbar)*(T[,1]-gbar)*V[i,i];
end;
chi2 = top/bottom;
X = J(c+1, 4, 0);
print "Score test of proportional hazards assumption";
print "Time variable: &time";

ct = T[, 1] - sum(T[,1])/δ
norm_ct = sqrt(t(ct)*ct);
do i = 1 to c;
csv = sv[, i] - sum(sv[,i])/δ
n_csv = sqrt(t(csv)*csv);
X[i, 1] = t(ct)*csv/(norm_ct*n_csv); /*correlation*/
X[i, 2] = chi2[i]; /*cstat*/
X[i, 3] = 1;
X[i, 4] = 1- probchi(chi2[i], 1); /*probchi2*/
end; /* individual test*/
rname = N//"Global test";
cname={"Rho" "Chi-Square" "df" "P-value"};

rowmat = J(1,c,1);
a = (T[,1]-gbar)#S;
do i = 1 to c;
rowmat[i] =sum(a[, i]);
end;
global = &delta*(rowmat*V*t(rowmat))/(t(T[,1]-gbar)*(T[,1]-gbar));
probchi2=1-probchi(global,c);
X[c+1, 1] = .;
X[c+1, 2] = global;
X[c+1, 3] = c;
X[c+1, 4] = probchi2;
print x[rowname=rname colname=cname format=12.3];
%end;

CONCLUSION

This paper offers an implementation of the test on proportional hazards based on scaled Schoenfeld residuals. The
implementation uses PROC IML and is embedded in a macro program. It offers both test on individual predictors and
a global test on collectively all the variables of interest at once. It offers four different transformations of the time
variable. The macro program can be downloaded following the link. For more examples on using this macro
program, visit the textbook example page Chapter 6 of “Applied Survival Analysis” created by the Statistical
Consulting Group at UCLA.

5
REFERENCES
P. M. Grambsch, T. M. Therneau, Proportional hazards tests and diagnostics based on weighted residuals.
Biometrika, 81: 515-526, 1994
D. W. Hosmer, Jr., S. Lemeshow and S. May, Applied Survival Analysis: Regression Modeling of Time to Event
nd
Data, 2 Edition, 2008
T. M. Therneau, P. M. Grambsch, Modeling Survival Data Extending the Cox Model, Springer-Verlag, New York
2000

CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Xiao Chen
Statistical Consulting Group
UCLA Academic Technology Services
5308 Math Sciences Box 951557
Los Angeles, CA 90095
Work Phone: (310) 825-7431
Fax: (310) 206-7025
E-mail: xiao.chen@ucla.edu
Web: www.ats.ucla.edu/stat/

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.

Вам также может понравиться