Академический Документы
Профессиональный Документы
Культура Документы
Shlomo S. Sawilowsky
Editor
College of Education
Wayne State University
Harvey Keselman
Associate Editor
Department of Psychology
University of Manitoba
Bruno D. Zumbo
Associate Editor
Measurement, Evaluation, & Research Methodology
University of British Columbia
Vance W. Berger
Assistant Editor
Biometry Research Group
National Cancer Institute
John L. Cuzzocrea
Assistant Editor
Educational Research
University of Akron
Todd C. Headrick
Assistant Editor
Educational Psychology and Special Education
Southern Illinois University-Carbondale
Alan Klockars
Assistant Editor
Educational Psychology
University of Washington
iii
בס״ד
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November, 2009, Vol. 8, No. 2, 355-669 1538 – 9472/09/$95.00
Regular Articles
360 – 376 Suzanne R. Doyle Examples of Computing Power for Zero-
Inflated and Overdispersed Count Data
377 – 383 Daniel R. Thompson Assessing Trends: Monte Carlo Trials with
Four Different Regression Methods
384 – 395 Marie Ng, Level Robust Methods Based on the Least
Rand R. Wilcox Squares Regression Estimator
iv
בס״ד
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November, 2009, Vol. 8, No. 2, 355-669 1538 – 9472/09/$95.00
505 – 510 Paul Turner Multiple Search Paths and the General-To-
Specific Methodology
511 – 519 James F. Reed III Closed Form Confidence Intervals for Small
Sample Matched Proportions
566 – 570 Madhusudan Bhandary, Test for the Equality of the Number of Signals
Debasis Kundu
Brief Reports
592 – 593 Markus Neuhäeuser A Maximum Test for the Analysis of Ordered
Categorical Data
v
בס״ד
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November, 2009, Vol. 8, No. 2, 355-669 1538 – 9472/09/$95.00
Emerging Scholars
600 – 609 Lindsey J. Wolff Smith, Estimation of the Standardized Mean
S. Natasha Beretvas Difference for Repeated Measures Designs
vi
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November, 2009, Vol. 8, No. 2, 355-359 1538 – 9472/09/$95.00
INVITED ARTICLE
Analysis of MultiFactor Experimental Designs
Phillip Good
Information Research
Huntington Beach, CA.
In the one-factor case, Good and Lunneborg (2006) showed that the permutation test is superior to the
analysis of variance. In the multi-factor case, simulations reveal the reverse is true. The analysis of
variance is remarkably robust against departures from normality including instances in which data is
drawn from mixtures of normal distributions or from Weibull distributions. The traditional permutation
test based on all rearrangements of the data labels is not exact and is more powerful that the analysis of
variance only for 2xC designs or when there is only a single significant effect. Permutation tests restricted
to synchronized permutations are exact, but lack power.
Key words: analysis of variance, permutation tests, synchronized permutations, exact tests, robust tests,
two-way experimental designs.
355
ANALYSIS OF MULTIFACTOR EXPERIMENTAL DESIGNS
powerful than the analysis of variance when permutation set to synchronized permutations in
samples are taken from non-normal which, for example, an exchange between rows
distributions. For example, when the data in a in one column is duplicated by exchanges
four-sample, one-factor comparison are drawn between the same rows in all the other columns
from mixtures of normal distributions, 50% N(δ, so as to preserve the exchangeability of the
1) and 50% N(1+δ, 1), in an unbalanced design residuals.
with 2, 3, 3, and 4 observations per cell, the The purpose of this article is to compare
permutation test was more powerful at the 10% the power of ANOVA tests with those of
level, a power of 86% against a shift in means of permutation tests (synchronized and
two units compared to 65% for the analysis of unsynchronized) when applied to two-factor
variance. experimental designs.
Unfortunately, the permutation test for
interaction in a two-factor experimental design Methodology
based on the set of all possible rearrangements Observations were drawn from one of the
among the cells is not exact. The residual errors following three distributions:
are not exchangeable, nor are the p-values of 1. Normal.
such permutation tests for main effects and 2. Weibull, because such distributions arise in
interactions independent of one another. Here is reliability and survival analysis and cannot
why: be readily transformed to normal
Suppose the observations satisfy a linear distributions. A shape parameter of 1.5 was
model, Xijm = μ + si + rj + (sr)ij + εijm where the specified.
residual errors {εijm} are independent and 3. Contaminated normal, both because such
identically distributed. To test the hypothesis of mixtures of distributions are common in
no interaction, first eliminate row and column practice and because they cannot be readily
effects by subtracting the row and column means transformed to normal distributions. In line
from the original observations. That is, set with findings in an earlier article in this
series, Good and Lunneborg (2006), we
focused on the worst case distribution, a
mixture of 70% N(0, 1) and 30% N(2, 2)
observations.
where by adding the grand mean , ensure the Designs with the following effects were
overall sum will be zero. Recall that studied:
a)
+δ 0
or, in terms of the original linear model, that +δ 0
b)
+δ 0
However, this means that two residuals in the 0 +δ
same row such as X’ i11 and X’ i23 will be
correlated while residuals taken from different c)
rows and columns will not be. Thus, the +δ 0 … -δ
residuals are not exchangeable, a necessary
requirement for tests based on a permutation +δ 0 … - δ
distribution to be exact and independent of one d)
another (see, for example, Good, 2002).
An alternative approach, first advanced +δ 0 … 0 -δ
by Salmaso and later published by Pesarin -δ 0 … 0 +δ
(2001) and Good (2002), is to restrict the
356
GOOD
e) Results
Summary
0 +δ 0
In line with Jager’s (1980) theoretical
0 +δ 0 results, the analysis of variance (ANOV) applied
to RxC experimental designs was found to yield
0 +δ 0
almost exact tests even when data are drawn
f) from mixtures of normal populations or from a
Weibull distribution. This result holds whether
0 +δ 0
the design is balanced or unbalanced. Of course,
0 0 0 because the ANOVA tests for main effects and
interaction share a common denominator - the
+δ 0 0
within sum of squares - the resultant p-values
g) are positively correlated. Thus a real non-zero
main effect may be obscured by the presence of
+δ 0 0
a spuriously significant interaction.
0 +δ 0 Although tests based on synchronized
permutations are both exact and independent of
0 0 +δ
one another, there are so few synchronized
h) permutations with small samples that these tests
lack power. For example, in a 2x2 design with 3
0 +δ 0 0 -δ
observations per cell, there are only 9 distinct
0 +δ 0 0 –δ values of each of the test statistics.
Fortunately, tests based on the entire set
0 +δ 0 0 –δ
of permutations, unsynchronized as well as
0 +δ 0 0 –δ synchronized, prove to be almost exact.
Moreover, these permutation tests for main
i)
effects and interaction are negatively correlated.
1 δ 1 1 δ The result is an increase in power if only one
effect is present, but a loss in power if there are
1 δ 1 1δ
multiple effects. These permutation tests are
1 δ 1 1δ more powerful than ANOVA tests when the data
are drawn from mixtures of normal populations
1 δ 1 1 δ
or from a Weibull distribution. They are as
powerful, even with data drawn from normal
To compare the results of the three distributions, with samples of n ≥ 5 per cell.
methodologies, 1,000 data sets were generated at
random for each design and each alternative (δ 2xK Design
=0, 1, or 2). p-values for the permutation tests In a 2x2 design with 3 observations per
were obtained by Monte Carlo means using a cell, restricting the permutation distribution to
minimum of 400 random (synchronized or synchronized permutations means there are only
unsynchronized) permutations per data set. The 9 distinct values of each of the test statistics. The
alpha level was set at 10%. (The exception being resultant tests lack power as do the tests based
the 2x2 designs with 3 observations per cell on synchronized permutations for 2x5 designs
where the highly discrete nature of the with as many as five observations per cell. For
synchronized permutation distribution forced example, in a 2x4 design with four observations
adoption of an 11.2% level.) per cell, the synchronized permutation test had a
The simulations were programmed in R. power of 53% against a shift of two units when
Test results for the analysis of variance were the data were drawn from a contaminated
derived using the anres() function. R code for normal, while the power of the equivalent
the permutation tests and the data generators is ANOVA test was 61%. As a result of these
posted at: http://statcourse.com/AnovPower.txt.
357
ANALYSIS OF MULTIFACTOR EXPERIMENTAL DESIGNS
358
GOOD
Discussion References
Apart from 2xC designs, there appears to be David, F. N., & Johnson, N. L. (1951).
little advantage to performing alternatives to the The effect of non-normality on the power
standard analysis of variance. The permutation function of the f-test in the analysis of variance.
tests are more powerful if only a single effect is Biometrika, 38, 43-57.
present, but how often can this be guaranteed? Good, P. (2002). Extensions of the
Even with 2xC designs, the results reported here concept of exchangeability and their
will be of little practical value until and unless applications. Journal of Modern Applied
permutation methods are incorporated in Statistical Methods, 1, 243-247.
standard commercial packages. Wheeler Good, P. (2005). Permutation,
suggests in a personal communication that if a parametric, and bootstrap tests of hypotheses
package possesses a macro-language, a vector (3rd Ed.). NY: Springer.
permutation command and an ANOVA routine, Good, P., & Lunneborg, C. E. (2005).
a permutation test for the multi-factor design can Limitations of the analysis of variance. I: The
be readily assembled as follows: one-way design. Journal of Modern Applied
Statistical Methods, 5(1), 41-43.
1. Use the ANOVA command applied to the Jagers, P. (1980). Invariance in the
original data set to generate the sums of linear model—an argument for chi-square and f
squares used in the denominators of the tests in nonnormal situations. Mathematische
of the various effects. Operationsforschung und Statistik, 11, 455-464.
2. Set up a loop and perform the following Pesarin, F. (2001). Multivariate
steps repeatedly: permutation tests. NY: Wiley.
a. Rearrange the data. Salmaso, L. (2003). Synchronized
b. Use the ANOVA command applied to permutation tests in 2k factorial designs.
the rearranged data set to generate the Communications in Statistics - Theory and
sums of squares used in the Methods, 32, 1419-1438.
denominators of the tests of the
various effects.
c. Compare these sums with the sums for
the original data set.
3. Record the p-values as the percentage of
rearrangements in which the new sum
equaled or exceeded the value of the
original.
359
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 360-376 1538 – 9472/09/$95.00
REGULAR ARTICLES
Examples of Computing Power for Zero-Inflated and Overdispersed Count Data
Suzanne R. Doyle
University of Washington
Examples of zero-inflated Poisson and negative binomial regression models were used to demonstrate
conditional power estimation, utilizing the method of an expanded data set derived from probability
weights based on assumed regression parameter values. SAS code is provided to calculate power for
models with a binary or continuous covariate associated with zero-inflation.
Key words: Conditional power, Wald statistic, zero-inflation, over-dispersion, Poisson, negative
binomial.
360
DOYLE
With the ZIP and ZIP( τ ) models, the matrix of full row rank and h0 an (h x 1)
non-zero counts are assumed to demonstrate constant vector. The Wald test statistic is
equi-dispersion. However, if there is zero- −1
ˆ βˆ )H′] ( Hβˆ − h 0) (3)
W = ( Hβˆ −h )′[ H var(
0
inflation and non-zero counts are over-dispersed
in relation to the Poisson distribution, parameter
estimates will be biased and an alternative where β̂ contains unrestricted maximum
distribution, such as the zero-inflated negative likelihood estimates of β . Under H0, (3) is
binomial regression models, ZINB or ZINB( τ ), asymptotically distributed as central chi square
are more appropriate (Greene, 1994). Similar to 2
the zero-inflated Poisson models, ZINB allows with h degrees of freedom ( χh ).
for different covariates and ZINB( τ ) permits For power calculations, under HA, the
the same covariates between the logistic portion Wald test statistic is asymptotically distributed
2
for zero counts and the negative binomial as non-central χ h ,( η) , where the non-centrality
distribution for non-zero counts.
parameter η is defined as
In this study, the use of an expanded
data set and the method of calculating
−1
conditional power as presented by Lyles, et al. ˆ βˆ )H′] ( Hβ − h 0). (4)
η = (Hβ −h 0)′[H var(
(2007) is extended to include the ZIP, ZIP( τ ),
ZINB and ZINB( τ ) models. Examples allow for Creating an Expanded Data Set and Computing
the use of a binary or a normally-distributed Conditional Power
continuous covariate associated with the zero- To estimate the conditional power given
inflation. Simulations were conducted to assess assumed values of N, X and β , an expanded data
the accuracy of calculated power estimates and
set is first created by selecting a value of J for
example SAS software programs (SAS Institute,
the number of possible values of Y with non-
2004) are provided.
negligible probability for any specific xi, such
that
Methodology J J
Model and Hypothesis Testing wij = Pr(Y i = y i | Xi = xi ) ≈ 1 (5)
Following directly from Lyles, et al. j =1 j =1
361
POWER FOR ZERO-INFLATED & OVERDISPERSED COUNT DATA
362
DOYLE
363
POWER FOR ZERO-INFLATED & OVERDISPERSED COUNT DATA
defined in (10) beyond their 99% confidence where β0 = 0.6931, β1 = -0.3567, τ = 2 and 1,
boundaries. The selection decision to discard and x is a binary variable with an equal number
data sets from consideration did not depend on of cases coded 0 and 1. The regression
the values of the regression parameter of interest coefficients were based on the rate ratio. That is,
to be statistically tested. for the binary covariate x, from the rates of the
Simulation values presented are the two groups ( λ1 = 2 and λ 2 = 1.4 ), the regression
average regression coefficient and the average
standard error (calculated as the square root of coefficients are β0 = log λ1 and
the average error variance) out of the 1,000 β1 = log(λ 2) − log(λ1) . With this model, interest
generated data sets for the parameter estimates
is in testing H 0 : β1 = 0 versus H A : β1 ≠ 0 .
of each model. Simulation-based power was
calculated as the proportion of Wald tests found Model B-τ is defined as
statistically significant at α = .05 out of 1,000
randomly generated data sets under each specific logit( πi ) = γ 0 + γ1z
model considered. Appendices A through D and
provide SAS programming code to evaluate a log( λ i ) = β0 + β1z + β2x (15)
large sample simulation for distributional
characteristics, to construct an expanded data set
and to calculate power for models with a binary where β0 = 0.6931, β1 = -0.3567, β2 = -0.3567,
covariate or a normally-distributed continuous γ 0 = −τβ0 , γ1 = −τβ1 , τ = 2 and 1, and x and z
covariate related to the zero-inflation. are binary covariates with an equal number of
To calculate the expanded data set, it cases coded 0 and 1. In this particular example
was first necessary to choose the initial value of the regression coefficients for the logistic
J for each value of xi. This was done by portion of the model ( γ 0 and γ1 ) are both a
generating a large simulated data set (N =
100,000 for each binary or continuous covariate constant multiple of τ , although this is not a
in the model) based on the same parameter necessary requirement for the ZIP model. With
values of the model. To ensure that a reasonable this model, interest is in assessing H 0 : β2 = 0
threshold for the sum (e.g., > 0.9999) of the versus H A : β2 ≠ 0 .
weights in (12) and (13) would be obtained, the
The ZINB( τ ) and ZINB models
initial value of J was increased in one unit
consisted of the same parameter estimates as the
integer increments until the maximum likelihood
ZIP( τ ) and ZIP models (Model A-τ and Model
estimates for the parameters from maximizing
B-τ described above), but included two values of
the weighted log-likelihood equaled the assumed
parameter values of the regression model. The an extra scale parameter, κ = 0.75 and
large simulated data set also provided κ = 1.50. Sample sizes were based on obtaining
approximations to the population distributional conditional power estimates of approximately
characteristics of each model (mean, variance, .95 for the regression coefficient tested, with τ =
and frequencies of each value of the outcome 2 for the ZIP and ZIP( τ ) models, and for τ = 2
variable yi) and estimates of the percents of zero- and κ = 0.75 for the ZINB and ZINB( τ )
inflation. models. SAS code to evaluate a large sample
simulation for distributional characteristics, to
ZIP( τ ), ZIP, ZINB( τ ) and ZINB Models with a construct an expanded data set and to calculate
Binary Variable for Zero Inflation power, for models with a binary covariate
Model A-τ and Model B-τ, where τ = 2 related to the zero-inflation are presented in
and 1, are ZIP( τ ) and ZIP models, respectively. Appendices A and B, for the Poisson and
Model A-τ is defined as negative binomial regression models,
respectively.
logit( πi ) = −τβ0 − τβ1x and log( λ i ) = β0 + β1x
(14)
364
DOYLE
The outcomes for the ZIP models logistic portion of the model involving zero-
presented at the bottom of Table 1 show that inflation ( γ 0 and γ1 ), and in decreased values
with a sample size of N = 488, when τ = 2, for the scale or over-dispersion parameter κ than
there is approximately .95 power and 27.0% would be expected.
zero-inflation for testing H A : β2 ≠ 0 . Again, as
τ decreases, the calculated power is reduced to ZIP( τ ), ZIP, ZINB( τ ) and ZINB Models with a
approximately .90 and 37.5% estimated zero- Continuous Variable for Zero-Inflation
inflation. Model C-τ and Model D-τ, where τ = 2
and 1, are ZIP( τ ) and ZIP models, respectively.
ZINB( τ ) Models With A Binary Covariate Model C-τ is defined as
Associated With The Zero-Inflation
The results of the ZINB( τ ) models with logit( πi ) = −τβ0 − τβ1z and log( λ i ) = β0 + β1z
a binary covariate associated with the zero- (16)
inflation, presented at the top of Table 2,
indicate that with a sample size of N = 464, where β0 = 0.5000, β1 = -0.1500, τ = 2 and 1,
when κ = 0.75 and τ = 2, there is and z is a continuous variable distributed as
approximately .95 power and 27.0% zero- N(0,1). These are the same parameter estimates
inflation for testing H A : β1 ≠ 0 . As τ decreases of β0 and β1 used by Lyles, et al. (2007) with
to 1, the calculated power is reduced to their example of standard Poisson regression
approximately .80 and 37.5% estimated zero- with one continuous covariate. With this ZIP( τ )
inflation. When over-dispersion of the non-zero
model, interest is in assessing H 0 : β1 = 0 versus
counts increases ( κ = 1.50 ), power is reduced to
approximately .80 when τ = 2, and .59 when τ H A : β1 ≠ 0 . Model D-τ is defined as
= 1.
In most cases, the simulated (Sim.) logit( πi ) = γ 0 + γ1z and log( λ i ) = β0 + β1z + β2x
values and power estimates closely match the
(17)
calculated (Cal.) parameters, except for a slight
tendency for the simulated data to result in an
where β0 = 0.5000, β1 = -0.1500, β2 = -0.3000,
inflated average standard error ( σ τ ) associated
with parameter estimates for τ , and slightly γ 0 = −τβ0 , γ1 = −τβ1 , τ = 2 and 1, x is a binary
lower than expected values for the scale or over- variable with an equal number of cases coded 0
dispersion parameter κ . and 1, and z is a continuous variable distributed
365
POWER FOR ZERO-INFLATED & OVERDISPERSED COUNT DATA
Table 1: Parameter Estimates with a Binary Covariate for Zero-Inflation and Poisson Regression
ZIP( τ ) Models (N = 212)
Model A-2 Model A-1
Calculated Simulated Calculated Simulated
τ 2.0000 2.0622 1.0000 1.0390
στ 0.6169 0.7034 0.4286 0.4792
β0 0.6931 0.6962 0.6931 0.6944
σβ0 0.0891 0.0894 0.0990 0.0995
β1 -0.3567 -0.3565 -0.3567 -0.3535
σ β1 0.0989 0.1005 0.1256 0.1284
β1 Power .9502 .9450 .8106 .8080
Estimated Zero-Inflation
x=0 20.24% 33.70%
x=1 33.79% 41.59%
Total 27.02% 37.65%
366
DOYLE
367
POWER FOR ZERO-INFLATED & OVERDISPERSED COUNT DATA
368
DOYLE
369
POWER FOR ZERO-INFLATED & OVERDISPERSED COUNT DATA
as N(0,1). With this ZIP model, interest is in model fitted to an expanded data set, power was
testing H 0 : β2 = 0 versus H A : β2 ≠ 0 . estimated for the Wald statistic. Although not
The ZINB( τ ) and ZINB models presented here, this method can also be used to
approximate power based on the likelihood ratio
consisted of the same parameter estimates as the
test. Overall, with the same sample size and
ZIP( τ ) and ZIP models (Model C-τ and Model
parameter value of the estimate of interest to be
D-τ), but included an extra scale parameter,
tested with the Wald test statistic, results
κ = 0.75 and κ = 1.50 . SAS programming code
indicated a decrease in power as the percent of
to evaluate a large sample simulation for
zero-inflation and/or over-dispersion increased.
distributional characteristics, to construct an
This trend was particularly more noticeable for
expanded data set, and to calculate power, for
the ZIP( τ ) and ZINB( τ ) models. Calculated
models with a continuous covariate related to the
power estimates indicate if the percent of zero-
zero-inflation are presented in Appendices C and
inflation or over-dispersion is underestimated, a
D, for the Poisson and negative binomial
loss of assumed power in the statistical test will
regression models, respectively.
result.
The results of the ZIP( τ ) and ZIP
To estimate power for zero-inflated
models with a continuous covariate for zero-
count data it is necessary to select a value of τ
inflation are presented in Table 3. As before,
for the ZIP( τ ) and ZINB( τ ) models or values
when τ decreases, based on the same sample
of the regression coefficients associated with the
size and value of the regression coefficient
logistic portion in the ZIP and ZINB models
tested, the calculated power is reduced, and there
is also a slight tendency for the simulated data to (i.e., γ 0 and γ1 ) to produce the correct assumed
result in inflated average parameter estimates for proportion of zero-inflation. But in practice,
the standard error ( σ τ ) with the ZIP( τ ) models, these parameter values may be unknown or
and with inflated average parameter estimates of difficult to estimate. Generating a large
the standard errors for the logistic portion simulated data set iteratively until the expected
involving zero-inflation ( σ γ and σγ ) with the percent of zero-inflation occurs can aid the
0 1 researcher in obtaining approximations to the
ZIP models. population distributional characteristics of
The results of the ZINB( τ ) and ZINB model and estimation of the parameter values
models with a continuous covariate for zero- associated with zero-inflation can be improved.
inflation are presented in Table 4. Similar to the
results previously presented, based on the same References
sample size and value of the regression Cameron, A. C., & Trivedi, P. K. (2006).
coefficient tested, when τ decreases and/or when Regression analysis of count data. New York:
overdispersion of the non-zero counts increases, Cambridge University Press.
the calculated power is reduced There is a slight Greene, W. H. (1994). Accounting for excess
zeros and sample selection in Poisson and negative
tendency for simulated data to result in inflated
binomial regression models. Stern School of
average standard errors ( σ τ ) for the parameter Business, New York University, Dept. of Economics
estimates of τ with the ZINB( τ ) models, and Working Paper, No. EC-94-10.
with inflated average standard errors ( σ γ and Lambert, D. (1992). Zero-inflated Poisson
0
regression, with an application to defects in
σγ1 ) for the logistic portion involving zero- manufacturing. Technometrics, 34, 1-14.
Lord, D. (2006). Modeling motor vehicle
inflation ( γ 0 and γ1 ) with the ZINB models.
crashes using Poisson-gamma models: Examining the
effects of low sample mean values and small sample
Conclusion size on the estimation of the fixed dispersion
Examples of ZIP, ZIP( τ ), ZINB and ZINB( τ ) parameter. Accident Analysis and Prevention, 38,
models were used to extend the method of 751-766.
estimating conditional power presented by
Lyles, et al. (2007) to zero-inflated count data.
Utilizing the variance-covariance matrix of the
370
DOYLE
371
POWER FOR ZERO-INFLATED & OVERDISPERSED COUNT DATA
372
DOYLE
373
POWER FOR ZERO-INFLATED & OVERDISPERSED COUNT DATA
374
DOYLE
375
POWER FOR ZERO-INFLATED & OVERDISPERSED COUNT DATA
376
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 377-383 1538 – 9472/09/$95.00
Assessing Trends:
Monte Carlo Trials with Four Different Regression Methods
Daniel R. Thompson
Florida Department of Health
Ordinary Least Squares (OLS), Poisson, Negative Binomial, and Quasi-Poisson Regression methods were
assessed for testing the statistical significance of a trend by performing 10,000 simulations. The Poisson
method should be used when data follow a Poisson distribution. The other methods should be used when
data follow a normal distribution.
Key words: Monte Carlo, simulation, Ordinary least squares regression, Poisson regression, negative
binomial regression, Quasi-Poisson regression.
377
ASSESSING TRENDS: MONTE CARLO TRIALS WITH FOUR REGRESSION METHODS
378
THOMPSON
Table 1: Means and Standard Deviations from Four Different Sets of Numerators and Denominators
Taken From Florida Community Health Statistics 1996-2005
and, as described above, the four methods were test data sets became smaller. For the data set
used to test for significant trends. with the smallest numerators and denominators
(Table 5) the Poisson method indicated
Results significance in 8.24% of the simulations where
The tables below give the results of the Monte the null hypothesis was true, which is much
Carlo trials. In the simulations where the null closer to the desired alpha level of 5%.
hypothesis of no trend was true, the statistical In the results for the simulations where
tests using OLS, Quasipoisson and Negative the null hypothesis of no trend was false (Tables
Binomial regression methods performed well 2 through 5), three of the four methods
when the data were normally distributed and performed about equally, when the data were
also when the data followed a Poisson normally distributed. In contrast, the Poisson
distribution. regression method detected the trend in larger
As expected, with an alpha level of 0.05, proportions of the trails. For example, in Table 2
approximately 2.5% of the trials reached for the simulations with the normally distributed
statistically high significance and approximately data, where the null hypothesis was false, the
2.5% reached statistically low significance Poisson method detected a significant trend in
(Tables 2 through 5). In contrast, the Poisson 68.66% of the simulations. The other 3 methods
regression method performed well only when all detected a significant trend in about 8% of
the numerators and denominators followed the the simulations.
Poisson distribution. In simulations where the Based on these data, it appears the
data followed a Normal distribution, and the null Poisson method is more likely to detect a trend
hypothesis of no trend was true, the Poisson when the null hypothesis of no trend is false,
regression method indicated statistical but, as shown in tables 2 through 5, the Poisson
significance in far more than 5% of the method is also more likely to detect a trend
simulations (Tables 2 through 5). The results for when the null hypothesis of no trend is true. In
the Poisson method were better for the smaller short, with normally distributed data, the
data sets. Poisson method is more likely to detect a trend
For example, in the test data set with the when a trend is present and also when a trend is
largest numerators and denominators (Table 2) not present.
the Poisson method indicated a significant trend When the data followed a Poisson
in almost 90% (45.57% significantly low plus distribution, and the null hypothesis of no trend
44.24% significantly high) of the simulations was false, the Poisson method was more likely
where the null hypothesis of no trend was true, to detect a significant tend compared to the other
while the other 3 methods indicated a significant 3 methods. For example, in Table 3, in the
trend in a proportion close to the alpha level of simulations where the null hypothesis is false,
5%. The Poisson method performed better as the the Poisson method detected a trend in 94.04%
size of the numerators and denominators in the of the Poisson simulations, while the other 3
379
ASSESSING TRENDS: MONTE CARLO TRIALS WITH FOUR REGRESSION METHODS
Table 2: Results of 10,000 Simulations of Florida Injury Mortality Rate Trends by Statistical Method and
Distribution* Characteristics
Test Percent Percent Percent
Data Null Hypothesis: Significantly Not Significantly
Method Distribution No Trend** Low Significant High
* Simulated 10 years of Florida injury mortaltiy rates with randomly generated numerators
at mean 10,293 and denominators at mean 16,275,762. For the random normal data,
the standard deviations were 1,311 for the numerators and 1,119,822 for the denominators.
For the random Poisson data, the standard deviations were the square roots of the means.
** Where Null Hypothesis of no trend = FALSE, average trend = 0.01 increase per year
380
THOMPSON
Table 3: Results of 10,000 Simulations of Florida Infant Mortality Rate Trends by Statistical
Method and Distribution* Characteristics
Test Percent Percent Percent
Data Null Hypothesis: Significantly Not Significantly
Method Distribution No Trend** Low Significant High
* Simulated 10 years of Florida infant death rates with randomly generated numerators
at mean 1,483 and denominators at mean 204,609. For the random normal data,
the standard deviations were 87.8 for the numerators and 11,707 for the denominators.
For the random Poisson data, the standard deviations were the square roots of the means
** Where Null Hypothesis of no trend = FALSE, average trend = 0.01 increase per year
Table 4: Results of 10,000 Simulations of Low Birth Weight Rate Trends for a Florida County
by Statistical Method and Distribution* Characteristics
Test Percent Percent Percent
Data Null Hypothesis: Significantly Not Significantly
Method Distribution No Trend** Low Significant High
* Simulated 10 years of one Florida county low birth weight rates with randomly generated numerators
at mean 274.1 and denominators at mean 2983.6. For the random normal data,
the standard deviations were 27.12 for the numerators and 117.04 for the denominators.
For the random Poisson data, the standard deviations were the square roots of the means.
** Where Null Hypothesis of no trend = FALSE, average trend = 0.01 increase per year
381
ASSESSING TRENDS: MONTE CARLO TRIALS WITH FOUR REGRESSION METHODS
than the other three methods. When the test data One of the defining characteristics of the
followed a Poisson distribution and the null Poisson distribution is the mean is equal to the
hypothesis of no trend was true, the Poisson variance. In situations where the variance
regression method performed as well as the exceeds the mean (this is referred to as over-
other three methods. However, in the dispersion), Poisson regression will tend to
simulations where the null hypothesis of no underestimate the variance and thereby increase
trend was true and the data followed a normal the probability that random results are deemed
distribution, the Poisson regression method was statistically significant.
far too likely to result in statistical significance, Based on the results of this analysis, one
while the other three methods resulted in recommendation is data should be examined to
statistical significance in proportions close to the assess whether it follows a Poisson distribution,
alpha level of 0.05. In summary, the Poisson and the Poisson regression method should be
method performed as well or better than the used only when this condition is met. In
other methods when the simulated data followed practical terms, when using the Poisson
a Poisson distribution but did not perform as regression method, the mean should be
well as the other methods when the simulated approximately equal to the variance. When this
data followed a normal distribution. is not the case, it would probably be better to use
Table 5: Results of 10,000 Simulations of Infant Mortality Trends for a Florida County by
Statistical Method and Distribution* Characteristics
Test Percent Percent Percent
Data Null Hypothesis: Significantly Not Significantly
Method Distribution No Trend** Low Significant High
* Simulated 10 years of one Florida county infant mortaltiy rates with randomly generated numerators
at mean 27.5 and denominators at mean 2983.6. For the random normal data,
the standard deviations were 5.82 for the numerators and 117.04 for the denominators.
For the random Poisson data, the standard deviations were the square roots of the means.
** Where Null Hypothesis of no trend = FALSE, average trend = 0.01 increase per year
382
THOMPSON
383
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 384-395 1538 – 9472/09/$95.00
Heteroscedastic consistent covariance matrix (HCCM) estimators provide ways for testing hypotheses
about regression coefficients under heteroscedasticity. Recent studies have found that methods combining
the HCCM-based test statistic with the wild bootstrap consistently perform better than non-bootstrap
HCCM-based methods (Davidson & Flachaire, 2008; Flachaire, 2005; Godfrey, 2006). This finding is
more closely examined by considering a broader range of situations which were not included in any of the
previous studies. In addition, the latest version of HCCM, HC5 (Cribari-Neto, et al., 2007), is evaluated.
384
NG & WILCOX
and an unbiased estimate of the variance of continuous uniform distribution: Uniform(-1, 1).
coefficients even under heteroscedasticity The former approach has been widely
(White, 1980; MacKinnon & White, 1985). considered (Liu, 1988; Davidson & Flachaire,
Among all the HCCM estimators, HC4 2000; Godfrey, 2006) and was found to work
was found to perform fairly well with small well in various multiple regression situations. Of
samples (Long & Ervin, 2000; Cribari-Neto, interest is how these two approaches compare
2004). Recently, Cribari-Neto, et al. (2007) when testing (2) in simple regression models.
introduced a new version of HCCM (HC5) Situations were identified where wild bootstrap
arguing that HC5 is better than HC4, particularly methods were unsatisfactory.
at handling high leverage points in X. However,
in their simulations, they only focused on Methodology
models with ε ∼ N(0, 1). Moreover, only a HC5-Based Quasi-T Test (HC5-T)
limited number of distributions of X and patterns The HC5 quasi-t statistic is based on the
of heteroscedasticity were considered. In this standard error estimator HC5, which is given by
study, the performances of HC5-based and HC4- V = X′X X′diag X X′X ,
based quasi-t statistics were compared by ( ) ( )α
looking at a broader range of situations. (3)
Methods combining an HCCM-based where r , i = 1, ..., n are the usual residuals, X is
test statistic with the wild bootstrap method the design matrix,
perform considerably better than non-bootstrap kh max
h ii
asymptotic approaches (Davidson & Flachaire, αi = min{ −
, max 4, − }
2008; Flachaire, 2005). In a recent study, h h
Godfrey (2006) compared several non-bootstrap (4)
and wild bootstrap HCCM-based methods for nh ii nkh
testing multiple coefficients (H : β = . . . = = min{ n , max 4, max
}
i=1 ii i =1 ii
n
h h
β = 0). It was found that when testing at the α
= 0.05 level, the wild bootstrap methods and
h ii = x i ( X ' X ) x i ′
−1
generally provided better control over the (5)
probability of a Type I error than the non-
bootstrap asymptotic methods. However, in the where x is the ith row of X, h =
studies mentioned, the wild bootstrap and non- max{h , . . . , h } and k is set at 0.7 as
bootstrap methods were evaluated in a limited suggested by Cribari-Neto, et al. (2007, 2008).
set of simulation scenarios. The motivation behind HC5 is that when high
In Godfrey’s study, data were drawn leverage observations are present in X, the
from a data set in Greene (2003) and only two standard error of the coefficients are often
heteroscedastic conditions were considered. underestimated. HC5 attempts to correct such a
Here, more extensive simulations were bias by taking into account the maximal
performed to investigate the performance of the leverage.
various bootstrap and non-bootstrap HCCM- For testing (2), the quasi-t test statistic
based methods. More patterns of is,
heteroscedasticity were considered, as well as
T=βˆ 1 − 0 / V (6)
22
more types of distributions for both X and ε.
where V is the 2nd entry along the diagonal of
Small sample performance of one non-bootstrap
and two wild bootstrap versions of HC5-based V. Reject (2) if |T| ≥ t α/ where t α/ is the
and HC4-based quasi-t methods were evaluated. 1−α/2 quantile of the Student’s t-distribution
Finally, two variations of the wild with n − 2 degrees of freedom.
bootstrap method were compared when
generating bootstrap samples. One approach HC4-Based Quasi-T Test (HC4-T)
makes use of the lattice distribution. Another The HC4 quasi-t statistics is similar to
approach makes use of a standardized HC5-T, except the standard error is estimated
using HC4, which is given by
385
LEVEL ROBUST METHODS BASED ON LEAST SQUARES REGRESSION ESTIMATOR
Simulation Design
ri2
V = ( X X ) X'diag X ( X ' X ) (7)
−1 −1
′
δi
Data are generated from the model:
(1 − h ii ) Yi = Xiβ1 + τ ( Xi ) ε i (11)
where where is a function of X used to model
hii nhii
δ i = min{4, − } = min{4, } (8) heteroscedasticity. Data are generated from a g-
i=1hii
n
h and-h distribution. Let Z be a random variable
generated from a standard normal distribution,
HC5-Based Wild Bootstrap Quasi-T Test exp ( gZ ) − 1 2
X= exp(hZ / 2) (12)
(HC5WB-D and HC5WB-C) g
The test statistic for testing (2) is
computed using the following steps: has a g-and-h distribution. When g = 0, this last
equation is taken to be X = Zexp (hZ ⁄2).
1. Compute the HC5 quasi-t test statistics (T) When g = 0 and h = 0, X has a standard normal
given by (6). distribution. Skewness and heavy-tailedness of
the g-and-h distributions are determined by the
2. Construct a bootstrap sample Y ∗ = β +
values of g and h, respectively. As the value of g
β X + a r , i = 1, ..., n, where a is increases, the distribution becomes more
typically generated in one of two ways. skewed. As the value of h increases, the
The first generates a from a two-point distribution becomes more heavy-tailed. Four
(lattice) distribution: types of distributions are considered for X:
−1, with probability 0.5 standard normal (g = 0, h = 0), asymmetric light-
ai =
1, with probability 0.5 tailed (g = 0.5, h = 0), symmetric heavy-tailed (g
= 0, h = 0.5) and asymmetric heavy-tailed (g =
0.5, h = 0.5). The error term (ε ) is also
The other method uses
randomly generated based on one of these four
a i = 12(Ui − 0.5) g-and-h distributions. When g-and-h
where U is generated from a uniform distributions are asymmetric (g = 0.5), the mean
distribution on the unit interval. We is not zero. Therefore, ε ’s generated from these
denote the method based on the first distributions are re-centered to have a mean of
approach HC5WB-D and the method zero.
based on the latter approach HC5WB-C. Five choices for τ(X ) are considered:
τ ( Xi ) = 1 , τ ( Xi ) = | Xi | , τ ( Xi ) =
3. Compute the quasi-t test statistics (T ∗ )
2
based on this bootstrap sample, yielding 1+ , and τ ( Xi ) = Xi + 1 . These
βˆ 1* − 0 Xi + 1
*
T = (9) functions are denoted as variance patterns (VP),
V *
22 VP1, VP2, VP3, VP4 and VP5, respectively.
4. Repeat Steps 2 - 3 B times yielding T ∗ , b Homoscedasticity is represented by τ ( Xi ) = 1.
= 1, ..., B. In the current study, B = 599.
5. Finally, a p-value is computed: Moreover, τ ( X i ) = | X i |, τ ( Xi ) =
*
#{ T ≥ T } 2
p=
b
(10) 1+ , and τ ( Xi ) = Xi + 1 represent a
B Xi + 1
6. Reject H if ≤ α. particular pattern of variability in Y based upon
HC4-Based Wild Bootstrap Quasi-T Test the value of X . All possible pairs of X and ε
(HC4WB-D and HC4WB-C) distributions are considered, resulting in a total
The procedure for testing (2) is the same of 16 sets of distributions. All five variance
as that of HC5WB-D and HC5W-C except that patterns are used for each set of distributions.
HC4 is used to estimate the standard error. Hence, a total of 80 simulated conditions are
386
NG & WILCOX
considered. The estimated probability of a Type Figures 1 and 4, when testing at α = 0.05, under
I error is based on 1,000 replications with a VP 1 and 4, the bootstrap methods outperform
sample size of n = 20 and when testing at α = the non-bootstrap methods. Specifically, the
0.05 and α = 0.01. According to Robey and non-bootstrap methods tended to be too
Barcikowski (1992), 1,000 replications are conservative under those conditions.
sufficient from a power point of view. If the Nonetheless, under VP 3 and 5 (see Figures 3
hypothesis that the actual Type I error rate is and 5), the non-bootstrap methods, in general,
0.05 is tested, and power should be 0.9 when performed better than the bootstrap methods. In
testing at the 0.05 level and the true α value particular, the actual Type I error rates yielded
differs from 0.05 by 0.025, then 976 replications by the bootstrap methods in those situations
are required. The actual Type I error probability tended to be noticeably higher than the nominal
is estimated with α, the proportion of p-values level. In one situation, the actual Type I error
less than or equal to 0.05 and 0.01. rate was as high as 0.196. When testing at α =
0.01, HC5WB-C and HC4WB-C offered the
Results best performance in general; however, situations
First, when testing at both α = 0.05 and α = 0.01, were found where non-bootstrap methods
the performances of HC5-T and HC4-T are outperform bootstrap methods.
extremely similar in terms of control over the Finally, regarding the use of the
probability of a Type I error (See Tables 1 and continuous uniform distribution versus the
2). When testing at α = 0.05, the average Type I lattice distribution for generating bootstrap
error rate was 0.038 (SD = 0.022) for HC5-T samples, results suggest that the former has
and 0.040 (SD = 0.022) for HC4-T. When slight practical advantages. When testing at α =
testing at α = 0.01, the average Type I error rate 0.05, the average Type I error rates yielded by
was 0.015 (SD = 0.013) for HC5-T and 0.016 the two approaches are 0.059 for HC5WB-C and
(SD = 0.013) for HC4-T. HC4WB-C and 0.060 for HC5WB-D and
Theoretically, when leverage points are HC4WBD. When testing at α = 0.01, the
likely to occur (i.e. when X is generated from a average Type I error rates are 0.015 for
distribution with h = 0.5), HC5-T should HC5WB-C and HC4WB-C and 0.021 for
perform better than HC4-T; however, as shown HC5WB-D and HC4WB-D. Overall, the actual
in Table 1, this is not the case. On the other Type I error rates yielded by HC5WB-C and
hand, when leverage points are relatively HC4WB-C appear to deviate from the nominal
unlikely (i.e., when X is generated from a level in fewer cases.
distribution with h = 0), HC5-T and HC4-T
should yield the same outcomes. As indicated by Conclusion
the results of this study, when X is normally This study expanded on extant simulations by
distributed (g = 0 and h = 0), the actual Type I considering ranges of non-normality and
error rates resulting from the two methods are heteroscedasticity that had not been considered
identical. However, when X has a skewed light- previously. The performance of the latest
tailed distribution (g = 0.5 and h = 0), HC5-T HCCM estimator (HC5) was also closely
and HC4-T do not always yield the same results. considered. The non-bootstrap HC5-based and
Focus was placed on a few situations where HC4-based quasi-t methods (HC5-T and HC4-T)
HC4-T is unsatisfactory, and we considered the were compared, as well as their wild bootstrap
extent it improves as the sample size increases. counterparts (HC5WB-D, HC5WB-C, HC4WB-
We considered sample sizes of 30, 50 and 100. D and HC4WB-C). Furthermore, two wild
As shown in Table 3, control over the bootstrap sampling schemes were evaluated -
probability of a Type I error does not improve one based on the lattice distribution; the other
markedly with increased sample sizes. based on the continuous standardized uniform
Second, with respect to the non- distribution.
bootstrap and bootstrap methods, results suggest As opposed to the findings of Cribari-
that the bootstrap methods are not necessarily Neto, et al. (2007), results here suggest that HC5
superior to the non-bootstrap ones. As shown in does not offer striking advantages over HC4.
387
LEVEL ROBUST METHODS BASED ON LEAST SQUARES REGRESSION ESTIMATOR
Both HC5-T and HC4-T perform similarly Cribari-Neto, F., Souza, T. C., &
across all the situations considered. In many Vasconcellos, A. L. P. (2007). Inference under
cases, HC5-T appears more conservative than heteroskedasticity and leveraged data.
HC4-T. One concern is that, for the situations at Communication in Statistics -Theory and
hand, setting k = 0.7 when calculating HC5 may Methods, 36, 1877-1888.
not be ideal; thus, whether changing the value of Cribari-Neto, F., Souza, T. C., &
k might improve the performance of HC5-T was Vasconcellos, A. L. P. (2008). Errata: Inference
examined. As suggested by Cribari-Neto, et al., under heteroskedasticity and leveraged data.
values of k between 0.6 and 0.8 generally Communication in Statistics -Theory and
yielded desirable results, for this reason k = 0.6 Methods, 36, 1877-1888. Communication in
and k = 0.8 were considered. However, as Statistics -Theory and Methods, 37, 3329-3330.
indicated in Tables 4 and 5, regardless of the Davidson, R., & Flachaire, E. (2008).
value of k, no noticeable difference was The wild bootstrap, tamed at last. Journal of
identified between the methods. Econometrics, 146, 162-169.
Moreover, contrary to both Davidson Flachaire, E. (2005). Bootstrapping
and Flachaire (2008) and Godfrey’s (2006) heteroskedastic regression models: wild
findings, when testing the hypothesis H : β = 0 bootstrap vs. pairs bootstrap. Computational
in a simple regression model, the wild bootstrap Statistics & Data Analysis, 49, 361-376.
methods (HC5WB-D, HC5WB-C, HC4WB-D Greene, W. H. (2003). Econometric
and HC4WB-C) do not always outperform the analysis (5th Ed.). New Jersey: Prentice Hall.
non-bootstrap methods (HC5-T and HC4-T). By Godfrey, L. G. (2006). Tests for
considering a wider range of situations, specific regression models with heteroscedasticity of
circumstances where the non-bootstrap methods unknown form. Computational Statistics & Data
outperform the wild bootstrap methods are able Analysis, 50, 2715-2733.
to be identified and vice versa. In particular, the Liu, R. Y. (1988). Bootstrap procedures
non-bootstrap and wild bootstrap approaches are under some non i.i.d. models. The Annuals of
each sensitive to different patterns of Statistics, 16, 1696-1708.
heteroscedasticity. Long, J. S., & Ervin, L. H. (2000).
For example, the wild bootstrap Using heteroscedasticity consistent standard
methods generally performed better than the errors in the linear regression model. The
non-bootstrap methods under VP 1 and 4 American Statistician, 54, 217-224.
whereas the non-bootstrap methods generally MacKinnon, J. G., & White, H. (1985).
performed better than the wild bootstrap Some heteroscedasticity consistent covariance
methods under VP 3 and 5. Situations also exist matrix estimators with improved finite sample
(1988), Davidson and Flachaire (2008) and properties. Journal of Econometrics, 29, 305-
Godfrey (2006). The actual Type I error rates 325.
resulting from the methods HC5WB-C and Robey, R. R. & Barcikowski, R. S.
HC4WB-C were generally less variable (1992). Type I error and the number of iterations
compared to those resulting from HC5WB-D in Monte Carlo studies of robustness. British
and HC4WB-D. In many cases, the Journal of Mathematical and Statistical
performances between the two approaches are Psychology, 45, 283–288.
similar, but in certain situations such as in VP3, Weisberg, S. (1980). Applied Linear
HC5WB-C and HC4WB-C notably Regression. NY: Wiley.
outperformed HC5WB-D and HC4WB-D. White, H. (1980). A heteroskedastic-
consistent covariance matrix estimator and a
References direct test of heteroscedasticity. Econometrica,
Cribari-Neto, F. (2004). Asymptotic 48, 817-838.
inference under heteroscedasticity of unknown
form. Computational Statistics & Data Analysis,
45, 215-233.
388
NG & WILCOX
389
LEVEL ROBUST METHODS BASED ON LEAST SQUARES REGRESSION ESTIMATOR
390
NG & WILCOX
391
LEVEL ROBUST METHODS BASED ON LEAST SQUARES REGRESSION ESTIMATOR
392
NG & WILCOX
Table 3: Actual Type I Error Rates when Testing at α = 0.05 with Sample Sizes
30, 50 and 100 for HC4-T
X e
VP n = 30 n = 50 n = 100
g h g h
0 0.5 0 0.5 1 0.022 0.021 0.018
0.5 0.5 0.5 0.5 1 0.012 0.019 0.020
0 0.5 0 0.5 4 0.014 0.007 0.011
0 0.5 0.5 0.5 4 0.011 0.009 0.023
0 0 0.5 0.5 5 0.118 0.123 0.143
0.5 0 0.5 0 5 0.093 0.070 0.078
0.5 0 0.5 0.5 5 0.190 0.181 0.174
Figure 1: Actual Type I Error Rates for VP1 when Testing at α = 0.05
The solid horizontal line indicates α = 0.05, the dashed lines indicate the upper and
lower confidence limits for α, (0.037, 0.065).
393
LEVEL ROBUST METHODS BASED ON LEAST SQUARES REGRESSION ESTIMATOR
Figure 2: Actual Type I Error Rates Under VP2 Figure 3: Actual Type I Error Rates Under VP3
Figure 4: Actual Type I error rates under VP4 Figure 5: Actual Type I error rates under VP5
394
NG & WILCOX
Table 4: Actual Type I Error Rates when Testing at α = 0.05, k = 0.6 for HC5-T
X e
VP HC5-T HC4-T
g h g h
0 0.5 0.5 0.5 4 0.013 0.013
0.5 0.5 0 0.5 4 0.015 0.015
0 0 0.5 0.5 5 0.106 0.106
0.5 0 0.5 0.5 5 0.135 0.136
Table 5: Actual Type I Error Rates when Testing at α = 0.05, k = 0.8 for HC5-T
X e
VP HC5-T HC4-T
g h g h
0 0.5 0.5 0.5 4 0.005 0.006
0.5 0.5 0 0.5 4 0.007 0.007
0 0 0.5 0.5 5 0.106 0.106
0.5 0 0.5 0.5 5 0.128 0.131
395
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 396-408 1538 – 9472/09/$95.00
Vassili F. Pastushenko
Johannes Kepler University of Linz, Austria
The empirical distribution function (ecdf) is unbiased in the usual sense, but shows certain order bias.
Pyke suggested discrete ecdf using expectations of order statistics. Piecewise constant optimal ecdf saves
200%/N of sample size N. Results are compared with linear interpolation for U(0, 1), which require up to
sixfold shorter samples at the same accuracy.
396
PASTUSHENKO
sometimes called parameter, taking any value of measure 1/N to each sample element. As a
the principally possible X-values result, any sample is represented as a set of
measure 1, whereas in reality it represents a set
1 N of measure zero, which is obvious for the main
F* ( x, X ) =
N
H ( x −X
n =1
n ) (6) class of continuous distributions, and
discontinuous distributions can be considered as
a limit of continuous ones. This contradiction is
H(t) is Heaviside unit step function, H = 1 for especially strongly expressed in eq.(6), where
t≥0, otherwise H=0. Function (6) as a piecewise measure 1 is attributed to every single-point
constant approximation of F(x) takes N+1 values sample on the right-hand side, which should
(levels) equal to (0:N)/N in N+1 x-intervals mean that every sample element is a
between and outside of N sample values. F* is deterministic, not a stochastic item.
continuous from the right, although Pugachev As indicated by Pyke (1959), a more
(1984) suggested that the continuity from the left reasonable approach should consider a sample as
would be more reasonable. A centrally a set of measure zero, which delimits N+1
symmetrical version could be a compromise (H nonzero-measure intervals on the x-axis. This is
= 0.5 at t = 0). Middle points between adjacent consistent with the point of view that the
F* levels are sampling procedure represents mapping of N
random values of parent F(x) to the x-axis. A
m = (n-0.5)/N, n = 1:N (7) single random F-value is uniformly distributed
in (0, 1), i.e., F ∈ U(0, 1) . Each of the F-values
For convenience, an example of F* is shown for mapped into the sample values is selected
N = 3, Figure 1 A. Expected Fn-values (En, c.f. independently. However, these values finally
next section), shown by circles, are different appear on the F-axis as an ordered sequence, so
from m. that the neighbouring elements of the sequence
Eq. (6) is constructed as an arithmetic are no longer independent. Order statistics F1,
mean of N ecdf, each corresponding to a 1-point …, FN have their own distributions. Therefore,
sample, an optimal ecdf must use this information.
N
1 Probability densities for random u ∈ U(0,1), u =
F* ( x, X ) =
N
F ( x, X
n =1
* n ) (8)
Fn, are c.f. Gibbons & Chakraborti (2003),
Durbin (1973), Pyke (1959):
This shows that E[F*(x, X)] = F(x) for any N,
where E[…] denotes the mathematical N!
f N ,n (u ) = u n −1 (1 − u ) N −n ,
expectation, because this expectation (n − 1)! ( N − n )!
corresponds to F* for an infinitely long sample. n = 1:N. (10)
In other words, for any fixed-in-advance x-value
F*(x, X) represents an unbiased estimation of The first two moments of these distributions, or
F(x). The name empirical reflects the similarity expected values and variances of Fn, denoted En
between F*(x, X), which gives the proportion of and Vn respectively, are (c.f. Gibbons &
sample elements r satisfying r ≤ x, and F(x) = Chakraborti (2003)):
Prob(r≤x), r being a single random number.
However, this similarity contains an arbitrary 1
n
assumption. Indeed, differentiation of (8) with En = E[ Fn ] = xf N ,n ( x)dx = , n = 1:N (11)
respect to x gives empirical p.d. f. f*(x, X), Feller 0
N +1
(1971) and
N
1
f* ( x ) =
N
δ (x − X
n =1
n ), (9) 1
Vn = ( x − En )2 f N ,n ( x)dx = n( N +2 1− n) (12)
0 ( N +1) ( N + 2)
δ(x) being the Dirac delta. As can be seen from = En (1− En ) , n = 1:N
N +2
this expression, ecdf (8) attributes probability
397
LEAST ERROR SAMPLE DISTRIBUTION FUNCTION
P(x,X)
0.5 0.5
0 0
X0 X1 X2 X3 X4 X0 X1 X2 X3 X4
N=3 N=3
1 1
C D
C(x,X)
S(x,X)
0.5 0.5
0 0
X0 X1 X2 X3 X4 X0 X1 X2 X3 X4
As a research tool, F* is expected to optimally Weibull (1939) and Gumbel (1939). These
reproduce parent F(x). However, there are some suggestions were partly considered for
discrepancies with predictions of order statistics applications using ecdf values only at x = X,
(11-12). It follows from (6) that E[F*(Xn, X) ] = such as two-sided Kolmogorov-Smirnov test,
E[n/N] = n/N, n=1:N, whereas the correct Durbin (1968). However, any generalization for
expectation (11) is different , Pyke (1959). This arbitrary x-values was not presented, although
discrepancy means a certain order bias. Hyndman and Fan (1996) discuss similar ideas
Pyke (1959) considered order statistics concerning distribution quantiles.
Fn as zero-measure delimiters of probability To find an alternative for ecdf, an
intervals, created by the sample. He also optimality criterion must be selected. Because
considered statistic C N+ = E n − Fn instead of the aim of ecdf is to approximate F(x), the main
criterion is the approximation accuracy.
the usual statistic, D N+ = n / N − Fn , n = 1:N. Additionally convenience or simplicity may be
This was interpreted by Brunk (1962), Durbin discussed, but these aspects are almost the same
(1968) and Durbin and Knott (1972) as a within the class of piecewise constant
discrete modification of ecdf. In particular, approximations which are considered here.
Brunk mentions Pyke’s (1959) suggestion that The approximation accuracy needs a
the plotted points (Fn, n/(N+1)) in the Cartesian definition of distance between two compared
plane replace the empirical distribution function. distribution functions. The distance between two
In fact, as Hyndman and Fan (1996) mentioned, distributions, e.g. between exact F(x) and its
similar suggestions were made much earlier by empirical approximation, is frequently
398
PASTUSHENKO
399
LEAST ERROR SAMPLE DISTRIBUTION FUNCTION
N +1 N
X +0
d C ( x, X ) 1 N simplified in (18), which results in an equivalent
N X 1 −0
t( x)
dx
dx =
N
t( X n )
n =1
of C in the entire x-range:
(19) 1 1 N
C(x,X) = + H (x − X n ) ,
N + 1 2 n =1
Because extremal x-values acquire a probability
measure, the first and last summands can be
X0 < x < XN+1. (20)
Table 1: Notations
[A, B, …] Concatenation of A, B, …, a set consisting of A, B, …
Centred ecdf, E-values are in the centres of corresponding
C(x, X)
probability intervals
Defect of levels, sum of their squared deviations from the
d
optimal (natural) levels
D(α) Total expected squared Deviation of s(x, X, α) from F(x)
Total expected squared deviations for F*(x, X), C(x, X), S(x,
D*, DC, DS, …
X), …
E = n/(N+1) n = 1:N vector of expected order statistics.
En nth element of E
E = [0, E1, …, EN, 1] Vector E extended by extremal E-values
E[<abc>] Mathematical expectation of an expression <abc>
F(x) Parent d.f.
f(x) p.d.f., f = dF/dx
F*(x, X) Presently accepted ecdf
f*(x) Empirical p.d.f., f* = dF*(x)/dx
fN,n(u) p.d.f. of n-th order statistic u ∈ U(0,1), n = 1:N
Gain, relative total squared deviation (in units of total deviation
gz
for F*), gz = Dz/D* , z = C,S,…
Heaviside unit step, H=1 if t ≥ 0, otherwise H = 0. In Matlab: H
H(t)
= t >= 0
M = mean(X) Average of sample elements
N Sample size (length of i.i.d. sample)
P(x, X) NF*(x, X)/(N+1) Pyke function
s(x, X, α) Family of ecdf with parameter α, 0 ≤ α < 0.5
sn Levels of s(x, X, α), n = 1:N+1
S(x, X) Optimal member of s-family, minimizing D(α)
u Uniform random variable, 0 ≤ u ≤ 1
U(0, 1) Standard uniform distribution, F(u) = u, 0 ≤ u ≤ 1
X = [X1, X2, …, XN] i.i.d. sample with parent d.f. F(x)
X = [X0, X1, X2, …, XN, XN+1] Extended sample X by adding extremal x-values, size(X)= N+2
x A number ∈ ( set of possible X-values)
α, β Parameters of ecdf family s(x, X, α, β)
δ(x) Dirac delta
δxX Kronecker symbol. In Matlab: δxX = any(x == X)
Δ The deviation of an ecdf from the parent d.f.
Φ(x, X) Hybrid of S and P for both continual and discrete applications
400
PASTUSHENKO
401
LEAST ERROR SAMPLE DISTRIBUTION FUNCTION
values) appears to be greater than 2 at any N Ecdf S(x, X) formally ascribes to every element
(zeros happen at the median level 0.5 for even of the extended sample X probability measure of
N): 1/(N+2):
(n − 1) / N − n /( N + 2) 2 1 N +1
(n − 0.5) /( N + 1) − n /( N + 2)
=2+
N
(31) S(x, X) = H ( x − Xn).
N + 2 n =0
(38)
Thus, the detailed comparison leads to a Ecdf S(x, X) has zero defect d by definition.
conclusion: both levels of F*(x, X) and of C(x, Similar to C(x, X), the expression for S may be
X) show certain order bias, because in average simplified as:
these levels do not match the expected behaviour 1 N
of integrals of F between order statistics Fn. S(x, X) = 1 + H ( x − X n ) ,
They are insufficiently big below the sample N + 2 n =1
median, and too big above it. X0<x<XN+1. (39)
The defect d of s(x, X, α) is introduced
as a sum of squared deviations of sn from natural An illustration of S(x, X) for N = 3 is given by
levels (29), Figure 1, D.
N +1
d = ( sn − Sn ) 2 . (32) Results
1 Function S(x, X) Minimizes the Expected Total
Error of F(x) Approximation.
The defect of F* is It can be shown that S(x, X) minimizes
the error of F(x) approximation by calculating
N +1
n −1 n 2 N +1 total squared deviation D of s(x, X, α) from F(x)
d* = ( − ) = , (33) and finding an optimal α as argmin(D(α)),
n =1 N N +2 3N ( N + 2)
getting in this way again α = 1/(N+2) as the
and the defect of C is optimal value. Total expected approximation
error, or expected squared deviation is
N +1
n − 0.5 n 2 N
dC = ( − ) = . X N +1
N +1 N + 2 12( N + 1)( N + 2) D(α ) = E[ (F( x) - s(x, X,α )) f(x)dx]
2
n =1 (40)
(34) X0
In agreement with eq. (31), the ratio of these The optimality of S is confirmed by following
defects is: theorem and proof.
d* 1
= 4(1 + )2 (35)
Theorem
dC N
S(x, X) represents least error
approximation of F(x) at the family s(x, X, α),
Two conclusions can be made. First, although
because it minimizes the total squared
near interpolation seems to be attractive in the
approximation error (40).
sense that in puts expected values En exactly in
the middle between C- levels, it is still not yet
Proof
optimal S(x, X), based on natural levels (29):
Consider deviation Δ,
S(x, X) = s(x, X, 1/(Ν+2)). (36)
Δ = F(x) - s(x, X, α) (41)
Thus, the optimum should occur at:
as a random quantity at every fixed x due to
randomness of X. Mathematical expectation of
1
α = . (37)
N +2
402
PASTUSHENKO
403
LEAST ERROR SAMPLE DISTRIBUTION FUNCTION
Figure 2: Total Squared Expected Error D(α) of the Family s(x, X, α) for Several N-values
The cases of F*(x, X), C(x, X) and S(x, X) as members of s-family are shown by special
symbols; note that min(D(α))-values (circles) linearly depend on optimal α-values.
0.05 F (x,X)
*
C(x,X)
mean squared deviations
N=3
0.04 S(x,X)
0.03
N=6
0.02
N=9
0.01 N=12
0
0 0.05 0.1 0.15 0.2 0.25 0.3
α
Eq. (49) is nonlinear with respect to random wins in wide intervals near x = 0 and x = 1.
numbers X. Correspondingly, the expectation More interesting results follow if linear
E[Elin(x, X)] deviates from x. Expected squared interpolation is made according to eq. (17). Now
deviations E[(Elin(x, X)-x)2] were estimated the interpolation target is x, i.e. ecdf-values are
numerically as an average over 105 X-samples at selected as an independent variable e. In this
N = 5. Figure 3 compares the result with E(Δ2*), case the implicitly defined ecdf e(x, X) is given
E(Δ2C) and E(Δ2S). The left figure shows these by:
expectations for all four compared versions, and x(e, X) = Xn-1(1-λ)+ Xnλ; n = 1:N+1. (50)
the right figure shows their integrals in (0, x),
which give at x = 1 corresponding total errors D. Here, λ is the interpolation variable,
The gains, shown on the top of the right figure,
represent the relative total errors, i.e. DC/D*, λ=(e-En-1)(N+1), 0 ≤ λ ≤ 1 (En-1 ≤ e ≤ En). (51)
DS/D* and Dlin/D* respectively.
The total approximation error is notably Note that an equation similar to (50) was used
smaller for linear interpolation, as reflected by by Hyndman & Fan (1996), eq. (1), who applied
gC (1.31), gS (1.4) and glin (1.68). As illustrated linear interpolation for calculating distribution
in Figure 3 (left), the total squared error is quantiles. Due to the linearity of eq. (50) with
smaller for Elin than for C at any x, and it is respect to random X-values, this equation
smaller than that for F* almost everywhere, with represents an unbiased empirical estimation of
exception of narrow intervals near x = 0 and x = parent U(0, 1), that is, E[x(e, X)] = e, which
1. In addition, Elin loses to S around x = 0.5, but
404
PASTUSHENKO
Figure 3: Expected Squared Errors of Different Versions of ecdf for Samples from U(0, 1), N=5
Left: E[Δ2]; right: integrals of left curves in (0, x), which define at x = 1 the total expected errors.
0.025
S(x,X)
0.015
0.02
0.015 0.01
0.01
0.005
0.005
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
immediately follows from E[X] = E. This is The gain of linear interpolation is now
interesting, because it shows that F*(x, X) is not the same as in Figure 3, that is, the linear gain is
the only possible unbiased estimation of F(x). invariant with respect to the interpolation target.
The squared error of xlin defined by (50) is: The value of the linear gain for N > 1 is well
approximated by g = 1 + 6/(2N-1), which means
E[Δlin2] = E[(x(e,X)-e)2] about 300%/N savings on sample size in
= E[((Xn-1 – En-1)(1-λ) +(Xn-En)λ)2] comparison with F*. This raises the question
= Vn-1(1-λ)2+Vnλ2+2c(n,n+1)λ(1-λ), n = 1:N+1. about how such gain correlates with the quality
(52) of predictions based on linear interpolation.
Eq. (8) can be directly applied to linear
Here c = cov(X), a covariance matrix of interpolation, which gives unbiased estimation
extended sample X, and V=[0 V 0] is the and therefore eq. (8) should be valid. Given M =
variance (12), extended by the values V0 = VN+1 mean(X), eq. (8) suggests to represent x(e, X) as
= 0. As can be seen from eq. (52), expected x(e, M(X)):
squared approximation error in every interval En-
1 ≤ e ≤ En is given by parabola, connecting
x = 2eM, e ≤ 0.5
adjacent points (En, Vn). This is illustrated in and
Figure 4. The integral of (52) in (0, e) is now x = 2(1-e) M + 2e-1, e > 0.5. (53)
represented by piecewise cubic parabolas.
405
LEAST ERROR SAMPLE DISTRIBUTION FUNCTION
Figure 4: Expected Squared Error of the Linear Approximation (50), E[(x-e)2] and its Integral in (0, e)
N = 5; gain = 1.68
0.035
0.03
2
0.025 E[(x-e) ] theoretical
E[(x-e) ] and its integral
2
E[(x-e) ] numerical
Vn
0.02 2
integral(E[(x-e) ] in (0, e)
parabolic "convex hull" of Vn
0.015
2
0.01
0.005
0
0 E1 E2 E3 E4 E5 1
e
Because E[M] = 0.5 (uniform data), (53) is This method works indeed, and it should be
indeed an unbiased estimation of x = e, the compared with others. However, fitting
expected squared deviation of x from e is given parameters is a special topic, which should be
by discussed separately.
var(x) = 4e2VM, e ≤ 0.5 Optimal ecdf S is constructed to
and minimize expected total error for continual
var(x) = 4(1-e)2 VM, e > 0.5. (54) applications. Discrete applications only need
Where ecdf values at x = X, and then P(x, X) should be
VM = 1/(12N) (55) used. How is it possible to combine P(x, X) and
S(x, X) into a universal ecdf, valid both for
VM is the variance of M. Integrating (54) over e continual and discrete applications? This can be
in (0, 1), the total mean squared deviation DM is done by redefining S at x = X, e.g. by
obtained as introducing function Φ(x, X) = S(x), if x ≠ Xn,
DM = 1/(36N). (56) otherwise Φ(Xn, X) = En, n = 1:N. Such
switching between P(X, X) and S(x, X) can be
This result seems to be extraordinary, because it expressed as a formal mixture of both functions,
means a gain of ecdf (53) equal to 6, that is, 6 using Kronecker symbol δxX:
times shorter samples in comparison with F* at
the same approximation error, and this happens Φ(x,X) = δxX P(x,X) + (1- δxX) S(x,X), δxX = 1,
at any N value! Is it possible to get some if any(x==X), otherwise δxX = 0. (57)
practical advantages out of such a precise
approximation? Function Φ(x, X) is discontinuous at x = X both
One such possibility is suggested by from left and right, which is physically and
distribution parameter fitting. Thus, unknown functionally more reasonable, than in the case of
parameter(s) q can be found as F*(x, X), continuous from the right only.
argmin((mean(F(X, X, q))-0.5)2).
406
PASTUSHENKO
407
LEAST ERROR SAMPLE DISTRIBUTION FUNCTION
408
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 409-422 1538 – 9472/09/$95.00
A random variable X is said to have the skew-Laplace probability distribution if its pdf is given by
f ( x) = 2 g ( x )G (λx ) , where g (.) and G (.), respectively, denote the pdf and the cdf of the Laplace
distribution. When the skew Laplace distribution is truncated on the left at 0 it is called it the truncated
skew Laplace (TSL) distribution. This article provides a comparison of TSL distribution with two-
parameter gamma model and the hypoexponential model, and an application of the subject model in
maintenance system is studied.
409
TRUNCATED SKEW LAPLACE PROBABILITY DISTRIBUTION APPLICATION
(1.5)
m(t ) =
ϕ ϕ .
where,
(1 + λ ) λt
1 λx 2(1 + λ ) − exp −
Φ (λ ) = 1 − exp − . ϕ
2(1+ | λ |) ϕ (1.10)
Aryal et al. (2005a) proposed a reliability model This article provides a comparison of
that can be derived from the skew Laplace this reliability model with other competing
distribution on truncating it at 0 on the left. This models, namely, the two parameter gamma and
is called the truncated skew Laplace (TSL) hypoexponential distribution. We also study an
probability distribution. The cdf of this application of the TSL probability model in
reliability model for λ > 0 is given by preventive maintenance and cost optimization.
F ∗ ( x) =
TSL vs. Gamma Distribution
(1 + λ ) x x A random variable X is said to have a
exp − − 2(1 + λ ) exp − (1.6)
ϕ ϕ gamma probability distribution with parameters
1+
(2λ + 1) α and β denoted by G (α , β ) if it has a
probability density function given by
410
ARYAL & TSOKOS
411
TRUNCATED SKEW LAPLACE PROBABILITY DISTRIBUTION APPLICATION
Figure 1: P-P Plots of Vessel Data Using TSL and Gamma Distribution
412
ARYAL & TSOKOS
sequential phases. If the time the process spent time (MRLT) at time t for Hypo(λ1 , λ2 ) is
in each phase is independent and given by
exponentiallydistributed, then it can be shown
that overall time is hypoexponentially
distributed. It has been empirically observed that 1 λ 22 exp( − λ1t ) − λ12 exp( − λ 2 t )
m Hypo ( t ) = .
service times for input-output operations in a λ1λ 2 [ λ 2 exp( − λ1t ) − λ1 exp( − λ 2 t ]
computer system often possess this distribution (3.5)
(see Trivedi, 1982) and will have n parameters
one for each of its distinct phases. Interest then To compare the TSL and
lies in a two-stage hypoexponential process, that hypoexponential pdf in terms of reliability and
is, if X is a random variable with parameters λ1 mean residual life times, random samples of size
and λ2 (λ1 ≠ λ2 ) then its pdf is given by 50, 100 and 500 are generated from a
hypoexponential pdf with parameters λ1 =1 and
λ1λ2 λ2 = 2, 5,10 & 20 for each sample size.
f ( x) = {exp(−λ1 x) − exp(−λ2 x)} , x > 0 Numerical iterative procedure, Newton-
λ2 − λ1
Raphson algorithm, is used to estimate the
(3.1) maximum likelihood estimates of λ1 & λ 2 . To
compare these results, the parameters ϕ and λ
The notation Hypo(λ1 , λ2 ) denotes a
of a TSL distribution are estimated (See Table
hypoexponential random variable with
2). In addition the mean residual life times were
parameters λ1 and, λ 2 . The corresponding cdf
computed for both the models at t = t n / 2 .
is given by
In Table 2, M TSL and M HYPO denote
λ2 λ1 the MRLT of TSL and hypoexponential models
F ( x) = 1 − exp(−λ1 x) + exp(−λ2 x), respectively. Table 2 shows that if the sample
λ2 − λ1 λ2 − λ1
size is large and the difference between the two
x≥0 parameters λ1 and λ 2 is large both the TSL and
(3.2) hypoexponential model will produce the same
result. However, for a small sample size and a
The reliability function R (t ) of a small difference between λ1 and λ 2 a significant
Hypo(λ1 , λ2 ) random variable is given by difference is observed between the two models.
Figures 2-5 illustrate the plotted reliability
λ2 λ1 graphs and provide the support for these
R (t ) = exp(−λ1t ) − exp(−λ2 t ) findings.
λ2 − λ1 λ2 − λ1
(3.3) TSL Distribution and Preventive Maintenance
In many situations, failure of a system
The hazard rate function h(t ) of a or unit during actual operation can be very
costly or in some cases dangerous if the system
Hypo(λ1 , λ2 ) random variable is given by
fails, thus, it may be better to repair or replace
before it fails. However, it is not typically
λ1λ2 [exp(−λ1t ) − exp(−λ2 t )] feasible to make frequent replacements of a
h(t ) = . (3.4)
λ2 exp(−λ1t ) − λ1 exp(−λ2 t ) system. Thus, developing a replacement policy
that balances the cost of failures against the cost
of planned replacement or maintenance is
It is clear that h(t ) is an increasing function of
necessary. Suppose a unit that is to operate over
the parameter λ 2 ; it increases from 0 to a time 0 to time t, [0, t], is replaced upon failure
min{λ1 , λ2 }. Note that the mean residual life (with failure probability distribution F). Assume
413
TRUNCATED SKEW LAPLACE PROBABILITY DISTRIBUTION APPLICATION
Figure 2: Reliability of TSL and Hypoexponential Distributions for n=50, λ1=1, λ2=2
414
ARYAL & TSOKOS
Figure 3: Reliability of TSL and Hypoexponential Distributions for n=50, λ1=1, λ2=5
Figure 4: Reliability of TSL and Hypoexponential Distributions for n=50, λ1=1, λ2=10
415
TRUNCATED SKEW LAPLACE PROBABILITY DISTRIBUTION APPLICATION
Figure 5: Reliability of TSL and Hypoexponential Distributions for n=50, λ1=1, λ2=20
that the failures are easily detected and instantly Age Replacement Policy and TSL Probability
replaced and that cost c1 includes the cost Distribution
Consider the so-called age replacement
resulting from planned replacement and cost c2
policy; in this policy an item is always replaced
that includes all costs invested resulting from
exactly at the time of failure, or at t ∗ time after
failure then the expected cost during the period
installation, whichever occurs first. This age
[0, t] is given by
replacement policy for infinite time spans seems
to have received the most attention in the
C (t ) = c1 E ( N1 (t )) + c2 E ( N 2 (t )), (4.1) literature. Morese (1958) showed how to
determine the replacement interval minimizing
where, E ( N1 (t )) and E ( N 2 (t )) denote the cost per unit time, while Barlow and Proschen
expected number of planned replacement and (1962) proved that if the failure distribution, F,
the expected number of failures respectively. is continuous then a minimum-cost age
The goal is to determine the policy minimizing replacement exists for any infinite time span.
C (t ) for a finite time span or minimizing In this article, the goal is to determine
C (t ) the optimal time t ∗ at which preventive
lim for an infinite time span. Because the replacement should be performed. The model
t →∞ t
should determine the time t ∗ that minimizes
TSL probability distribution has an increasing
failure rate it is expected that this model would the total expected cost of preventive and failure
be useful in a maintenance system. maintenance per unit time. The total cost per
cycle consists of the cost of preventive
maintenance in addition to the cost of failure
416
ARYAL & TSOKOS
( )
t ∗ . Hence, the expected cost per unit time is −(1+ 4λ + 2λ )
2
equal to:
c1 R(t * ) + c2 1 − R(t * ) (
− exp −(1+ λ)t / ϕ
*
)
(4.1.3)
t * R(t * ) + M (t * ) 1 − R(t * ) (4.1.4)
Methodology
Assume that a system has a time to failure In order to minimize a function g (t ) subject to
distribution of the truncated skew Laplace pdf;
a ≤ t ≤ b the Golden Section Method, which
the goal is to compute the optimal time t ∗ of employs the following steps to calculate the
preventive replacement. Because the reliability optimum value may be used.
function of a TSL random variable is given by
Step 1:
t* (1 + λ )t * Select an allowable final tolerance level δ and
2(1 + λ ) exp − − exp −
* ϕ ϕ assume the initial interval where the minimum
R (t ) = lies is [a1 , b1 ] = [a, b] and let
(2λ + 1)
λ1 = a1 + (1 − α )(b1 − a1 )
and μ1 = a1 + α (b1 − a1 )
1 t*
Take α = 0.618 , which is a positive root of
M (t * ) =
1 − R(t*) 0
tf (t )dt
c 2 + c − 1 = 0 , evaluate g (λ1 ) and g ( μ1 ) , and
let k = 1 . Go to Step 2.
thus,
417
TRUNCATED SKEW LAPLACE PROBABILITY DISTRIBUTION APPLICATION
Step 3:
ECU (λ3 ) = 8.516 and ECU ( μ 3 ) = 8.533.
Let a k +1 = a k , bk +1 = bk , λk +1 = μ k and
Because ECU (λ3 ) < ECU ( μ 3 ) the
μ k +1 = a k +1 + α (bk +1 − a k +1 ) . Evaluate
next interval where the optimal solution lies is
g ( μ k +1 ) and go to Step 5. [0, 2.36] .
Step 4: Iteration 4:
Let a k +1 = a k , bk +1 = μ k , μ k +1 = λ k and [a 4, ]
b4 = [0, 2.36], λ4 = 0.901 and μ 4 = 1.459
λk +1 = a k +1 + (1 − α )(bk +1 − a k +1 ) . Evaluate
ECU (λ 4 ) = 8.613 and ECU ( μ 4 ) = 8.516.
g (λ k +1 ) and go to Step 5.
Because ECU (λ4 ) > ECU ( μ 4 ) the
Step 5: next interval where the optimal solution lies is
Replace k by k + 1 and go to Step 1.
[0.901, 2.36] .
Example
To implement this method to find the time t ∗ Iteration 5:
subject to the condition that c1 = 1 and c 2 = 10 [a5 , b5 ] = [0.901, 2.36], λ5 = 1.459 and μ5 = 1.803
proceed as follows: ECU(λ5 ) = 8.516 and ECU(μ5 ) = 8.517.
418
ARYAL & TSOKOS
Because ECU (λ7 ) > ECU ( μ 7 ) the It is known that the expected number of failures
next interval where the optimal solution lies is in the interval (0, t*) is the integral of the failure
rate function, that is
[1.459, 1.803] .
If the δ level is fixed at 0.5, it can be
concluded that the optimum value lies in the
( ) 0
t*
M (t * ) = E N (t * ) = H (t * ) = h(t )dt .
419
TRUNCATED SKEW LAPLACE PROBABILITY DISTRIBUTION APPLICATION
Iteration 2: Iteration 7:
[a 2, b2 ] = [0,6.18] , λ2 = 2.36 , μ 2 = 3.82 , [a 7, b7 ] = [0.901,1.459 ] , λ7 = 1.114 ,
ECU (λ2 ) = 9.30 , and ECU ( μ 2 ) = 9.523. μ 7 = 1.245 , ECU (λ7 ) = 9.08 , and
Because ECU (λ2 ) < ECU ( μ 2 ) the ECU ( μ 7 ) = 9.09.
next interval where the optimal solution lies is
[0, 3.82] . Because ECU (λ7 ) < ECU ( μ 7 ) the
next interval where the optimal solution lies is
Iteration 3: [0.901, 1.245] .
[a 3, b3 ] = [0,3.82] , λ3 = 1.459 , μ 3 = 2.36 , If the δ level is fixed at 0.5 it can be
ECU (λ3 ) = 9.124 and ECU ( μ 3 ) = 9.30. concluded that the optimum value lies in the
interval [ 0.901, 1.245] and it is given by 1.07.
Because ECU (λ3 ) < ECU ( μ 3 ) the As in the case of age replacement in this
next interval where the optimal solution lies is numerical example it was assumed that the
[0, 2.36] . failure data follows TSL (1,1) model. Observe
that, in order to optimize the cost, maintenance
must be scheduled at every 1.07 units of time.
Iteration 4:
[a 4, b4 ] = [0,2.36] , λ4 = 0.901, μ 4 = 1.459 , Maintenance Over a Finite Time Span
ECU (λ4 ) = 9.102 and ECU ( μ 4 ) = 9.124. The problem concerning the preventive
maintenance over a finite time span is of great
importance in industry. It can be viewed in two
Because ECU (λ4 ) < ECU ( μ 4 ) the
different perspectives based on whether the total
next interval where the optimal solution lies is number of replacements (failure + planned)
[0, 1.459] . times are known or unknown. The first case is
straightforward and has been addressed in the
Iteration 5: literature for a long time. Barlow et al. (1962)
[a 5, b5 ] = [0,1.459 ] , λ5 = 0.557 , μ5 = 0.901, derived the expression for this case. Let T ∗
represent the total time span, meaning
ECU (λ5 ) = 9.405 and ECU ( μ 5 ) = 9.124. minimization of the cost due to forced
replacement or planned replacement until
Because ECU (λ5 ) > ECU ( μ 5 ) the T = T ∗ . Let C n (T ∗ , T ) represent the expected
next interval where the optimal solution lies is cost in the time span 0 to T * , [0, T ∗ ],
[0.557, 1.459] . considering only the first n replacements and
following a policy of replacement at interval T.
Iteration 6: It is clear that considering the case when
[a 6, b6 ] = [0.557,1.459 ] , λ6 = 0.9015 , T * ≤ T is equivalent to zero planned
replacements, that
μ 6 = 1.114 , ECU (λ6 ) = 9.102 , and
ECU ( μ 6 ) = 9.08. c2 F (T * ), if T * ≤ T ,
C1 (T * , T ) =
c2 F (T ) + c1 (1 − F (T ) ) , if T ≥ T
*
420
ARYAL & TSOKOS
T T*
*
[ ]
c 2 + C n (T * − y, T ) dF ( y ) if T * ≤ T C (T , T ) = c2 + C (T * − τ , T ) f (τ )dτ (5.4)
*
0 0
T
[ ]
c 2 + C n (T − y, T ) dF ( y )
*
The interest here lies in finding the
preventative maintenance time T that minimizes
0 the cost of the system. Consider a numerical
[ ]
+ c1 + C n (T * − T , T ) × [1 − F (T )], otherwise example to determine whether the minimum
exists if the failure model is assumed to be TSL
(1, 1). A random sample of size 100 was
(5.2) generated from TSL (1, 1) and a time T to
perform preventive maintenance was fixed. The
A statistical model may now be preventive maintenance cost c1 = 1 was set along
developed that can be used to predict the total with the failure replacement cost c2 = 1, 2 and
cost of maintenance before an item is actually 10. The process was repeated several times and
used. Let T equal the predetermined replacement the total cost for first 40 failures was computed.
time, and assume that an item is always replaced All necessary calculations were performed using
exactly at the time of failure T* or T hours after
the statistical language R. In the table 3, Ci , for
its installation, whichever occurs first. Let τ
denotes the failure time then we have two cases i = 1, 2, & 10 represents the total cost due to
to consider, preventive maintenance cost c1=1 and the failure
replacement cost c 2 = i, i = 1, 2 &10 . It can be
Case 1: T < T*
observed from table 3 that the minimum Ci
In this case the preventative
maintenance (PM) interval is less than the finite exists around at T = 1.1 units of time. A
planning horizon. For this case if the component preliminary study of the application of the TSL
fails after time, say, τ ( for τ < T ) then the cost distribution in such environment can be found in
Aryal, et al. (2008).
due to failure is incurred and the planning
horizon is reduced to [T − τ ]. But if the
*
Table 3: Expected Costs at Different
component works till the preventive Maintenance Times
replacement time T then the cost due to T C10 C2 C1
preventive maintenance is incurred and the 1.00 340.55 88.95 57.50
planning horizon is reduced to [T*-T]. 1.01 347.25 89.65 57.45
1.02 336.95 87.75 56.60
The total cost incurred in these two cases is 1.03 342.95 88.15 56.30
T 1.04 339.15 87.15 55.65
[ ]
C (T * , T ) = c 2 + C (T * − τ , T ) f (τ )dτ 1.05
1.06
341.25
334.40
87.25
86.40
55.50
55.40
0
1.07 343.75 87.35 55.30
[
+ [1 − F (T )]× c1 + C (T * − T , T ) ] 1.08 332.15 84.95 54.05
(5.3) 1.09 338.55 85.81 54.22
where c1 is the cost for preventive maintenance 1.10 318.48 82.67 53.19
1.11 327.68 84.04 52.59
and c 2 (> c1 ) is the cost for failure maintenance. 1.12 344.76 86.48 54.19
1.13 333.70 84.50 53.35
Case 2: T* < T 1.14 340.40 85.20 53.30
In this case the PM interval is greater 1.15 338.86 84.68 53.90
than the planning horizon so there is no 1.16 331.28 82.90 53.86
preventive maintenance but there is a chance of 1.17 338.27 84.09 54.31
failure maintenance. Hence the total cost 1.18 335.24 83.05 53.52
incurred will be 1.19 341.90 84.00 54.76
1.20 363.90 87.50 56.95
421
TRUNCATED SKEW LAPLACE PROBABILITY DISTRIBUTION APPLICATION
422
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 423-447 1538 – 9472/09/$95.00
On Some Discrete Distributions and their Applications with Real Life Data
This article reviews some useful discrete models and compares their performance in terms of the high
frequency of zeroes, which is observed in many discrete data (e.g., motor crash, earthquake, strike data,
etc.). A simulation study is conducted to determine how commonly used discrete models (such as the
binomial, Poisson, negative binomial, zero-inflated and zero-truncated models) behave if excess zeroes
are present in the data. Results indicate that the negative binomial model and the ZIP model are better
able to capture the effect of excess zeroes. Some real-life environmental data are used to illustrate the
performance of the proposed models.
Key words: Binomial Distribution; Poisson distribution; Negative Binomial; ZIP; ZINB.
423
DISCRETE DISTRIBUTIONS AND THEIR APPLICATIONS WITH REAL LIFE DATA
greater than expected. Reasons may include guidelines to fit a discrete process appropriately.
failing to observe an event during the First, a simulation study was conducted to
observational period and an inability to ever determine the performance of the considered
experience an event. Some researchers (Warton, models when excess zeroes are present in a
2005; Shankar, et al., 2003; Kibria, 2006) have dataset. Second, the following real-life data (For
applied zero-inflated models to model this type details, see Table 4.1) were analyzed, the
of process (known as a dual-states process: one numbers of:
zero-count state and one other normal-count
state). These models generally capture apparent 1. Road accidents per month in the Dhaka
excess zeroes that commonly arise in some district,
discrete processes, such as road crash data, and 2. People visiting the Dhaka medical hospital
improve statistical fit when compared to the (BMSSU) per day,
Poisson and the negative binomial model. The 3. Earthquakes in Bangladesh per year, and
reason is that data obtained from a dual-state 4. Strikes (hartals) per month in Dhaka.
process often suffer from over-dispersion
because the number of zeroes is inflated by the Statistical Distribution: The Binomial
zero-count state. A zero-inflated model Distribution
(introduced by Rider, 1961) is defined by If X~B(n, p), then the probability mass
function (pmf) of X is defined by
1 − θ if X = 0
P( X = k ) = P( X = k ; n , p ) = nCK p k ( 1 − p ) n − k ,
θP( X ; μ i ) if X > 0
k = 0, 1, 2, …, n. (2.1)
where θ is the proportion of non-zero values of
X and P(X;μi) is a zero-truncated probability where n is the total number of trials and p is the
model fitted to normal-count states. To address probability of success of each trial. The moment
phenomena with zero-inflated counting generating function (mgf) of (2.1) is
processes, the zero-inflated Poisson (ZIP) model
and the zero-inflated negative binomial (ZINB) M X (t ) = ( p + qet )n ,
model have been developed. A ZIP model is a
mix of a distribution that is degenerate at zero thus, E(X), V(X) and skewness (Sk) of (2.1) are
and a variant of the Poisson model. Conversely, np, npq and [( 1 − 2 p )2 / npq ] respectively.
the ZINB model is a mix of zero and a variant of
negative binomial model.
Statistical Distribution: The Poisson Distribution
Opposite situations from the zero-
inflated models are also encountered; this article In (2.1) if n→∞ and p→0, then X
examines processes that have no zeroes: the follows the Poisson distribution with parameter
zero-truncated models. If the Poisson or the λ(>0) (denoted X~P(λ)). The pmf is defined as
negative binomial model is used with these
processes, the procedure tries to fit the model by e −λ λk
P( X = k ; λ ) = , k = 0, 1, 2, … (2.2)
including probabilities for zero values. More k!
accurate models that do not include zero values
should be able to be produced. If the value of where λ denotes expected number of
zero cannot be observed in any random occurrences. The mgf of (2.2) is
experiment, then these models may be used.
Two cases are considered: (1) the zero-truncated t
M X (t ) = eλ ( e −1) ,
Poisson model, and (2) the zero-truncated
negative binomial model.
Given a range of possible models, it is thus, E(X) and V(X) of (2.2) are the same, which
difficult to fit an appropriate discrete model. The is λ and Sk is equal to 1/λ.
main purpose of this article is to provide
424
BANIK & KIBRIA
Γ( k + X ) X − ( k + X ) 1 − θ if X = 0
P( X ; k , p) = p q ,
X !Γk P( X ;θ , k , p ) = P( X ; k , p )
p > 0, k > 0, X = 0, 1, 2,... θ 1 − q−k
if X > 0
(2.3) (2.5)
where p is the chance of success in a single trial where P(X; k, p) is defined in (2.3) and θ is the
and k are the number of failures of repeated proportion of non-zero values of X. The mgf of
identical trials. If k→∞, then X~P(λ), where λ = (2.5) is
kp. The mgf of (2.3) is
θ
M X (t ) = (q − pet )− k , M X (t ) = (1 − θ ) + [(q − pet ) − k − 1],
1 − q−k
thus, E(X), V(X) and Sk of (2.3) are kp, kpq and thus, E(X), V(X) and Sk of (2.5) are
[( 1 + 2 p )2 / kpq ] respectively.
θ kp θ kp θ kp
Statistical Distribution: The Zero-Inflated −k
, −k p( k + 1 ) + 1 − −k
Poisson (ZIP) Distribution 1− q 1− q 1− q
If X~ZIP(θ, λ) with parameters θ and λ,
then the pmf is defined by and
1 − θ if X = 0 θ kp θ kp 2
( kp ) 2 + 3kp + 1 − [ 3kp + 3 −
−k
]
P( X ; θ , λ ) = P( X ; λ ) (2.4) 1− q 1− q
−k
θ 1 − e − λ if X > 0
θ kp θ kp 3
−k
[ kp + 1 − ]
−k
1− q 1− q
where P(X; λ) is defined in (2.2) and θ is the
proportion of non-zero values of X. The mgf of
(2.4) is respectively. As k→∞, ZINB (θ, k, p) ~ZIP(θ,
λ), where λ= kp.
θ e− λ λe t
M X (t ) = (1 − θ ) + (e − 1),
1 − e− λ Statistical Distribution: Zero-Truncated Poisson
(ZTP) Distribution
thus, E(X), V(X) and Sk of (2.4) are If X~ZTP(λ), then the pmf of X is given
by
θλ θλ θλ P ( X = x)
, λ +1− P ( X = x | X > 0) =
1 − e −λ 1 − e − λ 1 − e − λ P ( X > 0)
and P( X ; λ )
θλ θλ 2 =
λ 2 + 3λ + 1 − −λ
[ 3λ + 3 − −λ ] 1 − P ( X = 0)
1 − e 1−e
P( X ; λ )
θλ θλ 3 =
−λ
[λ + 1− −λ
] (1 − e − λ )
1−e 1−e
respectively.
425
DISCRETE DISTRIBUTIONS AND THEIR APPLICATIONS WITH REAL LIFE DATA
thus,
thus, E(X) and V(X) are λ( 1 − e − λ ) −1 and Sk is
X i2 n
X i3
n
( 1 − e − λ )( λ + 3 + λ−1 ) − 3( λ + 1 ) − ( 1 − e − λ ) −1 . M 1 = E( X ) , M 2 = , M3 = .
i =1 n i =1 n
Statistical Distribution: Zero-Truncated
Negative Binomial (ZTNB) Distribution The Maximum Likelihood Estimation Method
If X~ZTNB(k, p), then the pmf of X is (MLE)
given by Find the log-likelihood function for a
given distribution and take a partial derivative of
P( X ; k , p ) P( X ; k , p ) this function with respect to each parameter and
P( X = x | X > 0 ) = = set it equal to 0; solve it to find the parameters
1 − P( X = 0 ) 1 − q −k estimate.
respectively. ∂ log L( X ; n, p ) Xi n − Xi
= i =1
− i =1
.
∂p p 1− p
Parameter Estimation
To estimate the parameters of the Simplifying results in:
considered models, the most common methods n
426
BANIK & KIBRIA
Estimator of λ
The log-likelihood expression of (2.2) is Differentiating the above expression with
respect to p and k, the following equations
n n result:
LogL(X;λ) = − nλ + X i log λ − (log X i )! n n
i =1 i =1
∂LogL( X ; k , p ) Xi ( k + X i )
= i =1
− i =1
(2.8)
Differentiating the above expression with ∂p p 1+ p
respect to λ, results in and
p
∂ log L( X ; λ ) Xi
∂LogL( X ; k , p ) 1 n X i −1 j
∂λ
= − n + i =1
λ =
∂k n i =1 j =0 1 + jk −1
and, after simplification, kp ( E ( X ) + k )
+ k 2 ln(1 + p ) −
1+ p
Pλˆ ( ml ) = M 1 (2.9)
thus
Pλˆ ( mom ) = Pλˆ ( ml ) = M 1 . Solving (2.8), results in NB p ( ml ) = M 1 / kˆ. It
was observed that NBk̂ ( ml ) does not exist in
Negative Binomial Distribution: Moment
closed form, thus, NBk̂ ( ml ) was obtained by
Estimators of p and k
Based on (2.3), it is known that E(X) = optimizing numerically (2.9) using the Newton-
kp and V(X) = kpq, thus Raphson optimization technique where p= p .
n − 1 i =1 M
ZIPλˆ ( mom ) = 2 − 1
M1
Negative Binomial Distribution: Maximum and
Likelihood Estimators of p and k
The log-likelihood expression of (2.3) is
427
DISCRETE DISTRIBUTIONS AND THEIR APPLICATIONS WITH REAL LIFE DATA
M (1 − e − λ )
ˆ LogL( X ; θ , k , p ) =
ZIPθˆ ( ml ) = 1 .
λˆ
n n
I(X i
= 0) log(1 − θ ) + I ( X i > 0) log θ
i =1 i =1
ZIP Distribution: Maximum Likelihood n n
+ log( ) − ( k + X i ) log q
Γ( Xi +K )
Estimators of θ and λ X i ! ΓK
The log-likelihood expression of (2.4) is i =1 i =1
n n
+ X i log p − log(1 − q − k )
LogL(X; θ, λ) = i =1 i =1
n
I(X i = 0) log(1 − θ )
i =1 Differentiating the above with respect to each of
n
+ I ( X i > 0){log θ + X i log λ − log(e − 1) − log X i !} λ parameters, results in the following estimators
i =1 for θ, k, and p:
n n
I( X i = 0 ) I( X i > 0 ) Other estimates p̂ and k̂ were found
∂ log L( X ;θ , λ )
= − i =1 + i =1 iteratively: k, p is given by
∂θ 1 −θ θ
−1
ZINBpˆ ( ml ) = E ( X ){1 − (1 + kp)− k } (2.10)
n n n
∂ log L(X; θ, λ ) e −λ
∂λ
= i =1
Xi / λ −
i =1
I ( X i > 0) − I( X
i =1
i > 0) thus, the solution of p̂ has the same properties
e λ −1
as described above. Because the score equation
After the above equations are simplified for θ for k̂ does not have a simple form, k was
and λ, the following are obtained: estimated numerically given the current estimate
n of p̂ from (2.10) (for details, see Warton, 2005).
ZIPθˆ ( mom ) = I ( X i > 0 ) / n
i =1
428
BANIK & KIBRIA
n
Γ( X i + k ) n excess zeroes - it is necessary to investigate
log(
i =1 X i ! Γk
) + X i log p
i =1
which model would best fit a discrete process.
Thus, a series of simulation experiments was
n n
conducted to determine the effect of excess
− (k + X i ) log q − log(1 − q − k ) zeroes on selected models. These simulation
i =1 i =1
studies reflect how commonly used discrete
models behave if excess zeroes are present in a
Methods for Comparison of the Distributions:
set of data.
Goodness of Fit (GOF) Test
The GOF test determines whether a
Simulation Experiment Design
hypothesized distribution can be used as a model
A sample, X = {X1, X2, …, Xn}, was
for a particular population of interest. Common
obtained where data were generated from a
tests include the χ2, the Kolmogorov-Smirnov Poisson model with:
and the Anderson-Darling tests. The χ2 test can λ: 1.0, 1.5, 2.0 and 2.5;
be applied for discrete models; other tests tend n: 10, 20, 30, 50, 100, 150, 200; and
to be restricted to continuous models. The test 10%, 20%, 80% zeroes.
procedure for the χ2 GOF is simple: divide a set Different data sets for different sample sizes and
of data into a number of bins and the number of λs were generated to determine which model
points that fall into each bin is compared to the performs best if zeroes (10% to 80%) are present
expected number of points for those bins (if the in a dataset. To select the possible best model,
data are obtained from the hypothesized the Chi-square GOF statistic defined in (2.11)
distribution). More formally, suppose: for all models were calculated. If the test was
not statistically significant, then the data follows
H0: Data follows the specified population the specified (e.g., binomial, Poisson, or other)
distribution. population distribution. Tables 3.1-3.8 show the
H1: Data does not follow the specified GOF statistic values for all proposed
population distribution. distributions. Both small and large sample
behaviors were investigated for all models, and
If the data is divided into bins, then the test all calculations were carried out using the
statistic is: programming code MATLAB (Version 7.0).
s (O − E )
2
χ cal
2
= i i
(2.11)
i =1 Ei Results
Tables 3.1 to 3.8 show that the performance of
where Oi and Ei are the observed and expected the models depends on the sample size (n), λ
frequencies for bin i. The null, Ho, is rejected if and the percentage of zeroes included in the
χ cal
2
> χ df2 ,α , where degrees of freedom (df) is sample. It was observed that, as the percentage
of zeroes increases in the sample, the proportion
calculated as (s−1− # of parameters estimated) of over dispersion decreases and performance of
and α is the significance level. the binomial and Poisson distributions decrease.
For small sample sizes, most of the models fit
Methodology well; for large sample sizes, however, both the
Simulation Study binomial and Poisson performed poorly
Because the outcome of interest in many compared to others. For samples containing a
fields is discrete in nature and generally follows moderate to high percentage of zeroes, the
the binomial, Poisson or the negative binomial negative binomial performed best followed by
distribution. It is evident from the literature that ZIP and ZINB. Based on simulations, therefore,
these types of variables often contain a high in the presence of excess zeroes the negative
proportion of zeroes. These zeroes may be due binomial model and the ZIP (moment estimator
to either the presence of a population with only of parameters) model to approximate a real
zero counts and/or over-dispersion. Hence, it discrete process are recommended.
may be stated that - to capture the effect of
429
DISCRETE DISTRIBUTIONS AND THEIR APPLICATIONS WITH REAL LIFE DATA
indicates over-dispersion; mom denotes moment estimator, and ml denotes MLE of model parameters.
430
BANIK & KIBRIA
431
DISCRETE DISTRIBUTIONS AND THEIR APPLICATIONS WITH REAL LIFE DATA
indicates over-dispersion; mom denotes moment estimator, and ml denotes MLE of model parameters.
432
BANIK & KIBRIA
indicates over-dispersion; mom denotes moment estimator, and ml denotes MLE of model parameters.
433
DISCRETE DISTRIBUTIONS AND THEIR APPLICATIONS WITH REAL LIFE DATA
indicates over-dispersion; mom denotes moment estimator, and ml denotes MLE of model parameters.
434
BANIK & KIBRIA
indicates over-dispersion; mom denotes moment estimator, and ml denotes MLE of model parameters.
435
DISCRETE DISTRIBUTIONS AND THEIR APPLICATIONS WITH REAL LIFE DATA
436
BANIK & KIBRIA
indicates over-dispersion; mom denotes moment estimator, and ml denotes MLE of model parameters.
437
DISCRETE DISTRIBUTIONS AND THEIR APPLICATIONS WITH REAL LIFE DATA
Applications to Real Data Sets cases, the negative binomial model can improve
The selected processes were fitted using statistical fit to the process. In the literature, it
theoretical principles and by understanding the has been suggested that over-dispersed processes
simulation outcomes. Theoretical explanations may also be characterized by excess zeroes
of a discrete process are reviewed as follows: (more zeroes than expected under the P(λ)
Generally, a process with two outcomes (see process) and zero-inflated models can be a
Lord, et al., 2005, for details) follows a statistical solution to fit these types of processes.
Bernoulli distribution. To be more specific,
consider a random variable, which is NOA. Each Traffic Accident Data
time a vehicle enters any type of entity (a trial) The first data set analyzed was the
on a given transportation network, it will either number of accidents (NOA) causing death in the
be involved in an accident or it will not. Dhaka district per month; NOA were counted
Thus, X~B(1,p), where p is the for each of 64 months for the period of January
probability of an accident when a vehicle enters 2003 to April 2008 and the data are presented in
any transportation network. In general, if n Table 4.1.1 and in Figure 4.1.1.
vehicles are passing through the transportation
network (n trials) they are considered records of
NOA in n trials, thus, X~B(n,p). However, it was Table 4.1.1: Probability
observed that the chance that a typical vehicle Distribution of NOA
will cause an accident is very small out when Observed
NOA
considering the millions of vehicles that enter a Months
transportation network (large number of n trials). 0 0.12
Therefore, a B(n,p) model for X is approximated 1 0.24
by a P(λ) model, where λ represents expected
2 0.17
number of accidents. This approximation works
well when λs are constant, but it is not 3 0.22
reasonable to assume that λ across drivers and 4 0.17
road segments are constant; in reality, this varies 5 0.05
with each driver-vehicle combination. 6 0
Considering NOA from different roads with 7 0.02
different probabilities of accidents for drivers,
Total 64 months
the distribution of accidents have often been
observed over-dispersed: if this occurs, P(λ) is
unlikely to show a good fit. In these
438
BANIK & KIBRIA
A total of 141 accidents occurred during the frequencies with GOF statistic values are
considered periods (see Figure 4.1.1) and the tabulated and shown in Table 4.1.2. The sample
rate of accidents per month was 2.2 (see Table mean and variance indicate that the data shows
4.1.2). Figure 4.1.1 also shows that since 2004 over dispersion; about 16% of zeroes are present
the rates have decreased. Main causes of road in the NOA data set. According to the GOF
accidents identified according to Haque, 2003 statistic, the binomial model fits poorly, whereas
include: rapid increase in the number of the Poisson and the negative binomial appear to
vehicles, more paved roads leading to higher fit. The excellent fits of different models are
speeds, poor driving and road use knowledge, illustrated in Figure 4.1.2; based on the figure
skill and awareness and poor traffic and the GOF statistic, the negative binomial
management. The observed and expected model was shown to best fit NOA.
21 20
20
10 2
0
2003 2004 2005 2006 2007 04/2008
439
DISCRETE DISTRIBUTIONS AND THEIR APPLICATIONS WITH REAL LIFE DATA
20
# of months
15
10
0
1 2 3 4 5 6 7 8
Number of Patient Visits (NOPV) at Hospital working hour (see Table 4.2.2). Expected
The number of patient visits (NOPV) frequencies and GOF statistic values were
data were collected from the medical unit of the tabulated and are shown in Table 4.2.2 and
Dhaka BMSSU medical hospital for the period Figure 4.2.2 shows a bar chart of observed vs.
of 26 April 2007 to 23 July 2007, where the expected frequencies. Tabulated results and the
variable of interest is the total number of chart show that the negative binomial model and
patients visit in BMSSU per day. The frequency the ZTNB model (ZTNB model had the best fit)
distribution for NOPV is reported in Table 4.2.1, fit NOPV data well compared to other models.
which shows that the patients visiting rate per Based on this analysis, the ZTNB model is
day is 142.36; this equates to a rate of 14.23 per recommended to accurately fit NOPV per day.
Table 4.2.1: Frequency Figure 4.2.1: Trend to Visits in BMSSU per Day
Distribution of NOPV 250
Observed
NOPV 200
Days
No. of Patients
150
51-83 1
100
84-116 12
50
117-149 32
0
150-182 23 26.04.07 17.05.07 05.06.07 24.06.07 12.07.07
183-215 6
440
BANIK & KIBRIA
50
Number of Days
40
30
20
10
0
51-83 84-116 117-149 150-182 183-215
441
DISCRETE DISTRIBUTIONS AND THEIR APPLICATIONS WITH REAL LIFE DATA
10
8
NEQ
0
1973 1978 1983 1988 1993 1998 2003 2008
5
Magnitude
0
2/11/73 14/10/86 28/5/92 6/1/98 27/7/03 14/2/07
442
BANIK & KIBRIA
GOF Statistic 1.9e+004* 7.7e+003* 97.72* 16.23 15.25 77.09* 83.67* 2.28e+005
7
No. of Years
5
4
3
2
0
0 1 2 3 4 5 6 7 8 9 10 11
443
DISCRETE DISTRIBUTIONS AND THEIR APPLICATIONS WITH REAL LIFE DATA
Hartal (Strike) Data 413 hartals were observed and the monthly
The fourth variable is the number of hartal rate is 0.96 per month (see Table 4.4.2).
hartals (NOH) per month observed in Dhaka city Figure 4.4.1 shows the NOH for two periods:
from 1972 to 2007. Data from 1972 to 2000 was 1972-1990 (post-independence) and 1991-2007
collected from Dasgupta (2001) and from 2001- (parliamentary democracy). It has been observed
2007 was collected from the daily newspaper, that the NOH have not decreased since the
the Daily Star. Historically, the hartal Independence in 1971. Although there were
phenomenon has respectable roots in Ghandi’s relatively few hartals in the early years
civil disobedience against British colonialism following independence, the NOH began to rise
(the word hartal, derived from Gujarati, is sharply after 1981, with 101 hartals between
closing down shops or locking doors). In 1982 and 1990. Since 1991(during the
Bangladesh today, hartals are usually associated parliamentary democracy), the NOH have
with the stoppage of vehicular traffic, closure of continued to rise with 125 hartals occurring from
markets, shops, educational institutions and 1991-1996. Thus, the democratic periods (1991-
offices for a specific period of time to articulate 1996 and 2003-2007) have experienced by far
agitation (Huq, 1992). When collecting monthly the largest number of hartals. Lack of political
NOH data, care was taken to include all events stability was found to be the main cause for this
that were consistent with the above definition of higher frequency of hartals (for details, see
hartal (e.g., a hartal lasting 4 to 8 hours was Beyond Hartals, 2005, p. 11). From Table 4.4.1,
treated as a half-day hartal, 9 to 12 hours as a it may be observed that the hartal data contains
full-day hartal; for longer hartals, each 12 hour about 60% of zeroes. Table 4.4.2 indicates that
period was treated as a full-day hartal). NOH process displays over-dispersion with a
Historical patterns of hartals in Dhaka city, variance to mean > 1. According to data in this
NOH with respect to time are plotted in Figure study (Table 4.4.2), the negative binomial
4.4.1, and the frequency distribution of NOH is distribution to model NOH with 31% chance of
shown in Table 4.4.1. Between 1972 and 2007, hartal per month is recommended.
Figure 4.4.1: Total Hartals in Dhaka City: 1972-2007 Table 4.4.1: Frequency
140 Distribution of NOH
125
Number of
120 111 NOH
101 Parliamentary Months
Democracy
100 0 257
NOH
80 1 86
Post- 54
60 2 41
Independence
40 3 14
13
20 9 4 9
0 5 8
72-75 76-81 82-90 91-96 97-02 03-07
6 10
7 or More 7
444
BANIK & KIBRIA
250
No. of Months
200
150
100
50
0
0 1 2 3 4 5 6 7 8 9
445
DISCRETE DISTRIBUTIONS AND THEIR APPLICATIONS WITH REAL LIFE DATA
446
BANIK & KIBRIA
447
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 448-462 1538 – 9472/09/$95.00
The purpose of this article is to provide an empirical comparison of rank-based normalization methods for
standardized test scores. A series of Monte Carlo simulations were performed to compare the Blom,
Tukey, Van der Waerden and Rankit approximations in terms of achieving the T score’s specified mean
and standard deviation and unit normal skewness and kurtosis. All four normalization methods were
accurate on the mean but were variably inaccurate on the standard deviation. Overall, deviation from the
target moments was pronounced for the even moments but slight for the odd moments. Rankit emerged as
the most accurate method among all sample sizes and distributions, thus it should be the default selection
for score normalization in the social and behavioral sciences. However, small samples and skewed
distributions degrade the performance of all methods, and practitioners should take these conditions into
account when making decisions based on standardized test scores.
Key words: Normalization; normalizing transformations; T scores; test scoring; ranking methods; Rankit; Blom;
Tukey; Van der Waerden; Monte Carlo.
448
SOLOMON & SAWILOWSKY
449
NORMALIZING TRANSFORMATIONS AND SCORE ACCURACY
more generally. Nor do the sample sizes Scaling standard scores of achievement
represent every possible increment. However, and psychometric tests has limited value. Most
both the sample size increments and the range of educational and psychological measurements are
distributional types are assumed to be sufficient ordinal (Lester & Bishop, 2000), but standard
for the purpose of outlining the absolute and scores can only be obtained for continuous data
comparative accuracy of these normalizing because they require computation of the mean.
transformations in real settings. Although the Furthermore, linear transformations retain the
interpretation of results need not be restricted to shape of the original distribution. If a variable’s
educational and psychological data, similar original distribution is Gaussian, its transformed
distributional types may be most often found in distribution will also be normal. If an observed
these domains. distribution manifests substantial skew,
For normally distributed variables, the excessive or too little kurtosis, or multimodality,
standardization process begins with the Z score these non-Gaussian features will be maintained
transformation, which produces a mean of 0 and in the transformed distribution.
a standard deviation of 1 (Walker & Lev, 1969; This is problematic for a wide range of
Mehrens & Lehmann, 1980; Hinkle, Wiersma, practitioners because it is common practice for
& Jurs, 2003). Z scores are produced by dividing educators to compare or combine scores on
the deviation score (the difference between raw separate tests and for testing companies to
scores and the mean of their distribution) by the reference new versions of their tests to earlier
standard deviation: Z = (X – μ) / σ . However, Z versions. Standard scores such as Z will not
scores can be difficult to interpret due to suffice for these purposes because they do not
decimals and negative numbers. Because 95% of account for differing score distributions between
the scores fall between -3 and +3, small changes tests. Comparing scores from a symmetric
in decimals may imply large changes in distribution with those from a negatively skewed
performance. Also, because half the scores are distribution, for example, will give more weight
negative, it may appear to the uninitiated that to the scores at the lower range of the skewed
half of the examinees obtained an extremely curve than to those at the lower range of the
poor outcome. symmetric curve (Horst, 1931). Normalizing
transformations are used to avoid biasing test
Linear versus Area Transformations score interpretation due to heteroscedastic and
Linear scaling remedies these problems asymmetrical score distributions.
by multiplying standard scores by a number
large enough to render decimal places trivial, Non-normality Observed
then adding a number large enough to eliminate According to Nunnally (1978), “test
negative numbers. Although standard scores scores are seldom normally distributed” (p.160).
may be assigned any mean and standard Micceri (1989) demonstrated the extent of this
deviation through linear scaling, the T score phenomenon in the social and behavioral
scale (ST = 10Z + 50) has dominated the scoring sciences by evaluating the distributional
systems of social and behavioral science tests for characteristics of 440 real data sets collected
a century (Cronbach, 1976; Kline, 2000; from the fields of education and psychology.
McCall, 1939). In the case of a normally Standardized scores from national, statewide,
distributed variable, the resulting T-scaled and districtwide test scores accounted for 40%
standard scores would have a mean of 50 and a of them. Sources included the Comprehensive
standard deviation of 10. In the context of Test of Basic Skills (CTBS), the California
standardized testing, however, T scores refer not Achievement Tests, the Comprehensive
to T-scaled standard scores but to T-scaled Assessment Program, the Stanford Reading
normal scores. In the T score formula, Z refers to tests, the Scholastic Aptitude Tests (SATs), the
a score’s location on a unit normal College Board subject area tests, the American
distribution—its normal deviate—not its place College Tests (ACTs), the Graduate Record
within the testing population. Examinations (GREs), Florida Teacher
Certification Examinations for adults, and
450
SOLOMON & SAWILOWSKY
Florida State Assessment Program test scores for The Importance of T Scores for the
3rd, 5th, 8th, 10th, and 11th grades. Interpretation of Standardized Tests
Micceri summarized the tail weights, Standardized test scores are notoriously
asymmetry, modality, and digit preferences for difficult to interpret (Chang, 2006; Kolen and
the ability measures, psychometric measures, Brennan, 2004; Micceri, 1990; Petersen, Kolen,
criterion/mastery measures, and gain scores. and Hoover, 1989). Most test-takers, parents,
Over the 440 data sets, Micceri found that only and even many educators, would be at a loss to
19 (4.3%) approximated the normal distribution. explain exactly what a score of 39, 73, or 428
No achievement measure’s scores exhibited means in conventional terms, such as pass/fail,
symmetry, smoothness, unimodality, or tail percentage of questions answered correctly, or
weights that were similar to the Gaussian performance relative to other test-takers. Despite
distribution. Underscoring the conclusion that the opaqueness of T scores relative to these
normality is virtually nonexistent in educational conventional criteria, they have the advantage of
and psychological data, none of the 440 data sets being the most familiar normal score scale, thus
passed the Kolmogorov-Smirnov test of facilitating score interpretation. Most normal
normality at alpha = .01, including the 19 that score systems are assigned means and standard
were relatively symmetric with light tails. The deviations that correspond with the T score. For
data collected from this study highlight the example, the College Entrance Board’s
prevalence of non-normality in real social and Scholastic Aptitude Test (SAT) Verbal and
behavioral science data sets. Mathematical sections are scaled to a mean of
Furthermore, it is unlikely that the 500 and a standard deviation of 100. T scores
central limit theorem will rehabilitate the fall between 20 and 80 and SAT scores fall
demonstrated prevalence of non-normal data sets between 200 and 800. The T score scale
in applied settings. Although sample means may facilitates the interpretation of test scores from
increasingly approximate the normal distribution any number of different metrics, few of which
as sample sizes increase (Student, 1908), it is would be familiar to a test taker, teacher, or
wrong to assume that the original population of administrator, and gives them a common
scores is normally distributed. According to framework.
Friedman (1937), “this is especially apt to be the The importance of transforming normal
case with social and economic data, where the scores to a scale that preserves a mean of 50 and
normal distribution is likely to be the exception a standard deviation of 10 calls for an empirical
rather than the rule” (p.675). comparison of normalizing transformations. This
There has been considerable empirical study experimentally demonstrates the relative
evidence that raw and standardized test scores accuracy of the Blom, Tukey, Van der Waerden,
are non-normally distributed in the social and and Rankit approximations for the purpose of
behavioral sciences. In addition to Micceri normalizing test scores. It compares their
(1989), numerous authors have raised concerns accuracy in terms of achieving the T score’s
regarding the assumption of normally distributed specified mean and standard deviation and unit
data (Pearson, 1895; Wilson & Hilferty, 1929; normal skewness and kurtosis, among small and
Allport, 1934; Simon, 1955; Tukey & large sample sizes in an array of real, non-
McLaughlin, 1963; Andrews et al., 1972; normal distributions.
Pearson & Please, 1975; Stigler, 1977; Bradley,
1978; Tapia & Thompson, 1978; Tan, 1982; Methodology
Sawilowsky & Blair, 1992). The prevalence of A Fortran program was written to compute
non-normal distributions in education, normal scores using the four rank-based
psychology, and related disciplines calls for a normalization formulas under investigation.
closer look at transformation procedures in the Fortran was chosen for its large processing
domain of achievement and psychometric test capacity and speed of execution. This is
scoring. important for Monte Carlo simulations, which
typically require from thousands to millions of
iterations.
451
NORMALIZING TRANSFORMATIONS AND SCORE ACCURACY
Normal scores were computed for each that best represented the distributional
successive iteration of randomly sampled raw characteristics described by Micceri (1989).
scores drawn from various real data sets. The The following five distributions were
resulting normal scores were then scaled to the drawn from achievement measures: Smooth
T. The first four moments of the distribution Symmetric, Discrete Mass at Zero, Extreme
were calculated from these T scores for each of Asymmetric – Growth, Digit Preference, and
the 14 sample sizes in each of the eight Multimodal Lumpy. Mass at Zero with Gap,
populations. Absolute values were computed by Extreme Asymmetric – Decay, and Extreme
subtracting T score means from 50, standard Bimodal were drawn from psychometric
deviations from 10, skewness values from 0, and measures. All eight achievement and
kurtosis values from 3. These absolute values psychometric distributions are nonnormal. These
were sorted into like bins and ranked in order of distributions are described in Table 2 and
proximity to the target moments. The values and graphically depicted in Figure 1.
ranks were averaged over the results from
10,000 simulations and reported in complete Results
tables (Solomon, 2008). Average root mean The purpose of this study was to compare the
square (RMS) values and ranks were also accuracy of the Blom, Tukey, Van der Waerden,
computed and reported for the target moments. and Rankit approximations in attaining the target
This paper summarizes the values and ranks for moments of the normal distribution. Tables 3, 4,
absolute deviation values and RMS, or and 5 present these results. Table 3 summarizes
magnitude of deviation. Together, deviation the major findings according to moment, sample
values and magnitude of deviation describe the size, and distribution. It presents values and
accuracy and stability of the Blom, Tukey, Van simplified ranks for the accuracy of the four
der Waerden, and Rankit approximations in normalizing methods on the first measure,
attaining the first four moments of the normal deviation from target moment. For example, the
distribution. T score’s target standard deviation is 10.
Therefore, two methods that produce a standard
Sample Sizes and Iterations deviation of 9.8 or 10.2 would have the same
Simulations were conducted on samples absolute deviation value: 0.2. The highest
of size n = 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, ranked method for each condition is the most
100, 200, 500, and 1,000 that were randomly accurate, having the smallest absolute deviation
selected from each of the eight Micceri (1989) value over 10,000 Monte Carlo repetitions. It is
data sets. Ten-thousand (10,000) iterations were possible to assign ranks on the mean despite the
performed to break any ties up to three decimal accuracy of all four normalization methods
places. because differences begin to appear at six
decimal places. However, all numbers are
Achievement and Psychometric Distributions rounded to the third decimal place in the tables.
Micceri (1989) computed three indices Table 3 shows that rank-based
of symmetry/asymmetry and two indices of tail normalizing methods are less accurate on the
weight for each of the 440 large data sets he standard deviation than on the mean, skewness,
examined (for 70% of which, n ≥ 1,000), or kurtosis. Furthermore, the standard deviation
grouped by data type: achievement/ability has more immediate relevance to the
(accounting for 231 of the measures), interpretation of test scores than the higher
psychometric (125), criterion/mastery (35), and moments. For these reasons, Tables 4 and 5 and
gain scores (49). Eight distributions were Figures 2 and 3 restrict their focus to the
identified based on symmetry, tail weight methods’ performance on the standard deviation.
contamination, propensity scores, and modality. Table 4 summarizes the methods’ proximity to
Sawilowsky, Blair, and Micceri (1990) the target standard deviation by distribution
translated these results into a Fortran subroutine type. Table 5 reports proximity for all eight
using achievement and psychometric measures distributions.
452
SOLOMON & SAWILOWSKY
3. Extreme Asymmetric
4 ≤ x ≤ 30 24.50 27.00 33.52 -1.33 4.11
– Growth
Psychometric
453
NORMALIZING TRANSFORMATIONS AND SCORE ACCURACY
Table 3: Deviation from Target, Summarized by Moment, Sample Size and Distribution
Moment
Blom Tukey Van der W. Rankit
Rank Value Rank Value Rank Value Rank Value
Mean 4 0.000 1 0.000 2 0.000 3 0.000
Standard Dev 2 1.142 3 1.186 4 1.603 1 1.119
Skewness 2 0.192 2 0.192 1 0.191 2 0.192
Kurtosis 2 0.947 3 0.941 4 0.952 1 0.930
Sample Size
Blom Tukey Van der W. Rankit
Rank Value Rank Value Rank Value Rank Value
5 ≤ 50 2 0.609 3 0.628 4 0.769 1 0.603
100 ≤ 1000 2 0.435 3 0.423 4 0.447 1 0.416
Distribution
Blom Tukey Van der W. Rankit
Rank Value Rank Value Rank Value Rank Value
Smooth Sym 2 0.393 3 0.411 4 0.531 1 0.391
Discr Mass Zero 2 0.404 3 0.421 4 0.539 1 0.403
Asym – Growth 2 0.453 3 0.470 4 0.583 1 0.452
Digit Preference 2 0.390 3 0.408 4 0.527 1 0.370
MM Lumpy 2 0.412 3 0.396 4 0.510 1 0.376
MZ w/Gap 2 1.129 3 1.126 4 1.204 1 1.113
Asym – Decay 2 0.726 3 0.739 4 0.835 1 0.725
Extr Bimodal 2 0.655 3 0.669 4 0.765 1 0.654
454
SOLOMON & SAWILOWSKY
Proximity to target includes deviation only possible for the deviation range on the
values, at the top of the Tables 4 and 5, and second and fourth moments, standard deviation
RMS values, at the bottom. RMS is an important and kurtosis. The first and third moments, mean
second measure of accuracy because it indicates and skewness, either contain zeros, which make
how consistently the methods perform. By transformations impossible, or lack sufficient
standardizing the linear distance of each variability to make curve fitting worthwhile.
observed moment from its target, RMS denotes To evaluate trends at larger sample
within-method magnitude of deviation. sizes, the small-sample regression models are
Respectively, the two accuracy measures, fitted a second time with the addition of four
deviation value and magnitude of deviation, sample sizes: n = 100, n = 200, n = 500, and n =
describe each method’s average distance from 1000. To whatever extent predictive patterns are
the target value and how much its performance established when n ≤ 50, those regression slopes
varies over the course of 10,000 random events. either improve in fit or continue to hold when
sample sizes increase. Figure 3 shows that
Predictive Patterns of the Deviation Range inclusion of larger sample sizes causes the
Figure 2 plots the range of deviation Smooth Symmetric power curve to remain intact
values for each distribution against a power and the Digit Preference power curve to improve
curve among small samples. Curve fitting is in fit.
Table 4: Proximity to Target Standard Deviation for Achievement and Psychometric Distributions
Deviation Value
Blom Tukey Van der Waerden Rankit
100 ≤ 100 ≤ 100 ≤ 100 ≤
5 ≤ 50 5 ≤ 50 5 ≤ 50 5 ≤ 50
1000 1000 1000 1000
455
NORMALIZING TRANSFORMATIONS AND SCORE ACCURACY
Table 5: Proximity to Target Standard Deviation for Small and Large Samples
Deviation Value
Blom Tukey Van der Waerden Rankit
100 ≤ 100 ≤ 100 ≤ 100 ≤
5 ≤ 50 5 ≤ 50 5 ≤ 50 5 ≤ 50
1000 1000 1000 1000
Smooth Sym 0.720 0.077 0.808 0.089 1.401 0.202 0.719 0.047
Discr MZ 0.736 0.082 0.823 0.094 1.414 0.208 0.734 0.073
Asym – Gro 0.829 0.247 0.914 0.260 1.489 0.356 0.827 0.237
Digit Pref 0.702 0.072 0.790 0.084 1.385 0.195 0.700 0.043
MM Lumpy 0.696 0.547 0.785 0.085 1.378 0.196 0.695 0.044
MZ w/Gap 3.651 2.804 3.711 2.815 4.117 2.896 3.647 2.795
Asym – Dec 1.668 0.420 1.743 0.425 2.244 0.458 1.666 0.417
Extr Bimod 1.469 0.921 1.543 0.931 2.045 1.011 1.467 0.912
Figure 2: Power Curves Fitted to the Deviation Range of the Standard Deviation at 10 Small Sample Sizes
456
SOLOMON & SAWILOWSKY
Figure 2 (Continued): Power Curves Fitted to the Deviation Range of the Standard Deviation at 10 Small
Sample Sizes
Figure 3: Power Curves Fitted to the Deviation Range of the Standard Deviation with Inclusion of Four
Large Sample Sizes
457
NORMALIZING TRANSFORMATIONS AND SCORE ACCURACY
458
SOLOMON & SAWILOWSKY
Table 6: Increase of True Score Range over Expected Separately, the influence of sample size
Score Range by Standard Error and distribution can make the worst
% Increase normalization method outperform the best one.
Together, their influence can distort the standard
Standard Van der
Rankit Blom Tukey deviation enough to render the T score
Error Waerden
distribution, and the test results, meaningless. In
± 0.5 548% 557% 574% 741% the best case scenario, Rankit would be used
among large samples of the Digit Preference
± 1.0 324% 328% 337% 421% distribution, where it is off target by 0.043
(Table 5). With a Z score of 2 and a standard
± 1.5 249% 252% 258% 314% error of ± 2, this leads to a true score range of
± 2.0 212% 214% 219% 260% 4.172, only 4% greater than the expected score
range. In the worst case scenario, Van der
± 2.5 190% 191% 195% 228% Waerden could be used among small samples of
the Mass at Zero with Gap distribution, where it
± 3.0 175% 176% 179% 207% is off target by 4.117. With the same Z score and
standard error, this combination produces a true
score range of 20.468, or 512% greater than the
Van der Waerden’s worst deviation expected score range. Clearly, a true score range
value — again, among psychometric of 20 is psychometrically unacceptable.Telling a
distributions at small sample sizes — is 12 times parent that her child scored somewhere between
higher than its best. Normalization performance a 60 and an 80 is equally pointless.
is so heavily influenced by sample size and
distribution type that Van der Waerden, which is Magnitude of Deviation on the Standard
the worst overall performer, produces much Deviation
more accurate standard deviations under the best Returning to the second accuracy
sample size and distributional conditions than measure, magnitude of deviation, Table 4 shows
Rankit does under the worst distributional how consistently the methods perform on the
conditions. Under these circumstances, Rankit’s standard deviation.1 Among achievement
worst deviation value is 10 times higher than distributions, they exhibit virtually no variability
Van der Waerden’s best deviation value. with large samples (RMS = 0.001) and slight
Table 5 illustrates this phenomenon variability with small samples (average RMS =
even more starkly. The overall best method, 0.015). Among psychometric distributions, the
Rankit, has its least accurate deviation value, pattern is the same but the magnitude of
3.647, among small samples of the psychometric deviation is greater for both large and small
distribution, Mass at Zero with Gap. Van der samples (average RMS = 0.094 and 0.529,
Waerden attains its most accurate deviation respectively). As expected, small samples and
value, 0.195, among large samples of the Digit psychometric distributions aggravate the
Preference achievement distribution. The best instability of each method’s performance and
method’s worst deviation value on any exacerbate the differences between them.
distribution is 19 times higher than the worst Average magnitude of deviation for small
method’s best value. This pattern holds samples is nearly six times greater than larger
independently for sample size and distribution. samples. Average magnitude of deviation for
Van der Waerden’s best deviation values are psychometric distributions is 39 times greater
superior to Rankit’s worst among small and than achievement distributions. Table 5 provides
large samples. Sample size exerts a strong RMS values for all eight distributions. It is
enough influence to reverse the standing of the notable that Extreme Asymmetric – Growth,
best- and worst-performing methods on every which is highly skewed, presents the highest
distribution. All four methods perform best with RMS value among achievement distributions,
Digit Preference and Multimodal Lumpy and although it is still lower than the psychometric
worst with Mass at Zero with Gap. distributions.
459
NORMALIZING TRANSFORMATIONS AND SCORE ACCURACY
460
SOLOMON & SAWILOWSKY
461
NORMALIZING TRANSFORMATIONS AND SCORE ACCURACY
462
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 463-468 1538 – 9472/09/$95.00
This study investigates how reliability (internal consistency) affects model-fitting in maximum likelihood
exploratory factor analysis (EFA). This was accomplished through an examination of goodness of fit
indices between the population and the sample matrices. Monte Carlo simulations were performed to
create pseudo-populations with known parameters. Results indicated that the higher the internal
consistency the worse the fit. It is postulated that the observations are similar to those from structural
equation modeling where a good fit with low correlations can be observed and also the reverse with
higher item correlations.
463
INTERNAL CONSISTENCY AND GOODNESS OF FIT ML FACTOR ANALYSIS
use of these estimates with speeded tests is The study employed the use of Monte
inappropriate due to the confounding of Carlo simulations to generate scores at different
construct measurement with testing speed. levels of reliability. The number of components
Furthermore, for tests that consist of scales extracted by each of the four procedures, scree
measuring different constructs, internal plot, Kaiser Rule, Horn’s parallel analysis
consistency should be assessed separately for procedure and modified Horn’s parallel analysis
each scale (Henson, 2001). procedure was determined at each reliability
level. In his study, Kanyongo (2006) found
Exploratory Factor Analysis (EFA) mixed results on the influence of reliability on
The primary purpose of EFA is to arrive the number of components extracted. However,
at a more parsimonious conceptual generally, when component loading was high, an
understanding of a set of measured variables by improvement in reliability resulted in
determining the number and nature of common improvement of the accuracy of the procedures
factors needed to account for the pattern of especially for variable-to-component ratio of
correlations among the measured variables 4:1.
(Fabringer, et al., 1999). EFA is based on the The Kaiser procedure showed the
common factor model (Thurstone, 1947). The greatest improvement in performance although it
model postulates that each measured variable in still had the worst performance at any given
a battery of measured variables is a linear reliability level. When the variable-to-
function of one or more common factors and one component ratio was 8:1, reliability did not
unique factor. impact the performance of the scree plot, Horn’s
Fabringer, et al. (1999) defined common parallel analysis (HPA) or modified Horn’s
factors as unobservable latent variables that parallel analysis (MHPA) since they were 100%
influence more than one measured variable in accurate at all reliability levels. When
the battery and accounts for the correlations component loading was low, it was not clear
among the measured variables, and unique what the impact of reliability was on the
factors as latent variables that influence only one performance of the procedures.
measured variable in a battery and do not The work of Fabringer, et al. (1999)
account for correlations among measured involved an examination of the use of
variables. Unique factors are assumed to have exploratory factor analysis (EFA) in
two components; a specific component and an psychological research. They noted that a clear
error of measurement component, the conceptual distinction exists between principal
unreliability in the measured variable. The factor analysis (PCA) and EFA. When the goal
common factor model seeks to understand the of the analysis is to identify latent constructs
structure of correlations among measured underlying measured variables, it is more
variables by estimating the pattern of relations sensible to use EFA than PCA. Also, in
between the common factor(s) and each of the situations in which a researcher has relatively
measured variables (Fabringer, et al., 1999). little theoretical or empirical basis to make
strong assumptions about how many common
Previous Work factors exist or what specific measured variables
Kanyongo investigated the influence of these common factors are likely to influence,
internal consistency on the number of EFA is probably a more sensible approach than
components extracted by various procedures in confirmatory factor analysis (CFA).
principal components analysis. Internal Fabringer, et al. (1999) pointed that in
consistency reliability coefficients are not direct EFA; sound selection of measured variables
measures of reliability, but are theoretical requires consideration of psychometric
estimates based on classical test theory. IC properties of measures. When EFA is conducted
addresses reliability in terms of consistency of on measured variables with low communalities,
scores across a given set of items. In other substantial distortion in results occurs. One of
words, it is a measure of the correlation between the reasons why variables may have low
subsets of items within an instrument. communalities is low reliability. Variance due to
464
KANYONGO & SCHREIBER
random error cannot be explained by common principal factors, iterated principal factors,
factors; and because of this, variables with low maximum likelihood), all three are reasonable
reliability will have low communality and approaches with certain advantages and
should be avoided. disadvantages. Nonetheless, the wide range of fit
Fabringer, et al. (1999) also noted that indexes available for ML EFA provides some
although there are several procedures that are basis for preferring this method” (p.283). Since
available for model-fitting in EFA, the ML EFA has potential to provide misleading
maximum likelihood (ML) method of factor results when assumptions of multivariate
extraction is becoming increasingly popular. ML normality are severely violated, the
is a procedure used to fit the common factor recommendation is that the distribution of the
model to the data in EFA. ML allows for the measured variables should be examined prior to
computation of a wide range of indices of the using the procedure. If non-normality is severe
goodness of fit of the model. ML also permits (skew>2; kurtosis>7), measured variables
statistical significance testing of factor loadings should be transformed to normalize their
and correlations among and the computation of distributions (Curran, West & Finch, 1996).
confidence intervals for these parameters Fabringer, et al. (1999) noted that the
(Cudeck & O’Dell, 1994). Fabringer, et al. root mean square error of approximation
(1999) pointed out that the ML method has a (RMSEA) fit index and the expected cross-
more formal statistical foundation than the validation index (ECVI) provide a promising
principal factors methods and thus provides approach for assessing fit of a model in
more capabilities for statistical inference, such determining the number of factors in EFA. They
as significance testing and determination of recommended that “In ML factor analysis; we
confidence intervals. encourage the use of descriptive fit indices such
In their work, Frabinger, et al. (1999) as RMSEA and ECVI along with more
further stated that, ideally, the preferred model traditional approaches such as the scree plot and
should not just fit the data substantially better parallel analysis” (p.283). Based on this
than simple models and as well as more complex recommendation, this study uses these fit indices
models. The preferred model should fit the data along with the scree plot and parallel analysis to
reasonably well in an absolute sense. A statistic assess the accuracy of determining the number
used for assessing the fit of a model in ML of factor at a given level of reliability.
factor analysis solutions is called a goodness of
fit index. Research Question
There are several fit indices used in ML The main research question that this
factor analysis and one of them is the likelihood study intends to answer is: As the internal
ratio statistic (Lawley, 1940). If sample size is consistency of a set of items increases, does the
sufficiently large and the distributional fit of the data to the exploratory factor analysis
assumptions underlying ML estimation are improve? To answer this question, a Monte
adequately satisfied, the likelihood ratio statistic Carlo simulation study was conducted which
approximately follows a Chi-square distribution involved the manipulation of component
if the specified number of factors is correct in reliability (ρxx’) loading (aij), variable-to-
the population (Fabringer, et al., 1999). They component ratio (p:m). The number of variables
noted that, if this is not the case, a researcher (p) was made constant at 24 to represent a
should exercise caution in interpreting the moderately large data set.
results because a preferred model that fits the
data poorly might do so and because the data do Methodology
not correspond to assumptions of the common The underlying population correlation matrix
factor model.Alternatively, it might suggest the was generated for each possible p, p:m and aij
existence of numerous minor common factors. combination, and the factors upon which this
Fabringer, et al. also suggested that “with population correlation matrix was based were
respect to selecting one of the major methods of independent of each other. RANCORR program
fitting the common factor model in EFA (i.e.,
465
INTERNAL CONSISTENCY AND GOODNESS OF FIT ML FACTOR ANALYSIS
by Hong (1999) was used to generate the increased; an indication that the model fit
population matrix as follows. became poor. Based on Hu and Bentler’s (1999)
The component pattern matrix was recommendation that the cutoff for a good fit be
specified with component loading of .80 and RMSEA <= 0.06, results here show that only
variable-to-component ratio of 8:1. After alpha of 0.6 had a good fit. The goodness-of-fit
specifying the component pattern matrix and the indices therefore suggest that the three-factor
program was executed, a population correlation model is acceptable at alpha value of 0.6.
matrix is produced. After the population
correlation matrix was generated as described in
the above section, the MNDG program (Brooks, Table 1: Goodness-of-Fit Indices
2002) was then used to generate samples from Chi-Square
Alpha RMSEA ECVI
the population correlation matrix. Three data (df)
sets for reliability of .60, .80, and 1.00, each 247.153
consisting of 24 variables and 300 cases were .6 .025 1.44
(207)
generated. Based on the variable-to-component
ratio of 8:1, each dataset had 3 factors built in. 436.535
.8 .061 2.07
(207)
Analysis 736.385
An exploratory factor analysis, 1.0 .092 3.07
(207)
maximum likelihood, varimax rotation and a
three factor specification, was used for each of
the three data sets; coefficient alpha = .6, .8, and
1.0. Two goodness-of-fit indices were chosen Along with goodness-of-fit-indices, the dataset
for this analysis RMSEA and ECVI. RMSEA with the best fit was submitted to principal
was chosen because it is based on the predicted components analysis through the scree plot and
versus observed covariances which is parallel analysis. Results of the scree plot
appropriate given that nested models are not analysis are displayed in Figure 1 while parallel
being compared. analysis results are shown in Table 2. The scree
Hu and Bentler (1999) suggested plot shows a sharp drop between the third and
RMSEA <= .06 as the cutoff for a good model fourth eigenvalues; an indication that there were
fit. RMSEA is a commonly utilized measure of three distinct factors in the data sets. These
fit, partly because it does not require comparison results confirm the three-factor model as the best
with a null model. ECVI was chosen because it model for these data.
is based on information theory; the discrepancy To interpret results of parallel analysis,
between models implied and observed real data eigenvalues must be larger than random
covariance matrices: the lower the ECVI, the data eigenvalues for them to be considered
better the fit. Finally, the Chi-square and degrees meaningful eigenvalues. Table 2 shows that the
of freedom are provided for each analysis. The first three eigenvalues expected for random data
data were also submitted to a PCA using the (1.55, 1.46 and 1.39) fall below the observed
scree plot and parallel analysis to assess the eigenvalues for all the three values of alpha.
accuracy of determining the number of common However, the forth eigenvalue of the random
factors underlying the data sets. data (1.34) is greater than the observed
eigenvalues of all the three alpha values. Again,
Results the results further confirm the three-factor model
The results in Table 1 show that the two as the best model.
measures of goodness-of-fit used in this study
(RMSEA) and (ECVI) both display the same Conclusion
pattern; the smaller the alpha, the better the Results in this study were inconsistent with our
model fit. The best fit was obtained for alpha of original ideas of the pattern of goodness of fit
0.6, RMSEA (0.025) and ECVI (1.44). As alpha and internal consistency. It was anticipated that
increased from 0.6 to 1.0, both indices high internal consistency would yield a better fit.
466
KANYONGO & SCHREIBER
4
Eigenvalue
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Component Number
However, the findings seem logical because correlations, the more likely it is to incorrectly
based on one author’s experience with structural specify the model and observe a poor fit based
equation modeling, a perfect fit can exist when on the indices. Second, if correlations are low,
all variables in the model were completely the researcher may lack the power to reject the
uncorrelated if the variances are not constrained. model at hand.
Also, the lower the correlations stipulated in the Also, results seem to confirm what other
model, the easier it is to find good fit. The researchers have argued in the literature. For
stronger the correlations, the more power there example, Cudeck and Hulen (2001) noted that if
is within structural equation modeling to detect a group of items has been identified as one-
an incorrect model. In essence, the higher the dimensional, the internal consistency of the
467
INTERNAL CONSISTENCY AND GOODNESS OF FIT ML FACTOR ANALYSIS
collection of items need not be high for factor Cudek, R., & Hulen, C. (2001).
analysis to be able to identify homogenous sets Measurement. Journal of Consumer Psychology,
of items in a measuring scale. Test reliability is a 10, 55–69.
function of items. Therefore, if only a few items Cudek, R., & O’Dell, L. L. (1994).
have been identified as homogeneous by factor Applications of standard error estimates in
analysis, their reliability may not be high. unrestricted factor analysis: Significance tests
If ML with exploratory factor analysis for factor loadings and correlations.
including goodness-of-fit analyses are to be used Psychological Bulletin, 115, 475-487.
more extensively in the future, a great deal of Curran, P. J., West, S. G., & Finch, J. F.
work must to be done to help researchers make (1996). The robustness of test statistics to
good decisions. This assertion is supported by nonnormality and specification error in
Fabringer, et al. (1999) who noted that, confirmatory factor analysis. Psychological
“although these guidelines for RMSEA are Methods, 1, 16-29.
generally accepted, it is of course possible that Fabringer, R. L., Wegener, D. T.,
subsequent research might suggest MacCallum, R. C., & Strahan, E. J. (1999).
modifications” (p.280). Evaluating the use of exploratory factor analysis
in psychological research. Psychological
Limitations of Current Research Methods, 4, 272-299.
Since the study involved simulations, Henson, R. K. (2001). Understanding
the major limitation of the study, like any other internal consistency reliability estimates: A
simulation study is that the results might not be conceptual primer on coefficient alpha.
generalizable to other situations. This is Measurement and Evaluation in Counseling and
especially true considering the fact that the Development, 34, 177-189.
manipulation of the parameters for this study Hong, S. (1999). Generating correlation
yielded strong internal validity thereby matrices with model error for simulation studies
compromising external validity. However, in factor analysis: A combination of the Tucker-
despite this limitation, the importance of the Koopman-Linn model and Wijsman’s algorithm.
findings cannot be neglected because they help Behavior Research Methods, Instruments &
inform researchers on the need to move away Computers, 31, 727-730.
from relying entirely on internal consistency as a Hu, L., & Bentler, M. (1999). Cutoff
measure of dimensionality of data to an criteria for fit indexes in covariance structure
approach where other analyses are considered as analysis: Conventional criteria versus new
well. This point was reiterated by Cudeck and alternatives. Structural Equation Modeling, 6, 1-
Hulin (2001) who stated that a reliable test need 55.
not conform to a one-factor model and Kanyongo, G. Y. (2006). The influence
conversely items that fit a single common factor of reliability of four rules for determining the
may have low reliability. number of components to retain. Journal of
Modern Applied Statistical Methods, 2, 332-343.
References Lawrey, D. N. (1940). The estimation of
Allen, M. J., & Yen, W. M. (1979). factor loadings by the method of maximum
Introduction to measurement theory. Monterey, likelihood. Proceedings of the Royal Society of
CA: Brooks/Cole. Edinborough, 60A, 64-72.
Brooks, G. P. (2002). MNDG. Thurstone, L. L. (1947). Multiple factor
[http://oak.cats.ohiou.edu.edu/~brooksg/mndg.ht analysis. Chicago, IL: University of Chicago
m]. (Computer Software and Manual). Press.
468
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 469-477 1538 – 9472/09/$95.00
The power and type I error rates of eight indices for lag-one autocorrelation detection were assessed for
interrupted time series experiments (ITSEs) with small numbers of data points. Performance of Huitema
and McKean’s (2000) zHM statistic was modified and compared with the zHM, five information criteria and
the Durbin-Watson statistic.
469
LAG-ONE AUTOCORRELATION DETECTION
Autocorrelation N −1
470
RIVIELLO & BERETVAS
471
LAG-ONE AUTOCORRELATION DETECTION
approximately normally distributed and does not evaluation that compares the two models’ fit can
have an inconclusive region. The test statistic be then be conducted using an information
was evaluated for its use to test residuals from criterion such as Akaike’s Information Criterion
ITSE models that have one to four phases. (AIC):
Huitema and McKean’s test statistic is defined
as: AIC = −2 Log ( L ) +2k (14)
472
RIVIELLO & BERETVAS
473
LAG-ONE AUTOCORRELATION DETECTION
474
RIVIELLO & BERETVAS
475
LAG-ONE AUTOCORRELATION DETECTION
statistical significance of these coefficients. The noted in this study might be of minimal
current evaluation could be extended further by importance.
comparing estimation of the OLS versus As with most simulation studies, results
autoregressive model coefficients and their SEs are limited by the conditions investigated: the
for different levels of autocorrelation. This could values of the β 2 and β 3 coefficients (see
help inform the current study’s type I error and
Equation 20) do not seem to have much effect
power results by indicating the magnitude of the
on identification of ρ1, but it should be
effect of incorrect modeling of autocorrelation.
investigated whether this is really the case or
For example, if only a small degree of accuracy
whether it just appears that way from the limited
and precision is gained by modeling the
range of values of β 2 and β 3 that were chosen
autocorrelation for a certain value of ρ1 , then it
may not matter that the model selection criteria in this study. One of the main limitations of this
has low power at that value. Similarly, if an study is that it considers only the two-phase
insubstantial degree of accuracy and precision ITSE data and only investigated first-order
results from false identification of autocorrelation. Another important limitation is
autocorrelation, then the type I error inflation that performance was evaluated only for a small
subset of possible data trends. All conditions
included a slight positive linear trend in
476
RIVIELLO & BERETVAS
baseline. In addition, the only model Fan, X., Felsovalyi, A., Keenan, S. C.,
misspecification assessed was whether the & Sivo, S. (2001). SAS for Monte Carlo studies:
residuals were autocorrelated. A guide for quantitative researchers. Cary, NC:
Future research should investigate use of SAS Institute, Inc.
the zHM and z HM +
for further misspecified Hannon, E. J., & Quinn, B. G. (1979).
The determination of the order of an
models including when a true non-linear trend is
autoregression. Journal of the Royal Statistical
ignored to mimic asymptotic trends resulting
Society, Series B, 41, 190-195.
from ceiling or floor effects. The performance of
Huitema, B. E. (1985). Autocorrelation
these statistics could also be assessed for ITSEs
in applied behavior analysis: A myth.
with more than two phases (e.g., for ABAB
Behavioral Assessment, 7, 107-118
designs) as investigated by Huitema and
Huitema, B. E., & McKean, J. W.
McKean (2000). This study also only
(1991). Autocorrelation estimation and inference
investigated conditions in which the treatment
with small samples. Psychological Bulletin, 110,
and baseline phases had equal numbers of data
291-304.
points (nB = nA). Single-subject studies
Huitema, B. E., & McKean, J. W.
frequently entail unequal sample sizes per phase
(2000). A simple and powerful test for
and the effect of uneven n should be
autocorrelated errors in OLS intervention
investigated.
models. Psychological Reports, 87, 3-20.
Based on the results of this study,
Huitema, B. E., Mckean, J. W., &
researchers interested in modeling linear growth
McKnight S. (1999). Autocorrelation effects on
in ITSE data with a small number of data points
+
least-squares intervention analysis of short time
should use z HM or zHM to test for the presence series. Educational and Psychological
of lag-one autocorrelation. Researchers are Measurement, 59, 767-786.
cautioned against using the Durbin-Watson test Hurvich, C. M., & Tsai, C. L. (1989).
statistic and the various information criteria Regression and time series model selection in
evaluated here including the AIC, HQIC, SBC, small samples. Biometrika, 76, 297-307.
AICC, DW and the CAIC for two-phase ITSEs Kendall, M. G. (1954). Note on bias in
with Ns less than 50. the estimation of autocorrelation. Biometrika,
41, 403-404.
References Marriott, F. H. C., & Pope, J. A. (1954).
Bartlett, M. S. (1946). On the theoretical Bias in the estimation of autocorrelations.
specification and sampling properties of Biometrika, 41, 390-402.
autocorrelated time-series. Journal of the Royal SAS Institute, Inc. (2003). SAS
Statistical Society, 8, 27-41. (Version 9.1) [Computer Software]. Cary, NC:
Bozdogan, H. (1987). Model selection SAS Institute, Inc.
and Akaike’s information criterion (AIC): The Schwarz, G. (1978). Estimating the
general theory and its analytical extensions. dimension of a model. Annals of Statistics, 6,
Psychometrika, 52, 345-370. 461-464.
Busk, P. L., & Marascuilo, L. A. (1988). Tawney, L., & Gast, D. (1984). Single-
Autocorrelation in single-subject research: A subject research in special education.
counterargument to the myth of no Columbus, OR: Merrill.
autocorrelation. Behavioral Assessment, 10, 229- White, J. S. (1961). Asymptotic
242. expansions for the mean and variance of the
serial correlation coefficient. Biometrika, 48, 85-
94.
477
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 478-487 1538 – 9472/09/$95.00
Assumes the life distribution of a test unit for any stress follows a Rayleigh distribution with scale
parameter θ , and that Ln (θ ) is a linear function of the stress level. Maximum likelihood estimators of
the parameters under a cumulative exposure model are obtained. The approximate variance estimates
obtained from the asymptotic normal distribution of the maximum likelihood estimators are used to
construct confidence intervals for the model parameters. A simulation study was conducted to study the
performance of the estimators. Simulation results showed that in terms of bias, mean squared error,
attainment of the nominal confidence level, symmetry of lower and upper error rates and the expected
interval width, the estimators are very accurate and have a high level of precision.
Key words: Accelerated life test, Cumulative exposure model, Rayleigh distribution, Maximum
likelihood estimation, Step-stress.
478
AL-HAJ EBRAHEM & AL-MASRI
G1 (τ ) = G2 ( s) . θ
, y > 0, θ > 0 (1)
f ( y, θ ) = θ e
Most of the available literature on a 0 , otherwise
step-stress accelerated life testing deals with the
exponential exposure model. Khamis and and
Higgins (1996, 1998) proposed a new model 0 ,y<0
known as KH model for step-stress accelerated F ( y, θ ) = − y2 (2)
life test as an alternative to the Weibull 1 − e θ ,y >0
cumulative exposure model.
Miller and Nelson (1983) obtained the The following assumptions are understood:
optimum simple step-stress accelerated life test
plans for the case where the test units have 1. Under any stress the lifetime of a test unit
exponentially distributed lifetimes. Bai, Kim and follows a Rayleigh distribution.
Lee (1989) extended the results of Miller and 2. Testing is conducted at stresses X1 and X2,
Nelson (1983) to the case of censoring. Khamis where X1 < X2.
and Higgins (1996) obtained the optimum 3-step 3. The relationship between the parameter θ i
step-stress using the exponential distribution.
Alhadeed and Yang (2005) obtained the and the stress Xi is given by
optimum design for the lognormal step-stress Ln(θ i ) = β 0 + β 1 X i , where β 0 and β 1 are
model. Al-Haj Ebrahem and Al Masri (2007(a)) unknown parameters to be determined from
obtained the optimum simple step-stress plans the test data.
for the log-logistic cumulative exposure model, 4. The lifetimes of test units are independent
by minimizing the asymptotic variance of the and identically distributed.
maximum likelihood estimate of a given 100 P- 5. All n units are initially placed on low stress
th percentile of the distribution at the design X1 and run until time τ when the stress is
stress. changed to high stress X2. At X2 testing
Al-Haj Ebrahem and Al Masri (2007(b)) continues until all remaining units fail.
obtained the optimum simple step-stress plans
for the log-logistic distribution under time Verification that the Rayleigh cumulative
censoring. Xiong (1998) presented the exposure model for step-stress is given by:
inferences of parameters in the simple step-
stress model in accelerated life testing with type
2
−y
two censoring. Xiong and Milliken (2002)
studied statistical models in step-stress 1− e θ1 ,0 < y <τ
accelerated life testing when stress change time G( y) = θ2
2
are random and obtained the marginal life − y−τ +τ (3)
distribution for test units. Nonparametric
θ1
1− e θ2 ,τ < y < ∞
approaches for step-stress testing have been
proposed by Shaked and Singurwalla (1983) and
Schmoyer (1991). For additional details, see
Chung and Bai (1998) and Gouno (2001). This If T = Y 2 , then the cumulative exposure model
article considers point and interval estimation of of T is given by:
Rayleigh cumulative exposure model
parameters. −t
1 − e θ1 ,0 < t < τ 2
Model and Assumptions
G(t ) = θ2
2
(4)
The probability density function and the − t −τ +τ
cumulative distribution function of the Rayleigh θ1
1 − e θ2
,τ 2 < t < ∞
distribution are given respectively by:
479
RAYLEIGH MODEL PARAMETER ESTIMATION IN STEP-STRESS TESTING
e 1
, 0 < t < τ1 obtained by solving numerically the following
θ 1 two equations:
τ 1θ 2 − τ 1θ 2
2
t − τ1 + t − τ1 +
θ1
∂LnL(t i j , β 0 , β1 )
θ1 =0 (9)
e
θ2
,τ 1 < t < ∞ ∂β 0
θ2 t
(6) ∂LnL(t i j , β 0 , β 1 )
=0 (10)
∂β 1
Methodology
Model Parameters Estimation
In order to construct confidence intervals for the
Let t i j , j = 1, 2, ..., ni , i = 1, 2 be the
model parameters, the asymptotic normality of
observed lifetime under low and high stress, the maximum likelihood estimates are used. It is
where n1 denotes the number of units failed at known that:
the low stress X1 and n 2 denotes the number of
units failed at the high stress X2. The Likelihood (
( βˆ0 , βˆ1 ) ~ N ( β 0 , β1 ), Fˆ −1 )
function is given by:
where Fˆ −1 denotes the inverse of the observed
L (ti j , β 0 , β 1 ) = Fisher information matrix F̂ . The observed
n1 n2 2
τ 1θ 2 Fisher information matrix F̂ is obtained by
t1 j
t2 j − τ 1 +
θ1
− j =1
− j =1 evaluating the second and mixed partial
θ1 θ2
θ 1− n1θ 2− n 2 e derivatives of LnL (t i j , β 0 , β 1 ) at the
n2 τ1 τ 1θ 2 maximum likelihood estimates β̂ 0 and βˆ1 , that
*∏ 1 − +
is:
t2 j t 2 jθ 1
j =1 Fˆ Fˆ12
(7) Fˆ = 11
ˆ Fˆ22
where F21
480
AL-HAJ EBRAHEM & AL-MASRI
481
RAYLEIGH MODEL PARAMETER ESTIMATION IN STEP-STRESS TESTING
482
AL-HAJ EBRAHEM & AL-MASRI
483
RAYLEIGH MODEL PARAMETER ESTIMATION IN STEP-STRESS TESTING
484
AL-HAJ EBRAHEM & AL-MASRI
485
RAYLEIGH MODEL PARAMETER ESTIMATION IN STEP-STRESS TESTING
486
AL-HAJ EBRAHEM & AL-MASRI
487
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 488-504 1538 – 9472/09/$95.00
Douglas Landsittel
University of Pittsburgh
In a previous simulation study, the complexity of neural networks for limited cases of binary and
normally-distributed variables based the null distribution of the likelihood ratio statistic and the
corresponding chi-square distribution was characterized. This study expands on those results and presents
a more general formulation for calculating degrees of freedom.
488
LANDSITTEL
parameter values, or eliminating variables with a the mean under a chi-square distribution, is also
small effect on predicted values (i.e. sensitivity displayed for each simulation condition.
methods). Bayesian approaches have also been
proposed (Ripley, 1996; Paige & Butler, 2001) Methodology
for model selection with neural networks. The Feed-Forward Neural Network Model
Despite the noted advances, implementation of This study focuses strictly on a single
such methods has been limited by either Bernoulli outcome, such as presence or absence
computational issues, dependence on the of disease. All neural network models utilized a
specified test set, or lack of distributional theory. feed-forward structure (Ripley, 1996) with a
As a result, there are no established procedures single hidden layer so that
for variable selection or determination of the
optimal network structure (e.g. the number of H p
previously cited literature primarily focuses on the jth hidden unit. The response function of the
continuous outcomes. In cases where the neural network model can thus be viewed as a
likelihood ratio can be adequately approximated logistic of these hidden unit values. In terms of
by a chi-square distribution, the degrees of
further terminology, the parameters v0 , v1 ,..., vH
freedom can be used to quantify neural network
model complexity under the null. Derivation of are referred to as the connections between the
the test statistic null distribution is pursued hidden and output layer and each set of other
through simulation approaches, rather than parameters, w j1 , w j 2 ,..., w jp , are referred to as
theoretical derivations, because of the the connections between the inputs and hidden
complexity of the network response function and units, where there are p covariate values specific
the lack of maximum likelihood or other each of the p hidden units. This described model
globally optimal estimation. structure often leads to categorization of neural
The two main objectives of this networks as a black box technique. None of the
simulation study are to: (1) verify that the chi- parameter values directly correspond to any
square distribution provides an adequate specific main effect or interaction. Further, the
approximation to the empirical test statistic degree of non-linearity cannot be explicitly
distribution in a limited number of simulated determined from the number of hidden units or
cases, and (2) quantify how the distribution, any easily characterized aspect of the model.
number of covariates and the number of hidden The optimal model coefficients were
units affects model degrees of freedom. calculated via back-propagation (Rumelhart, et
Adequacy of the chi-square approximation will al., 1995) and the nnet routine in S-Plus
be judged by how close the α -level based on (Venables & Ripley, 1997), which iteratively
the simulation distribution (i.e., the percent of updates weights using a gradient descent-based
the test statistic distribution greater than the algorithm. For a Bernoulli outcome,
corresponding chi-square quantile) is to various optimization is based on the minimization of the
percentiles of the chi-square distribution. The deviance (D),
variance, which should be approximately twice
489
ESTIMATING MODEL COMPLEXITY OF FEED-FORWARD NEURAL NETWORKS
490
LANDSITTEL
491
ESTIMATING MODEL COMPLEXITY OF FEED-FORWARD NEURAL NETWORKS
Table 1 shows the simulated distribution units), and (2) the simulated distribution fell
of the likelihood ratio test for independence is extremely close to the corresponding chi-square
closely followed by a chi-square distribution. In distribution (7 inputs and 10 hidden units). Both
a large percentage of the cases, all of the figures show noticeable variability at the upper
simulated α-levels were within 1-3% of the end of the distribution; however, it should be
expected percentiles. No systematic differences noted that these few points are primarily within
were evident in the results. Figures 1a and 1b only the top 1% of the distribution, and thus
illustrate two examples where: (1) the simulated have little effect on most of the resulting
distribution varied a few percent from the significance levels.
expected percentiles (2 inputs and 2 hidden
Figure 1a: Example Q-Q plot (with 2 Binary Inputs, 2 HUs and 0.01 WD)
Illustrating Greater Variability from the Expected Chi-square Distribution
14
12
10
Quantiles
8
6
4
2
0
0 5 10 15
Likelihood Ratio
Figure 1b: Example Q-Q plot (with 7 Binary Inputs, 10 HUs and 0.01 WD)
Illustrating a Close Correspondence with the Expected Chi-square Distribution
160
140
Quantiles
120
100
80
Likelihood Ratio
492
LANDSITTEL
Table 2 shows additional simulation results for the maximum number of terms. In general, the
other cases (with at most 10 inputs) where the ratio of degrees of freedom by number of model
number of model parameters (p) is less than the parameters may be expressed as a function of m
maximum number of terms (m). Results again – p. To produce a linear relationship, it is more
showed that the simulated α-levels were very convenient (with binary inputs) to express df/p
close the expected percentiles. Q-Q plots for as a function of log2(m-p). The simulated
these cases (not shown here) resulted in similar degrees of freedom from Table 1 was used to
findings as displayed in Figures 1a and 1b. No derive a relationship, and Figure 2 shows a plot
additional simulations are shown here for two, of the simulated data from Table 1 (with 2, 5 or
three or four inputs because these cases all 10 hidden units) overlaid with the linear
corresponded to models where p > m, and will regression line
therefore produce a known degrees of freedom
(3, 7, and 15 for 2, 3 and 4 inputs, respectively). df
Simulations were conducted, but not shown = 0.6643 + 0.1429 × log 2 ( m − p ) . (5)
p
here, to verify that such results would hold for 4
hidden units (since p was only slightly greater Figure 2 shows a general trend between the
than m in the case of 3 hidden units); results difference in m – p and the degrees of freedom
verified the expected finding (of 15 degrees of (divided by the number of parameters), but also
freedom). Other combinations leading to a illustrates some variability between the
known degrees of freedom (with p > m) were simulated values and the subsequent estimates.
also excluded from the table (including 5 inputs To evaluate the significance of these
with 6-9 hidden units and 6 inputs with 8-9 discrepancies, the estimated df were compared to
hidden units). the simulated distribution of the likelihood ratio
statistic (for model independence). Results are
Estimating Degrees of Freedom for Binary Input shown in Table 3.
Variables Results indicate that the estimated df
The above results indicate that the model usually approximates the simulated value within
degrees of freedom for strictly binary inputs an absolute error of a few percent. For example,
appear to intuitively depend on two factors: most of the conditions (11 of 16) result in a 5%
(1) the maximum number of possible main significance level between 0.03 and 0.07; the
effects and interactions = (2k – 1), and largest discrepancy is an absolute difference of
(2) the number of model parameters = h(k 0.04 from the true 5% level. The 10%
+ 1) + (h + 1). significance level corresponds to somewhat
In cases where the number of model parameters larger errors, with the estimated p-values as high
is sufficient to fit all main effects and as 0.17 and as low as 0.02. The 75th, 50th and
interactions, the degrees of freedom is equal to 25th percentiles showed similar findings with
that maximum number of terms. For example, occasionally substantial discrepancies.
regardless of the number of hidden units, the The above rule for estimating the df,
degrees of freedom (df) are approximately 3.0 based on the previously fit linear regression of
for two binary inputs and approximately 7.0 for df/p as a function of log2(m – p), was also
three binary inputs. For four binary inputs, two evaluated with respect to its adequacy to predict
hidden units (and subsequently 13 parameters) model complexity for new cases (with 3, 4, or 6-
are insufficient to fit all 15 terms and result in 9 hidden units). Figure 3 shows a plot of these
approximately 12 df. additional simulated data overlaid with the linear
In such cases, where the number of same regression line df/p = 0.6643 +
model parameters is less than the maximum 0.1429·log2(m – p).
number of terms, the df is generally in between Figure 3 shows the trend between the
(or at least very close to) the number of model difference in m – p and the df (divided by the
parameters (p) and the maximum number of number of parameters), but again illustrates
terms (m). Exactly where the df falls depends on variability between the simulated values and the
how close the number of model parameters is to
493
ESTIMATING MODEL COMPLEXITY OF FEED-FORWARD NEURAL NETWORKS
Table 2: Additional Simulated Likelihood Ratio Statistics with All Binary Inputs
Inputs Hidden Likelihood Ratio Simulated α-levels
#Parameters
(Max # Terms) Units Mean Variance 0.75 0.50 0.25 0.10 0.05
5 3 22 23.84 47.66 0.73 0.50 0.26 0.10 0.06
(31) 4 29 28.69 54.80 0.74 0.50 0.26 0.12 0.06
3 25 34.57 67.28 0.74 0.49 0.25 0.10 0.06
6 4 33 42.81 96.51 0.77 0.51 0.24 0.08 0.03
(63) 6 49 55.86 109.8 0.74 0.50 0.26 0.09 0.05
7 57 59.60 110.0 0.77 0.49 0.26 0.11 0.05
3 28 45.93 92.58 0.75 0.50 0.23 0.10 0.05
4 37 58.06 111.6 0.75 0.50 0.24 0.10 0.05
7 6 55 82.27 148.3 0.75 0.49 0.26 0.11 0.06
(127) 7 64 92.84 175.6 0.74 0.51 0.27 0.10 0.05
8 73 102.5 189.5 0.75 0.51 0.25 0.09 0.06
9 82 111.7 224.0 0.76 0.51 0.24 0.10 0.06
3 31 54.90 101.0 0.75 0.50 0.26 0.11 0.07
4 41 73.02 148.2 0.75 0.52 0.23 0.08 0.04
8 6 61 107.8 223.0 0.75 0.49 0.24 0.09 0.05
(255) 7 71 124.8 258.3 0.76 0.50 0.25 0.10 0.03
8 81 139.7 238.2 0.71 0.52 0.28 0.12 0.06
9 91 155.0 268.0 0.73 0.52 0.24 0.13 0.08
3 34 65.13 135.0 0.77 0.50 0.24 0.10 0.05
4 45 87.02 179.6 0.76 0.51 0.25 0.09 0.04
9 6 67 131.4 228.8 0.73 0.51 0.27 0.10 0.06
(511) 7 78 152.3 286.6 0.74 0.50 0.25 0.10 0.06
8 89 171.8 338.5 0.74 0.51 0.26 0.11 0.05
9 100 194.7 303.6 0.72 0.50 0.27 0.14 0.08
3 37 75.5 163.5 0.76 0.51 0.25 0.08 0.03
4 49 100.9 190.8 0.73 0.52 0.26 0.10 0.05
10 6 73 152.7 297.1 0.77 0.50 0.24 0.10 0.05
(1023) 7 85 178.5 341.9 0.74 0.51 0.24 0.09 0.06
8 97 204.8 430.0 0.77 0.51 0.24 0.08 0.04
9 109 230.0 425.7 0.74 0.52 0.25 0.10 0.06
Mean Simulated α -levels 0.75 0.51 0.25 0.10 0.05
494
LANDSITTEL
Figure 2: Plot of the Degrees of Freedom for Binary Inputs (2, 5, and 10 Hidden Units) as a
Function of the Difference between Maximum Number of Terms and Number of Parameters
2.0
1.8
degrees of freedom / p
1.6
1.4
1.2
1.0
2 4 6 8 10
Log Base 2 of m - p
Table 3: Comparison of Estimated to Simulated Degrees of Freedom with All Binary Inputs
Max # of Simulated α-levels using the Estimated df
# of Hidden Simulated Estimated
Terms
Parameters Units df df 0.75 0.50 0.25 0.10 0.05
(Inputs)
15
13 2 11.94 10.49 0.66 0.41 0.19 0.07 0.05
(4)
31
15 2 18.36 18.54 0.74 0.49 0.25 0.12 0.07
(5)
63 17 2 25.07 24.71 0.73 0.51 0.30 0.15 0.08
(6) 41 5 50.63 53.36 0.67 0.40 0.16 0.05 0.02
19 2 30.92 30.96 0.74 0.50 0.26 0.10 0.05
127
46 5 69.93 72.23 0.69 0.46 0.18 0.07 0.03
(7)
91 10 117.3 127.7 0.50 0.25 0.09 0.02 0.01
21 2 38.75 37.57 0.78 0.56 0.30 0.11 0.05
255
51 5 88.95 89.80 0.71 0.47 0.25 0.11 0.06
(8)
101 10 168.3 172.0 0.67 0.42 0.21 0.07 0.03
23 2 45.76 44.63 0.82 0.56 0.24 0.08 0.03
511
56 5 107.7 107.9 0.75 0.53 0.24 0.10 0.05
(9)
111 10 214.4 210.8 0.80 0.57 0.30 0.15 0.08
25 2 51.76 52.21 0.76 0.49 0.21 0.07 0.03
1023
61 5 126.1 126.9 0.72 0.49 0.23 0.09 0.05
(10)
121 10 257.5 250.2 0.84 0.61 0.36 0.17 0.09
Mean Estimated α -levels 0.72 0.57 0.24 0.10 0.05
495
ESTIMATING MODEL COMPLEXITY OF FEED-FORWARD NEURAL NETWORKS
Figure 3: Plot of the Degrees of Freedom for Binary Inputs (3, 4, and 6-9 Hidden Units) as a
Function of the Difference between Maximum Number of Terms and Number of Parameters
2.0
1.8
degrees of freedom / p
1.6
1.4
1.2
1.0
2 4 6 8 10
Log Base 2 of m - p
496
LANDSITTEL
497
ESTIMATING MODEL COMPLEXITY OF FEED-FORWARD NEURAL NETWORKS
498
LANDSITTEL
number of continuous and binary variables and continuous or binary inputs). However, the lack
the additional df gained (or lost) when of correspondence with the chi-square
combining a given number of (continuous and distribution, and the subsequent need for
binary) input variables. improved model fitting (e.g., more global
At this point, the observed trends could optimization procedures) would invalidate any
be used to derive estimates of model complexity subsequent findings. Therefore, modifications of
for the cases in Table 8 and for other cases with the S-Plus procedures need to be pursued for
larger numbers of hidden units and other these cases before any specific rules can be
combinations of continuous and binary inputs effectively formulated for the case of both
(as done previously when separately considering continuous and binary inputs.
Table 5: Likelihood Ratio Statistic for Model Independence with Continuous Inputs
Hidden # Likelihood Ratio Simulated α-levels
Inputs
Units Parameters Mean Variance 0.75 0.50 0.25 0.10 0.05
2 9 10.8 25.0 0.74 0.51 0.25 0.08 0.03
2 5 21 26.2 60.5 0.75 0.51 0.23 0.08 0.04
10 41 49.8 99.4 0.75 0.50 0.25 0.09 0.05
2 11 16.6 33.6 0.74 0.51 0.26 0.10 0.04
3 5 26 41.0 71.2 0.74 0.52 0.26 0.11 0.06
10 51 82.5 171.2 0.76 0.50 0.24 0.10 0.05
2 13 22.2 40.0 0.77 0.53 0.25 0.09 0.04
4 5 31 55.4 111.0 0.75 0.51 0.27 0.10 0.04
10 61 115.3 216.9 0.75 0.49 0.26 0.11 0.06
2 15 27.7 57.9 0.77 0.51 0.25 0.07 0.03
5 5 36 69.3 131.4 0.75 0.51 0.26 0.09 0.04
10 71 144.8 293.4 0.75 0.50 0.24 0.10 0.05
2 17 33.4 65.4 0.76 0.54 0.27 0.08 0.04
6 5 41 83.0 164.3 0.74 0.50 0.25 0.10 0.05
10 81 176.7 341.1 0.74 0.53 0.24 0.09 0.05
2 19 38.7 100.1 0.77 0.51 0.21 0.07 0.02
7 5 46 98.3 202.0 0.75 0.50 0.27 0.08 0.04
10 91 205.2 375.8 0.75 0.51 0.25 0.11 0.05
2 21 44.8 101.1 0.78 0.52 0.23 0.08 0.04
8 5 51 112.5 220.8 0.74 0.49 0.24 0.11 0.06
10 101 239.0 476.9 0.75 0.51 0.25 0.10 0.04
2 23 49.9 142.2 0.79 0.53 0.19 0.05 0.02
9 5 56 127.4 239.9 0.74 0.48 0.26 0.11 0.06
10 111 269.0 487.6 0.73 0.50 0.27 0.11 0.05
2 25 54.6 166.1 0.80 0.49 0.19 0.05 0.03
10 5 61 140.8 280.2 0.76 0.49 0.24 0.12 0.06
10 121 299.5 546.4 0.75 0.51 0.26 0.10 0.04
Mean Simulated α -levels 0.75 0.51 0.25 0.09 0.04
499
ESTIMATING MODEL COMPLEXITY OF FEED-FORWARD NEURAL NETWORKS
200
150
100
50
0
500
LANDSITTEL
501
ESTIMATING MODEL COMPLEXITY OF FEED-FORWARD NEURAL NETWORKS
Table 8: Likelihood Ratios with Continuous and Binary Inputs and 2 Hidden Units
Continuous Mean Simulated α-levels
Binary Additional
Inputs df Likelihood
Inputs df 0.75 0.50 0.25 0.10 0.05
(df) Ratio
2 3.0 19.7 6.7 0.73 0.52 0.27 0.10 0.05
3 7.0 24.8 7.8 0.76 0.51 0.26 0.08 0.04
4 10.5 31.0 10.5 0.75 0.54 0.24 0.08 0.03
5 18.5 36.7 8.2 0.78 0.53 0.25 0.06 0.03
2
6 24.7 42.4 7.7 0.78 0.54 0.22 0.06 0.02
(10)
7 31.0 47.7 6.7 0.77 0.51 0.22 0.06 0.02
8 37.6 54.0 6.8 0.78 0.49 0.22 0.07 0.02
9 44.6 58.4 3.8 0.81 0.49 0.19 0.06 0.02
10 52.2 64.1 1.9 0.80 0.47 0.18 0.04 0.02
2 3.0 38.0 7.0 0.76 0.51 0.23 0.06 0.02
3 7.0 43.6 8.6 0.79 0.53 0.20 0.05 0.02
4 10.5 47.9 9.4 0.80 0.47 0.20 0.05 0.02
5 18.5 52.9 6.4 0.80 0.50 0.18 0.06 0.02
5
6 24.7 59.5 6.8 0.82 0.48 0.17 0.06 0.02
(28)
7 31.0 64.3 5.3 0.84 0.44 0.18 0.04 0.02
8 37.6 69.5 3.9 0.79 0.47 0.19 0.05 0.01
9 44.6 73.6 1.0 0.80 0.46 0.19 0.05 0.02
10 52.2 79.1 -1.1 0.82 0.45 0.17 0.04 0.01
2 3.0 64.4 3.4 0.81 0.46 0.18 0.05 0.02
3 7.0 69.4 4.4 0.82 0.47 0.18 0.04 0.01
4 10.5 72.2 3.7 0.80 0.50 0.18 0.05 0.02
5 18.5 78.1 1.5 0.78 0.47 0.19 0.06 0.03
10
6 24.7 83.0 0.3 0.79 0.46 0.16 0.06 0.02
(58)
7 31.0 89.1 0.1 0.78 0.48 0.17 0.04 0.02
8 37.6 94.7 -1.1 0.78 0.46 0.17 0.05 0.01
9 44.6 98.6 -4.0 0.79 0.46 0.20 0.05 0.01
10 52.2 103.3 -6.9 0.79 0.45 0.15 0.04 0.01
Mean Simulated α -levels 0.79 0.49 0.20 0.06 0.02
Categorical Input Variables inputs, further work on these types of data are
Additional simulations were conducted being delayed until a better optimization routine
for categorical variables. In theory, a categorical can be implemented with S-Plus or with another
variable with 3 levels should produce fewer programming language.
degrees of freedom than 2 binary inputs, since
the 3 levels would be coded as 2 binary inputs, Conclusions
but would not have an interaction between the 2 One issue that was not addressed was the
levels. Simulations (not shown here) provided correlation of the input variables. All
evidence of this type of relationship, but simulations were run with independently
simulation results differed substantially from the generated data. Comparing the current findings
expected chi-square distribution. Therefore, as in to previous analyses with some overlap
the cases of both binary and continuous (Landsittel, et al., 2003) indicates that the
502
LANDSITTEL
degrees of freedom (df) may be somewhat lower Despite these limitations, the results
with moderately correlated data, which is contribute significantly to our understanding of
somewhat intuitive since correlated variables, to neural network model complexity by providing
some degree, add less information than explicit equations to quantify complexity under
independently distributed variables. Rather than a range of scenarios. Once improved methods
consider this further complication, it was are implemented to better optimize more
decided that all simulations should use complex models (where there was significant
independent data as a starting point, and the variability from the expected chi-square
effects of correlation should be addressed distribution), the derived equations for df can be
separately in a future manuscript. tested across a much wider range of models.
Another limitation encountered here was Assuming results hold for other scenarios (to be
the failure of the S-Plus routine to achieve tested after achieving more global optimization),
acceptably optimal results in minimizing the the estimated df can be implemented in practice
deviance. Because the simulations with only for model selection via AIC or BIC statistics.
binary, or only continuous inputs led to close Such approaches would serve as a favorable
correspondence with the chi-square distribution alternative to any of the ad-hoc approaches
(which allows the use of the mean as the df), it currently being utilized in practice.
would be expected that this would hold for
models with both binary and continuous inputs. Acknowledgements
The failure to achieve this result is most likely a The author would like to acknowledge Mr.
function of the (only locally optimal) routines. Dustin Ferris, who, as an undergraduate student
Future work will address this point through in Mathematics at Duquesne University,
investigating other optimization routines (e.g., conducted many of the initial simulations used
genetic algorithms), and incorporating those to generate some of the results in this
routines into the current approaches and manuscript, and Duquesne University, for the
methodology. Faculty Development Fund award, that included
To the best of our knowledge, these funding of Mr. Ferris’s stipend.
studies are the first to use df under the null as a
measure of model complexity. Unlike References
generalized linear models or other standard Amari, S.. & Murata, N. (1993).
regression methods, the model complexity may Statistical theory of learning curves under
vary substantially for different data sets. In entropic loss criterion. Neural Computation, 5,
terms of the general applicability of this 140-153.
approach, the complexity under the null may Hastie, T., & Tibshirani, R. (1990).
provide a more appropriate penalty for Generalized additive models. NY: Chapman and
subsequent use in model selection in many Hall.
scenarios, as higher complexity may be desirable Hodges, J. S., & Sargent, D. J. (2001).
if the true underlying association is highly non- Counting degrees of freedom in hierarchical and
linear. In contrast to a measure such as the other richly-parameterised models. Biometrika,
generalized df, where the complexity tends to 88(2), 367-379.
increase substantially when fit to data with some Landsittel, D., Singh, H., & Arena, V.
observed association, the complexity under the C. (2003). Null distribution of the likelihood
null only penalizing the model for incorrectly ratio statistic with feed-forward neural networks.
fitting non-linearity when none exists. Using an Journal of Modern and Applied Statistical
AIC-type statistic with generalized or effective Methods, 1(2), 333-342.
df, for example, would highly penalize the Liu, Y. (1995). Unbiased estimate of
neural network model for accurately fitting a generalization error and model selection in
highly non-linear association, and likely make it neural networks. Neural Networks, 8(2), 215-
very difficult to select an adequately complex 219.
model.
503
ESTIMATING MODEL COMPLEXITY OF FEED-FORWARD NEURAL NETWORKS
504
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 505-510 1538 – 9472/09/$95.00
Paul Turner
Loughborough University
United Kingdom
Increased interest in computer automation of the general-to-specific methodology has resulted from
research by Hoover and Perez (1999) and Krolzig and Hendry (2001). This article presents simulation
results for a multiple search path algorithm that has better properties than those generated by a single
search path. The most noticeable improvements occur when the data contain unit roots.
Introduction 4 4
505
MULTIPLE SEARCH PATHS AND THE GENERAL-TO-SPECIFIC METHODOLOGY
may become large even in a relatively small automatically conduct a specification search to
model. Suppose, for example, that the true obtain the best data congruent model based on a
model requires a total of n restrictions on the given data set.
most general model. It follows that there are n ! The purpose of this paper is to
separate search paths which involve the investigate the properties of a simple automatic
elimination of one variable at each stage and search algorithm in uncovering a correctly
which will succeed in getting from the general specified parsimonious model from an initially
model to the final, correct specification. If the overparameterized model. The algorithm works
elimination of several variables at once is by estimating multiple search paths and
allowed then the number of search paths choosing the final specification which minimizes
increases still further. the Schwartz criterion. This is compared with a
Another problem arising within the naïve search algorithm in which the least
general-to-specific methodology is that there is significant variable in the regression is
always the chance of making a Type II error successively eliminated until all remaining
during the process of the specification search variables are significant at a pre-determined
and a variable which should be present in the level.
final specification is eliminated at some stage.
The result of this is typically that other variables, Methodology
which ideally would have been excluded in the The main problem encountered in conducting
final specification, are retained as proxies for the multiple search paths is the number of possible
missing variable. The resulting final search paths that might be legitimately
specification is therefore over-parameterized. It investigated. For example, consider a model in
is difficult to identify cases such as this from which the final specification involves twelve
real world data where the investigator does not exclusion restrictions relative to the original
have the luxury of knowledge of the data model (not an unusual situation when working
generation process. However, it is with quarterly data). In this case there are
straightforward to demonstrate this phenomenon 12! = 479,001,600 possible search paths
using Monte Carlo analysis of artificial data sets. involving the progressive elimination of one
In the early years of general-to-specific variable at each stage. Therefore, even with the
analysis it was argued that the only solution to power of modern computing, consideration of
the problems discussed above was to rely on the every possible search path is simply not an
skill and knowledge of the investigator. For option. However, the situation is not as
example, Gilbert (1986) argued the following: impossible as it may first appear. Many search
paths will eventually converge on the same final
How should the econometrician set specification and the problem is simply to ensure
about discovering congruent that enough are tried so as to maximize the
simplifications of the general chance of obtaining the correct specification.
representation of the DGP …. Scientific The pseudo-code below sets out the algorithm
discovery is necessarily an innovative used in this research to achieve this.
and imaginative process, and cannot be
automated. (p.295) FOR j = 1 to R, where R is a predetermined
number of iterations.
However, more recent research by Hoover and
REPEAT UNTIL tβˆ > tc where tc is a
Perez (1999), Hendry and Krolzig (2001) and i
Krolzig and Hendry (2001) has suggested that predetermined critical value for all
automatic computer search algorithms can be i = 1,.., N where N is the number of
effective in detecting a well specified variables included in the equation.
econometric model using the now established
‘general-to-specific’ methodology. This has Estimate equation.
been facilitated by the introduction of the PC-
GETS computer package which will
506
TURNER
507
MULTIPLE SEARCH PATHS AND THE GENERAL-TO-SPECIFIC METHODOLOGY
model more often. Moreover, as the sample size This means that it is still reasonable to conduct a
gets large, the frequency with which the multiple specification search in levels of the series even
search path algorithm identifies the true model though individually each series contains a unit
appears to be converging towards 100% with root. The results for the multiple search path
tc = tc1% . This is not the case for the naïve algorithm are given in Table 3.
algorithm in which, with the same specification, The results from Table 3 are very
the true model was identified in only 67.6% of similar to those for non-stationary data shown in
the simulations. Table 1. The actual percentages differ slightly
Next, the effects of working with non- but the general pattern remains the same. If the
stationary data was considered. Here the x sample size is small then tc = tc5% performs
variable is generated as a random walk series better than tc = tc1% . However, as the sample size
with the implication that the y variable also gets larger, this situation is reversed with case A
contains a unit root. However, the specification
converging towards 100% (when tc = tc1% ) as the
of an equilibrium relationship between the
variables ensures that they are co-integrated. sample size becomes large.
508
TURNER
509
MULTIPLE SEARCH PATHS AND THE GENERAL-TO-SPECIFIC METHODOLOGY
510
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 511-519 1538 – 9472/09/$95.00
The behavior of the Wald-z, Wald-c, Quesenberry-Hurst, Wald-m and Agresti-Min methods was
investigated for matched proportions confidence intervals. It was concluded that given the widespread use
of the repeated-measure design, pretest-posttest design, matched-pairs design, and cross-over design, the
textbook Wald-z method should be abandoned in favor of the Agresti-Min alternative.
511
SMALL SAMPLE MATCHED PROPORTION CLOSED FORM CONFIDENCE INTERVALS
Figure 1: Coverage Probabilities (n=50) for A Single Proportion Wald Confidence Interval Method
Wald - z
1.0000
0.9500
W 0.9000
0.8500
0.8000
0.100 0.300 0.500 0.700 0.900
Figure 2: Coverage probabilities for the difference in nominal 95% Wald-z as a function of p1 when
p2=0.3 with n1=n2=20 (Two Independent Proportions)
Wald-z
1.000
0.950
WZ
0.900
0.850
0.800
0.000 0.250 0.500 0.750 1.000
512
REED
LB = p b – p c − {χ 2 ( p b + p c + 1/ n ) – ( p b – p c ) / n)1/2
2
πc ∈ [−Δ, (1−Δ)/2] if −Δ < πc < 0.
and
Coverage probability (CP) is generally used to
UB = pb – pc + {χ 2 ( pb + pc + 1/ n ) – ( pb – pc ) / n)1/2
2
evaluate (1 – α) confidence intervals. The
coverage probability function CP (Δ) for
with matched proportions for any Δ is defined as:
χ 2 = χ 2 ( α, 1) .
CP (Δ) = [Σk [ΣbΣc IT(b,c |Δ,πc) f (b,c | Δ, πc)]},
Quesenberry-Hurst (QH):
where:
( n | p b – pc |)
LB = IT (b, c | Δ, πc) = 1 if Δ∈ [Δl, Δu]; 0 otherwise.
2
{ }
1/2
( χ + n ) − χ ( χ + n ) ( p b + p c ) – n ( p b – p c )
2 2 2
Results
(χ2 + n)
The 95% CP (Δ) for pc= 0.1, n=20 and pb=
0.001, ..., 0.999 for the Wald-z, Wald-c, AM,
and Wald-m and Quesenberry-Hurst methods are
UB =
( n | p b – pc |) shown in Figure 3.
2
{ } CP (Δ) probabilities are 0.9125, 0.9545,
1/ 2
( χ + n ) + χ ( χ + n ) ( p b + p c ) – n ( p b – p c )
2 2 2
0.9401, 0.9435 and 0.0541 respectively. The
(χ2 + n) 95% CP (Δ) for pc= 0.25, n=30 and pb= 0.001,
..., 0.999 for the Wald-z, Wald-c, AM Wald-m
and Quesenberry-Hurst methods are shown in
Figure 4.
where
CP (Δ) probabilities are 0.9334, 0.9611,
χ 2 = χ 2 ( α, 1) . 0.9425, 0.9484 and 0.9448 respectively. And,
the CP (Δ) for pc= 0.40, n=40 and pb= 0.001, ...,
Agresti-Min (AM): 0.999 for the Wald-z, Wald-c, AM Wald-m and
Quesenberry-Hurst methods are shown in Figure
(
LB = c − b
* *
)/n *
5. CP (Δ) probabilities are 0.9390, 0.9607,
0.9444, 0.9485 and 0.9451 respectively.
− zα 2 ( b * + c * ) − ( c* − b * ) 2 / n * / n * The CP (Δ) plots in figures 3-5
demonstrate that the Wald-z method is
and suboptimal over the range of p, the Wald-c and
(
UB = c − b
* *
)/n *
Wald-m methods are conservative and the
Quesenberry-Hurst and Agresti-Min methods
− zα2 ( b * + c* ) − ( c * − b * ) 2 / n * / n * are slightly less than nominal.
with Conclusion
b* = b+1/2, c*= c+1/2, n*= n + 2. A number of closed form methods for
constructing confidence intervals for paired
The joint probability mass function of (Yb,Yc) is binary data were proposed. Newcombe (1998)
expressed as a function of Δ [Δ = (πc - πb) – (πa – conducted an empirical study to compare the CP
πc) = πb - πc] and is given by: f (b,c | Δ, πc) = Pr (Δ) of ten confidence interval estimators for the
(Yb = b, Yc = c | Δ, πc ) = n!/[b!c!(n-b-c)!] ( πc + difference between binomial proportions based
Δ)b πcc (1 - 2 πc - Δ)n-b-c. Where, Δ and πc satisfy on paired data. He concluded that the profile
the following inequality: likelihood estimator and the score test based
confidence interval proposed by Tango (1998)
πc ∈ [0, (1−Δ)/2] if 0< πc < 1, performed well in large-sample situations. May
and
513
SMALL SAMPLE MATCHED PROPORTION CLOSED FORM CONFIDENCE INTERVALS
Wald-z
1.00
0.95
0.90
Waldz
0.85
0.80
0.75
0.00 0.20 0.40 0.60 0.80
P
Wald-c
1.00
0.95
0.90
Waldc
0.85
0.80
0.75
0.00 0.20 0.40 0.60 0.80
P
Wald-m
1.00
0.95
Waldm
0.90
0.85
0.80
0.75
0.00 0.20 0.40 0.60 0.80
514
REED
Quesenberry-Hurst
1.00
0.95
0.90
QH
0.85
0.80
0.75
0.00 0.20 0.40 0.60 0.80
P
Agresti-Min
1.00
0.95
0.90
AM
0.85
0.80
0.75
0.00 0.20 0.40 0.60 0.80
Wald-z
1.000
0.950
WZ
0.900
0.850
0.800
0.000 0.250 0.500 0.750 1.000
515
SMALL SAMPLE MATCHED PROPORTION CLOSED FORM CONFIDENCE INTERVALS
Wald-z
1.00
0.95
0.90
Waldz
0.85
0.80
0.75
0.00 0.20 0.40 0.60
P
Wald-c
1.00
0.95
0.90
Waldc
0.85
0.80
0.75
0.00 0.20 0.40 0.60
P
Wald-m
1.00
0.95
Waldm
0.90
0.85
0.80
0.75
0.00 0.20 0.40 0.60
516
REED
Quesenberry-Hurst
1.00
0.95
0.90
QH
0.85
0.80
0.75
0.00 0.20 0.40 0.60
P
Agresti-Min
1.00
0.95
0.90
AM
0.85
0.80
0.75
0.00 0.20 0.40 0.60
Wald-z
1.00
0.95
0.90
Waldz
0.85
0.80
0.75
0.00 0.20 0.40 0.60
517
SMALL SAMPLE MATCHED PROPORTION CLOSED FORM CONFIDENCE INTERVALS
Wald-c
1.00
0.95
0.90
Waldc
0.85
0.80
0.75
0.00 0.20 0.40 0.60
Wald-m
1.00
0.95
Waldm
0.90
0.85
0.80
0.75
0.00 0.20 0.40 0.60
Quesenberry-Hurst
1.00
0.95
0.90
QH
0.85
0.80
0.75
0.00 0.20 0.40 0.60
518
REED
Agresti-Min
1.00
0.95
0.90
AM
0.85
0.80
0.75
0.00 0.20 0.40 0.60
519
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 520-525 1538 – 9472/09/$95.00
Confidence intervals (based on the χ 2 -distribution and (Z) standard normal distribution) for the
intraclass correlation coefficient under unequal family sizes based on a single multinormal sample have
been proposed. It has been found that the confidence interval based on the χ 2 -distribution consistently
and reliably produces better results in terms of shorter average interval length than the confidence interval
based on the standard normal distribution: especially for larger sample sizes for various intraclass
correlation coefficient values. The coverage probability of the interval based on the χ 2 -distribution is
competitive with the coverage probability of the interval based on the standard normal distribution. An
example with real data is presented.
520
BHANDARY & FUJIWARA
the equality of two intraclass correlation vector and the covariance matrix for the familial
coefficients based on two independent data is given by the following (Rao, 1973):
multinormal samples. For several populations
and unequal family sizes, Bhandary and Alam μ i = μ 1i
(2000) proposed the Likelihood ratio and large ~ ~
sample ANOVA tests for the equality of several and
intraclass correlation coefficients based on 1 ρ ... ρ
several independent multinormal samples.
Donner and Zou (2002) proposed asymptotic 2ρ 1 ... ρ
Σi = σ , (2.1)
test for the equality of dependent intraclass pi xpi ... ... ... ...
correlation coefficients under unequal family
sizes. ρ ρ ... 1
None of the above authors, however,
derived any confidence interval estimator for where 1i is a pi x1 vector of 1’s,
~
intraclass correlation coefficients under unequal
family sizes. In this article, confidence interval μ ( −∞ < μ < ∞ ) is the common mean and
estimators for intraclass correlation coefficients σ 2 (σ 2 > 0) is the common variance of
are considered based on a single multinormal members of the family and ρ , which is called
sample under unequal family sizes, and
the intraclass correlation coefficient, is the
conditional analyses - assuming family sizes are
coefficient of correlation among the members of
fixed - though unequal.
It could be of interest to estimate the 1
the family and max − ≤ ρ ≤ 1.
blood pressure or cholesterol or lung capacity 1≤i ≤ k
pi − 1
for families in American races. Therefore, an
It is assumed that
interval estimator for the intraclass correlation
coefficient under unequal family sizes must be xi ~ N pi ( μ i , Σ i ); i = 1,..., k , where N pi
~ ~
developed. To address this need, this paper
proposes two confidence interval estimators for represents a p i -variate normal distribution and
the intraclass correlation coefficient under μ i , Σ i ' s are defined in (2.1). Let
unequal family sizes, and these interval ~
521
CI INTRACLASS CORRELATION ESTIMATION UNDER UNEQUAL FAMILY SIZES
~ . k
p = k −1 pi
p i x1
. i =1
k
0 c 2 = 1 − 2(1 − ρ )2 k −1 ai + (1 − ρ )2
i =1
and k k
k −1 ai + ( p −1)−1(k −1 ai )2
η i 0 ... 0 i =1 i =1
0 1− ρ ... 0
Σ *i = σ 2 and a i = 1 − p i−1 .
... ... ... ...
Under the above setup, it is observed
0 0 ... 1 − ρ
(using Srivastava & Katapa, 1986) that:
k k
Interval Based on the χ 2 Distribution
σˆ 2 = (k − 1) −1 (u i1 − μˆ ) 2 + k −1γˆ 2 ( ai ) It can be shown, by using the
i =1 i =1 distribution of u i given by (2.2):
k pi ~
uir2 k pi
γˆ = 2 i =1 r = 2
k
u 2
ir
i =1 r = 2
~ χ 2k , (2.7)
( p
i =1
i − 1) σ (1 − ρ )
2
( pi −1)
i =1
k
where χ n2 denotes the Chi-square distribution
μˆ = k −1 u i1
i =1 with n degrees of freedom. Using (2.7), a
(2.3) 100(1 − α )% confidence interval for ρ can be
found as follows:
and a i = 1 − p i−1 .
k pi k pi
Srivastava and Katapa (1986) derived
the asymptotic distribution of ρ̂ ; they showed:
u 2
ir u 2
ir
1− i =1 r = 2
< ρ < 1− i =1 r = 2
(2.8)
that ρ̂ ~ N( ρ , Var k ) asymptotically, where χ 2
α .σˆ 2
χ α .σˆ 2
2
1− ; n ;n
2 2
522
BHANDARY & FUJIWARA
523
CI INTRACLASS CORRELATION ESTIMATION UNDER UNEQUAL FAMILY SIZES
524
BHANDARY & FUJIWARA
525
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 526-533 1538 – 9472/09/$95.00
Vincent A. R. Camara
University of South Florida
This study obtained and compared confidence intervals for the mean of a Gaussian distribution.
Considering the square error and the Higgins-Tsokos loss functions, approximate Bayesian
confidence intervals for the mean of a normal population are derived. Using normal data and
SAS software, the obtained approximate Bayesian confidence intervals were compared to a
published Bayesian model. Whereas the published Bayesian method is sensitive to the choice of
the hyper-parameters and does not always yield the best confidence intervals, it is shown that the
proposed approximate Bayesian approach relies only on the observations and often performs
better.
526
CAMARA
Employing the square error loss function along confidence bounds for the variance of a normal
with a normal prior, Fogel (1991) obtained the population (Camara, 2003). For the square error
following Bayesian confidence interval for the loss function:
mean of the normal probability density function:
n _
μ1σ 2 / n + + xτ 2 τσ / n ( xi − x)2
LB = − Zα / 2 Lο 2 ( SE ) = i =1
τ +σ / n
2 2
τ 2 +σ 2 / n n − 2 − 2 ln(α / 2)
(2) (5)
n _
μ σ 2 / n + + xτ 2 τσ / n ( x − x)i
2
UB = 1 2 + Zα /2 Uσ 2 ( SE ) = i =1
τ +σ / n2
τ 2 +σ 2 / n n − 2 − 2 ln(1 − α / 2)
(3)
For the Higgins-Tsokos loss function:
where the mean and variance of the selected
normal prior are respectively denoted by μ1 and 1
U ο 2 ( HT ) =
τ .
2 n − 1 − 2 Ln(1 − α / 2)
n _
− G ( x)
This study employs the square error and
the Higgins-Tsokos loss functions to derive ( x − x)
i =1
i
2
f1 , f 2 0. (4)
527
CONFIDENCE INTERVAL APPROXIMATIONS FOR GAUSSIAN & BAYESIAN MODELS
n n
0.5 (Examples 1, 2, 3, 4, 7) as well as approximately
( xi − x )2 ( xi − x )2 normal populations (Examples 5, 6) were
L = i =1 + x 2 − i =1
considered. SAS software was employed to
μ ( SE ) n−1 n−2−2ln(1−α /2) obtain the normal population parameters
corresponding to each sample data set. For the
Higgins-Tsokos loss function, f1 = 1 and f2 = 1
0.5 were considered.
n n
( xi − x )2 ( xi − x )2
Example 1
U = i =1 + x 2 − i =1 Data Set: 24, 28, 22, 25, 24, 22, 29, 26,
μ ( SE ) n −1 n − 2 − 2ln(α /2)
25, 28, 19, 29 (Mann, 1998, p. 504).
(9) Normal population distribution obtained with
SAS:
The approximate Bayesian confidence interval N ( μ = 25.083, σ = 3.1176) ,
for the normal population mean corresponding
to the Higgins-Tsokos loss function is: x = 25.08333 , s 2 = 9.719696 .
0.5
n 2 Table 1a: Approximate Bayesian
(x − x ) Confidence Intervals for the Population
i 1
U = i =1 + x2 − Mean Corresponding Data Set 1
μ ( HT ) n −1 H ( x) Approximate Approximate
2 C.L.
%
Bayesian Bayesian
Bounds (SE) Bounds (HT)
(10) 25.0683- 25.0730-
where 80
25.1311 25.1158
25.0661- 25.0683-
n − 1 − 2 Ln(1 − α / 2) 90
H1 ( x ) = − G ( x) (11) 25.1437 25.1311
n 2 25.0650- 25.0661-
(x − x ) 95
i 25.1543 25.1437
i =1 25.0641- 25.0643-
99
25.1734 25.1660
n − 1 − 2 Ln(α / 2)
H 2 ( x) = − G ( x) (12)
n 2
(x − x ) Table 1b: Bayesian Confidence Intervals for the
i
i =1 Population Mean Corresponding Data Set 1
0.5
n 2 Bayesian C. I. Bayesian C. I.
( xi − x ) I II
i = 1 2 1 Bayesian Bayesian
L = +x −
μ ( HT ) n −1 H1 ( x ) Bounds Bounds
C.L.
% μ1 = 2, τ = 1 μ1 = 25, τ = 10
With (9),(10), (11), (12) and a change of 13.8971- 23.9353-
80
variable, approximate Bayesian Confidence 15.6097 26.2300
intervals are easily obtained when μ ≤ 0 . 13.6496- 23.6037-
90
15.8572 26.5617
13.4422- 23.3258-
Results 95
16.0646 26.8395
To compare the Bayesian model (3) with the
13.0275- 22.7701-
approximate Bayesian models (9 & 10), samples 99
16.4793 27.3953
obtained from normally distributed populations
528
CAMARA
529
CONFIDENCE INTERVAL APPROXIMATIONS FOR GAUSSIAN & BAYESIAN MODELS
Table 4a: Approximate Bayesian Confidence Table 5a: Approximate Bayesian Confidence
Intervals for the Population Mean Intervals for the Population Mean
Corresponding to Data Set 4 Corresponding to Data Set 5
Approximate Approximate Approximate Approximate
C.L. C.L.
Bayesian Bayesian Bayesian Bayesian
% %
Bounds (SE) Bounds (HT) Bounds (SE) Bounds (HT)
28.8270- 28.8330- 43.5794- 43.5858-
80 80
28.9087 28.8884 43.6703 43.6169
28.8242- 28.8270- 43.5764- 43.5794-
90 90
28.9256 28.9087 43.6902 43.6703
28.8228- 28.8242- 43.5749- 43.5764-
95 95
28.9400 28.9256 43.7074 43.6902
28.8217- 28.8220- 43.5738- 43.5741-
99 99
28.9663 28.9560 43.7395 43.7268
Table 4b: Bayesian Confidence Intervals for Table 5b: Bayesian Confidence Intervals for
the Population Mean Corresponding to Data the Population Mean Corresponding to Data
Set 4 Set 5
Bayesian C. I. Bayesian C. I. Bayesian C. I. Bayesian C. I.
I II I II
Bayesian Bayesian Bayesian Bayesian
C.L. Bounds Bounds Bounds Bounds
C.L.
% μ1 = 2, τ = 1 μ1 = 25, τ = 10 % μ 1 = 2, τ = 1 μ1 = 25, τ = 10
13.2394- 27.4048- 14.8305- 41.4441-
80 80
15.1312 30.1961 16.9204 45.0272
12.9659- 27.0014- 14.5285- 40.9263-
90 90
15.4047 30.5995 17.2225 45.5450
12.7369- 26.6634- 14.2754- 40.4924-
95 95
15.6337 30.9375 17.4756 45.9789
12.2787- 25.9873- 13.7692- 39.6246-
99 99
16.0919 31.6135 17.9817 46.8467
Example 5 Example 6
Data Set: 52, 33, 42, 44, 41, 50, 44, 51, Data Set: 52, 43, 47, 56, 62, 53, 61, 50,
45, 38,37,40,44, 50, 43 (McClave & Sincich, p. 56, 52, 53, 60, 50, 48, 60, 5543 (McClave &
301). Sincich, p. 301).
Normal population distribution obtained with Normal population distribution obtained with
SAS: SAS:
530
CAMARA
Table 6a: Approximate Bayesian Confidence Table 7a: Approximate Bayesian Confidence
Intervals for the Population Mean Intervals for the Population Mean
Corresponding to Data Set 6 Corresponding to Data Set 7
Approximate Approximate Approximate Approximate
C.L. C.L.
Bayesian Bayesian Bayesian Bayesian
% %
Bounds (SE) Bounds (HT) Bounds (SE) Bounds (HT)
53.6098- 53.6145- 82.7072- 82.7539-
80 80
53.6779 53.6602 83.4808 83.2572
53.6076- 53.6098- 82.6856- 82.7072-
90 90
53.6932 53.6779 83.6884 83.4808
53.6065- 53.6076- 82.6751- 82.6856-
95 95
53.7064 53.6932 83.8815 83.6884
53.6056- 53.6058- 82.6669- 82.6690-
99 99
53.7315 53.7216 84.2823 83.7173
Table 6b: Bayesian Confidence Intervals for Table 7b: Bayesian Confidence Intervals for
the Population Mean Corresponding to Data the Population Mean Corresponding to Data
Set 6 Set 7
Bayesian C. I. Bayesian C. I. Bayesian C. I. Bayesian C. I.
I II I II
Bayesian Bayesian Bayesian Bayesian
C.L. Bounds Bounds Bounds Bounds
C.L.
% μ1 = 2, τ = 1 μ1 = 25, τ = 10 % μ 1 = 2, τ = 1 μ 1 = 25, τ = 10
19.1978- 51.3930- 3.2940- 63.0810-
80 80
21.2568 54.8269 5.8132 75.4828
18.9002- 50.8967- 2.9299- 61.2886-
90 90
21.5544 55.3232 6.17740 77.2752
18.6508- 50.4808- 2.6248- 59.7868-
95 95
21.8038 55.7391 6.4824 78.7770
18.1521- 49.6492- 2.0147- 56.7833-
99 99
22.3024 56.5707 7.0926 81.7806
531
CONFIDENCE INTERVAL APPROXIMATIONS FOR GAUSSIAN & BAYESIAN MODELS
loss functions. The following conclusions are to confidence intervals that do not contain
based on results obtained: the normal population mean.
1. The Bayesian model (3) used to construct 3. Contrary to the Bayesian model (3), which
confidence intervals for the mean of a uses the Z-table, both the approach
normal population does not always yield the employed in this study and our approximate
best coverage accuracy. Each of the Bayesian models rely only on observations.
obtained approximate Bayesian confidence 4. With the proposed approach, approximate
intervals contains the population mean and Bayesian confidence intervals for a normal
performs better than its Bayesian population mean are easily obtained for any
counterparts. level of significance..
2. Bayesian models are generally sensitive to 5. The approximate Bayesian approach under
the choice of hyper-parameters. Some values the popular square error loss function does
arbitrarily assigned to the hyper-parameters not always yield the best approximate
may lead to a very poor estimation of the Bayesian results: The Higgins-Tsokos loss
parameter(s) under study. In this study some function performs better in the examples
values assigned to the hyper-parameters led presented.
532
CAMARA
Higgins, J. J., & Tsokos, C. P. (1976). Schafer, R. E., et al. (1971). Bayesian
On the behavior of some quantities used in reliability, phase II: Development of a priori
Bayesian reliability demonstration tests, IEEE distribution. Rome Air Development Center,
Trans. Reliability, R-25(4), 261-264. Griffis AFR, NY RADC-YR-71-209.
Higgins, J. J., & Tsokos, C. P. (1980). A Schafer, R. E., et al. (1973). Bayesian
study of the effect of the loss function on Bayes reliability demonstration phase III:
estimates of failure intensity, MTBF, and Development of test plans. Rome Air
reliability. Applied Mathematics and development Center, Griffs AFB, NY RADC-
Computation, 6, 145-166. TR-73-39.
Mann, P. S. (1998). Introductory Shafer, R. E., & Feduccia, A. J. (1972).
statistics (3rd Ed.).John Wiley & Sons, Inc, New Prior distribution fitted to observed reliability
York. data. IEEE Trans. Reliability, R-21(3), 148-154
McClave, J. T., & Sincich, T. A. (1997). Tsokos, C. P., & Shimi, S. (Eds).
First course in statistics, (6th Ed.). NY: Prentice (1976). Proceedings of the Conference on the
Hall. theory and applications of reliability with
Schafer, R. E., et al. (1970). Bayesian emphasis on Bayesian and nonparametric
reliability demonstration, phase I: data for the a methods, Methods, Vols. I, II. NY: Academic
priori distribution. Rome Air Development Press.
Center, Griffis AFBNY RADC-TR-69-389.
533
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 534-546 1538 – 9472/09/$95.00
The progressive Type-II censoring scheme has become quite popular. A drawback of a progressive
censoring scheme is that the length of the experiment can be very large if the items are highly reliable.
Recently, Kundu and Joarder (2006) introduced the Type-II progressively hybrid censored scheme and
analyzed the data assuming that the lifetimes of the items are exponentially distributed. This article
presents the analysis of Type-II progressively hybrid censored data when the lifetime distributions of the
items follow Weibull distributions. Maximum likelihood estimators and approximate maximum
likelihood estimators are developed for estimating the unknown parameters. Asymptotic confidence
intervals based on maximum likelihood estimators and approximate maximum likelihood estimators are
proposed. Different methods are compared using Monte Carlo simulations and one real data set is
analyzed.
Key words: Maximum likelihood estimators; approximate maximum likelihood estimators; Type-I
censoring; Type-II censoring; Monte Carlo simulation.
534
KUNDU, JOARDER & KRISHNA
be obtained by solving a non-linear equation and Type-II Progressively Hybrid Censoring Scheme
a simple iterative scheme is proposed to solve Data
the non-linear equation. Approximate maximum Under the Type-II progressively hybrid
likelihood estimators (AMLEs), which have censoring scheme, it is assumed that n identical
explicit expressions are also suggested. It is not items are put on a test and the lifetime
possible to compute the exact distributions of the distributions of the n items are denoted by
MLEs, so the asymptotic distribution is used to Y1 ,.., Yn . The integer m < n is pre-fixed,
construct confidence intervals. Monte Carlo
simulations are used to compare different R1 ,.., Rm are m pre-fixed integers satisfying
methods and one data analysis is performed for R1 + ..... + Rm + m = n , and T is a pre-fixed
illustrative purposes. time point. At the time of the first failure
Y1:m:n , R1 of the remaining units are randomly
Type-II Progressively Hybrid Censoring Scheme
Models removed. Similarly, at the time of the second
If it is assumed that the lifetime random failure Y2:m:n , R2 of the remaining units are
variable Y has a Weibull distribution with removed and so on. If the m − th failure Ym:m:n
shape and scale parameters α and λ
occurs before time T , the experiment stops at
respectively, then the probability density
time point Ym:m:n . If, however, the m -th failure
function (PDF) of Y is
α
does not occur before time point T and only J
α −1 − y
αy failures occur before T (where 0 ≤ J < m ),
fY ( y; α , γ ) = λ
e ; y > 0, (1)
λ λ then at time T all remaining RJ* units are
removed and the experiment terminates. Note
where α > 0, λ > 0 are the natural parameter
that RJ* = n − ( R1 + ... + RJ ) − J . The two cases
space. If the random variable Y has the density
function (1), then X = ln Y has the extreme are denoted as Case I and Case II respectively
and this is called the censoring scheme as the
value distribution with the PDF Type-II progressively hybrid censoring scheme
x − μ x −μ (Kundu and Joarder, 2006).
−e σ In the presence of the Type-II
1 σ
f X ( x; μ , σ ) = e ; − ∝< x <∝ , (2) progressively hybrid censoring scheme, one of
σ the following is observed
where μ = ln λ , σ = 1 α . The density function Case I:
as described by (2) is known as the density {Y1:m:n ,..., Ym:m:n }; if Ym:m:n < T , (4)
function of an extreme value distribution with or
location and scale parameters μ and σ
respectively. Models (1) and (2) are equivalent Case II:
models in the sense that the procedure developed {Y1:m:n ,..., YJ :m:n }; if YJ :m:n < T < YJ +1:m:n . (5)
under one model can be easily used for the other
model. Although, they are equivalent models, For Case II, although YJ +1:m:n is not observed,
(2) can be the easier with which to work but YJ :m:n < T < YJ +1:m:n means that the J − th
compared to model (1), because in model (2) the failure took place before T and no failure took
two parameters μ and σ appear as location place between YJ :m:n and T (i.e.,
and scale parameters. For μ = 0 and σ = 1,
YJ +1:m:n ,.., Ym:m:n ) are not observed.
model (2) is known as the standard extreme
value distribution and has the following PDF The conventional Type-I progressive
censoring scheme needs the pre-specification of
f Z ( z;0,1) = e
( z −e ) ; − ∝< z <∝ .
z
535
ON TYPE-II PROGRESSIVELY HYBRID CENSORING
536
KUNDU, JOARDER & KRISHNA
m1 i −1 *
n − (1 + RK ) g ( zi:m:n ) ( G ( zi:m:n ) ) , c1' = iJ=1 Di e μ i + R *J e μ J ,
Ri
l ( μ ,σ ) = m ∏
σ i =1 k =1
*
(14) c2' = iJ=1 Di μ i e μ i + R *J μ *J e μ J ,
and for Case II is *
d1' = iJ=1 Di X i :m :n e μ i + R *J Se μ J ,
1 J
i −1
∏ n − (1 + RK ) g ( zi:m:n ) ( G ( zi:m:n ) ) ( G (V ) )
Ri RJ*
l ( μ ,σ ) = *
J
σ i =1 k =1 d 2' = iJ=1 D i X i2:m :n e μ i + R *J S 2 e μ J ,
(15) *
d 3' = iJ= 1 D i μ i X i : m : n e μ i + R *J μ *J S e μ J ,
where ,
zi:m:n = ( xi:m:n − μ ) σ , V = (S − μ) σ , A ' = Jc1' , ( )
B ' = c1' d 3' + JX − d 1' c 2' + J , ( )
g ( x ) = e x −e , G ( x ) = e−e , C ' = d 12 − c1' d 2' .
x x
μ = ln λ and
σ =1 α . Here μi and Di are the same as above for
Ignoring the constant term, the following log- i = 1,.., J , μ J* = G −1 ( p*J ) = ln(− ln q*J ),
likelihood results from (15) is
p*J = ( pJ + pJ +1 ) / 2, and q*J = 1 − p*J .
m m
L( μ,σ ) = ln l ( μ,σ ) =−mlnσ + ln( g ( zi:mn: ) ) + Ri ln( G( zi:mn: ) ).
i=1 i=1
Results
(16) Because the performance of the different
From (16) the following approximate MLE’s of methods cannot be compared theoretically,
μ and σ are obtained (see Appendix 1), Monte Carlo simulations are used to compare
the performances of the different methods
μ =
( c1 − c2 − m) σˆ + d1 , σ = −B + B2 − 4 AC proposed for different parameter values and for
different sampling schemes. The term different
c1 2A
sampling schemes mean different sets of Ri ' s
(17)
and different T values. The performances of
where the MLEs and AMLEs estimators of the
m m m
c1 = Di e μ , c2 = Di μi e μ , d1 = Di X i:m:n e μ ,
i i i unknown parameters are compared in terms of
i =1 i =1 i =1 their biases and mean squared errors (MSEs) for
m m different censoring schemes. The average
d2 = Di X 2
i:m:n
μi
e , d3 = Di μi X i:m:neμ , A = m c1 , i
lengths of the asymptotic confidence intervals
i =1 i =1
and their coverage percentages are also
B = c1 (d3 + mX ) − d1 (c2 + m), C = d12 − c1d2 , compared. All computations were performed
μ i = G − 1 ( p i ) = ln ( − ln q i ) , pi = i (n + 1), using a Pentium IV processor and a FORTRAN-
77 program. In all cases the random deviate
qi = 1 − pi and Di = 1 + Ri , for i = 1, , m .
generator RAN2 was used as proposed in Press,
et al. (1991).
For Case-II, ignoring the constant term, Because λ is the scale parameter, all
the log-likelihood is obtained as
cases λ = 1 have been taken in without loss of
J J
L( μ,σ ) = ln l ( μ,σ ) = −J lnσ + ln ( g ( zi:mn: ) ) + Ri ln ( G( zi:mn: ) ) + RJ* ln G(V ) . generality. For simulation purposes, the results
i=1 i=1
537
ON TYPE-II PROGRESSIVELY HYBRID CENSORING
538
KUNDU, JOARDER & KRISHNA
sample is: 11, 35, 49, 170, 329, 958, 1,925, iterative procedure. The proposed approximate
2,223, 2,400, 2,568. From the above sample maximum likelihood estimators of the shape and
data, D = m = 10 is obtained, which yields α scale parameters could be obtained in explicit
and λ of based on MLEs and AMLEs are forms. Although the exact confidence intervals
(αˆ =6.29773×10−1, λˆ =8113.80176), (α =6.33116×10−1, λ =6511.83036) could not be constructed, it was observed that
the asymptotic confidence intervals work
respectively. Using the above estimates the 95%
reasonably well for MLEs. Although the
asymptotic confidence interval for α and λ is
frequentest approach was used, Bayes estimates
obtained based on MLEs and AMLEs which are and credible intervals can also be obtained under
(6.29773 ×10−1 , 6.29882 ×10−1 ), suitable priors along the same line as Kundu
,
(8113.40869,8114.19482) (2007).
and References
(6.33116 ×10−1 , 6.33176 ×10−1 ), Arnold, B. C., & Balakrishnan, N.
(6511.4344, 6512.2264) (1989). Relations bounds and approximations
respectively. for order statistics, lecture notes in statistics
(53). New York: Springer Verlag.
Data Set 2 Balakrishnan, N. (2007). Progressive
censoring: An apprisal (with discussions). In
Consider m = 10, T = 2000, and Ri ' s
press. To appear in TEST.
are same as Data Set 1. In this case the Balakrishnan, N., & Aggrwala, R.
progressively hybrid censored sample obtained (2000). Progressive censoring: Theory, methods
as: 11, 35, 49, 170, 329, 958, 1,925 and and applications. Boston, MA: Birkhauser.
D = J = 7. The MLE and AMLEs of α and Balakrishnan, N., & Varadan, J. (1991).
λ are Approximate MLEs for the location and scale
(αˆ = 4.77441× 10−1 , λˆ = 25148.8613) and parameters of the extreme value distribution
with censoring. IEEE Transactions on
(α = 4.77589 × 10 −1 , λ = 23092.3759) Reliability, 40, 146-151.
respectively. From the above estimates the 95% Childs, A., Chandrasekhar, B., &
asymptotic confidence intervals are obtained for Balakrishnan, N. (2007). Exact likelihood
α and λ based on MLEs and AMLEs, which inference for an exponential parameter under
are progressively hybrid schemes: Statistical models
(4.77383 ×10−1 , 4.77499 ×10−1 ), and methods for biomedical and technical
systems. F. Vonta, M. Nikulin, N. Limnios, & C.
(25148.5078, 25149.2148) Huber-Carod (Eds.). Boston, MA: Birkhauser.
and Cohen, A. C. (1963). Progressively
(4.77529 ×10−1 , 4.77649 ×10 −1 ), censored samples in life testing. Technometrics,
5, 327-329.
(23092.0219, 23092, 7299) Cohen, A. C. (1966). Life-testing and
respectively. early failure. Technometrics, 8, 539-549.
In both cases it is clear that if the tested David, H. A. (1981). Order Statistics
hypothesis is H 0 : α = 1 , it will be rejected, this (2nd Ed.). New York: John Wiley and Sons.
implies that in this case the Weibull distribution Kundu, D., & Joarder, A. (2006).
should be used rather than exponential. Analysis of type II progressively hybrid
censored data. Computational Statistics and
Conclusion Data Analysis, 50, 2509-2528.
This article discussed the Type-II progressively Kundu, D. (2007). On hybrid censored
hybrid censored data for the two parameters Weibull distribution. Journal of Statistical
Weibull distribution. It was observed that the Planning and Inference, 137, 2127-2142.
maximum likelihood estimator of the shape
parameter could be obtained by using an
539
ON TYPE-II PROGRESSIVELY HYBRID CENSORING
540
KUNDU, JOARDER & KRISHNA
541
ON TYPE-II PROGRESSIVELY HYBRID CENSORING
542
KUNDU, JOARDER & KRISHNA
543
ON TYPE-II PROGRESSIVELY HYBRID CENSORING
Di e − Di μi e − m σ + Di X i:m:n e −μ Di e = 0
μi μi μi μi
two-parameter Weibull and extreme value i =1 i =1 i =1 i =1
Appendix 1 m μi 2
For case-I, taking derivatives with respect to μ m Di e σ +
i =1
and σ of L( μ , σ ) as defined in (16), results in
m μi
m
i Di μi X i:m:n e + mX ) σ +
μi
D e
∂L( μ , σ ) 1 m g ( zi:m:n ) m g ′( zi:m:n ) i =1 i =1
= Ri − =0
∂μ σ i =1 G ( zi:m:n ) i =1 g ( zi:m:n ) 2
m μi
(22) Di X i:m:n e
i =1
∂L ( μ , σ ) 1 m g ( zi:m:n ) m g ' ( zi:m:n )
= Ri zi:m:n − zi:m:n
G ( zi:m:n ) i =1 g ( zi:m:n )
− m = 0.
m
m
− Di X i:m:n e μi Di μi e μi + m σ
∂μ σ i =1
(23) i =1 i =1
Clearly, (22) and (23) do not have explicit m m
analytical solutions. Consider a first-order − Di e μi Di X i2:m:n e μi = 0
i =1 i =1
Taylor approximation to g ′( zi:m:n ) / g ( zi:m:n )
(27)
and g ( zi:m:n ) / G ( zi:m:n ) by expanding around
the actual mean μi of the standardized order The above two equations (26) and (27) can be
statistic Z i:m:n , where written as
μi = G ( pi ) = ln(− ln qi ), and pi = i ( n + 1) ,
−1 (c1 − c2 − m)σ + d1 − μ c1 = 0 (28)
qi = 1 − pi for i = 1,..., m, similar to
Balakrishnan and Varadan (1991), David (1981) Aσ 2 + Bσ + C = 0 (29)
or Arnold and Balakrishnan (1989). Otherwise, where
the necessary procedures for obtaining c1 = i =1 Di e μi , c2 = i =1 Di μi e μi , d1 = i =1 Di X i:m:n e μi ,
m m m
C = d12 − c1d 2
g ' ( zi:m:n ) g ( zi:m:n ) ≈ α i − βi zi:m:n (24)
g ( zi:m:n ) G ( zi:m:n ) ≈ 1 − α i + βi zi:m:n (25)
544
KUNDU, JOARDER & KRISHNA
( ) ( )
2
" * g ' μ J*
− B + B − 4 AC 2
g μJ
σ = (31) β J = − ' * +
* *
= − ln q J
2A g μ J
g μ *J
( ) ( )
Consider only positive root of σ ; these Using the approximation (24), (25), (34) and
approximate estimators are equivalent but not (35) in (32) and (33) gives
unbiased. Unfortunately, it is not possible to J * μ *J J * * μ *J
Di e + R J e − Di μ i e + R J μ J e − J σ
μi μi
compute the exact bias of ߤ and ߪ theoretically i =1 i =1
because of intractability encountered in finding
J
μ *J J
* μ *J
Di X i:m:n e + RJ Se − μ Di e + RJ e = 0
μi * μi
the expectation of B 2 − 4 AC . i =1 i =1
(36)
Appendix 2 and
For case-II, taking derivatives with respect to μ J * μ J* 2 J * μ *J
J
J Di e + RJ e σ + Di e + RJ e Di μi X i:m:n e + RJ μ J Se + JX σ
μi μi μi * * μ *J
(similar to Case-I) i =1 i =1 i =1
J *
J
*
∂L(μ , σ ) 1 J
= Ri
g ( zi:m:n ) J g ' ( zi:m:n )
− + RJ*
g (V ) − Di e μi + RJ* e μ J Di X i2:m:n e μi + RJ* S 2 e μ J = 0.
=0 i =1 i =1
∂μ σ i =1 G ( zi:m:n ) i =1 g ( zi:m:n ) G (V )
(32) (37)
The above two equations (36) and (37) can be
∂L (μ , σ ) 1 J g (z i:m:n ) J g ' (z i:m:n ) g (V )
= Ri z i:m:n − z i:m:n + R J*V − J = 0. written as
∂μ σ i =1 G (z i:m:n ) i =1 g (z i:m:n ) G (V )
(33) (c1' − c2' − J )σ + d1' − μ c1' = 0 (38)
545
ON TYPE-II PROGRESSIVELY HYBRID CENSORING
The solution to the preceding equations yields Consider only positive root of σ ; these
the approximate MLE’s are approximate estimators are equivalent but not
unbiased. Unfortunately, it is not possible to
(c1' − c2' − J )σ + d1' compute the exact bias of ߤ and ߪ theoretically
μ = (40) because of intractability encountered in finding
c1'
the expectation of B '2 − 4 A′C ′ .
− B′ + B '2 − 4 A′C ′
σ = (41)
2 A′
546
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 547-559 1538 – 9472/09/$95.00
The sample selection model has been studied in the context of semi-parametric methods. With the
deficiencies of the parametric model, such as inconsistent estimators, semi-parametric estimation methods
provide better alternatives. This article focuses on the context of fuzzy concepts as a hybrid to the semi-
parametric sample selection model. The better approach when confronted with uncertainty and ambiguity
is to use the tools provided by the theory of fuzzy sets, which are appropriate for modeling vague
concepts. A fuzzy membership function for solving uncertainty data of a semi-parametric sample
selection model is introduced as a solution to the problem.
Key words: Uncertainty, semi-parametric sample selection model, crisp data, fuzzy sets, membership
function.
547
SEMI-PARAMETRIC OF SAMPLE SELECTION MODEL USING FUZZY CONCEPTS
similar to Heckman (1976) for the bivariate real-number closed intervals and open intervals,
normal case where the first stage consisted of respectively (see Terano, et al., 1994).
semi-parametric estimation of the binary
selection model and the second stage consisted Semi-Parametric Estimation Model
of estimating the regression equation. Ichimura The study of the semi-parametric
and Lee (1990) proposed an extension of estimation model involves and considers the
applicability of a semi-parametric approach. It two-step estimation approach. The semi-
was shown that all models can be represented in parametric context is a frequently employed
the context of multiple index frameworks method for sample selection models (Vella,
(Stoker, 1986) and that it can be estimated by 1998) and is a hybrid between the two sides of
the semi-parametric least squares method if the semi-parametric approach (i.e., it combines
identification conditions are met. Andrews some advantages of both fully parametric and
(1991) proposed the establishment of asymptotic the completely nonparametric). Thus, parts of
series estimators for instant polynomial series, the model are parametrically specified, while
trigonometric series and Gallant’s Fourier non-parametric estimation issues are used for the
flexible form estimators, for nonparametric remaining part. As a hybrid, the semi-parametric
regression models and applied a variety of approach shares the advantages and
estimands in the regression model under disadvantages of each, in terms that allow a
consideration, including derivatives and more general specification of the nuisance
integrals of the regression function (see also parameters. In semi-parametric models, the
Klein & Spady, 1993; Gerfin, 1996; Vella, 1998; estimators of the parameters of interest are
Martin, 2001; Khan & Powell, 2001; Lee & consistent under a broader range of conditions
Vella, 2006). than for parametric models but more precise
Previous studies in this area (converging to the true values at the square root
concentrated on the sample selection model and of the sample size) than their nonparametric
used parametric, semi-parametric or counterparts.
nonparametric approaches. None of the studies For a correctly-specified parametric
conducted analyzed semi-parametric sample model, estimators for semi-parametric models
selection models in the context of fuzzy are generally less efficient than maximum
environment like fuzzy sets, fuzzy logic or fuzzy likelihood estimators yet maintain the sensitivity
sets and systems (L. M. Safiih, 2007). of misspecification for the structural function or
This article introduces a membership other parametric components of the model. In
function of a sample selection model that can be the semi-parametric approach, the differences
used to deal with sample selection model arise from the weaker assumption of the error
problems in which historical data contains some term in contrast to the parametric approach. In
uncertainty. An ideal framework does not this study a two-step semi-parametric approach
currently exist to address problems in which a is considered, which generalizes Heckman’s
definite criterion for discovering what elements two-step procedure. According to Härdle, et al.
belong or do not belong to a given set (Miceli, (1999), Powell (1987) considered a semi-
1998). A fuzzy set, defined by fuzzy sets in a parametric self-selection model that combined
universe of discourse (U) is characterized by a the two equation structure of (2.1) with the
membership function and denoted by the following weak assumption about the joint
function μA, maps all elements of U that take the distribution of the error terms. For example, the
values in the interval [0, 1] that is participation equation of the first step is
A : X → [0, 1] (Zadeh, 1965). The concept of estimated semi-parametrically by the DWADE
fuzzy sets by Zadeh is extended from the crisp estimator (Powell, et al., 1989), while applying
sets, that is, the two-valued evaluation of 0 or 1, the Powell (1987) estimator for the second step
{0, 1}, to the infinite number of values from 0 to of the structural equation.
1, [0, 1]. Brackets { } are used in crisp to Representation of Uncertainty
indicates sets, whereas square [ ] brackets and Towards representing uncertainty
parentheses ( ) are used in fuzzy sets to denote various approaches can be considered. In this
548
SAFIIH, KAMIL & OSMAN
study, the representation of uncertainty number based on a special type of fuzzy number
identified variables by commonly used containing three parameters: the grade starts at
approaches, that is, putting a range and a zero, rises to a maximum and then declines to
preference function to the desirability of using zero again as the domain increases with its
that particular value within the range. In other nature; that is, the membership function
words, it is similar to the notion of fuzzy number increases towards the peak and decreasing away
and membership function which is the function from it, and can be represented as a special form
μ A that takes the values in the interval [0, 1], as:
that is, A : X → [ 0,1] . For more details about
representation of uncertainty, this article ( x − c)
(n − c) if x ∈ [c, n]
concentrates on using fuzzy number and
membership function. 1 if x = n
Generally, a fuzzy number represents an μ A ( x) =
approximation of some value which is in the (d − x) if x ∈ [n, d ]
(l ) (l ) (l ) (l )
interval terms [c , d ], c ≤ d for l 0, 1, ( d − n)
..., n , and is given by the α - cuts at the α - 0 otherwise
levels μ l with μ l = μ l −1 + Δμ , μ 0 = 0 and
μ n = 1 . A fuzzy number usually provides a The graph of a typical membership function is
illustrated in Figure 1.
better job set to compare the corresponding crisp
values. As widely used in practice, each α-cuts
α
A of fuzzy set A are closed and related with Figure 1: A Triangular Fuzzy Number
an interval of real numbers of fuzzy numbers for μ A (x)
all α ∈ (0,1] and based on the coefficient A(x ) :
if A ≥ α then α A = 1 and if α A < α then
α
α
A = 0 which is the crisp set α A depends on α . 1
Closely related with a fuzzy number is
the concept of membership function. In this
concept, the element of a real continuous
number in the interval [0, 1], or a number
representing partial belonging or degree of
membership are used. Referring to the definition 0
c n d
of the membership function, setting the
membership grades is open either subjectively to
the researcher, depending on his/her intuition, From that function, the α -cuts of a triangular
experience and expertise, or objectively based fuzzy number can be define as a set of closed
on the analysis of a set of rules and conditions intervals as
associated with the input data variables. Here,
choosing the membership grades is done [( n − c )α + c, ( n − d )α + n ], ∀α ∈ (0,1]
subjectively, i.e., reflected by a quantitative
phenomenon and can only be described in terms
of approximate numbers or intervals such as For the membership function μ A (x) , the
“around 60,” “close to 80,” “about 10,” assumptions are as follows:
“approximately 15,” or “nearly 50.” However,
because of the popularity and ease of (i) monotonically increasing function for
representing a fuzzy set by the expert - membership function μ A (x) with
especially when it comes to the theory and μ A ( x) = 0 and lim μ A ( x) = 1 for x ≤ n.
applications - the triangular membership x →∞
function is chosen. It is called a triangular fuzzy
549
SEMI-PARAMETRIC OF SAMPLE SELECTION MODEL USING FUZZY CONCEPTS
(ii) monotonically decreasing function for where l ≤ m ≤ u , x is a model value with l and
membership function μ A (x) with u be a lower and upper bound of the support of
μ A ( x) = 1 and lim μ A ( x ) = 0 for x ≥ n. A respectively. Thus, the triangular fuzzy
x →∞ number is denoted by (l , m , u ) . The support of
A is the set elements { x ∈ ℜ | l < m < u} . A
The α -cuts and LR Representation of a Fuzzy
non-fuzzy number by convention occurs when
Number
Prior to delving into fuzzy modeling of
l = m = u.
PSSM, an overview and some definitions used
in this article are presented (Yen, et al., 1999; Theorem 1:
Chen & Wang, 1999); the definitions and their The values of estimator coefficients of
properties are related to the existence of fuzzy the participation and structural equations for
set theory and were introduced by Zadeh (1965). fuzzy data converge to the values of estimator
coefficients of the participation and structural
Definition: the fuzzy function is defined by equations for non-fuzzy data respectively
~ ~ ~ ~ whenever the value of α − cut tends to 1 from
f : X × A → Y ; Y = f ( x, A), where below.
550
SAFIIH, KAMIL & OSMAN
and
L(YC ) = R(YC ) = 1(α max ) . Thus, Theorem 2 is proven.
551
SEMI-PARAMETRIC OF SAMPLE SELECTION MODEL USING FUZZY CONCEPTS
Methodology ( w − wil )
Development of Fuzzy Semi-Parametric Sample (w − w ) if w ∈ [ wim , wim ]
Selection Models im il
assumptions. Thus, the sample selection model 0 otherwise
is now called a semi-parametric of sample and
selection model (SPSSM).
In the development of SPSSM modeling ( z − zil )
using the fuzzy concept, as a development of
( z − z ) if z ∈ [ zim , zim ]
im il
fuzzy PSSM, the basic configuration of fuzzy 1 if z = zim
modeling is still considered as previously μ zi sp ( z ) =
mentioned (i.e., involved fuzzification, fuzzy ( ziu − zim ) if z ∈ [ z , z ]
environment and defuzzification). For the ( ziu − zim ) im iu
552
SAFIIH, KAMIL & OSMAN
~ ~ ' γ + ε~
zi*sp = w i = 1,..., N ∞
zμ
i sp i sp
z i ( z )dz
1
di = 1 if d = ~
x β + u~i sp > 0,
∗
* ' Z icsp = −∞
∞
= ( Z il + Z im + Z iu ).
i i sp
3
di = 0 otherwise i = 1,..., N
μ
−∞
z i ( z )dz
fuzzy numbers with the membership functions β sp with the SPSSM approach, applying the
μW~ , μ X~ , μZ~ , με~i and μu~i , procedure as in Powell, then the parameter is
isp isp isp sp sp
estimated for the fuzzy semi-parametric sample
respectively. Because the distributional selection model (fuzzy SPSSM). Before
assumption for the SPSSM is weak, for the obtaining a real value for the fuzzy SPSSM
analysis of the fuzzy SPSSM it is also assumed coefficient estimate, first the coefficient
that the distributional assumption is weak. estimated values of γ and β are used as a
To determine an estimate for γ and β shadow of reflection to the real one. The values
of the fuzzy parametric of a sample selection
of γˆ and β̂ are then applied to the parameters
model, one option is to defuzzify the fuzzy
~ ~ ~ of the parametric model to obtain a real value for
observations Wisp' , X i'sp and Z i∗sp . This means the fuzzy SPSSM coefficient estimates of
converting the triangular fuzzy membership real- γ sp , β sp , σ εi , uisp . The Powell (1987) SPSSM
value into a single (crisp) value (or a vector of sp
values) that, in the same sense, is the best procedure is then executed using the XploRe
representative of the fuzzy sets that will actually software.
be applied. The centroid method or the center of The Powell SPSSM procedure combines
gravity method is used to compute the outputs of the two-equation structure as shown above but
the crisp value as the center of the area under the has a weaker assumption about the joint
distribution of the error terms:
curve. Let Wic sp , X icsp , and Z ic∗ sp
be the
~ ~ f (ε isp , uisp | wisp ) = f (ε isp , uisp | wi'sp γ ).
defuzzified values of Wi sp , X isp , and Z i∗sp
respectively. The calculation of the centroid
Wic sp , X ic , and Z ic∗ respectively For this reason, it is assumed that the joint
method for sp sp
densities of ε i sp , ui sp (conditional on wi sp ) are
is via the following formulas:
smooth but unknown functions f (⋅) that
∞
depend on wi sp only through the linear model
wμ i
w ( w)dw
1 wi'sp γ . Based on this assumption, the regression
Wicsp = −∞
∞
= (Wil + Wim + Wiu ),
3 function for the observed outcome zi takes the
μ
−∞
i
w ( w)dw
following form:
∞
E ( zi | xisp ) = E ( zi*sp | wisp , d i*sp > 0)
xμ x i ( x)dx
1
X icsp = −∞
= ( X il + X im + X iu ), = wi'sp γ + E (uisp | wisp , xi'sp β > −ε isp )
∞
3
−∞
μ x i ( x)dx = wi'sp γ + λ ( xi'sp β )
and
553
SEMI-PARAMETRIC OF SAMPLE SELECTION MODEL USING FUZZY CONCEPTS
where λ (⋅) is an unknown smooth function. where g (⋅) is an unknown, smooth function.
The Powell idea of SPSSM is based upon two Using this and given β̂ , the second step consists
observations, i and j , with conditions
of estimating γ . Executing the Powell
wisp ≠ w j sp but wi'sp γ = w'j sp γ . With this procedure by XploRe software takes the data as
condition, the unknown function λ (⋅) can be input from the outcome equation ( x and y ,
differenced out by subtracting the regression where x may not contain a vector of ones). The
functions for i and j : vector id containing the estimate for the first-
step index xi'sp β̂ , and the bandwidth vector h
E ( zi∗sp | w = wisp ) − E ( z ∗jsp | w = w jsp ) where h is the threshold parameter k that is used
for estimating the intercept coefficient from the
= ( w isp − w jsp ) ' γ + λ ( xi'sp β ) − λ ( x 'jsp β ) first element. The bandwidth h from the second
= ( w isp − w jsp ) ' γ element (not covered in this study) is used for
estimating the slope coefficients. For fuzzy
PSSM, the above procedure is followed, and
This is the basic idea underlying the γ estimator then another set of crisp values Wicsp , X icsp and
proposed by Powell (1987). Powell’s procedure
Zicsp are obtained. Applying the α -cut values
is from the differences, regress zi on differences
on the triangular membership function of the
in wisp , as the concept of closeness with two ~ ~ ~
fuzzy observations Wi sp , X i sp and Z i sp with the
estimated indices xi'sp β̂ and x'jsp β̂ (hence
original observation, fuzzy data without α -cut
λ ( xi' βˆ ) − λ ( x 'j βˆ ) ≈ 0) . Thus, γ can be and fuzzy data with α -cut to estimate the
sp sp
parameters of the fuzzy SPSSM. The same
estimated by a weighted least squares estimator:
procedure above is applied. The parameters of
−1
the fuzzy SPSSM are estimated. From the
n N N various fuzzy data, comparisons will be made on
γˆPowell = ϖˆ ij N(wiap − wjsp )(wiap − wjap )' ×
2 i=1 j=i+1 the effect of the fuzzy data and α -cut with
original data on the estimation of the SPSSM.
n −1 N N
ϖˆ ij N(wiap − wjap )(ziap − z jap )
2 i=1 j=i+1 Data Description
The data set used for this study is from
the 1994 Malaysian Population and Family
Where weights ϖ̂ ij N are calculated by Survey (MPFS-94). This survey was conducted
' ˆ ˆ by National Population and Family
1 xi sp β − x j sp β
'
554
SAFIIH, KAMIL & OSMAN
(for example, no recorded family income in The second equation called the wage equation.
1994, etc.), were removed from the sample. The dependent variable used for the analysis was
The resulting sample data consisted of the log hourly wages (z ) . While, the
1,100 married women, this accounted for 39.4% independent variables were EDU, PEXP
who were employed, the remaining 1,692 (potential work experience divided by 10),
(60.6%) were considered as non-participants. PEXP2 (potential experience squared divided by
The data set used in this study consisted of 2,792 100), PEXPCD (PEXP interacted with the total
married women. Selection rules (Martins, 2001) number of children) and PEXPCHD2 (PEXP2
were applied to create the sample criteria for interacted with the total number of children).
selecting participant and non participant married Both the participation and wage equations were
women on the basis of the MPFS-94 data set, considered as the specification I and II
which are as follows: respectively, that is, the most basic one of SSM.
a) Married and aged below 60; According to Kao and Chin (2002), the
b) Not in school or retired; regression parameters ( β , γ ) should be
c) Husband present in 1994; and estimated from the sample data and, if some of
d) Husband reported positive earnings for
the observations in the equation X ij and Y i are
1994.
fuzzy, then they fall into the category of fuzzy
Study Variables regression analysis. For the data used in this
Following the literature (see Gerfin, study, it was assumed that uncertainty was
1996; Martins, 2001; Christofides, et al., 2003), present, therefore, instead of crisp data, fuzzy
the model employed in this study consists of two data are more appropriate. In the participation
equations or parts. The first equation - the equation, fuzzy data was used for the
probability that a married women participates in independent variables (x) : AGE (age in year
the labor market - is the so-called participation divided by 10), AGE2 (age square divided by
equation. Independent variables involved are: 100) HW (log of husband’s monthly wage). For
AGE (age in years divided by 10), AGE2 (age the wage equation, fuzzy data used for the
squared divided by 100), EDU (years of dependent variable was the log hourly wages
education), CHILD (the number of children (z ) while the independent variables (x) for
under age 18 living in the family), HW (log of fuzzy data involved the variables PEXP
husband’s monthly wage). The standard human (potential work experience divided by 10),
capital approach was followed for the PEXP2 (potential experience squared divided by
determination of wages, with the exception of 100), PEXPCD (PEXP interacted with the total
potential experience. The potential experience number of children) and PEXPCHD2 (PEXP2
variable in the data set was calculated by age- interacted with the total number of children). In
edu-6 rather than actual work experience. In our study, the observations in the fuzzy
order to manage these problems a method participation and fuzzy wage equations involved
advanced by Buchinsky (1998) was used. fuzzy and non-fuzzy data, i.e. a mixture between
Instead of considering the term fuzzy and non-fuzzy data, thus the variables fall
Qw = ξ1 EXP + ξ 2 EXP in the wage equation
2
into the category of fuzzy data (Kao and Chyu,
i.e., EXP is the unobserved actual experience, 2002). For instance, the exogenous variables
we use the alternative for women’s time is child AGE, AGE2 and HWS in the participation and
rearing and the home activities related to child the variables PEXP, PEXP2, PEXPCHD and
rearing, then the specification of Qz given by: PEXPCHD2 in the wage equations are in the
form of fuzzy data. These fuzzy exogenous
~ ~ ~
Q z = γ 1 PEXP + γ 2 PEXP 2 variables are denoted as A G E , A G E 2 H W S
~ ~ ~
and PEXP , PEXP 2 , PEX P CHD ,
+ γ 3 PEXPCHD + γ 4 PEXPCHD 2 ~
PEX P CHD 2 , respectively. In accord with
general sample selection model, the exogenous
variables EDU and CHILD in the participation
555
SEMI-PARAMETRIC OF SAMPLE SELECTION MODEL USING FUZZY CONCEPTS
and the exogenous variable EDU in the fuzzy significant at the 5% level. This is an interesting
wage equation are considered as non-fuzzy data. finding and it should be pointed out that using
However EDU and CHILD are considered as this approach the standard errors for the
fuzzy data. parameter were much smaller when compared to
those in conventional SPSSM. This provides
Results evidence that this approach is better in
A semi-parametric estimation obtained due to estimating coefficients and provides a
the so-called curse of dimensionality and considerable efficiency gain compared to those
asymptotic distribution is unknown. Here the in the conventional semi-parametric model. In
results that applied to the most basic estimators addition, the coefficient estimated from
are presented; that is, the participant and wage FSPSSM was considerably close to the
equation of the DWADE estimator and the coefficient estimated with conventional SPSSM.
Powell estimator, respectively. Both estimators Hence, the coefficient estimated from FSPSSM
are consistent with n − consistency and is consistent even though it involves uncertain
data.
asymptotic normality.
The Wage Equation in the Wage Sector
Participation Equation in the Wage Sector
The wage equation using the Powell
The participation equation using the
estimator of SPSSM is presented in Table 2 with
DWEDE estimator is presented in Table 1 along
FSPSSM results for comparison purposes. The
with FSPSSM results for comparison purposes.
first column used the Powell estimator with
The first column used the DWADE estimator
bandwidth values h = 0.2 without the constant
with bandwidth values h = 0.2 without the
terms. The other columns show results given by
constant terms. The DWADE estimator shares
the fuzzy semi-parametric sample selection
the ADE estimator of the semi-parametric
model (FPSSM) with α − cuts 0.0, 0.2, 0.4, 0.6
sample selection model (SPSSM). This is
and 0.8 respectively.
followed by the fuzzy semi-parametric sample
At first the coefficient estimate
selection model (FPSSM) with α − cuts 0.0, 0.2,
suggested that the whole variable was significant
0.4, 0.6 and 0.8 respectively. At first the
(significant and negatively estimated coefficient
estimate coefficient suggests that all variables
on EDU, PEXP2 and PEXPCHD, while a
except AGE are significant (significantly and
positive and significant coefficient was
negatively estimated coefficient on AGE2 and
estimated for PEXP and PEXPCHD2). As the
CHILD, while a positive and significant
estimated coefficient, the results for whole
coefficient was estimated for EDU and HW).
variable statistical significance at the 5% level
However, only CHILD shows a statistically
resulted in a significant result. The results reveal
significant effect at the 5% level – an
significant differences between the SPSSM
unexpected and important result. Although in the
compared to the PSSM method of correcting
conventional parametric model, it appears
sample selectivity bias. This increased the
together with EDU, in the context of SPSSM,
results obtained in SPSSM where not all
only estimates of the CHILD effect appears to
variables in PSSM contributed significantly
be significantly relevant, which is more aligned
regarding married women involved in wage
with economic theory.
sectors.
For comparison purposes, the FSPSSM
For comparison purposes it was then
was used. The estimated coefficient gives a
applied with the FSPSSM. The estimated
similar trend with the SPSSM (i.e., significant
coefficient was significant for all variables. The
for variables AGE2, EDU, CHILD and HW).
results show significant and positive coefficient
The results show a significant and positive
estimates for PEXP and PEXPCHD2, significant
coefficient estimate for EDU and HW, and a
but negative estimated coefficients on EDU,
significant but negative estimated coefficient on
PEXP2 and PEXPCHD. The coefficient for all
AGE2 and CHILD. In the FSPSSM context, the
variables appears to be relevant with statistical
CHILD coefficient appears to be statistically
556
SAFIIH, KAMIL & OSMAN
significance at the 5% level. It should be noted words, applying FSPSSM, the coefficient
that, the standard errors for the parameter EDU, estimated is consistent even though the data may
PEXP and PEXP2 were much smaller when contain uncertainties.
compared to those in the conventional SPSSM.
This provides evidence that this method is Conclusion
considerably more efficient than the For comparison purposes of the participant
conventional semi-parametric model. The equation, the estimated coefficient and
coefficient estimated obtained from FSPSSM is significant factor gives a similar trend as the
also considerably close to the coefficient SPSSM. However, an interesting finding and the
estimated via conventional SPSSM. In other most significant result appears by applying the
Table1: Semi-Parametric and Fuzzy Semi-Parametric Estimates for the Participation Equation
Coefficients
Fuzzy Selection Model
Participation DWADE
Equation α = 0.8 α = 0.6 α = 0.4 α = 0.2 α = 0.0
−0.002048 −0.0015393 −0.0043978 −0.0015934 −0.0016184 −0.001642
AGE
(1.233) (1.150) (1.151) (1.234) (1.232) (1.232)
−0.00016099 −0.00016584 −0.00020722 −0.00016629 −0.00016651 −0.00016673
AGE2
(0.1754) (0.1624) (0.1627) (0.1763) (0.1765) (0.1767)
0.00034766 0.00023044 0.00011323 0.00023044 0.00023044 0.00023044
EDU
(0.02116) (0.02015) (0.02015) (0.02115) (0.02062) (0.02062)
−0.0039216* −0.0044301* −0.0048986* −0.0044301* −0.0044301* −0.0044301*
CHILD
(0.06573) (0.06341) (0.0634) (0.06571) (0.06485) (0.06484)
0.044008 0.050262 0.05597 0.049549 0.049189 0.048832
HW
(0.1632) (0.1402) (0.1396) (0.1485) (0.1437) (0.1432)
*5% level of significance
Table 2: Semi-Parametric and Fuzzy Semi-Parametric Estimates for the Wage Equation
Coefficients
Wage Fuzzy Selection Model
Powell
Equation α = 0.8 α = 0.6 α = 0.4 α = 0.2 α = 0.0
−0.0112792 −0.0109003 −0.010939 −0.011346 −0.011385 −0.0114256
EDU
(0.005262) (0.005258) (0.005258) (0.005259) (0.005259) (0.005258)
0.544083* 0.540864* 0.538776* 0.534385* 0.532247* 0.530069*
PEXP
(0.1099) (0.1096) (0.1094) (0.1093) (0.1092) (0.109)
−0.160272* −0.159762* −0.159524* −0.158781* −0.158525* −0.158259*
PEXP2
(0.02633) (0.0263) (0.0263) (0.02632) (0.02632) (0.02632)
−0.161205* −0.159863* −0.159583* −0.15889* −0.158584* −0.158262*
PEXPCHD
(0.02453) (0.02453) (0.02455) (0.02459) (0.02461) (0.02463)
0.046591* 0.0463242* .0462221* 0.0458118* .0457004* 0.0455835*
PEXPCHD2
(0.008485) (0.008485) (0.008493) (0.008508) (0.008511) (0.008517)
*5% level of significance
557
SEMI-PARAMETRIC OF SAMPLE SELECTION MODEL USING FUZZY CONCEPTS
FPSSM, that is, the FSPSSM is a better estimate Cosslett, S. (1991). Semiparametric
when compared to the SPSSM in terms of the estimation of a regression models with sample
standard error of the coefficient estimate. The selectivity. In Barnett, W. A., Powell, J., &
standard errors of the coefficient estimate for the Tauchen, G. E. (Eds.), 175-198. Nonparametric
FSPSSM are smaller when compared to the and semiparametric estimation methods in
conventional SPSSM. This is evidence that the econometrics and statistics. Cambridge, MA:
FSPSSM approach is much better in estimate Cambridge University Press.
coefficient and results in considerable efficiency Gallant, R., & Nychka, D. (1987). Semi-
gain than the conventional semi-parametric nonparametric maximum likelihood estimation.
model. The coefficient estimate obtained was Econometrica, 55, 363-390.
also considerably close to the coefficient Gerfin, M. (1996). Parametric and
estimate of conventional SPSSM, hence semiparametric estimation of the binary
providing evidence that the coefficient estimate response model of labor market participation.
is consistent even when data involves Journal of Applied Econometrics, 11, 321-339.
uncertainties. Härdle, W., Klinke, S., & Müller, M.
The wages equation is similar to the (1999). Xplore learning guide. Berlin: Springer-
PSSM in terms of the coefficient estimation and Verlag.
significance factors. However, applying the Heckman, J. J. (1976). The common
FPSSM resulted in the most significant results structure of statistical models of truncation,
when compared to the PSSM, the coefficient sample selection, and limited dependent
estimates of most variables had small standard variables, and a simple estimation for such
errors. The rest is considerably close to the models. Annals of Economic and Social
standard error of SPSSM. As a whole, the Measurement, 5, 475-492.
FSPSSM gave a better estimate compared to the Heckman, J. J. (1979). Sample selection
SPSSM. In terms of consistency the coefficient bias as a specification error. Econometrica, 47,
estimate for all variables of FSPSSM were not 153-161.
much different to the coefficient estimates of Ichimura, H., & Lee, L. F. (1990).
SPSSM even though the values of the α − cuts Semiparametirc least square estimation of
increased (from 0.0 to 0.8). In the other words, multiple index models: Single equation
by observing the coefficient estimate and estimation. In Barnett, W. A., Powell, J., &
consistency, fuzzy model (FPSSM) performs Tauchen, G. E. (Eds.), 175-198. Nonparametric
much better than the model without fuzzy and semiparametric estimation methods in
(PSSM) for the wage equation. econometrics and statistics. Cambridge, MA:
Cambridge University Press.
References Kao, C., & Chin, C. L. (2002). A fuzzy
Andrews, D. W. K. (1991). Asymptotic linear regression model with better explanatory
normality of series estimation for nonparametric power. Fuzzy Sets and Systems, 126, 401-409.
and semiparametric regression models. Khan, S., & Powell, J. L. (2001). Two-
Econometrica, 59, 307-345. step estimation of semiparametric censored
Buchinsky, M. (1998). The dynamics of regression models. Journal of Econometrics,
changes in the female wage distribution in the 103, 73-110.
USA: A quantile regression approach. Journal of Klein, R., & Spady, R. (1993). An
Applied Econometrics, 13, 1-30. efficient semiparametric estimator of the binary
Chen, T., & Wang, M. J. J. (1999). response model. Econometrica, 61(2), 387-423.
Forecasting methods using fuzzy concepts. Lee, M-J., & Vella, F. (2006). A semi-
Fuzzy Sets and Systems, 105, 339-352. parametric estimator for censored selection
Christofides, L. N., Li, Q., Liu, Z., & models with endogeneity. Journal of
Min, I. (2003). Recent two-stage sample Econometrics, 130, 235-252.
selection procedure with an application to the
gander wage gap. Journal of Business and
Economic Statistics, 21(3), 396-405.
558
SAFIIH, KAMIL & OSMAN
559
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 560-565 1538 – 9472/09/$95.00
Daniel Eni
Federal University of Petroleum
Resources, Effurun- Nigeria
The performance of an autocovariance base estimator (ABE) for GARCH models against that of the
maximum likelihood estimator (MLE) if a distribution assumption is wrongly specified as normal was
studied. This was accomplished by simulating time series data that fits a GARCH model using the Log
normal and t-distributions with degrees of freedom of 5, 10 and 15. The simulated time series was
considered as the true probability distribution, but normality was assumed in the process of parameter
estimations. To track consistency, sample sizes of 200, 500, 1,000 and 1,200 were employed. The two
methods were then used to analyze the series under the normality assumption. The results show that the
ABE method appears to be competitive in the situations considered.
560
ENI
conditions. These were again shown by Ling and average parameter Β i . Hence, i = q + 1 … q + p
McAleer (2003) under second moment is used to obtain the estimator:
conditions. If the assumption of normality is
satisfied by the data, then the method will
Vq +1 Vq Vq −1 Vq −( p −1) (α1 + Β1 )
produce efficient estimates; otherwise,
Vq + 2 Vq −1 Vq Vq −( p −2) (α 2 + Β2 )
inefficient estimates will be produced. Engle and =
Gonzalez-Rivera (1991) studied the loss of
estimation efficiency inherent in QMLE and Vq + max ( p ,q ) Vq −( p −1) Vq −( p −2) Vq p (α + Β q )
concluded it may be severe if the distribution (3)
density is heavy tailed.
The QMLE estimator requires the use of Where Vi is the set of variances associated with
a numerical optimization procedure which equation (2). The autoregressive parameters
depends on different optimization techniques for α i + Β i are obtained by solving (3).
implementation. This potentially leads to
different estimates, as shown by Brooks, Burke Eni and Etuk (2006) have shown that the
and Perssand (2001) and McCullough and moving average parameters Bi can be obtained
Renfro (1999). Both studies reported different from
QMLE estimates across various packages using
different optimization routines. These techniques Vi Vi −1 VI − p Φ0
estimate time-varying variances in different p Vi +1 Vi Vi −( p −1) −Φ
1=
ways and may result in different interpretations f ( Φi )
and predictions with varying implications to the i =0
economy. To resolve these problems, Eni and Vi + q Vi − q −1 Vi + q − p −Φ p
Etuk (2006) developed an Autocovariance Base
B0 − B1 − Bq −1 − Bq B
Estimator (ABE) for estimating the parameters
0
of GARCH models through an ARMA
transformation of the GARCH model equation. − B1 − B2 − Bq 0 − B1
The purpose of this article is to rate the
performance of the ABE when the normality 2
assumption is violated. σ a
The Autocovariance Base Estimator (ABE)
Consider the GARCH (p, q) equation − Bq −1 − Bq 0 0 − Bq −1
− Bq 0 0 0 − Bq
p q
ht = w0 + α i et2−1 + B j ht −1 , (1)
i =1 j =1
or
p
or its ARMA (Max (p, q), q) transform
f ( Φ )V Φ = σ
i =0
i
2
a Bb (4)
max ( p , q ) q
where
ε = w0 +
t
2
(α
i =1
i + Bi )ε t2−1 − BJ at −1 + B0 at
j =1
(2)
f ( Φ i ) = −Φ i , f ( Φ 0 ) = Φ 0 , Φ i = (α i + Βi ) .
є~ N(0, σ2).
p
561
ABE PERFORMANCE IN GARCH MODEL ESTIMATES FOR NON-NORMALITY
p B
where f(Br) and f’(Br) represent the vector
W = E (ε t2 )(1 − α i − Bi ). (9)
function (5) and its derivative evaluated at B=Br. i =1 i =1
Note that
B0 B1 Bq −1 Bq Methodology
The data generating process (DGP) in this study
involved the simulation of 1,500 data points
− B1 B2 Bq 0 with 10 replications using the random number
generator in MATLAB 5. The random number
2 generator in MATLAB 5 is able to generate all
f ′( B ) = σ a floating point numbers in the interval
2−53 , 1 − 2−53 . Thus, it can generate 2
1492
− Bq −1 Bq 0 0 values before repeating itself. Note that 1,500
data points are equivalent to 2
10.55
and, with 10
− Bq 0 0 0 replications; results in only 213.87267 data points.
Hence 1,500 data points with 10 replications
B0 B1 Bq −1 Bq were obtained without repetitions. Also, a
program implementation was used for ARMA to
find the QMLE (McLeod and Sales, 1983).
0 − B0 Bq − z Bq −1 Although normality would typically be assumed,
the data points were simulated using the Log
normal and the T-distribution with 5, 10 and 15
2
+ σ a degrees of freedom.
Of the 1,500 data points generated for
each process, the first 200 observations were
0 0 − B0 B1 discarded to avoid initialization effects, yielding
a sample size of 12,000 observations, with
results reported in sample sizes of 200, 500,
0 0 0 − B0
1,000 and 1,200. These sample presentations
enable tracking of consistency and efficiency of
the estimators. The relative efficiency of the
= σ r2 (D1+D2)
autocovariances based estimator (ABE) and the
quasi-maximum likelihood (QML) estimators
and (4) becomes were studied under this misspecification of
562
ENI
distribution function. The selection criteria used ABE except under the log normal distribution
was the Aikake information criteria (AIC). where ABE performed better than QMLE.
For simulating the data points, the Table 2 shows that the estimates using a
conditional variance equation for low sample size of 500 are better, although still poor.
persistence due to Engle and Ng (1993) was The performance bridge between QMLE and
adopted. ABE appears to be closing. This is observed
from the AIC of QMLE and ABE under the
ht = 0.2 + 0.05 ε t2−1 + 0.75ht −1 different probability distribution functions, with
one exception in the case of the log normality.
Surprisingly, the QMLE method failed to show
ε t2−1 = ht Z t2 consistency, but it is notable that the
performance of both methods was enhanced
and Z t2 is any of Z ~ t 5 or Z~t10 or Z~LN(0,1) under the t-distribution as the degrees of
freedom increase.
or Z~t15, where N = normality, tV = t-distribution Table 3 shows that both estimation
with V degree of freedom, and LN = log normal.
models, QMLE and ABE, had equal
performance ratings and gave consistent
Results
estimates in general. However, the ABE had an
Apart from the parameter setting in the DGP,
edge in its performance under t(5) and LN(0, 1)
selected studies of the parameter settings while QMLE had an edge under t(10) and t(15).
(W ,α , B ) = ( 0, 1, 0.15, 0.85) and The estimates under t(15) and t(10) were close to
their true values for both estimation methods.
(W , α , B ) = ( 0, 1, 0.25, 0.65) (Lumsdaine,
Finally, the results shown in Table 3 are further
1995), and (W,α, B) = (1, 0.3, 0.6) and confirmed by examining Table 4 where the two
methods have nearly equal ratings based on the
(W , α , B) = (1, 0.05, 0.9)(Chen, 2002) were values of their AIC.
also studied. The results obtained agree with the
results obtained from detailed studies of the Conclusion
DGP. It is shown is this study that the ABE method is
Table 1 shows the results from a sample adequate in estimating GARCH model
size of 200 data points. The table reveals that the parameters and can perform as well as the
estimates are poor for QMLE and ABE. On the maximum likelihood estimate for reasonably
basis of the Aikate information criteria (AIC), large numbers of data points when the
however, the QMLE performed better than the distribution assumption is misspecified.
Table 1: Performance Rating of QMLE and ABE for Sample Size n = 200
Estimation Method
QMLE ABE
Estimates W α B AIC W α B AIC
t (5) 0.16 0.01 0.77 -70.90 1.14 0.016 0.74 -65.312
563
ABE PERFORMANCE IN GARCH MODEL ESTIMATES FOR NON-NORMALITY
Table 2: Performance Rating of QMLE and ABE for Sample Size n = 500
Estimation Method
QMLE ABE
Estimates W α B AIC W α B AIC
t (5) 0.115 0.02 0.739 -132.341 0.15 0.025 0.73 -151.24
Table 3: Performance Rating of QMLE and ABE for Sample Size n = 1,000
Estimation Method
QMLE ABE
Estimates W α B AIC W α B AIC
t (5) 0.18 0.02 0.74 -137.12 0.19 0.03 0.73 -140.12
Table 4: Performance Rating of QMLE and ABE for Sample Size n = 1,200
Estimation Method
QMLE ABE
Estimates W α B AIC W α B AIC
t (5) 0.15 0.03 0.76 -162.11 0.18 0.039 0.73 -173.70
564
ENI
565
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 566-570 1538 – 9472/09/$95.00
A likelihood ratio test for testing the equality of the ranks of two non-negative definite covariance
matrices arising in the area of signal processing is derived. The asymptotic distribution of the test statistic
follows a Chi-square distribution from the general theory of likelihood ratio test.
566
BHANDARY & KUNDU
a px1 random noise vector for the ith population. where λ1 ≥ λ2 ≥ ... ≥ λk > σ 2 are the
The following are assumed about N i (t ) and
eigenvalues of R1( k ) and U1 ,..., U k are the
S i (t ) : corresponding orthonormal eigenvectors of
Ni (t ) ~ N p (0, Σi ), A1Ψ1 A1′ and similarly, μ1 ≥ μ 2 ≥ ... ≥ μ k > σ 2
Si (t ) ~ N p (0, Ψi ), are the eigenvalues of R2( k ) and V1 ,..., Vk are the
corresponding orthonormal eigen vectors of
and N i (t ) and Si (t ) are independently A2 Ψ 2 A2′ .
distributed. The null hypothesis to test is Thus, under H 0k the log-likelihood
H 0 k : q1 = q2 = k , and the alternative (apart from a constant term) is
hypothesis is H1 : q1 ≠ q2 , k = 0,1,..., p − 1. At
this point, the likelihood ratio test is derived next n1 n
log L = − log R1( k ) − 1 tr.( Rˆ1 ( R1( k ) ) −1 )
and asymptotic distribution of the test statistic is 2 2
used to obtain the critical value. n2 n
− log R2( k ) − 2 tr.( Rˆ 2 ( R2( k ) ) −1 )
Likelihood Ratio Test: Case 1
2 2
n n
Consider Σi = σ I p , i = 1, 2. The test = − 1 tr.( Rˆ1 ( R1( k ) ) −1 ) − 2 tr.( Rˆ 2 ( R2( k ) ) −1 )
2
2 2
hypotheses are: k k
n n
− 1 log λi − 2 log μi
H 0 k : q1 = q2 = k 2 i =1 2 i =1
and n
− ( p − k ) log σ 2
H1 : q1 ≠ q2 , k = 0,1,..., p − 1. 2
(2)
The observations from the two populations are where
as follows: X 1 (t1 ),..., X 1 (tn1 ) are i.i.d. 1 n1
Rˆ1 = X 1 (ti ) X 1′(ti ),
~ N p (0, A1Ψ1 A1′ + σ 2 I p ) and n1 i =1
X 2 (tn1 +1 ),..., X 2 (tn1 + n2 ) are i.i.d. n1 + n2
1
~ N p (0, A2 Ψ 2 A2′ + σ I p ). 2 Rˆ 2 =
n2
i = n1 +1
X 2 (ti ) X 2′ (ti )
+ n( p − k ) log σ 2
567
TEST FOR THE EQUALITY OF THE NUMBER OF SIGNALS
H0k σ
2
σ 2
p p
Similarly, V2 = I p is obtained. Hence, given 1 i
n l + n2 ξi
λi ' s, μi ' s and σ 2 , − n( p − k ) log i = k +1 i = k +1
n( p − k )
= L1 (say)
568
BHANDARY & KUNDU
cn
(ii) lim =∞ and cn is defined the same as previously.
n →∞ log log n
Hence log of likelihood ratio statistic is
L1 − L2 , where L*1 and L*2 are given by (6) and
* *
For practical purposes, choose cn = log n which
(7) respectively. The critical value for this test
satisfies conditions (i) and (ii). Hence,
can be approximated from the fact that
asymptotically, −2( L1 − L2 ) ~ χγ ( qˆ ,qˆ ,kˆ ) under
* * 2
kˆ kˆ
L*1 = −np − n1 log li − n2 log ξi
1 2
i =1 i =1
H 0 where,
p p
1 i
n l + n2 ξi γ (qˆ1 , qˆ2 , kˆ) =
ˆ
− n( p − k ) log i = ˆ +1
k i = kˆ +1
n( p − kˆ) (qˆ1 + qˆ2 − 2kˆ) + ( p − 1)(qˆ1 + qˆ2 − 2kˆ)
qˆ (qˆ − 1) qˆ2 (qˆ2 − 1)
+ kˆ(kˆ − 1) − 1 1 − .
(6) 2 2
Similarly, Likelihood Ratio Test: Case 2
Consider, Σi = σ i I p , i = 1, 2 . For case
2
L*2 = Sup log L
H1 2, the problem can be solved similarly and the
qˆ1 qˆ 2 problem is easier than that in case 1.
= − np − n1 log li − n2 log ξ i
i =1 i =1 Likelihood Ratio Test: Case 3
n l +n ξ
p p
1 2
1 1 2 2
assumed and in that case, the log likelihood must
n ( p − qˆ ) + n ( p − qˆ
1 1 2 2
)
be maximized with respect to the eigenvalues
(7) subject to the condition that the eigenvalues are
greater than 1, in which case the technique
presented by Zhao, Krishnaiah and Bai (1986a,
where, q̂1 and q̂2 are obtained such that b) can be used.
569
TEST FOR THE EQUALITY OF THE NUMBER OF SIGNALS
570
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 571-582 1538 – 9472/09/$95.00
A linear B-spline function was modified to model dose-specific response variation in developmental
toxicity studies. In this new model, response variation is assumed to differ across dose groups. The model
was applied to a developmental toxicity study and proved to be significant over the previous model of
singular response variation.
Key words: Developmental toxicity study, dose-response, interior knot, linear B-spline, response
variation, threshold.
571
SPLINE MODEL WITH DOSE-SPECIFIC RESPONSE VARIATION
Hunt and Rai (2003) first modeled the mi) in the ith dose group, let xij be the number of
observed variation by including a parameter that
fetuses that experience at least one adverse
equated to the interlitter response variation.
effect. Adverse effects include early and late
Subsequently, they modified their model to
fetal death and any kind of malformation
include additional parameters to account for the
(morphological, visceral or skeletal). An adverse
noticeable multiple response variation across
effect such as death supersedes malformation. If
dose groups (Hunt & Rai, 2007). They reported
significance with the model of dose-specific
Pj ( d i ) be the probability of a fetus in the jth
response variation compared to a model with litter indirectly exposed to the i th dose level, di ,
uniform variation across dose levels. They also experiencing an adverse event, then the
conducted simulations and found that the dose- proposed dose-response model is the following:
specific model with multiple variation
parameters led to unbiased estimation of all 3
model parameters, whether the true variance exp(θ k Bk ,2 (di , ξ ) + σ i zij )
structure was that of single or multiple Pj (di ) = k =1
3
parameters, whereas the single-parameter model
was less robust. These results are similar to 1 + exp(θ k Bk ,2 (di , ξ ) + σ i zij ) . (1)
k =1
those of Kupper, et al. (1986), who assumed the
beta-binomial distribution for the response exp(B 2 (di , ξ )θ + σ i zij )
=
number for each litter and a logistic dose- 1 + exp(B 2 (di , ξ )θ + σ i zij )
response model and found that the model with
the multiple-intralitter correlation structure
produced less biased results than the model that Here,
assumed a single-intralitter correlation.
Similar to Hunt and Rai (2007), the B 2 (di , ξ ) = ( B1,2 (di , ξ ), B2,2 (di , ξ ), B3,2 ( di , ξ ) )
model used in this study includes dose-specific is the set of (order 2, degree 1) linear B-splines,
parameters that estimate the response variation with 1 interior knot, ξ , defined on the dose
in addition to using polynomial regression
splines to fit to the dose-response pattern interval [d1 , d g ) , and derived recursively from
(Ramsay, 1988). The regression spline approach the order 1 (degree 0) B-splines Bk ,1 ( d i , ξ ) . If
was used formerly in a model with one
parameter for response variation (Li & Hunt, ξ1 = ξ2 = d1 , ξ3 = ξ , and ξ 4 = ξ5 = d g , then,
2004). As the polynomial, degree one (linear) the order 1 B-splines are given by:
was used and a set of B-splines was constructed
recursively to help fit the model. The theory of 1, d i ∈ [ξ k , ξ k +1 )
B-splines is described in de Boor (2001). Bk ,1 ( d i , ξ ) = k = 1, 2, 3,
Integral to this theory is the incorporation of 0, otherwise
interior knots as change points in the direction of (2)
the plotted curve. The ability to incorporate
these knots is desirable for data from and the order 2 B-splines formed recursively
developmental studies as the threshold is from (2) are given by:
inherently assumed.
di − ξk
Bk ,2 ( d i , ξ ) = B (d , ξ )
Methodology ξ k +1 − ξ k k ,1 i
In a developmental toxicity study, there are g . (3)
dose groups, each of which has a certain level of ξ − di
+ k +2 B (d , ξ )
a toxic substance. The ith dose group contains ξ k + 2 − ξ k +1 k +1,1 i
mi animals, and therefore litters (i = 1, …, g).
For nij implantations of the jth animal (j = 1, …,
572
LI & HUNT
k k ,2 i
i =1 j =1
i l
L (θ , σ , ξ ) = k =1
(7)
2
mi
exp( − z ij / 2)
∏∏( )
g
nij ∞ xij nij − xij
Pj ( d i )[1 − Pj ( d i )] dz ij
i =1 j =1
xij
−∞
2π A value of q = 20 was chosen for the
approximation; this value has been deemed
(4)
acceptable in many settings (Collett, 1991).
A profile-likelihood approach was used
where σ = (σ 1 ,..., σ g ) . Also note, the interior ~
to maximize (θ, σ, ξ ) in equation (7) and to
knot ξ is regarded as a parameter of the model
estimate the parameters of the model. The
that must be estimated in addition to the approach begins with a search over the dose
parameters θ and σ . The likelihood function in interval (0, d g ) , the domain of ξ . A large
equation (4) integrates out the random effect zij
from the joint distribution, thereby leaving a
number of G grid points ξt* : t = 1,..., G were { }
marginal function for the number of fetal chosen by
using the formula:
responses xij (see Collett 1991, p. 208). ξ = d g × t /(G + 1) . Once a fixed grid point ξ t*
t
*
Because (4) cannot be solved directly, was selected, the order 1 and 2 B-splines in
an approximation is used via the Gauss-Hermite equations (2) and (3) were calculated by
formula for numeric integration, given by: regarding the fixed grid point ξ t* as the interior
q
knot for that part of the search. As a result, the
∞
f (u) exp(−u 2 )du = al f (bl ) . (5) maximizer of the profile-likelihood (θ, σ, ξt ) ,
*
−∞
l =1
denoted by (θˆ t , σˆ t ) , can be found and it yields
Here q , the values of which al and bl depend,
~ ˆ
(θt , σˆ t , ξt* ) , t = 1, 2,..., G .
Hence, the
is chosen to approximate (5). The standardized maximum likelihood estimates are given by:
tables from which the values of q , al , and bl
may be found are in Abramowitz and Stegun (θˆ , σˆ , ξˆ) = arg max (θˆ ,σˆ ,ξ * ):t =1,2,...,G (θˆ t , σˆ t , ξt* ).
(1972).
{ t t t }
(8)
To approximate, first let zij = u 2 ,
To maximize the log-likelihood function
based on equation (5), take the log of the in equation (6), the Olsson version (1974) of the
likelihood function in equation (4), and Nelder-Mead simplex algorithm (1965) was
approximate the log-likelihood function by: used. The approach minimizes a function by
573
SPLINE MODEL WITH DOSE-SPECIFIC RESPONSE VARIATION
0 30 396 75 0.189
0.025 26 320 37 0.116
0.05 26 319 80 0.251
0.10 24 276 192 0.696
0.15 25 308 302 0.981
574
LI & HUNT
above the knot. Figure 2 shows the plot of the new model, it was found that the addition of
logistically transformed estimated dose-response multiple parameters affected the estimates of
curve based on equation (1) without the random non-variation model parameters and led to
effects. This plot is indicative of two separate statistical significance of the new model over the
linear functions below and above the interior prior one. The spline approach also is more
knot. robust than typical approaches that use standard
Figure 3 shows the plots of the two regression functions which restrict the dose-
estimated curves from the multiple- σ and response relationship to one pattern.
single- σ models, respectively. For the multiple- Another desirable feature of using
σ model, the interior knot 0.036 is very close to splines for fitting is that it includes models
being the threshold value as the below-knot which inherently assume a threshold. The
pattern follows closely to a horizontal line. For approach of using regression splines to account
the single- σ model, it is more of a linear for threshold effects has been used recently in
pattern of decreasing slope below the knot. This other fields. For example, Molinari, et al. (2001)
degree of difference can be important as used spline functions in place of the standard
threshold estimation is crucial aspect of this linear functions in a Cox regression analysis of
analysis. Although estimate itself is relatively survival data from several clinical trials and
close, the general dose-response relationship is Bessaoud, et al. (2005) used spline functions in
different and is indicative that the multiple- σ the logistic regression setting for analysis of
model is more appropriate in this situation. clinical data. The spline approach used in this
study is similar to these. Both other studies also
Conclusion extended their model to handle several
A linear B-spline threshold dose-response model covariates and indicated the practicality of using
was modified to include multiple parameters for linear splines to estimate an interior knot as a
modeling dose-specific response variation in threshold value. As they were dealing with
developmental toxicity study data. The previous larger data sets, both groups looked at cases of
model showed only singular response variation multiple knots and higher-order spline functions,
across dose groups. Upon application of this although neither went past cubic splines and 3
575
SPLINE MODEL WITH DOSE-SPECIFIC RESPONSE VARIATION
Figure 1: The Linear B-spline Basis on the Dose Interval [0, 0.15], with Estimated Interior Knot ξˆ = 0.036
Figure 2: The Fitted Curve (solid line) and 95% Point-Wise Confidence Interval
(dashed and dotted lines) for Logit(P)
576
LI & HUNT
Figure 3: Estimated Dose-Response Curves for the Multiple- σ Model (left) and the Single- σ Model (right)
knots. Bessaoud, et al. (2005) only used up to other possible dose-response patterns (Hunt &
quadratic splines and both indicated that cubic Li, 2006).
splines resulted in overfitting and that linear and Another advantage of the spline model
quadratic seemed appropriate. is the addition of parameters to more adequately
For developmental toxicity studies, the model the different degrees of response variation
existence of a threshold is inherent in current observed to occur across dose groups in a
guidelines and is accounted for in some manner developmental toxicity study. This more
during the risk assessment process, albeit accurately specified model improves the
indirectly (USEPA, 1991). The pure threshold estimation of important parameters such as the
model is a start in the direction of more threshold or, in the case of the spline model, the
adequately modeling these effects, yet the change point. As illustrated in Kupper, et al.
threshold itself has proved to be difficult to (1986) and in Hunt and Rai (2007), the model
ascertain, and any threshold estimated from such assuming multiple parameters to model response
a model may be specific to that data set, rather variation leads to negligible bias, whereas under
than being a universal value (Cox, 1987). The conditions of major differences in dose-specific
use of the splines to model behavior that is variation the model with single parameter may
common to threshold models may help address lead to extremely biased estimates. Thus, the
this issue. Although the interior knot estimated model that has multiple variation parameters is
in this study is not specifically the threshold the general model that should be used. However,
dose level, the model is useful in that it Hunt and Rai (2007) also showed that in cases of
identifies a change point in the direction of the relatively similar variation across dose groups,
dose-response pattern. It also inherently assumes the single parameter model may suffice.
threshold existence and robustly models several
577
SPLINE MODEL WITH DOSE-SPECIFIC RESPONSE VARIATION
The potential for future applications in Bessaoud, F., Daures, J. P., & Molinari,
this area include the fitting of higher degree N. (2005). Free knot splines for logistics models
polynomials; for example, the quadratic B-spline and threshold selection. Computer Methods and
might be a reasonable extension to the current Programs in Biomedicine, 77, 1-9.
linear B-spline approach. The most immediate Collett, D. (1991). Modelling binary
advantage is the relative smoothness of the data. Boca Raton, FL: CRC Press.
quadratic spline over the linear. However, Cox, C. (1987). Threshold dose-
disadvantages include overfitting. Also, the response models in toxicology. Biometrics, 43,
number of dose levels becomes a factor when 511-523.
adding additional knots into the estimation. The Crump, K. S. (1984). A new method for
combination of higher order and multiple knots determining allowable daily intakes.
could result in an overly complex model for this Fundamental and Applied Toxicology, 4, 854-
type of data. Due to the observed dose-response 871.
pattern of the data set under investigation in this De Boor, C. (2001). A Practical Guide
article, the linear spline model with one knot to Splines (Revised Edition). NY: Springer.
appears to provide reasonable fit. Hunt, D. L., & Li, C. S. (2006). A
The polynomial regression splines regression spline model for developmental
approach is a generally advantageous way to toxicity data. Toxicological Sciences, 92, 329-
model data from developmental toxicity studies. 334.
Rather than requiring a direct estimation of a Hunt, D. L., & Rai, S. N. (2003). A
threshold level, it is able to fit several dose- threshold dose-response model with random
response curves to the data and implicitly can effects in teratological experiments.
still indicate the existence of effects such as Communications in Statistics-Theory and
threshold. It is more robust than previously Methods, 32, 1439-1457.
employed threshold models to such data (Cox, Hunt, D. L., & Rai, S. N. (2007).
1987; Schwartz, et al., 1995; Hunt & Rai, 2003, Interlitter response variability in a threshold
2007). Additionally, the modification of having dose-response model. Submitted to
dose-specific variation allows for an even more Communications in Statistics-Theory and
robust model with less biased estimates. Methods.
Kupper, L. L., Portier, C., Hogan, M. D.
Acknowledgements & Yamamoto, E. (1986). The impact of litter
This work was supported by Grant Number UL1 effects on dose-response modeling in teratology.
RR024146 from the National Center for Biometrics, 42, 85-98.
Research Resources (NCRR), a component of Li, C. S., & Hunt, D. (2004). Regression
the National Institutes of Health (NIH), and NIH splines for threshold selection with application
Roadmap for Medical Research. (C. S. Li). This to a random-effects logistic dose-response
work was partially supported by Cancer Center model. Computational Statistics and Data
Support CA21765 from the National Institutes Analysis, 46, 1-9.
of Health, USA, and the American Lebanese Molinari, N., Daures, J. P., & Durand, J.
Syrian Associated Charities (ALSAC) (D. L. F. (2001) Regression splines for threshold
Hunt). selection in survival data analysis. Statistics in
Medicine, 20, 237-247.
References Nelder, J. A., & Mead, R. (1965). A
Abramowitz, M., & Stegun, I. A. simplex method for function minimization.
(1972). Handbook of Mathematical Functions Computer Journal, 7, 308-313.
with Formulas, Graphs and Mathematical Olsson, D. M. (1974). A sequential
Tables. Washington, DC: U.S. Government simplex program for solving minimization
Printing Office. problems. Statistical Computer Program, 6, 53-
57.
578
LI & HUNT
Appendix
~
To obtain the observed information matrix I (θ, σ , ξ ) , the second-order partial derivatives of (θ, σ , ξ ) in
~
equation (7) must be calculated with respect to (θ, σ , ξ ) and (θ, σ, ξ ) as:
g mi
(θ, σ, ξ ) = log( H ij ) .
i =1 j =1
q
exp(ηi + σ i bl 2)
a R
3
, ηi = θ Bk ,2 (di , ξ ) = B2 (di , ξ )θ ,
xij x
Here, H ij = l il Ril ij , Ril = k =1 k
l =1 1 + exp(ηi + σ i bl 2)
~
Ril = 1 − Ril , and xij = nij − xij . The first-order partial derivatives of (θ, σ, ξ ) with respect to (θ, σ , ξ ) are:
∂ (θ, σ, ξ ) ∂ g mi
∂θ
= log( H ij )
∂θ i =1 j =1
g mi
∂H ij
= H ij−1
i =1 j =1 ∂θ
g mi
q ∂ηi
i =1 j =1 l =1
x
(
= H ij−1 al xij Ril ij Ril ij − xij Ril ij Ril ij
x +1 x +1 x
∂θ
, )
∂ (θ, σ, ξ ) ∂ g mi
∂σ r
=
∂σ r
log( H
i =1 j =1
ij )
g mi
∂H ij
= H ij−1 I ( r =i )
i =1 j =1 ∂σ r
g mi
q
i =1 j =1 l =1
x
( x +1 x
)
= H ij−1 a xij Ril ij Ril ij − xij Ril ij Ril ij bl 2 I ( r =i ) , r = 1,..., g ,
x +1
579
SPLINE MODEL WITH DOSE-SPECIFIC RESPONSE VARIATION
Appendix (continued)
and
∂ (θ, σ, ξ ) ∂ g mi
∂ξ
=
∂ξ
log( H
i =1 j =1
ij )
g mi
∂H ij
= H ij−1
i =1 j =1 ∂ξ
g mi
q ∂ηi
i =1 j =1 l =1
x
(
= H ij−1 al xij Ril ij Ril ij − xij Ril ij Ril ij
x +1 x +1 x
∂ξ
, )
where
∂ηi ∂
3
T
= T θ k Bk ,2 ( d i , ξ ) = B1,2 ( d i , ξ ), B2,2 ( d i , ξ ), B3,2 ( d i , ξ ) = B 2 ( d i , ξ ) , and
∂θ ∂θ k =1
∂ηi d −d d −d d − di d − dg
= θ1 i 1 2 I ( di <ξ ) + θ 2 1 i 2 I ( di <ξ ) + g I ( di >ξ ) + θ 3 i I .
∂ξ (ξ − d1 ) (ξ − d ) ( d − ξ ) 2 ( d − ξ ) 2 ( d i >ξ )
1 g g
g mi
∂ 2 (θ, σ, ξ ) ∂H ij ∂H ij
−1
∂ 2 H ij
− = − −1
T + H ij
∂θ∂θT ∂θ∂θT
∂θ ∂θ
i =1 j =1
g mi
q ∂ηi ∂ηi
i =1 j =1 l =1
x
(
= − H ij−1 al xij2 Ril ij Ril ij − ( xij + 2 xij xij + xij ) Ril ij Ril ij + xij2 Ril ij Ril ij
x +2 x +1 x +1 x +2 x
∂θ ∂θT
)
g mi
q ∂ηi q ∂ηi
i =1 j =1 l =1
x
(
+ H ij−2 al xij Ril ij Ril ij − xij Ril ij Ril ij
x +1 x +1 x
∂θ l =1
x
) x +1
( x +1 x
al xij Ril ij Ril ij − xij Ril ij Ril ij
∂θT
) ,
g mi
∂ 2 (θ, σ, ξ ) ∂H ij ∂H ij
−1
∂ 2 H ij
− = − −1
T + H ij
∂ξ∂θT
∂ξ ∂θ
i =1 j =1 ∂ξ∂θT
g mi
∂ 2 H ij g mi
∂H ij ∂H ij
= − H −1
ij + H ij−2
i =1 j =1 ∂ξ∂θT i =1 j =1 ∂ξ ∂θT
g mi
q ∂ηi ∂ηi
i =1 j =1 l =1
x
(
= − H ij−1 al xij2 Ril ij Ril ij − ( xij + 2 xij xij + xij ) Ril ij Ril ij + xij2 Ril ij Ril ij
x +2 x +1 x +1 x +2 x
∂ξ ∂θT
)
g mi
q ∂ 2ηi
i =1 j =1 l =1
x
(
− H ij−1 al xij Ril ij Ril ij − xij Ril ij Ril ij
x +1 x +1 x
∂ξ∂θT
)
g mi
q ∂ηi q ∂ηi
i =1 j =1 l =1
x x +1
(
+ H ij−2 al xij Ril ij Ril ij − xij Ril ij Ril ij
x +1 x
∂ξ l =1
x
)
x +1
( x +1 x
al xij Ril ij Ril ij − xij Ril ij Ril ij
∂θT
, )
580
LI & HUNT
Appendix (continued)
g mi
∂ 2 (θ, σ, ξ ) ∂H ij ∂H ij
−1
∂ 2 H ij
− = − + H −1
∂σ r ∂σ s ij
∂σ r ∂σ s
i =1 j =1 ∂σ r ∂σ s
g mi
∂H ij−1 q
= −
i =1 j =1 ∂σ r l =1
(
xij xij +1 xij
)
al xij Ril Ril − xij Ril Ril bl 2 I ( s =i )
xij +1
g mi q ∂ x +1 x
− H ij−1 al
x
(x +1
)
xij Ril ij Ril ij − xij Ril ij Ril ij bl 2 I ( s =i )
i =1 j =1 l =1 ∂σ r
g mi
q q
i =1 j =1 l =1
x
( x +1 x +1 x
) l =1
x
( x +1 x +1 x
)
= H ij−2 al xij Ril ij Ril ij − xij Ril ij Ril ij bl 2 I ( r =i ) al xij Ril ij Ril ij − xij Ril ij Ril ij bl 2 I ( s =i )
g mi
q
− H ij−1 al xij2 Ril ij Ril ij − ( xij + 2 xij xij + xij ) Ril ij Ril ij + xij2 Ril ij Ril ij 2bl2 I ( r =i ) I ( s =i )
x x +2 x +1 x +1 x +2 x
i =1 j =1 l =1
mi −1 q 2
− H ij al xrj Rrl Rrl − ( xrj + 2 xrj xrj + xrj ) Rrl Rrl + xrj Rrl Rrl 2bl
2 xrj xrj + 2 xrj +1 xrj +1 2 xrj + 2 xrj
j =1 l =1
2
mi
q
=
l =1
x
( x +1 x
)
+ H ij−2 al xrj Rrlrj Rrlrj − xrj Rrlrj Rrlrj bl 2 , s = r = i
x +1
j =1
0, s ≠ r ; s = r ≠ i,
g mi
∂ 2 (θ, σ, ξ ) ∂H ij ∂H ij
−1
∂ 2 H ij
− = − + H ij−1
∂ξ∂σ r
∂ξ ∂σ r
i =1 j =1 ∂ξ∂σ r
g mi
q ∂ηi
i =1 j =1 l =1
x x +2
(
= − H ij−1 al xij2 Ril ij Ril ij − ( xij + 2 xij xij + xij ) Ril ij Ril ij + xij2 Ril ij Ril ij
x +1 x +1 x +2 x
) 2bl I ( r =i )
∂ξ
g mi
q ∂ηi q
i =1 j =1 l =1
x
(
+ H ij−2 al xij Ril ij Ril ij − xij Ril ij Ril ij
x +1 x +1 x
∂ξ l =1
) x x +1
( x +1 x
al xij Ril ij Ril ij − xij Ril ij Ril ij ) 2bl I ( r =i )
mi −1 q ∂η
l =1
(
− H ij al xij Ril Ril − ( xij + 2 xij xij + xij ) Ril Ril + xij Ril Ril
2 xij xij + 2 xij +1 xij +1 2 xij + 2 xij
2bl i
∂ξ
)
j =1
mi
q ∂ηi q
l =1
x
(
= + H ij−2 al xij Ril ij Ril ij − xij Ril ij Ril ij
x +1 x +1 x
∂ξ l =1
) x x +1
( x +1 x
al xij Ril ij Ril ij − xij Ril ij Ril ij 2bl , r = i
)
j =1
0, ,r ≠ i ,
581
SPLINE MODEL WITH DOSE-SPECIFIC RESPONSE VARIATION
Appendix (continued)
g mi
∂ 2 (θ, σ, ξ ) ∂H ij ∂H ij
−1
∂ 2 H ij
−
∂ξ 2
= −
+ H −1
ij
∂ξ ∂ξ
i =1 j =1 ∂ξ 2
g mi q ∂η
2
= − H ij−1 al xij2 Ril ij Ril ij − ( xij + 2 xij xij + xij ) Ril ij Ril ij + xij2 Ril ij Ril ij i
x x +2 x +1 x +1 x +2 x
i =1 j =1
l =1 ∂ξ
g mi
q ∂ 2ηi
i =1 j =1 l =1
x
(
− H ij−1 al xij Ril ij Ril ij − xij Ril ij Ril ij
x +1 x +1 x
)
∂ξ 2
2
g mi
q xij +1 xij ∂η i
+ H
i=1 j =1
−2
ij
l =1
( xij
)
al xij Ril Ril − xij Ril Ril ∂ξ ,
xij +1
where
∂ 2ηi d −d d −d d g − di di − d g
= 2θ1 1 i 3 I ( di <ξ ) + 2θ 2 i 1 3 I ( di <ξ ) + I ( di >ξ ) + 2θ3 I
∂ξ 2
(ξ − d1 ) (ξ − d ) ( d − ξ ) 3 ( d − ξ ) 3 ( di >ξ )
1 g g
and
∂ 2ηi di − d1 d1 − di d g − di di − d g
= I ( d <ξ ) , I ( d <ξ ) + I ( d >ξ ) , I ( d >ξ ) .
∂ξ∂θT (ξ − d1 ) 2 i (ξ − d1 ) 2 i (d g − ξ )2 i (d g − ξ ) 2 i
582
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 583-591 1538 – 9472/09/$95.00
Bayesian analyses of symptomatic intracranial stenosis studies were conducted to compare the benefits of
long-term therapy with warfarin to aspirin. The synthesis of evidence of effect from previous non-
randomized studies in monitoring a randomized clinical trial was of particular interest. Sequential
Bayesian learning analysis was conducted and Bayesian hierarchical random effects models were used to
incorporate variability between studies. The posterior point estimates for the risk rate ratio (RRR) were
similar between analyses, although the interval estimates resulting from the hierarchical analyses are
larger than the corresponding Bayesian learning analyses. This demonstrated the difference between these
methods in accounting for between-study variability. This study suggests that Bayesian synthesis can be a
useful supplement to futility analysis in the process of monitoring randomized clinical trials.
Key words: Bayesian analysis, Bayesian hierarchical model, Bayesian learning, randomized clinical trial,
epidemiology, stroke.
583
BAYESIAN ANALYSIS OF WARFARIN V ASPIRIN FOR INTRACRANIAL STENOSIS
584
HERTZBERG, STERN & JOHNSTON
(1)
Millikan, 1954 Death 3 / 21 patients 10 / 23 patients 4.62 (2.18, 9.79) 1.53 (0.75) A
et al.
(2)
Stroke, MI, 26 / 143 14 / 166 patient-
Chimowitz, 1995 2.17 (1.16, 4.35) 0.63 (0.33) B
sudden death patient-year year
et al.
(3) Cerebral
Thijs and 2000 ischemic Not given Not given 4.9 (1.7, 13.9) 0.77 (0.33) C
Albers events
(4)
Stroke or 10 / 619 8 / 787 patient-
Qureshi, 2003 0.63 (0.25, 1.59) -0.46 (0.47) D
death patient-month month
et al.
Ischemic
(5) stroke, brain
62 / 504.4 63 / 541.7
Chimowitz, 2005 hemorrhage, 1.04 (0.73, 1.48) 0.06 (0.18)
patient-year patient-year
et al. vascular
death
Caveats:
A: The treatment received by patients not receiving warfarin is unclear as are the inclusion criteria
B: Retrospective study possibly subject to selection bias
C: HRR is adjusted for age, presence of anterior circulation disease, Caucasian race, hyperlipidemia
D: Unpublished result from data supporting paper; Qureshi & Suri, personal communication, December 22, 2005
HR of 55.6 (95% CI, 9.1, 333), comparing Table 2: Endpoints (Stroke or Death) in
stroke free survival for patients receiving either Qureshi, et al. (2003)*
warfarin or aspirin to patients receiving neither Warfarin Aspirin
after adjustment for sex, race, hypertension,
diabetes mellitus, cigarette smoking, (n=46) (n =40)
hyperlipidemia, and lesion location. Additional
data provided by the authors (Table 2) allowed Number of Patients 46 40
calculation of the aspirin to warfarin HR as 0.63
(95% CI, 0.25, 1.59). Stroke or Death 10 8
Study 5: Chimowitz, et al. (2005) was Person-Months to
the only RCT comparing warfarin to aspirin in 619 787
Endpoint
patients with this disease. 569 patients were *Qureshi & Suri, personal communication,
followed for an average of 1.8 years. The aspirin December 22, 2005
to warfarin HR is 1.04 (95% CI, 0.74, 1.49).
585
BAYESIAN ANALYSIS OF WARFARIN V ASPIRIN FOR INTRACRANIAL STENOSIS
586
HERTZBERG, STERN & JOHNSTON
program run again until the SRF ≤ 1.1. Note that from the non-informative prior tend to be much
such increases were only necessary for analysis higher than corresponding point estimates
of the hierarchical random effects Poisson stemming from skeptical and enthusiastic priors
models. at each point. This disparity is due to the
difference in variance between the non-
Results informative prior versus skeptical and
Bayesian Learning Analyses enthusiastic priors.
The results of the sequential Bayesian A hypothetical future study of warfarin
learning analysis with log(RRR) as a normal and aspirin would incorporate the results of
variate using studies 1-4 are shown in Table 3. study 5. With this addition, note that interval
Note that the Bayesian results that were estimates stemming from initial non-informative
available at the time that WASID began (studies and skeptical priors now include 1. Indeed, the
1 and 2) were mixed in their support for an Bayes interval from the initial skeptical prior
effect of warfarin as hypothesized for the now excludes 1.5 to the right. The Bayes
WASID clinical trial, i.e., RRR = .33 / .22 = 1.5, interval stemming from the enthusiastic prior
versus the null hypothesis RRR = 1. excludes 1 but covers 1.5 (the alternative
Specifically, although the 95% Bayes hypothesis for WASID) and 2.
intervals based on the initial informative prior or Sensitivity analyses including only
the initial enthusiastic prior include 1.5 but studies 2, 4 and 5 result in posterior point and
exclude 1, the interval based on the initial interval estimates that are not much different
skeptical prior includes both values. With the after adding study 4 into the analysis, especially
subsequent addition of study 3’s results the with the skeptical and enthusiastic priors (Table
evidence favoring warfarin grew stronger. The 4). The results after introduction of only study 2
95% Bayes intervals stemming from both initial are like the results after inclusion of both studies
skeptical and initial enthusiastic priors now 1 and 2, suggesting that the optimistic estimates
include 1.5 but exclude 1. Moreover the interval from study 1 do not contribute substantially to
stemming from the initial non-informative prior the overall conclusion. Additional sensitivity
excludes both 1 and 1.5 to the left. Addition of analysis including study 4 produced posterior
study 4 has little effect on point and interval point and interval estimates that were virtually
estimates. Further point estimates stemming identical suggesting that study 4 does not have a
substantial impact on the analysis.
Table 3: Posterior Point and Interval Estimates for θ = RRR from Bayesian Learning Analysis Using All Studies
Study
Interval Type Non-Informative Prior Skeptical Prior Enthusiastic Prior
Added
Mean θ (95% Bayes interval) 4.48 (1.14, 17.64) 1.11 (0.75, 1.63) 1.82 (1.23, 2.69)
1
Median θ (50% Bayes interval) 4.48 (2.72, 7.39) 1.22 (1.0, 1.35) 1.82 (1.49, 2.23)
Mean θ (95% Bayes interval) 2.46 (1.36, 4.44) 1.35 (0.91, 1.99) 1.82 (1.23, 2.69)
2
Median θ (50% Bayes interval) 2.46 (2.01, 3.00) 1.35 (1.22, 1.49) 1.82 (1.65, 2.01)
Mean θ (95% Bayes interval) 3.00 (1.67, 5.42) 1.65 (1.12, 2.44) 2.01 (1.36, 2.97)
3
Median θ (50% Bayes interval) 3.00 (2.46, 3.67) 1.65 (1.49, 1.82) 2.01 (1.82, 2.46)
Mean θ (95% Bayes interval) 2.72 (1.65, 4.95) 1.65 (1.11, 2.46) 2.01 (1.35, 3.00)
4
Median θ (50% Bayes interval) 2.72 (2.23, 3.32) 1.65 (1.49, 1.82) 2.01 (1.82, 2.23)
Mean θ (95% Bayes interval) 1.35 (1.00, 2.01) 1.35 (1.00, 1.43) 1.49 (1.11, 2.01)
5
Median θ (50% Bayes interval) 1.49 (1.22, 1.65) 1.35 (1.22, 1.49) 1.49 (1.35, 1.65)
587
BAYESIAN ANALYSIS OF WARFARIN V ASPIRIN FOR INTRACRANIAL STENOSIS
Table 4: Posterior Point and Interval Estimates for θ = RRR from Bayesian Learning Analysis Using
Restricted Set of Studies
Study Non-Informative
Interval Type Skeptical Prior Enthusiastic Prior
Added Prior
Mean θ (95% Bayes interval) 2.23 (1.23, 4.01) 1.35 (0.91, 1.99) 1.82 (1.23, 2.69)
2
Median θ (50% Bayes interval) 2.23 (1.82, 2.72) 1.35 (1.22, 1.65) 1.82 (1.65, 2.01)
Mean θ (95% Bayes interval) 2.01 (1.22, 3.67) 1.35 (0.90, 2.01) 1.82 (1.22, 2.72)
4
Median θ (50% Bayes interval) 2.23 (1.82, 2.46) 1.35 (1.22, 1.49) 1.82 (1.65, 2.01)
Mean θ (95% Bayes interval) 1.35 (0.90, 1.82) 1.22 (0.90, 1.65) 1.35 (1.11, 1.82)
5
Median θ (50% Bayes interval) 1.35 (1.11, 1.49) 1.22 (1.11, 1.35) 1.35 (1.22, 1.49)
588
HERTZBERG, STERN & JOHNSTON
Mean θ (95% Bayes interval) 2.72 (0.27, 14.88) 1.11 (0.67, 1.82) 1.82 (1.22, 3.00)
1, 2, 3, 4
Median θ (50% Bayes interval) 3.00 (2.01, 4.06) 1.11 (1.00, 1.35) 1.82 (1.65, 2.23)
1, 2, 3, Mean θ (95% Bayes interval) 2.01 (0.55, 6.69) 1.22 (0.74, 1.82) 1.82 (1.22, 2.72)
4, 5 Median θ (50% Bayes interval) 2.01 (1.49, 2.72) 1.22 (1.00, 1.35) 1.82 (1.49, 2.01)
Mean θ (95% Bayes interval) 1.49 (0, 4.9 x 105) 1.00 (0.67, 1.65) 1.65 (1.00, 2.72)
2, 4
Median θ (50% Bayes interval) 1.82 (0.27, 7.39) 1.00 (0.90, 1.22) 1.65 (1.35, 2.01)
Mean θ (95% Bayes interval) 1.22 (0.07, 20.1) 1.11 (0.67, 1.65) 1.49 (1.11, 2.46)
2, 4, 5
Median θ (50% Bayes interval) 1.35 (0.90, 2.01) 1.11 (1.00, 1.22) 1.49 (1.35, 1.82)
589
BAYESIAN ANALYSIS OF WARFARIN V ASPIRIN FOR INTRACRANIAL STENOSIS
Moreover, the hierarchical analyses limited to The ability to generate interval estimates
published results as of July 2003 would be no and use differing priors deepens understanding
different than before. However, the inclusion of of the current evidence in light of previous
the rates from study 4, if they had been studies. These results point to the utility of
published at that time, leads to hierarchical Bayesian analyses of prior studies as an
model results that lend support for both additional tool for monitoring clinical trials. The
RRR=1.5 or RRR=1 regardless of prior belief. concordance of frequentist and Bayesian
The lack of strict correspondence between efficacy analyses would provide robust
conclusions from Bayesian learning with those confirmation of the appropriateness of a futility
from Bayesian hierarchical random effects analysis when decisions regarding the
models results from differences between continuation or stopping of a clinical trial are
methods in incorporating between-study made.
variability. The studies do have differences in
design (sample size, endpoint definitions and Acknowledgements
inclusion criteria) warranting allowances in the This study was funded by a research grant (1R01
modeling process. Although none of studies 1-4 NS33643) from the US Public Health Service
were randomized clinical trials, hierarchical National Institute of Neurological Disorders and
models can be extended to adjust for different Stroke (NINDS). We thank Marc Chimowitz,
classes (such as RCTs versus non-randomized Mike Lynn, Scott Janis, and Bill Powers for
studies) when 2 or more studies of each class are their careful review and comments.
present. Unfortunately only one RCT was
available to include. References
It is particularly interesting to note the Berry, D. (2005). Clinical trials: Is the
change in conclusions wrought by the Bayesian approach ready for prime time? Yes!
unpublished, negative result of Study 4. This Stroke, 36, 1621-1623.
finding reinforces the importance of finding all
results, even negative ones, in compiling
evidence.
Table 6: Posterior Point and Interval estimates for φ = RRR Using Hierarchical Random Effects Model with
Poisson Distribution
Studies Non-Informative
Interval Type Skeptical Prior Enthusiastic Prior
Included Prior
Mean φ (95% Bayes interval) 2.23 (1.22, 4.48) 1.65 (1.0, 3.0) 2.01 (1.22, 3.32)
2*
Median φ (50% Bayes interval) 2.23 (1.82, 2.72) 1.82 (1.49, 2.01) 2.01 (1.65, 2.46)
Mean φ (95% Bayes interval) 0.41 (0, 54200) 1.0 (0.41, 2.72) 1.65 (0.67, 4.48)
2, 4
Median φ (50% Bayes interval) 0.41 (0.01, 22.20) 1.0 (0.74. 1.35) 1.65 (1.22, 2.23)
Mean φ (95% Bayes interval) 1.11 (0, 24300) 1.0 (0.41, 2.72) 1.65 (0.55, 4.48)
2, 4, 5
Median φ (50% Bayes interval) 1.11 (0.03, 54.6) 1.0 (0.74, 1.35) 1.65 (1.11, 2.23)
*Uses simple Poisson model for two groups
590
HERTZBERG, STERN & JOHNSTON
Berry, D., Berry, S., McKellar, J., & Koroshetz, W. (2005). Warfarin, aspirin,
Pearson, T. (2003). Comparison of the dose- and intracranial vascular disease. New England
response relationships of 2 lipid-lowering Journal of Medicine, 352, 1368-1370.
agents: A Bayesian meta-analysis. American Krams, M., Lees, K., & Berry, D.
Heart Journal, 145, 1036-1045. (2005). The past is the future: Innovative
Brophy, J., & Lawrence, J. (1995). designs in acute stroke therapy trials. Stroke, 36,
Placing trials in context using Bayesian analysis: 1341-1347.
Gusto revisited by reverend Bayes. JAMA, 273, Marzewski, D., et al. (1982).
871-875. Intracranial internal carotid artery stenosis:
Casella, G., & George, E. (1992). Longterm prognosis, Stroke, 13, 821-824.
Explaining the Gibbs sampler. American Millikan, C., Siekert, R., & Shick, R.
Statistician, 46, 167-174. (1954). Studies in cerrebrovascular disease. Iii.
Chimowitz, M., et al. (1995). The The use of anticoagulant drugs in the treatment
warfarin-aspirin symptomatic intracranial of insufficiency or thrombosis within the basilar
disease study. Neurology, 45, 1488-1493. artery system. Staff Meetings of the Mayo
Chimowitz, M., et al. (2005). Clinic, 30, 116-126.
Comparison of warfarin and aspirin for Moufarrij, N., Little, J., Furlan, A.,
symptomatic intracranial arterial stenosis. New Williams, G., & Marzewski, D. (1984).
England Journal of Medicine, 352, 1305-1316. Vertebral artery stenosis: Long-term follow-up.
Diamond, G., & Kaul, S. (2004). Prior Stroke, 15, 260-263.
convictions: Bayesian approaches to the analysis Qureshi, A., et al. (2003). Stroke-free
and interpretation of clinical megatrials. Journal survival and its determinants in patients with
of the American College of Cardiology, 43, symptomatic vertebrobasilar stenosis: A
1929-1939. multicenter study. Neurosurgery, 52, 1033-1040.
Donnan, G., Davis, S., & Ludbrook, J. Spiegelhalter, D., Abrams, K., & Myles,
(2005). The Bayesian principle: Can we adapt? J. (2004). Bayesian approaches to clinical trials
Stroke, 36, 1623-1624. and health-care evaluation. Chichester,
Gelman, A., Carlin, J., Stern, H., & England: John Wiley & Sons.
Rubin, D. (2004). Bayesian data analysis (2nd Spiegelhalter, D., Thomas, A., Best, N.,
Ed.). NY: Chapman & Hall. Gilks, W., & Lunn, D. (2003). Bugs: Bayesian
Howard, G., Coffey, C., & Cutter, G. Inference using Gibbs sampling, Technical,
(2005). Is Bayesian analysis ready for use in MRC Biostatistics Unit.
phase iii randomized clinical trials? Beware the Thijs, V., & Albers, G. (2000).
sound of the sirens. Stroke, 36, 1622-1623. Symptomatic intracranial atherosclerosis:
Investigators, T.W.-A.S.I.D.W.T. outcome of patients who fail antithrombotic
(2003). Design, progress, and challenges of a therapy. Neurology, 55, 490-498.
double-blind trial of warfarin versus aspirin for
symptomatic intracranial arterial stenosis.
Neuroepidemiology, 22, 106-117.
591
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 592-593 1538 – 9472/09/$95.00
BRIEF REPORTS
A Maximum Test for the Analysis of Ordered Categorical Data
Markus Neuhäeuser
Koblenz University of Applied Sciences
RheinAhrCampus, Remagen, Germany
Different scoring schemes are possible when performing exact tests using scores on ordered categorical
data. The standard scheme is based on integer scores, but non-integer scores were proposed to increase
power (Ivanova & Berger, 2001). However, different non-integer scores exist and the question arises as to
which of the non-integer schemes should be chosen. To solve this problem, a maximum test is proposed.
To be precise, the maximum of the competing statistics is used as the new test statistic, rather than
arbitrarily choosing one single test statistic.
Key words: Exact test, maximum test, ordered categorical data, scores.
592
NEUHÄUSER
= (0 0.49 1), and v3 = (0 0.51 1) the test is one possible test statistic, so combine the
performed with the statistic competing statistics. Recently, it was shown that
Tmax = max S (C , vi ) a maximum test can be regarded as an adaptive
i =1, 2 , 3
test with the test statistics themselves as
where selectors (Neuhäuser & Hothorn, 2006). Thus, in
3 3
vij C1 j vij C 2 j a maximum test the data decide and the
j =1 j =1 statistician does not need to arbitrarily choose
S (C , vi ) = −
n1 n2 between different tests.
Note that a multitude of alternative tests
applicable to ordered categorical data exist (Liu
are the individual test statistics with scores vi =
& Agresti, 2005). However, the discussed score
(vi1, vi2, vi3), Cij are the frequencies for ordered
tests as well as the proposed maximum tests
category j (j = 1, 2, 3) in group i (i = 1, 2), and ni
have – at a minimum – the advantage that they
is the sample size of group i.
are easy to apply.
This maximum test has the advantage of
a less discrete null distribution and an
References
accompanied increased power as the single tests
Berger, V. W. (2007). [Reply to Senn
based on the non-integer scores. The following
(2007).] Biometrics, 63, 298-299.
example was considered by Ivanova & Berger
Gregoire, T. G., & Driver, B. L. (1987).
(2001) and discussed by Senn (2007): C11 = 7,
Analysis of ordinal data to detect population
C12 = 3, C13 = 2, C21 = 18, C22 = 4, C23 = 14.
differences. Psychological Bulletin, 101, 159-
In case of this example, the maximum
165.
test gives a significant result for the table (C11,
Ivanova, A., & Berger, V. W. (2001).
C12) = (9, 1), as the scheme v3 does, in contrast
Drawbacks to integer scoring for ordered
to v1. For all 76 possible tables with the margins
categorical data. Biometrics, 57, 567-570.
of this example the maximum test’s p-value is at
Liu, I., & Agresti, A. (2005). The
least as small as the p-value of the test with the
analysis of ordered categorical data: An
standard scheme. To be precise, the two p-values
overview and a survey of recent developments.
are identical for 25 tables, but for 51 tables the
Test, 14, 1-73.
maximum test’s p-value is smaller. Moreover,
Neuhäuser, M., & Hothorn, L. A.
the maximum test’s p-value is always smaller
(2006). Maximum tests are adaptive permutation
than or equal to the bigger one of the two p-
tests. Journal of Modern Applied Statistical
values of the non-integer scoring tests; note that
Methods, 5, 317-322.
in this example max S (C , vi ) results in an
i = 2, 3 Rabbee, N., Coull, B. A., Mehta, C.,
identical test as Tmax. Patel, N., & Senchaudhuri, P. (2003). Power and
sample size for ordered categorical data.
Conclusion Statistical Methods in Medical Research, 12, 73-
The maximum test is a compromise that avoids 84.
the arbitrary choice of just one scheme and Rasmussen, J. L. (1989). Analysis of
maintains the advantage of the non-integer Likert-scale data: a reinterpretation of Gregoire
scores. Note that the maximum test is not and Driver. Psychological Bulletin, 105, 167-
complicated. Because the exact permutation null 170.
distribution of Tmax is used for inference, one Senn, S. (2007). Drawbacks to non-
does not need to know the correlation between integer scoring for ordered categorical data.
the different S (C , v i ) . Thus, when a researcher Biometrics, 63, 296-298.
selects a test based on the trade-off between Sheu, C.-F. (2002). Fitting mixed-
power and simplicity – as suggested by Ivanova effects models for repeated ordinal outcomes
& Berger (2001) – the maximum test is a with the NLMIXED procedure. Behavior
reasonable choice. In addition, the approach may Research Methods, Instruments, & Computers,
have some appeal to logic: there is more than 34, 151-157.
593
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 594-596 1538 – 9472/09/$95.00
W. J. Hurley
Royal Military College of Canada
Norton (1984) presented a calculation of the MLE for the parameter of the double exponential distribution
based on the calculus. An inductive approach is presented here.
g ( x ) = 1. 8 − x .
Maximizing this function with respect to θ is
equivalent to minimizing
Its graph is shown in Figure 1. Note that it has a
V-shape with a minimum at x = 1.8. Now
consider a sum of two linear absolute value g n (θ ) = xi − θ .
i
terms:
To obtain the MLE for general n, begin
h ( x ) = 1 .8 − x + 3 .2 − x .
with the case n = 1 where g1 (θ ) = x1 − θ . This
function has a minimum at θ = x1 , hence, for n
= 1, the MLE is
W. J. Hurley is a Professor in the Department of
Business Administration. Email: hurley- θ MLE = x1 .
w@rmc.ca.
594
HURLEY
595
INDUCTIVE MLE CALCULATION FOR THE DOUBLE EXPONENTIAL DISTRIBUTION
Now, consider the case n = 2. For the purposes same graphical analysis, it can be shown that
herein it is useful to order the observations, thus,
{
suppose that the sample is x(1) , x( 2) where } g 3 (θ ) = x (1) − θ + x ( 2 ) − θ + x ( 3) − θ
x(1) < x( 2 ) . The value of θ which minimizes
must now be found using has a unique minimum at θ = x ( 2 ) , the median.
In the case n = 4, the solution is
g 2 (θ ) = x (1) − θ + x ( 2 ) − θ .
θ MLE = λ x(2) + (1 − λ ) x(3) , 0 ≤ λ ≤ 1.
Based on the above, this function takes
the form Thus, the median is a solution, but not the only
solution.
− 2θ + x (1) + x( 2 ) θ ≤ x(1)
Conclusion
g 2 (θ ) = x ( 2) − x (1) x (1) ≤ θ ≤ x ( 2)
Extending the argument for general n is
2θ − x − x θ ≥ x( 2)
(1) ( 2) straightforward. It is the median, x(( x +1) / 2 ) , if n
is odd and the generalized median,
λx( n / 2) + (1 − λ ) x( n / 2+1) , when n is even.
and has a minimum at any point θ in the interval
x(1) ≤ θ ≤ x( 2) . Hence the MLE for n = 2 is
References
Hogg, R. V., & Craig, A. T. (1970).
θ MLE = λ x(1) + (1 − λ ) x(2) , 0 ≤ λ ≤ 1. Introduction to Mathematical Statistics, (3rd
Ed.). New York, NY: MacMillan Publishing
For this case, the median is defined Company.
(x (1) + x ( 2 ) ) / 2 and is a solution, but it is not the Norton, R. M. (1984). The double
exponential distribution: Using calculus to find a
only solution. maximum likelihood estimator. The American
Next, consider the case n = 3 with an Statistician, 38(2), 135-136.
ordered sample x(1) ≤ x( 2) ≤ x ( 3) . Using the
596
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 597-599 1538 – 9472/09/$95.00
Shlomo S. Sawilowsky
Wayne State University
Recommendations to expand Cohen’s (1988) rules of thumb for interpreting effect sizes are given to
include very small, very large, and huge effect sizes. The reasons for the expansion, and implications for
designing Monte Carlo studies, are discussed.
597
NEW EFFECT SIZE RULES OF THUMB
N=2 study participants as it is that there would are summarily ignored today. That issue cannot
be N=100 per cell. Those study parameters were be resolved here, but an important lesson that
chosen because they represented the minimum can be addressed is redressing the assumption in
and the maximum sample sizes that could be designing Monte Carlo studies that the effect
handled given the constraints of the time-share size parameters need only conform to the
mainframe computing resources available at that minimum and maximum values of .2 and .8.
time. Prudence dictated sample sizes also be For example, when advising a former
chosen between the two extremes to ensure there doctoral student on how to deconstruct the
were no anomalies in the middle of the comparative power of the independent t test vs.
robustness rates or power spectrum. the Wilcoxon test (Bridge, 2007), it was
Another important study parameter that necessary to model very small effect sizes (e.g.,
must be considered in designing Monte Carlo .001, .01). This led to disproving the notion that
simulations, which thanks to Cohen (e.g., 1962, when the former test fails to reject and the later
1969, 1977, 1988) has come to be the sin qua test rejects it is because the latter is actually
non of research design, is the effect size (for an detecting a shift in scale instead of a shift in
overview, see Sawilowsky, Sawilowsky, & location. It would not have been possible to
Grissom, in press). Previously, I discussed my demonstrate this had the Monte Carlo study
conversations with Cohen on developing an began by modeling effect sizes at .2.
encyclopedia of effect sizes: Similarly, in the Monte Carlo study in
1985 mentioned above, I modeled what I called
I had a series of written and telephone a very large effect size equivalent to d = 1.2.
conversations with, and initiated by, This was done because Walberg’s (1984)
Jacob Cohen. He recognized the collection of effect sizes pertaining to student
weaknesses in educated guessing learning outcomes included a magnitude of
(Cohen, 1988, p. 12) or using his rules about 1.2 for the use of positive reinforcement as
of thumb for small, medium, and large the intervention. Subsequently, in Monte Carlo
effect sizes (p. 532). I suggested studies I have conducted, and those conducted
cataloging and cross-referencing effect by my doctoral students that I supervised, the
size information for sample size effect size parameters were extended to 1.2.
estimation and power analysis as a more As the pursuit of quantifying effect sizes
deliberate alternative. continued even larger effect sizes were obtained
by researchers. For example, the use of cues as
Cohen expressed keen interest in this instructional strategies (d=1.25, Walberg & Lai,
project. His support led to me to 1999), the student variable of prior knowledge
delivering a paper at the annual meeting (d = 1.43, Marzano, 2000, p. 69), and identifying
of the AERA on the topic of a possible similarities and differences (d = 1.6, Marzano,
encyclopedia of effect sizes for 2000, p. 63), exceeded what I defined as very
education and psychology (Sawilowsky, large.
1996). The idea was to create something Incredibly, effect sizes on the use of
like the “physician’s desk reference”, mentoring as an instructional strategy to
but instead of medicines, the publication improve academic achievement have been
would be based on effect sizes. reported in various studies and research
(Sawilowsky, 2003, p. 131). textbooks to be as large as 2.0! The existence of
such values, well beyond any rule of thumb
In the context of the two independent heretofore published, has led to researchers
sample layout, Cohen (1988) defined small, presuming the studies yielding such results were
medium, and large effect sizes as d = .2, .5, and flawed.
.8, respectively. Cohen (1988) warned about For example, when DuBois, et al. (2002)
being flexible with these values and them were confronted with study findings of huge
becoming de facto standards for research. (See effect sizes in their meta-analysis of mentoring,
also Lenth, 2001.) Nevertheless, both warnings they resorted to attributing them as outliers and
598
SAWILOWSKY
deleting them from their study. This was just the Cohen, J. (1969). Statistical power
first step to ignore the obvious. They then analysis for the behavioral sciences. San Diego:
resorted to Winsorizing remaining “large effect Academic Press.
sizes [as a] safeguard against these extreme Cohen, J. (1977). Statistical power
values having undue influence,” (p. 167). I have analysis for the behavioral sciences, Rev. Ed.
long railed against excommunicating raw data San Diego: Academic Press.
with a large percentage of extreme values as Cohen, J. (1988). Statistical power
outliers, preferring to re-conceptualize the analysis for the behavioral sciences, 2nd ed.
population as a mixed normal instead of a Hillsdale, NJ: Erlbaum.
contaminated normal (assuming the underlying Dubois, D. L., Holloway, B. E.,
distribution is presumed to be Gaussian; the Valentine, J. C., & Cooper, H. (2002).
principle holds regardless of the parent Effectiveness of mentoring programs for youth:
population). A meta-analytic review. American Journal of
Recently, Hattie (2009) collected 800 Community Psychology, 30 (2), 157-197.
meta-analyses that “encompassed 52,637 Hattie, J. (2009). Visible learning: a
studies, and provided 146,142 effect sizes” (p. synthesis of over 800 meta-analyses relating to
15) pertaining to academic achievement. Figure achievement. Park Square, OX: Rutledge.
2.2 in Hattie (2009, p. 16) indicated about 75 Lenth, R. V. (2001). Some practical
studies with effect sizes greater than 1. Most fall guidelines for effective sample size
in the bins of 1.05 to 1.09 and 1.15 to 1.19, but a determination. The American Statistician, 55,
few also fall in the 2.0+ bin. 187-193.
Marzano. (2000). A new era of school
Conclusion reform: Going where the research takes us, p.
Based on current research findings in the applied 63. Auroa, CO: Mid-Continent Research for
literature, it seems appropriate to revise the rules Education and learning.
of thumb for effect sizes to now define d (.01) = Sawilowsky, S. (2003a). You think
very small, d (.2) = small, d (.5) = medium, d you’ve got trivials? Journal of Modern Applied
(.8) = large, d (1.2) = very large, and d (2.0) = Statistical Methods, 2(1) 218-225.
huge. Hence, the list of conditions of an Sawilowsky, S. (2003b). A different
appropriate Monte Carlo study or simulation future for social and behavioral science research.
(Sawilowsky, 2003) should be expanded to Journal of Modern Applied Statistical Methods,
incorporate these new minimum and maximum 2(1), 128-132.
effect sizes, as well as appropriate values Sawilowsky, S. S., Sawilowsky, J., &
between the two end points. Grissom, R. J. (in press). Effect size.
International Encyclopedia of Statistical
References Science. NY: Springer.
Bridge, T. J. (2007). Deconstructing the Walberg, H. J. (1984). Improving the
comparative power of the independent samples t productivity of America’s schools. Educational
test vs the Wilcoxon Mann-Whitney test for shift Leadership, 41(8), 19-27.
in location. Unpublished doctoral dissertation, Walberg, H. J., & Lai, J-S. (1999).
Detroit, MI: Wayne State University. Handbook of educational policy. (G. J. Cizek,
Cohen, J. (1962). The statistical power Ed.). San Diego, Academic Press, 419-453.
of abnormal-social psychological research: A
review. Journal of Abnormal Social Psychology,
65, 145-153.
599
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 600-609 1538 – 9472/09/$95.00
=
This simulation study modified the repeated measures mean difference effect size, dRM , for scenarios
≠
with unequal pre- and post-test score variances. Relative parameter and SE bias were calculated for dRM
= ≠ =
versus dRM . Results consistently favored dRM over dRM with worse positive parameter and negative SE
=
bias identified for dRM for increasingly heterogeneous variance conditions.
Lindsey J. Wolff Smith is a doctoral student in where μpre and μpost are the population means of
Quantitative Methods in the Department of the pre- and post-test scores, respectively, μD is
Educational Psychology. Her interests are in the the population mean difference in the pre- and
evaluation of statistical models using simulation post-test scores, and σD is the standard deviation
research. Email her at: ljwolff@hotmail.com. S. of change scores (Gibbons, Hedeker, & Davis,
Natasha Beretvas is an Associate Professor and 1993). The associated sample estimate is
chair of Quantitative Methods in the Department calculated as follows:
of Educational Psychology. Her interests are in
multilevel and meta-analytic modeling X post − X pre
techniques. Email her at: d RM = , (2)
tasha.beretvas@mail.utexas.edu. sD
600
WOLFF SMITH & BERETVAS
where X pre and X post are the sample means of distinguish the relevant formula when
the pre- and post-test scores, respectively, and sD σ pre =σ post is assumed from scenarios in which
is the sample standard deviation of difference σ pre ≠ σ post is assumed.
scores.
The sampling variance formula for δRM If σ pre ≠ σ post , another formula for σD
is: must be employed that does not assume equal
1 n − 1 δ 2 variances, namely:
σ δ2 = (1 + nδ RM ) −
2 RM
2
n n − 3 c ( n − 1)
RM
where n is the number of paired observations in where σ 2pre and σ 2post are the population
the RM design study (Morris & DeShon, 2002) variances of the pre- and post-groups,
with a corresponding formula used for sample respectively, and σpre,post is the covariance
estimates: between the pre- and post-test scores such that:
2
1 n − 1 d RM σ pre,post =ρσ pre σ post .
(1 + nd RM ) −
2 2 (9)
sd RM = 2
.
n n − 3 c ( n − 1)
(4) Therefore, the equation for σD≠ (see Equation 8)
becomes:
Equations 3 and 4 also contain the bias
correction factor, c(n − 1), that is approximated σ D≠ = σ 2pre +σ 2post − 2 ρσ pre σ post . (10)
by
3
c ( n − 1) = 1 − (5) The corresponding sample estimate is then:
4 ( n − 1) − 1
sD≠ = s 2pre + s 2post − 2rs pre s post . (11)
(Hedges, 1982).
Calculation of σD is necessary to obtain
δRM (see Equation 1). Morris and DeShon (2002) Note that when σ pre =σ post , Equations 10 and 11
presented the following relationship between the reduce to the corresponding (population and
standard deviation of difference scores, σD, and sample) homogeneous variances formula for σD
the standard deviation of the original scores, σ: (and sD) (see Equations 6 and 7, respectively).
This leads to the two primary foci of this
σ D= = σ 2 (1 − ρ ) study. First, empirical research has not been
(6)
conducted to assess how well the formulas for
δRM and for σδ2RM work in terms of parameter
where ρ is the correlation between the pre- and
post-test scores. The corresponding sample and standard error (SE) bias when pre- and post-
estimate is: test scores are and are not homogeneous.
Second, applied meta-analysts assume the
sD= = s 2(1 − r ) (7) homogeneity of the pre- and post-test scores and
use the sD= formula (Equation 7) as opposed to
with r representing the sample correlation. Both
sD≠ (Equation 11) when calculating the estimate
formulas (Equations 6 and 7) are founded on the
assumption that the population standard of δRM (Equation 2). Thus, this study also
deviations for the pre- and post-test scores are investigated the effect of using the conventional
equal (i.e., σ pre =σ post =σ ). Thus, the notation of formula for sD= (Equation 7) when the
homogeneity of variance assumption is violated
including a superscript with = was adopted to
601
REPEATED MEASURES DESIGN δ
and the modified formula for sD (i.e., sD≠ in used were the same for each of the pre- and
Equation 11) should be used. post-test groups.
In the current simulation study, four
design factors were manipulated, including: the Ratio of the Pre- and Post-Test Scores’ Standard
true value of δRM, the correlation between pre- Deviations
and post-test scores, sample size, and values for Five different values of the ratio of the
the pre- and post-test score standard deviations pre- and post-test scores’ standard deviations
to assess the effect of these factors on parameter were investigated. The following patterns were
and SE estimates of δRM. Results were compared evaluated: σpre = σpost, σpre < σpost, and σpre >
when the pre- and post-test scores were assumed σpost. For the two unequal standard deviations’
conditions, the degree of the difference was also
to have equal variances ( σ 2pre = σ 2post ), thus sD=
manipulated, with the following four unequal
=
was used to calculate dRM (i.e., providing dRM ) combinations of values for σpre:σpost
with the results based on the assumption that investigated: 0.8:1.2, 0.5:1.5, 1.2:0.8, and
σ 2pre ≠ σ 2post for which sD≠ was calculated and 1.5:0.5. For the σpre = σpost conditions, both pre-
≠
and post-test true standard deviations were
used to obtain the associated dRM (i.e., dRM ). generated to be one (i.e., σpre = σpost = 1).
602
WOLFF SMITH & BERETVAS
value for ρ) were generated to provide the pre- conditions in which the true parameter, δRM, was
and post-test scores for that condition’s zero, simple parameter estimation bias was
= ≠ calculated.
replication. Two values of dRM ( dRM and dRM )
were calculated using each dataset as described
above. Ten thousand replication datasets were Results
generated for each combination of conditions. Results are presented in three sections, one for
each of the three sample size conditions. Note
Bias Assessment that relative parameter bias is not calculable if
Relative parameter and SE estimation the true parameter value is zero (see Hoogland
≠ = & Boomsma, 1998), thus, simple bias rather
bias of each dRM ( dRM and dRM ) was
than relative bias is calculated for conditions in
summarized and assessed using Hoogland and
which the true δRM is zero.
Boomsma’s (1998) formulas and criteria. More
specifically, relative parameter bias was
Sample Size = 10: Relative Parameter Bias
calculated using the following formula:
Substantial positive relative parameter
bias was identified for all non-zero values of δRM
( θ̂ − θ ) and ρ. No substantial bias was found in the ρ = 0
( ) θ
j j
B θˆ j = (13) conditions. In all cases, the positive bias
=
j identified was greater when dRM was used
≠
rather than dRM (see Table 1). No criterion
where θj represents the jth parameter’s true value
exists to indicate whether simple bias is
and θˆ j is the mean estimate of parameter j substantial or not, however, the simple bias
averaged across the 10,000 replications per values seem small for the δRM = 0 conditions.
=
condition. Hoogland and Boomsma When dRM was used, the more the ratio of
recommended considering a parameter’s σ pre : σ post values diverged from 1:1, the worse
estimate as substantially biased if its relative
parameter bias exceeds 0.05 in magnitude. This the bias. Similarly, the stronger the ρ, the worse
=
cutoff means that estimates that differ from their the bias for the dRM estimate.
≠
parameter’s true value by more than five percent The dRM estimator was unaffected by
should be considered substantially biased. the σ pre :σ post and ρ values. However,
Hoogland and Boomsma’s (1998)
≠
commonly used formulation of relative standard substantial bias was detected for both dRM and
error bias is as follows: =
dRM even when σ pre :σ post was 1:1. Patterns of
bias identified for a given σ pre :σ post ratio
( ŝ − σ θˆ )
( )
B sθ̂
j
=
θˆ j
σ θ̂
j
(14)
closely mimicked patterns identified for the
inverse ratio. Thus, across conditions, results
j
found for the 1.5:0.5 ratio matched those for the
0.5:1.5 ratio. Similarly, results for the 0.8:1.2
where ŝθˆ is the mean of the SE estimates ratio conditions matched those for the 1.2:0.8
j
ratio. This result held across all conditions
associated with parameter estimates of θj and including the three sample sizes and thus will
σθˆ is the empirically true standard error of the not be mentioned further. Parameter estimation
j
≠ =
performance of both the dRM and dRM
distribution of θ̂ j s calculated by computing the
estimators was unaffected by the true δRM value
standard deviation of each conditions’ 10,000 (see Table 1). The positive parameter estimation
≠
θ̂ j s. Hoogland and Boomsma recommended bias of the dRM estimator was pretty
using a cutoff of magnitude 0.10 indicating consistently close to 10% across the n = 10
substantial relative SE bias. Note that, for conditions.
603
REPEATED MEASURES DESIGN δ
Table 1: Summary of Relative Parameter Estimation Bias by Generating Condition for n = 10 Conditions
σ pre :σ post Ratio Value
1:1 0.8:1.2 0.5:1.5 1.2:0.8 1.5:0.5
≠ = ≠ = ≠ = ≠ = ≠ =
Condition dRM dRM dRM dRM dRM dRM dRM dRM dRM dRM
ρ Value
0 0.002 0.003 -0.003 -0.003 0.002 0.004 0.002 0.002 0.002 0.004
0.2 0.090 0.106 0.097 0.130 0.096 0.199 0.093 0.125 0.087 0.190
0.5 0.092 0.126 0.099 0.182 0.087 0.355 0.086 0.168 0.092 0.358
0.8 0.105 0.165 0.097 0.325 0.100 0.896 0.088 0.311 0.089 0.866
δRM Value
0a 0.002 0.003 -0.003 -0.003 0.002 0.004 0.002 0.002 0.002 0.004
0.2 0.103 0.133 0.104 0.194 0.091 0.397 0.090 0.178 0.072 0.360
0.5 0.089 0.118 0.101 0.190 0.094 0.392 0.088 0.175 0.092 0.390
0.8 0.092 0.121 0.093 0.181 0.093 0.394 0.088 0.176 0.096 0.399
Overallb 0.095 0.124 0.099 0.188 0.093 0.394 0.089 0.177 0.087 0.383
Notes: Substantial relative parameter bias values are highlighted in the table; aMean simple bias is presented for δRM
= 0 conditions; b Overall = mean relative parameter bias across all δRM conditions excluding δRM = 0 conditions.
Table 2: Summary of Relative Standard Error Estimation Bias by Generating Condition for n = 10 Conditions
σ pre :σ post Ratio Value
1:1 0.8:1.2 0.5:1.5 1.2:0.8 1.5:0.5
≠ = ≠ = ≠ = ≠ = ≠ =
Condition dRM dRM dRM dRM dRM dRM dRM dRM dRM dRM
ρ Value
0 0.048 0.032 0.046 0.023 0.051 -0.012 0.051 0.027 0.049 -0.011
0.2 0.048 0.062 0.046 0.039 0.051 0.048 0.051 0.053 0.049 0.047
0.5 0.051 0.012 0.041 -0.034 0.046 -0.147 0.046 -0.028 0.039 -0.152
0.8 0.043 -0.013 0.048 -0.112 0.056 -0.308 0.047 -0.110 0.042 -0.317
δRM Value
0 0.046 0.016 0.044 -0.031 0.043 -0.143 0.042 -0.032 0.038 -0.145
0.2 0.041 0.011 0.036 -0.039 0.049 -0.135 0.042 -0.030 0.042 -0.142
0.5 0.057 0.022 0.047 -0.025 0.049 -0.134 0.051 -0.023 0.046 -0.129
0.8 0.060 0.020 0.046 -0.027 0.060 -0.111 0.061 -0.014 0.052 -0.120
Overalla 0.051 0.017 0.043 -0.031 0.050 -0.131 0.049 -0.025 0.044 -0.134
a
Notes: Substantial relative SE bias values are highlighted in the table; Overall = mean relative SE bias across all
δRM conditions excluding δRM = 0 conditions.
604
WOLFF SMITH & BERETVAS
Sample Size = 10: Relative SE Bias Substantial positive relative parameter bias was
≠ =
No relative SE bias was found for dRM found when dRM was used to estimate δRM,
for the n = 10 conditions (see Table 2). For however, the degree of parameter bias was lower
=
d RM , however, substantial negative bias was for the n = 20 conditions (see Table 3) than was
observed for the n = 10 conditions (in Table 1).
identified in certain conditions. Substantial
No substantial relative parameter bias
negative bias (i.e., B sθ̂ ( ) > 0.10 , see Equation
j
was found in the ρ = 0 conditions for dRM=
. With
the slightly larger sample size, no substantial
14) was found at the most extreme σ pre :σ post
bias was detected when the σ pre :σ post ratio was
values (i.e., when σ pre :σ post = 1.5:0.5 and
1:1. Otherwise, the pattern of the bias found
σ pre :σ post = 0.5:1.5). This bias occurred for matched that noted for the n = 10 conditions.
conditions in which ρ = 0.5 or larger and the The more the value of the σ pre :σ post ratio
magnitude of the bias seemed to be slightly diverged from 1:1 (and for larger ρ values), the
larger for smaller δRM (see Table 2). Substantial more the degree of substantial parameter bias
negative parameter estimation bias was also increased. Values of δRM did not seem to affect
=
detected for dRM for σ pre :σ post = 0.8:1.2 and the degree of bias (see Table 3).
for σ pre :σ post = 1.2:0.8 for the largest ρ
Sample Size = 20: Relative SE Bias
condition (i.e., when ρ = 0.8). The relative SE bias results for the n =
20 conditions (see Table 4) very closely matched
Sample Size = 20: Relative Parameter Bias those described for the n = 10 conditions (see
No substantial parameter bias was Table 2). No substantial relative SE bias was
≠ ≠
identified when dRM was used to estimate δRM found when using dRM to estimate δRM. For
across the n = 20 conditions (see Table 3). =
dRM , however, in the most extreme σ pre :σ post
Table 3: Summary of Relative Parameter Estimation Bias by Generating Condition for n = 20 Conditions
σ pre :σ post Ratio Value
1:1 0.8:1.2 0.5:1.5 1.2:0.8 1.5:0.5
≠ = ≠ = ≠ = ≠ = ≠ =
Condition dRM dRM dRM dRM dRM dRM dRM dRM dRM dRM
ρ Value
0 -0.001 -0.001 -0.001 -0.001 0.001 0.002 0.002 0.002 <0.001 <0.001
0.2 0.048 0.053 0.038 0.056 0.049 0.120 0.049 0.067 0.038 0.107
0.5 0.034 0.046 0.043 0.097 0.036 0.253 0.043 0.097 0.042 0.258
0.8 0.038 0.060 0.042 0.219 0.043 0.724 0.039 0.216 0.041 0.722
δRM Value
0a -0.001 -0.001 -0.001 -0.001 0.001 0.002 0.002 0.002 <0.001 <0.001
0.2 0.037 0.047 0.031 0.094 0.040 0.285 0.040 0.103 0.037 0.283
0.5 0.043 0.053 0.045 0.110 0.043 0.290 0.048 0.112 0.044 0.288
0.8 0.045 0.055 0.040 0.103 0.041 0.286 0.042 0.105 0.042 0.288
Overallb 0.041 0.052 0.039 0.102 0.041 0.287 0.043 0.106 0.041 0.287
a
Notes: Substantial relative parameter bias values are highlighted in the table; Mean simple bias is presented for δRM
= 0 conditions; bOverall = mean relative parameter bias across all δRM conditions excluding δRM = 0 conditions.
605
REPEATED MEASURES DESIGN δ
Table 4: Summary of Relative Standard Error Estimation Bias by Generating Condition for n = 20 Conditions
σ pre :σ post Ratio Value
1:1 0.8:1.2 0.5:1.5 1.2:0.8 1.5:0.5
≠ = ≠ = ≠ = ≠ = ≠ =
Condition dRM dRM dRM dRM dRM dRM dRM dRM dRM dRM
ρ Value
0 0.020 0.016 0.017 0.007 0.027 -0.006 0.030 0.019 0.010 -0.024
0.2 0.020 0.018 0.017 0.023 0.027 0.010 0.030 0.016 0.010 0.018
0.5 0.017 0.003 0.017 -0.034 0.018 -0.151 0.019 -0.031 0.016 -0.150
0.8 0.023 0.001 0.011 -0.118 0.013 -0.330 0.018 -0.115 0.018 -0.329
δRM Value
0 0.018 0.008 0.020 -0.034 0.017 -0.144 0.021 -0.034 0.010 -0.152
0.2 0.020 0.010 0.010 -0.044 0.016 -0.144 0.014 -0.039 0.012 -0.145
0.5 0.015 0.003 0.013 -0.041 0.011 -0.142 0.021 -0.033 0.019 -0.135
0.8 0.025 0.011 0.025 -0.025 0.024 -0.120 0.028 -0.026 0.021 -0.127
Overalla 0.019 0.008 0.017 -0.036 0.017 -0.138 0.021 -0.033 0.016 -0.140
a
Notes: Substantial relative SE bias values are highlighted in the table; Overall = average relative SE estimation bias
across δRM and ρ conditions.
606
WOLFF SMITH & BERETVAS
Table 5: Summary of Relative Parameter Estimation Bias by Generating Condition for n = 60 Conditions
σ pre :σ post Ratio Value
1:1 0.8:1.2 0.5:1.5 1.2:0.8 1.5:0.5
≠ = ≠ = ≠ = ≠ = ≠ =
Condition d RM d RM d RM d RM d RM d RM d RM d RM d RM dRM
ρ Value
0 <0.001 <0.001 -0.001 -0.001 <0.001 <0.001 -0.001 -0.001 -0.001 -0.001
0.2 0.015 0.017 0.018 0.030 0.007 0.061 0.010 0.022 0.010 0.065
0.5 0.012 0.016 0.010 0.053 0.011 0.204 0.013 0.055 0.007 0.200
0.8 0.015 0.021 0.012 0.165 0.010 0.643 0.009 0.162 0.014 0.647
δRM Value
0a <0.001 <0.001 -0.001 -0.001 <0.001 <0.001 -0.001 -0.001 -0.001 -0.001
0.2 0.014 0.017 0.010 0.062 0.007 0.229 0.009 0.061 0.013 0.235
0.5 0.016 0.019 0.014 0.066 0.009 0.229 0.012 0.065 0.012 0.232
0.8 0.013 0.016 0.014 0.067 0.013 0.235 0.010 0.062 0.012 0.233
Overallb 0.014 0.017 0.013 0.065 0.010 0.231 0.011 0.063 0.012 0.233
a
Notes: Substantial relative parameter bias values are highlighted in the table; Mean simple bias is presented for δRM
= 0 conditions; bOverall = mean relative parameter bias across all δRM conditions except for δRM = 0 conditions.
Table 6: Summary of Relative Standard Error Estimation Bias by Generating Condition for n = 60 Conditions
σ pre :σ post Ratio Value
Condition 1:1 0.8:1.2 0.5:1.5 1.2:0.8 1.5:0.5
≠ = ≠ = ≠ = ≠ = ≠ =
dRM dRM dRM dRM dRM dRM dRM dRM dRM dRM
ρ Value
0 0.001 0.001 0.009 0.004 0.004 -0.016 0.015 0.009 -0.003 -0.023
0.2 0.001 0.009 0.009 0.003 0.004 0.004 0.015 0.005 -0.003 0.010
0.5 0.001 -0.003 0.010 -0.030 0.007 -0.148 0.001 -0.039 0.009 -0.146
0.8 0.014 0.007 0.006 -0.114 0.005 -0.338 0.013 -0.109 0.005 -0.336
δRM Value
0 0.004 0.001 0.011 -0.036 0.004 -0.148 0.008 -0.038 0.008 -0.142
0.2 0.006 0.003 0.005 -0.041 0.005 -0.144 0.005 -0.042 0.003 -0.147
0.5 0.006 0.003 0.010 -0.035 0.012 -0.132 0.009 -0.035 0.004 -0.139
0.8 0.009 0.005 0.002 -0.040 <0.001 -0.134 0.010 -0.033 0.006 -0.129
Overalla 0.006 0.003 0.007 -0.038 0.005 -0.140 0.008 -0.037 0.005 -0.139
a
Notes: Substantial relative SE bias values are highlighted in the table; Overall = average relative SE estimation bias
across δRM and ρ conditions.
607
REPEATED MEASURES DESIGN δ
608
WOLFF SMITH & BERETVAS
609
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 610-612 1538 – 9472/09/$95.00
Julie M. Smith
Wayne State University
Several intermediate r values are calculated at three different correlations for use in the Fleishman Power
Method for generating correlated data from normal and non-normal populations.
Example
where a, b and d are constants and rxy is the To create correlated data pairs (X, Y) at
correlation to which all data will be set. The 0.70 from an exponential distribution (Chi-
formula, when solved, results in the graph of a
square, df = 2) with γ1 = 2 and γ2 = 6, using the
constants from Table 1, equation (1) would be as
follows:
Julie M. Smith is a Ph.D. Candidate in the rxy = r2[(.8263)2+(6)(.8263)(.02271)+
College of Education, Department of (9)(.02271)2+(2)(−.3137)2r2+(6)(.02271)r4]
Educational Evaluation and Research. Email:
ax7955@wayne.edu. rxy = r2[(.68278)+(.11259)+(.004642)+(.19682)r2
+(.00309)r4]
610
JULIE M. SMITH
where z1, z2 and z3 are randomly selected and for the Y correlate,
standard normal z scores (generated using a
random number generator). The data resulting χ 2x = (2)(Y ) + 2 (7)
from these equations are not the final correlates,
but represent intermediate standard normal The last step is optional, because computed
variates that will be used to generate the desired values are accurate for the distribution. It is only
correlated data, thus xi and yi and the constants necessary to perform this step if it is desirable to
appropriate to the distribution are next have values commonly found in the tables for
substituted into the Fleishman equation to
the distribution of interest, such as χ2 (df = 2) in
produce the final correlates as follows:
the example.
611
INTERMEDIATE R VALUES FOR USE IN THE FLEISHMAN POWER METHOD
612
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 613-625 1538 – 9472/09/$95.00
This article examines the creation of contextual aggregate variables from one dataset for use with another
dataset in multilevel analysis. The process of generating aggregate variables and methods of assessing the
validity of the constructed aggregates are presented, together with the difficulties that this approach
presents.
613
AGGREGATE VARIABLES FOR DATASETS IN MULTILEVEL ANALYSIS
administered for six consecutive years, with a appropriate cross-sectional weights and
new wave starting every three years since the individual and geographical identifiers for each
survey’s 1993 initiation. The SLID contains survey year using the SLIDret program. SLID
variables that may be used to construct identifies ERs by two separate variables: erres25
numerous interesting and relevant Economic and xerres25. The explanation for the presence
Region (ER) level variables; thus, researchers of two identifiers instead of one is that Statistics
could use the following procedures to construct Canada amended their ER identification codes in
ER level variables of their own choosing. For 1999, thus, the SLID contains two sets of ER
this example ten aggregate variables on identification codes. One code refers to the 1991
employment and education were constructed; Census boundaries for all survey years of the
these variables were selected for their potential SLID (xerres25), and the other refers to the 1991
value to researchers for use in conjunction with Census boundaries up to 1999 and to the
other datasets. Table 1 contains the variable amended 1999 Census boundaries in subsequent
names, definitions and the original SLID years (erres25). Researchers must decide upon
variables from which they were constructed. the most appropriate variable to use in any
particular research scenario. This will often be
Creation of an Analytic SLID Dataset determined by the geographical code used in the
First, the variables that were used to dataset in which the constructed aggregate
create the aggregates (shown in Table 1), were variables will be used.
extracted from the SLID together with the
614
CHOWHAN & DUNCAN
615
AGGREGATE VARIABLES FOR DATASETS IN MULTILEVEL ANALYSIS
calibrated on Census population totals, it is Postal Code Conversion File (PCCF), matching
expected that estimates will match well. The the data by Enumeration Area (EA) for 1996 and
following is a step-by-step guide to comparing Dissemination Area (DA) for 2001.
aggregate variables. Enumeration areas (EA) in the 1996 Census and
Dissemination areas (DA) in the 2001 Census
Choose a Method of Comparison are smaller geographical areas making up
Two methods of comparison were used various larger Statistics Canada geographical
in this example. The first involved simply areas, including ERs. The 1999 change in
calculating and comparing provincial and Census boundaries lead to a name and definition
national population totals for both SLID datasets change from EA to DA (for more information on
for 1996 and 2001. If no similarity existed at this using the PCCF and the change from EA to DA
level it would not be sensible to continue with see Gonthier, et al., 2006).
the comparison and the validity of the aggregate For 1996 the EA code is an eight-digit
variables would be questionable. The second code constructed from provincial, federal and
method of comparison used confidence intervals EA identifiers. The provincial code composes
as a means of statistically assessing how close the first two digits; the federal code the
the estimates match. This requires similarly following three and the EA code the final three.
defined variables to be created using Census To construct the eight-digit EA code from its
profile data so that aggregates are created from composite parts the provincial code is multiplied
the SLID and the Census profile data at the ER by 1,000,000 and the federal code is multiplied
level. Confidence intervals (assuming a Normal by 1,000, and these numbers are added to the EA
distribution) can be created around the SLID code. The Census data is then merged with the
estimates and observations made as to whether 1996 PCCF file using this eight-digit EA
the population estimates from the Census fall identifier.
within these confidence intervals for each ER. For 2001 the DA code is an eight-digit
The confidence level chosen for this example is code constructed from provincial, census
95% but researchers can choose any level they division and DA identifiers. The provincial code
think is suitable. A high number of matches composes the first two digits, the census division
show the SLID estimates are a good match to code the following two and the DA code the
true population parameters. final four. The eight-digit DA code for 2001 is
created the same way as the EA code for 1996.
Choose and Generate Demographic Variables Merging results in each record being assigned an
and Confidence Intervals ER identification code. Once again, to ensure the
For the provincial and national production of accurate estimates, data is
population totals, weighted sums were aggregated to the ER level by first creating a
calculated in STATA broken out by province. At sum of all individuals within ERs with the
the ER level, two characteristics were chosen for characteristics of interest. This ensures
comparison: gender and age. Twenty-one age accurately weighted estimates reflecting the
and gender breakouts by ER were calculated numbers of individuals in ERs. After these sums
using the SLID data for 1996 and 2001. In are created for each ER proportions are then
addition to the proportion of females, age calculated that correspond to the ten aggregate
breakouts for the whole population and for variables created in the SLID. Table 2 shows the
females only are generated using different age variable names and the 1996 and 2001 Census
intervals. Using STATA, 95% confidence variables from which they were constructed.
intervals were created for each SLID estimate.
Compare SLID and Census Profile Data
Recreate Aggregate Variables Using Census Estimates
Profile Data With the SLID education and
Because the Census profile data does employment aggregates, population totals,
not contain ER identification codes it is first demographic variables, confidence intervals for
necessary to merge the Census data with the these estimates for 1996 and 2001 and similar
616
CHOWHAN & DUNCAN
variables recreated using Census profile data enough to the Census to conclude that the data
from 1996 and 2001, the comparison was carried matches reasonably well.
out. First, weighted provincial and national Second, basic demographics were
population totals were compared by year and by compared by year and by ER in order to
province; results are shown in Tables 3a and 3b. determine the number of Census profile
It is important to note that some estimates that would fall within the 95%
variation in the totals is to be expected due to confidence intervals generated around the SLID
rounding error in the Census. In both tables it estimates. Each Census profile estimate falling
was expected that column 1 and 2 add up to within the confidence interval was called a
column 3. In 1996, there was a difference of 685 match. Table 4 shows the percentage of matches
and in 2001 there is a difference of 235. These across 66 ERs in 1996 and 73 ERs in 2001.
differences are likely due to rounding error. It The proportion of females in the
would also be expected that column 4 and population variable matched perfectly and the
column 6 would be similar and that column 5 age breakouts had a high, but not perfect,
would be less than both of these. In 1996 the percentage of matches. The only variables with
total population is SLID is 271,963 more than suspiciously low numbers of matches were the
the Census total population and in 2001, the total percentage of individuals aged 15 to 19 and the
population in the SLID is 1,828,145 below the percentage of females aged 15 to 19. The age
Census total population: no obvious reason breakouts for individuals and females aged 15 to
exists to explain this. Even with the minor 25 and 20 to 24 showed much better matching.
disparity, population totals in the SLID are close This suggests that the discrepancy is occurring at
617
AGGREGATE VARIABLES FOR DATASETS IN MULTILEVEL ANALYSIS
the lower end of the age spectrum in the 15 to 19 then the aggregates may be of some use. An
age range. investigation of the data and variable definitions
Based on observations of similarities in was carried out to identify possible causes for
the population totals, gender and age the low number of matches.
characteristics across the SLID and Census Investigation of the data and
profile data samples, it may be suggested that examination of the documentation highlighted
the SLID and the Census profile data will also several limitations with the variables chosen for
be similar across other characteristics, in this use in both the Census profile data and the
case education and employment. To test this, the SLID. These limitations are very likely the cause
constructed aggregates were checked for validity of the low number of matches across the
in a similar manner. Again, 95% confidence aggregate variables. First, the internal
intervals (assuming a Normal distribution) were consistency of the constructed estimates was
created around the SLID estimates and investigated. In particular, confirmation was
observations were made as to whether the required that the total populations being used on
population estimates from the Census fell within the SLID and in the Census Profile data as the
these confidence intervals for each ER. Table 5 denominator in the proportions calculations were
shows the percentage of matches across 66 ERs in fact the sum of their composite parts. In both
in 1996 and 73 in 2001. the SLID and the Census Profile data, age and
Given the excellent age and gender education populations were verified. A check
match of the data, the low number of matches was made of the proportions of individuals aged
for the constructed aggregate variables is under 25, 25 to 49, 50 to 74 and 75; these
surprising. Without a clear explanation as to proportions should total 1 as this range of ages
why the variables do not match, the constructed encompasses all possible ages in the population.
aggregates cannot be trusted as representative The same check was carried out for the female
and should not be used. However, if proportions and for the proportions of
explanations can be found for the low matching individuals aged under 25, 25 to 49, 50 to 64 and
Table 3a: Provincial and National Totals for SLID and Census Profile Data, 1996
1996 Census SLID
1. Male 2. Female 3. Total 4. Total 5. Total Labor 6. Total
Province
Subtotal Subtotal Population Population 15+ Force 15+ Population 15+
10 271,740 278,575 550,420 435,985 245,165 423,747
11 65,990 68,450 134,440 103,580 70,695 100,100
12 441,490 466,175 907,635 718,015 438,010 669,414
13 362,490 374,665 737,255 583,550 363,055 556,031
24 3,318,800 3,462,665 6,781,570 5,382,325 3,357,080 5,394,101
35 4,794,345 5,011,300 9,805,685 7,669,850 5,084,190 7,848,826
46 492,640 509,980 1,002,730 769,900 511,145 782,124
47 450,690 461,550 912,085 689,015 463,360 687,939
48 1,225,800 1,227,510 2,453,330 1,864,640 1,348,880 1,895,376
59 1,706,985 1,751,340 3,458,715 2,743,105 1,819,185 2,874,269
Total 13,130,970 13,612,210 26,743,865 20,959,965 13,700,765 21,231,928
618
CHOWHAN & DUNCAN
Table 3b: Provincial and National Totals for SLID and Census Profile Data, 2001
2001 Census SLID
1. Male 2. Female 3. Total 4. Total 5. Total Labor 6. Total
Province
Subtotal Subtotal Population Population 15+ Force 15+ Population 15+
10 249,805 260,815 510,545 422,170 240,600 404,336
11 65,450 69,145 134,530 107,940 73,570 98,323
12 437,335 466,330 903,505 739,060 450,075 681,910
13 355,380 371,485 726,990 597,500 370,920 548,849
24 3,521,985 3,689,680 7,212,255 5,923,010 3,734,615 5,270,975
35 5,458,005 5,701,920 11,159,880 8,972,500 5,950,800 8,426,920
46 547,455 567,110 1,114,400 881,395 582,590 796,246
47 478,785 494,380 973,075 766,390 509,670 691,965
48 1,470,895 1,473,690 2,944,620 2,334,465 1,678,965 2,193,306
59 1,908,975 1,978,245 3,887,305 3,183,715 2,046,190 2,994,323
Total 14,494,070 15,072,800 29,567,105 23,928,145 15,637,995 22,100,000
Table 4: Comparison of SLID and Census Profile Aggregate Estimates for Gender and Age Variables
1996 2001
Variable Name Variable Definition
% of Matches % of Matches
female % population that is female 100 97
pct_15to25 % population aged 15 to 25 71 70
pct_25to49 % population aged 25 to 49 79 78
pct_50to74 % population aged 50 to 74 70 84
pct_75over % population aged 75 & over 62 78
pct_50to64 % population aged 50 to 64 68 79
pct_65over % population aged 65 & over 70 74
pct_15to19 % population aged 15 to 19 44 52
pct_20to24 % population aged 20 to 24 80 86
pct_40to44 % population aged 40 to 44 90 88
pct_75to79 % population aged 75 to 79 74 89
pct_15to25_f % female population aged 15 to 25 68 74
pct_25to49_f % female population aged 25 to 49 85 88
pct_50to74_f % female population aged 50 to 74 76 86
pct_75over_f % female population aged 75 & over 73 88
pct_50to64_f % female population aged 50 to 64 79 86
pct_65over_f % female population aged 65 & over 77 82
pct_15to19_f % female population aged 15 to 19 70 66
pct_20to24_f % female population aged 20 to 24 80 92
pct_40to44_f % female population aged 40 to 44 85 92
pct_75to79_f % female population aged 75 to 79 83 92
619
AGGREGATE VARIABLES FOR DATASETS IN MULTILEVEL ANALYSIS
Table 5: Comparison of SLID and Census Profile Aggregate Estimates for Employment and
Education Variables
% of Matches
Variable Name Variable Description
1996 2001
65 and over. It was found that the 15 to 19 age with at least a high school education with the
category produced low numbers of matches. expectation that they would equal 1. This was
Verification was made that the difference not the case: most totals ranged from 0.4 to 0.6.
between the proportion of individuals aged Referring to the documentation and exploring
under 25 and the proportion of individuals aged the data illuminated the reason. The Total
15 to 19 added to the proportion of individuals population 15 years and over by highest level of
aged 20 to 24 equals 0. The same verification schooling is a poorly defined population. The
was made for the female proportions. The results Census Profile data contains numerous
were either extremely close or exactly 0 or 1 population totals broken out by different
(See Appendix 1 for details). characteristics. For example, the education
Additional checks were carried out for variables include ‘Total population 15 years and
the education variables in the Census Profile over by highest level of schooling’, the marital
data for both 1996 and 2001. The difference status variables include ‘Total population 15
between the proportion of individuals with years and over by marital status’ and the labor
postsecondary certificates and the proportion of force variables include ‘Total population 15
individuals with university certificates added to years and over by labor force activity’. It was
the proportion of individuals with non-university expected that summing together the number of
certificates was checked with the expectation individuals aged 15 years and over using the age
that, if accurate, they should equal 0. Results breakouts in the Census Profile data would
were either 0 or less than 0.0001 above or below include the same population as these ‘Total
0. The proportion of individuals with less than a population 15 years and over by…’ variables.
high school education were added to individuals This was checked for the ‘Total population 15
620
CHOWHAN & DUNCAN
years and over by highest level of schooling’ individuals at the time the Census is carried out.
variable. The difference between these two totals For the construction of the non_employee
was more than can be explained by rounding aggregate variable, it was necessary that
error in the Census Profile data. Checking this individuals only fall into one class of worker
variable against other variables that call category. However, the SLID uses the main job
themselves ‘Total population 15 years and over concept to categorize individuals into the class
by…’ a sizeable and apparently unexplainable of worker variable clwrkr1. Main job is typically
difference was found. Another drawback with the job with the longest duration, greatest
the 2001 education aggregate variables is that in number of hours worked over the year, and most
the 2001 Census Profile data education data is usual hours worked in a given month (Statistics
supplied for individuals aged 20 and over Canada, 1997; Statistics Canada, 2007). Thus,
(Statistics Canada, 1999). By contrast, SLID the difference in the reference periods of the
education data was available for individuals samples and the SLID’s focus on main job is a
aged 15 and over. This, added to the other possible explanation for lower matching rates.
problems described, provides the reason why the The Census Profile data has its own
education aggregate variables do not match well. ambiguities around the class of worker variable.
Having identified an explanation for the In the Census Profile data on class of worker
low matching across education variables, similar there is a category defined as Class of worker-
explanations were sought for the employment Not Applicable (Statistics Canada, 1999). The
variables. Three main limitations in both the documentation does not explain who this group
Census Profiles and SLID documentation consists of or what characteristics of individuals
regarding ambiguous definitions of populations in this category make class of worker not
and variables were found that could explain the applicable to them. In an effort to take this into
low number of matches across the employment account, the aggregate variable of non-employee
variables. First, there was some ambiguity over was constructed using an all classes of worker
the definition of the labor force. SLID defines variable as the denominator (a variable that does
the labor force as persons aged 16 to 69 who not include the class of worker-not applicable
were employed during the survey reference individuals). This was used in place of the
period. Therefore, the employment variables variable total labor force 15 years and over by
used in the construction of the aggregate class of worker, which did include those
variables only refers to these individuals. The individuals. Although this avoids using unclear
Census Profile data on the other hand defines the population definitions as a denominator, it does
labor force as employed individuals aged 15 and not help explain where that category of
over. Although this may cause some disparity, it individuals should be included most accurately
is unlikely to be the only cause of the low in the class of worker categorization. These
number of matches. Further investigation problems may explain why the non_employee
revealed a more severe limitation regarding the variables are not matching well.
classification of individual labor force status Finally, there was a difference in the
(Statistics Canada, 1997; Statistics Canada, definition of the Census Profile data and SLID
1999). managerial occupation variables that may render
One of the strengths of the SLID as a them incomparable. In the Census Profile data,
longitudinal survey is that it asks for information individuals are asked to explain the type of job
on every job an individual has held during the they have and their main responsibilities, and
reference year, rather than focusing on the job at from this information they are coded into
the time of the survey. Regarding class of occupation classifications. This classification
worker (paid worker, employee, self-employed, includes a section on management occupations
etc.) individuals can, therefore, hold several that was used to produce the proportion of
statuses, in addition, they are asked to report individuals with occupations perceived to be
their status for each month so that they have 12 managerial variable (Statistics Canada, 1999). In
statuses over the year. By contrast, the Census the SLID, on the other hand, individuals are
Profile data only reports the class of worker for asked if they perceive their job to be managerial
621
AGGREGATE VARIABLES FOR DATASETS IN MULTILEVEL ANALYSIS
(Statistics Canada, 1997). The self-identification both variables. The Census profile data
involved here suggests that this variable is likely documentation clearly defines its target
to be largely inconsistent: what defines a job as population for each variable but it can be unclear
managerial is not clearly defined. Individuals how individuals were included. For example, in
may identify themselves as having a managerial the 1996 Census profile data the ‘Total
job when in fact they do not. This could explain population 15 years and over by highest level of
why the pct_mgt variable does not show a high schooling’ was not the same as ‘All individuals
number of matches. aged 15 years and over’. In some cases, the
target populations for the same variable in the
Conclusion 1996 and 2001 Census profile data were
The investigations and results are outlined here different. For example, in the 1996 Census
as a precaution to researchers wishing to create profile data, the education data was available for
aggregate variables or use the SLID aggregate individuals aged 15 and over; and in the 2001
variables created in this study. The construction Census profile data, it was supplied for
and comparison of aggregate variables should individuals aged 20 and over (Statistics Canada,
not be undertaken without caution. Despite their 1999).
limitation, it is hoped that the constructed SLID
aggregates could be of some use to researchers. Definitions
The following points serve as a set of cautions to It is important to understand how
those wishing to use this approach based on variables are defined in order to construct useful
difficulties that might be encountered in aggregate variables that are as accurate as
aggregate variable construction and comparison. possible. What a researcher may consider to be a
standard classification may in fact be different
Internal Consistency across different datasets. In the example used in
When constructing aggregate variables this article, it was found that the definition of
it is important that the variables are internally labor force was not clear. In SLID, labor force is
coherent. For example, imagine creating two defined as persons aged 16 to 69 who were
aggregate education variables at the EA level; employed during the survey reference period. In
one is the proportion of individuals with less the Census profile data, the labor force is
than a high school education, the other is the defined as employed individuals aged 15 and
proportion of individuals with at least a high over. This difference made comparing the
school education. If a researcher added the two Census profile and SLID aggregate employment
proportions together across all EAs all totals variables inappropriate. Variable definitions may
should equal 1, if it does not, further also have unexplained ambiguities that must be
investigation would be required to uncover taken into account. For example, in the Census
reasons why. profile data there were ambiguities with the class
of worker variable, which made comparison
Target Population difficult.
When creating and comparing aggregate
variables it is important to know the target Survey Design
population of the variable. Some variables apply The way in which surveys are designed
to individuals over a certain age, some apply to can make constructing and comparing aggregate
individuals who only answered positively to variables problematic. One of the strengths of
survey questions, and some apply to all the SLID as a longitudinal survey is that it asks
respondents. This becomes more important if for information on every job an individual has
researchers wish to check the validity of their held during the reference year rather than
constructed aggregates by comparing the sample focusing on the job at the time the survey is
characteristics with the Census profile data carried out. The result is that many records may
characteristics. If constructing proportions of exist for one individual. By contrast, the Census
individuals with certain characteristics, it is profile data holds one record per individual.
important that the denominator be the same in Researchers must be clear on the survey design
622
CHOWHAN & DUNCAN
and number of records per individual. If several The difficulties encountered resulted in a set of
records exist for each individual, some rationale cautions for researchers wishing to use this
must be used to select the most suitable record. approach. As a whole, this article may serve as a
guide to researchers in the generation and
Classification comparison of these or similar aggregate
It is important to understand how variables and also emphasizes the precautions
variables have been coded into categories and that must be taken when using this approach.
whether individuals self-identify for certain
classifications. This has implications for References
category definitions and how comparable they Gonthier, D., Hotton, T., Cook, C., &
are across datasets. In the example provided, a Wilkins, R. (2006). Merging area-level census
difference was identified between the Census data with survey data in statistics Canada
profile data and SLID in how managerial research data centres. The Research Data
occupations are defined. The difference between Centres Information and Technical Bulletin,
coding by self-identification and coding by an 3(1), 21-39.
external classifier is an important one that could Statistics Canada. 1997. Survey of
lead to inconsistent definitions. labour and income dynamics: Microdata user’s
This article outlined the importance of guide. Dynamics of Labour and Income.
aggregate level variables for use in multilevel Catalogue No. 75M0001GPE. Ministry of
analysis and introduced the idea of generating Industry, Ottawa.
aggregate level variables in one dataset for use Statistics Canada. 1999. 1996 Census
across other datasets. Generating and comparing Dictionary, Final Edition Reference. Catalogue
aggregate variables was described using an No. 92-351-UIE. Ministry of Industry, Ottawa.
example generating employment and education Statistics Canada. 2007. Guide to the Labour
aggregate variables in the SLID for 1996 and Force Survey. Dynamics of Labour and Income.
2001 cross-sectional samples at the ER Level Labour Statistics Division. Catalogue No. 71-
and comparing them to similar estimates 543-GIE. Ministry of Industry, Ottawa.
constructed using the 1996 and 2001 Census
profile data.
Appendix
Tables A1 and A2 show the age and education verifications by ER for both the Census and the SLID for
1996 and 2001. The variable definitions are as follows:
SLIDage1: ‘pct_15to25’ + ‘pct_25to49’ + ‘pct_50to74’ + ‘pct_75over’
SLIDage2: ‘pct_15to25’ + ‘pct_25to49’ + ‘pct_50to64’ + ‘pct_65over’
SLIDage3: Difference between ‘pct_15to25’ and (‘pct_15to19’ + ‘pct_20to24’)
SLIDage4: SLIDage1 for females
SLIDage5: SLIDage2 for females
SLIDage6: SLIDage3 for females
Censage1: ‘pct_15to25’ + ‘pct_25to49’ + ‘pct_50to74’ + ‘pct_75over’
Censage2: ‘pct_15to25’ + ‘pct_25to49’ + ‘pct_50to64’ + ‘pct_65over’
Censage3: Difference between ‘pct_15to25’ and (‘pct_15to19’ + ‘pct_20to24’)
Censage4: Censusage1 for females
Censage5: Censusage2 for females
Censage6: Censusage3 for females
Censedu1: ‘Less than high school’ + ‘at least high school’
Difference between ‘postsecondary certificate’ and ‘university certificate’ + ‘non-
Censedu2:
university certificate’
623
AGGREGATE VARIABLES FOR DATASETS IN MULTILEVEL ANALYSIS
Table A1: 1996 Age and Education Verifications by ER for the Census and SLID
624
CHOWHAN & DUNCAN
Table A2: 2001 Age and Education Verifications by ER for the Census and SLID
625
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 626-631 1538 – 9472/09/$95.00
Previous work with respect to the treatments and relapse time for breast cancer patients is extended by
applying a Markov chain to model three different types of breast cancer patients: alive without ever
having relapse, alive with relapse, and deceased. It is shown that combined treatment of tamoxifen and
radiation is more effective than single treatment of tamoxifen in preventing the recurrence of breast
cancer. However, if the patient has already relapsed from breast cancer, single treatment of tamoxifen
would be more appropriate with respect to survival time after relapse. Transition probabilities between
three stages during different time periods, 2-year, 4-year, 5-year, and 10-year, are also calculated to
provide information on how likely one stage moves to another stage within a specific time period.
Key words: Markov chain, breast cancer, relapse time, tamoxifen and radiation.
626
CONG & TSOKOS
remaining 383 received tamoxifen (Tam) only. between the stages at different times satisfy the
The last follow-up was conducted in the summer Markov property: the conditional probability of
of 2002. As shown in Figure 2, only those 641 future one-step-event conditioned on the entire
patients enrolled at the Princess Margaret past of the process is just conditioned on the
Hospital are included: 320 and 321 in RT+Tam present stage of the process. In other words, the
and Tam treatment groups, respectively. one-step future stage depends only on the
present stage:
Figure 2: Breast Cancer Data
P( X t +1 = xt +1 | X t = xt , X t −1 = xt −1,..., X1 = x1) (1)
641Total = P( X t +1 = xt +1 | X t = xt )
627
MARKOV MODELING OF BREAST CANCER
where Scenario 3
If the time of death is known or
q11 q12 ... q1s j = death , but the stage on the previous instant
q q ... q2 s before death is unknown as denoted by k ( k
Q = 21 22 , (6) could be any possible stage between stage i and
... ... ... ... death), the contribution to the likelihood
qs1 qs 2 ... qss function from this pair of stages is:
and q ij denotes the transition intensity from Lij = pik (t j − ti )qkj . (10)
k≠ j
stage i to stage j .
The exponential of a matrix A is Results
defined by The breast cancer patients were divided into two
groups RT+Tam and Tam based on the different
Exp( A) = 1 + A2 / 2!+ A3 / 3!+ ... , (7) treatments they received. For those patients who
received combined treatments, 26 patients
experienced relapse, 13 patients died without
where each summand in the series is the matrix
recurrence of breast cancer during the entire
products. In this manner, once the intensity
period of the study, and 14 died after recurrence
matrix is given, the transition probabilities can
of breast cancer. For the patients in the Tam
be calculated as shown above.
group, 51 patients experienced relapse, 10 died
Next, the intensity matrix and transition
without reoccurrence of breast cancer, and 13
probabilities matrix can be obtained by
died after recurrence of breast cancer.
maximizing the likelihood L (Q ) which is a
As can be observed from the transition
function of Q . Consider an individual consisting intensity matrixes for both groups RT+Tam and
of a series of times (t1 , t2 ,..., tn ) and Tam as shown in Tables 1 and 2, patients who
corresponding stages ( x1 , x2 ,..., xn ). More received single treatment have a higher
transition intensity form Stage 1 to Stage 2, thus,
specifically, consider a pair of successive stages they are more likely to have breast cancer
observed to be i and j at time ti and t j . Three recurrence. Thus, the probability of that
scenarios are proposed and considered here. happening in the Tam group is higher than that
of the RT+Tam group. For those patients who
Scenario 1 died without relapse, there is no significant
If the information for the individual is difference between the two treatments as
obtained at arbitrary observation times (the exact illustrated by the intensity form Stage 1 to Stage
time of the transition of stages is unknown) the 3.
contribution to the likelihood from this pair of Combined treatment is also more
states is: effective than a single treatment with respect to
Lij = pij (t j − ti ) . (8) the possibility of death without relapse as can be
observed from the transition intensity from
Stage 1 to Stage 3. However, for those who
Scenario 2 already experienced relapse of breast cancer,
If the exact times of transitions between patients who received combined treatments are
different stages are recorded and there is no more likely to die than those who received a
transition between the observation times, the single treatment. Therefore, combined treatment
contribution to the likelihood from this pair of should be chosen over single treatment to avoid
stages is: recurrence, but for those patients who already
Lij = pij (t j − ti ) qij . (9) had breast cancer relapse, it would be advisable
to choose a single treatment to extend the time
from recurrence to death.
628
CONG & TSOKOS
629
MARKOV MODELING OF BREAST CANCER
From state 1
From state 2
0.8
Fittedsurvival probability
0.6
0.4
0.2
0.0
0 2 4 6 8
Time
630
CONG & TSOKOS
1.0
From state 1
From state 2
0.8
Fittedsurvival probability
0.6
0.4
0.2
0.0
0 2 4 6 8 10
Time
Table 5a: 2-year transition matrix for RT+Tam Table 5b: 2-year transition matrix for Tam
Stage 1 Stage 2 Stage 3 Stage 1 Stage 2 Stage 3
Stage 1 0.9550 0.0285 0.0165 Stage 1 0.9247 0.0623 0.0130
Stage 2 0 0.5408 0.4592 Stage 2 0 0.8431 0.1569
Stage 3 0 0 0 Stage 3 0 0 0
Table 6a: 4-year transition matrix for RT+Tam Table 6b: 4-year transition matrix for Tam
Stage 1 Stage 2 Stage 3 Stage 1 Stage 2 Stage 3
Stage 1 0.9121 0.0426 0.0453 Stage 1 0.8550 0.1102 0.0348
Stage 2 0 0.2925 0.7075 Stage 2 0 0.7108 0.2892
Stage 3 0 0 0 Stage 3 0 0 0
Table 7a: 5-year transition matrix for RT+Tam Table 7b: 5-year transition matrix for Tam
Stage 1 Stage 2 Stage 3 Stage 1 Stage 2 Stage 3
Stage 1 0.8913 0.0466 0.0621 Stage 1 0.8221 0.1295 0.0484
Stage 2 0 0.2151 0.7849 Stage 2 0 0.6527 0.3473
Stage 3 0 0 0 Stage 3 0 0 0
Table 8a: 10-year transition matrix for RT+Tam Table 8b: 10-year transition matrix for Tam
Stage 1 Stage 2 Stage 3 Stage 1 Stage 2 Stage 3
Stage 1 0.7945 0.0515 0.1540 Stage 1 0.6759 0.1910 0.1331
Stage 2 0 0.0463 0.9537 Stage 2 0 0.4260 0.5740
Stage 3 0 0 0 Stage 3 0 0 0
631
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 632-645 1538 – 9472/09/$95.00
Xing Liu
Eastern Connecticut State University
Researchers have a variety of options when choosing statistical software packages that can perform
ordinal logistic regression analyses. However, statistical software, such as Stata, SAS, and SPSS, may use
different techniques to estimate the parameters. The purpose of this article is to (1) illustrate the use of
Stata, SAS and SPSS to fit proportional odds models using educational data; and (2) compare the features
and results for fitting the proportional odds model using Stata OLOGIT, SAS PROC LOGISTIC
(ascending and descending), and SPSS PLUM. The assumption of the proportional odds was tested, and
the results of the fitted models were interpreted.
Key words: Proportional Odds Models, Ordinal logistic regression, Stata, SAS, SPSS, Comparison.
632
LIU
633
FITTING PO MODELS USING STATA SAS & SPSS
π j (x )
= ln π(Y ≥ j | x 1 , x 2 ,...x p )
1 − π (x ) = ln
j
(
π Y < j | x , x ,...x
1 2 p )
= αj + (−β1X1 -β2X2 - … -βpXp), (3)
= αj + β1X1 +β2X2 + … +βpXp, (6)
where πj(x) = π(Y ≤ j|x1,x2,…xp), which is the
probability of being at or below category j, given where in both equations αj are the intercepts,
a set of predictors. j = 1, 2, … J -1. αj are the cut and β1, β2 …βp are logit coefficients.
points, and β1, β2 …βp are logit coefficients. This SPSS PLUM (Polytomous Universal
is the form of a Proportional Odds (PO) model Model) is an extension of the generalized linear
because the odds ratio of any predictor is model for ordinal response data. It can provide
assumed to be constant across all categories. five types of link functions including logit,
Similar to logistic regression, in the proportional probit, complementary log-log, cauchit and
odds model we work with the logit, or the negative log-log. Just as Stata, the ordinal logit
natural log of the odds. To estimate the ln (odds) model is also based on the latent continuous
of being at or below the jth category, the PO outcome variable for SPSS PLUM, it takes the
model can be rewritten as: same form as follows:
Thus, this model predicts cumulative logits where αj’s are the thresholds, and β1, β2 …βp are
across J -1 response categories. By transforming logit coefficients; j = 1, 2…J-1.
the cumulative logits, we can obtain the Compared to both Stata and SPSS, SAS
estimated cumulative odds as well as the (ascending and descending) does not negate the
cumulative probabilities being at or below the jth signs before the logit coefficients in the
category. equations, because SAS Logistic procedure
SAS uses a different ordinal logit model (Proc Logistic) is used to model both the
for estimating the parameters from Stata. For dichotomous and ordinal categorical dependent
SAS PROC LOGISTIC (the ascending option), variables, and the signs before the coefficients in
the ordinal logit model has the following form: the ordinal logit model are kept consistent with
those in the binary logistic regression model.
634
LIU
Although the signs in the equations are positive, student ability influencing grading, and teachers’
SAS internally changes the signs of the grading habits. Composite scale scores were
estimated intercepts and coefficients according created by taking a mean of all the items for
to different ordering of the dependent variable each scale. Table 1 displays the descriptive
(with the ascending or descending option). statistics for these independent variables.
The proportional odds model was first
Methodology fitted with a single explanatory variable using
Sample Stata (V. 9.2) OLOGIT. Afterwards, the full-
The data were collected from teachers at model was fitted with all six explanatory
three middle schools and a teacher’s training variables. The assumption of proportional odds
school in Taizhou City, Jiangsu Province, China, for both models was examined using the Brant
using a survey instrument named Teachers’ test. Additional Stata subcommands
Perceptions of Grading Practices (TPGP) (Liu, demonstrated here included FITSTAT and
2004; Liu, O’Connell & McCoach, 2006). A LISTCEOF of Stata SPost (Long & Freese,
total of 147 teachers responded to the survey 2006) used for the analysis of post-estimations
with the response rate of 73.5%. The outcome for the models. The results of fit statistics, cut
variable of interest is teachers’ teaching points, logit coefficients and cumulative odds of
experiences, which is an ordinal categorical the independent variables for both models were
variable with 1 = low, 2 = medium and 3 = high. interpreted and discussed. The same model was
Explanatory variables included gender fit using SAS (V. 9.1.3) (ascending and
(female = 1; male = 2) and a set of scale scores descending), and SPSS (V. 13.0), and the
from the TPGP survey instrument The similarities and differences of the results using
instrument included five scales measuring the all three programs were compared.
importance of grading, the usefulness of
grading, student effort influencing grading,
1 2 3 Total
n = 70 n = 45 n = 32 n = 147
Variable 47.6% 30.6% 21.8% 100%
% Gender
74.3% 66.7% 50% 66.7%
(Female)
635
FITTING PO MODELS USING STATA SAS & SPSS
------------------------------------------------------------------------------
teaching | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gender | .7587563 .3310069 2.29 0.022 .1099947 1.407518
-------------+----------------------------------------------------------------
/cut1 | .9043487 .4678928 -.0127044 1.821402
/cut2 | 2.320024 .5037074 1.332775 3.307272
------------------------------------------------------------------------------
636
LIU
y>1 y>2
gender .66621777 .91021169
_cons -.78882009 -2.5443422
637
FITTING PO MODELS USING STATA SAS & SPSS
The Proportional Odds Model can also estimate cumulative categories. It also indicated that male
the ln(odds) of being at or beyond category j, teachers were 2.136 times the odds for female
given a set of predictors. Again, these ln(odds) teachers of being at or beyond any category, i.e.,
can be transformed into the cumulative odds, male teachers were more likely than female
and cumulative probabilities as well. For teachers to be at or beyond a particular category.
example, the cumulative probability of a Figure 4 displays the results of Stata listcoef.
teacher’s teaching experience can be estimated Adding option percent after listcoef, the result of
at or beyond category 3, P(Y ≥ 3), which is the percentage change in odds of being at or beyond
complementary probability when Y ≤ 2, at or a particular category can be obtained when the
beyond category 2, P(Y ≥ 2), and P(Y ≥ 1), predictor, gender, goes from males (x = 2) to
which equals 1. females (x = 1).
In Stata, when estimating the odds of
being beyond category j, or at or beyond j+1, the Proportional Odds Model with Six Explanatory
sign of the cut points needs to be reversed and Variables
their magnitude remain unchanged because the Next, a proportional odds model was fit
cut points were estimated from the right to the with eight explanatory variables, which is
left of the latent variable, Y*, that is, from the referred to as the Full Model. Figure 5 displays
direction when Y = 3 approaches Y = 1. the results for the fitting of the full model with
Therefore, two cut points from right to left turn six explanatory variables.
to -2.32 and - .904. When the predictor is Before interpreting the results of the full
dichotomous, a positive sign of the logit model, the assumption of proportional odds was
coefficient indicates that it is more likely for the first examined. The Stata brant command
group (x = 1) to be at or beyond a particular provides the results of the Brant test of parallel
category than for the relative group (x = 0). regression (Proportional Odds) assumption for
When the predictor is continuous, a positive the full model with six predictors and tests for
coefficient indicates that when the value of the each independent variable. It also provides the
predictor variable increases, the probability of estimated coefficient from j-1 binary logistic
being at or beyond a particular category regression models results of two separate binary
increases. logistic regression models. The data are
Using Stata syntax listcoef, the odds of dichotomized according to the cumulative
being at or beyond a particular category at 2.136 probability pattern so that each logistic
can be obtained, which was constant across all
----------------------------------------------------------------------
teaching | b z P>|z| e^b e^bStdX SDofX
-------------+--------------------------------------------------------
gender | 0.75876 2.292 0.022 2.1356 1.4318 0.4730
----------------------------------------------------------------------
b = raw coefficient
z = z-score for test of b=0
P>|z| = p-value for z-test
e^b = exp(b) = factor change in odds for unit increase in X
e^bStdX = exp(b*SD of X) = change in odds for SD increase in X
SDofX = standard deviation of X
638
LIU
regression model estimates the probability of the variables were similar across two binary
being at or beyond teaching experience level j. logistic models, supporting the results of the
For the omnibus Brant test, χ26 = 8.10, p = .230, Brant test of proportional odds assumption.
indicating that the proportional odds The log likelihood ratio Chi-Square test,
assumptions for the full-model was upheld. LR χ2(6) = 13.738, p = .033, indicating that the
Examining the Brant tests for each individual full model with six predictor provided a better fit
independent variable indicated that the Brant test than the null model with no independent
of the assumption of parallel regression variables in predicting cumulative probability
(proportional odds) were upheld for gender, for teaching experience. The likelihood ratio R2L
importance, effort, ability and habits. For = .045, much larger than that of the gender-only
usefulness, the Brant test, χ21 = 4.03, p = .045, model, but still small, suggesting that the
which is very close to .05, therefore, it may also relationship between the response variable,
be concluded that the PO assumption for this teaching experience, and six predictors, was still
variable is nearly upheld. Checking the small. Compared with the gender-only model,
estimated coefficients for each independent all R2statistics of the full-model shows
variable across two binary logistic regression improvement (see Figure 7).
models shows that the logit coefficients for all
y>1 y>2
gender .74115294 .86086025
importance .64416122 .46874536
usefulness -.94566294 -.19259753
effort .09533898 -.03621639
ability .26862373 .68349765
habits .48959286 -.02795948
_cons -2.7097459 -5.7522624
639
FITTING PO MODELS USING STATA SAS & SPSS
The Stata listcoef command (Figure 8) positive effects on the response variable in the
produced more detailed results of logit model. For example, the importance of grading
coefficients and cumulative odds (exponentiated (OR=1.778) had a positive effect on teachers
coefficients). For the proportional odds model, being in higher teaching experience categories.
interpretation of cumulative odds is independent For a one unit increase in the importance of
on the ancillary parameters (cut points) because grading, the odds ratio of being in higher
they are constant across all levels of the teaching experience categories versus lower
response variable. categories was 1.778 times greater, given the
The effects of the independent variables effects of other predictors are held constant.
can be interpreted in several ways, including Variables such as student ability and teacher’s
how they contribute to the odds and their grading habits, whose corresponding ORs were
probabilities of being at or beyond a particular greater than 1.0, but were not statistically
category. They can also be interpreted as how significant, had positive effects on the response
these variables contribute to the odds of being at variable, but these effects may be due to chance
or below a particular category, if the sign is and need further investigation. Independent
reversed before the estimated logit coefficients variables with ORs close to 1.0 have no effect
and corresponding cumulative odds are on the response variable. For example, student
computed. In terms of odds ratios, male teachers effort influencing grading was not associated
were 2.241 times the odds for female teachers to with teaching experience in this model
be at or beyond a particular category (OR=1.0266, p=.946).
(OR=2.241), after controlling the effects of other
predictors in the model. The usefulness of Comparison of Results of a Single-Variable PO
grading with a corresponding OR significantly Model Using Stata, SAS, and SPSS
less than 1.0 has significant negative effects in Table 2 shows a comparison of the
the model. These cumulative odds are associated results for Stata OLOGIT with results from SAS
with a teacher being in lower teaching PROC LOGISTIC with the ascending and
experience categories rather than in higher descending options, and SPSS PLUM. The
categories. For a one unit increase in the similarities and differences between these results
usefulness of grading, the odds ratio of being in should be noted, otherwise, it could be
higher teaching experience categories versus misleading to interpret the results in the same
lower categories was .53 times lower, after way, disregarding their different
controlling for the effects of other variables. parameterizations. In estimating proportional
However, variables whose corresponding ORs odds models, Stata sets the intercept to 0, and
are significantly greater than 1.0 have significant estimates the cut points, while SAS ascending
640
LIU
Figure 8: Results of Logit Coefficient, Cumulative Odds, and Percentage Change in Odds
. listcoef, help
----------------------------------------------------------------------
teaching | b z P>|z| e^b e^bStdX SDofX
-------------+--------------------------------------------------------
gender | 0.80695 2.318 0.020 2.2411 1.4648 0.4730
importance | 0.57547 1.897 0.058 1.7780 1.4601 0.6578
usefulness | -0.63454 -2.322 0.020 0.5302 0.6402 0.7029
effort | 0.02625 0.068 0.946 1.0266 1.0140 0.5283
ability | 0.34300 0.825 0.409 1.4092 1.1752 0.4707
habits | 0.31787 1.088 0.277 1.3742 1.2282 0.6466
----------------------------------------------------------------------
b = raw coefficient
z = z-score for test of b=0
P>|z| = p-value for z-test
e^b = exp(b) = factor change in odds for unit increase in X
e^bStdX = exp(b*SD of X) = change in odds for SD increase in X
SDofX = standard deviation of X
estimates the intercepts and set the cut points to coefficient, β = -.759. Substituting it into its
0. Comparing Stata with SAS (ascending), the corresponding logit form (5) results in the same
different choice of parameterization does not equation: logit [π(Y ≤ j | gender)] = αj -
influence the magnitude of cut points (or .759*(gender). Therefore, the same results of
intercepts) and coefficients. However, it does estimated cumulative odds and cumulative
determine the sign before these estimates. probability were obtained using Stata and SAS
When estimating the odds of being at or ascending.
below a response category, the estimates for the Comparing the results of the
cut points using Stata are the same as the proportional odds model using Stata and SAS
intercepts using SAS ascending in both sign and with the descending option, it was found that
magnitude. The first cut point, α1 in Stata estimated cut points for Stata and the estimated
estimation is the same as the first intercept α1 in intercepts for SAS descending are the same in
SAS ascending estimation, because there is no magnitude but are opposite in sign. Using Stat
first intercept α1 in Stata estimation. Using Stata and SAS descending, the estimated logit
and SAS (the ascending option), the estimated coefficients are the same in both magnitude and
logit coefficients are the same in magnitude but sign. To estimate the odds of being at or beyond
are opposite in sign. Using Stata, the estimated a particular teaching experience level using
logit coefficient β = .759. Substituting it into the Stata, it is only necessary to reverse the sign of
logit form (4), we get logit [π(Y ≤ j | gender)] = the estimated cut points. The estimated logit
αj −(.759)*(gender) = αj -.759*(gender). OR = coefficient is β = .759. Exponentiating this
e(-.759) = .468, indicating that male teachers were results in e (.759) =2.136, indicating male teachers
.468 times the odds for female teachers of being are 2.136 times greater than female teachers to
at or below at any category, that is, female be at or beyond a particular category. In other
teachers were more likely than male teachers to words, female teachers are less likely than male
be at or below a particular teaching experience teachers to be at or beyond a certain category.
level. Using SAS ascending, the estimated logit
641
FITTING PO MODELS USING STATA SAS & SPSS
Table 2: Results of Proportional Odds Model with a Single Variable Using Stata, SAS
(Ascending and descending) and SPSS: A Comparison
SAS SAS
STATA SPSS
(Ascending) (Descending)
Gender
.759 -.759 .759 0
(Male = 2)
Gender
-.759
(Female = 1)
Using Stata and SPSS, when estimating and 2 = male, the estimated coefficient is only
the effects of predictors on the log odds of being displayed for the category with smaller value,
at or below a certain category of the outcome i.e., when gender = 1. The category with larger
variable, the sign before the coefficients are both value, gender = 2, is the reference category, and
minus rather than plus. In other words, the has an estimate of 0. If gender is coded with 1 =
effects of predictors are subtracted from the cut female and 0 = male, the estimated coefficient is
points or thresholds. SPSS PLUM labels the displayed for the case when gender = 0, and the
estimated logits for the predicator variables estimated coefficient for female (gender = 1) is
LOCATION. When the predicator variable is 0. Using SPSS PLUM, the estimated logit
continuous, the estimated logit coefficients are coefficient, β = -.759 for the case when female =
the same as those estimated by Stata OLOGIT in 1, and β = 0 for the case when male = 2.
both magnitude and sign. However, SPSS Substituting it into the logit form (7) results in
PLUM is different from Stata OLOGIT in this logit [π(Y ≤ j | gender)] = αj −(−.759)*(gender)
aspect: when the predictor variable is = αj +.759*(gender). By exponentiating, OR = e
categorical, for example gender, with 1 = female (.759)
= 2.136, indicating that female teachers are
642
LIU
643
FITTING PO MODELS USING STATA SAS & SPSS
Table 3: Feature Comparisons of the Ordinal Logistic Regression Analysis Using Stata, SAS and SPSS
STATA SAS SPSS
Model Specification
Cutpoints/Thresholds
Intercept
Test Hypotheses of Logit Coefficients
Maximum Likelihood Estimates
Odds Ratio
z-statistic or Wald Test for Parameter Estimate
Chi-square Statistic for Parameter Estimate
Confidence Interval for Parameter Estimate
Fit Statistics
Loglikelihood
Goodness-of-Fit Test
Pseudo R-Square
Test of PO Assumption
Omnibus Test of Assumption of Proportional Odds
Test of Assumption of Proportional Odds for Individual
Variables
Association of Predicted Probabilities
and Observed Responses
and the test of proportional odds assumption. exercise caution in interpreting the results.
The estimated coefficients and cut points In educational research, ordinal
(thresholds) were the same in magnitude but categorical data is frequently used and
may be reversed in sign. Comparing the results researchers need to understand and be familiar
using Stata and SPSS, it was found that although with the ordinal logistic regression models
the ordinal logit models are based on latent dealing with the internally ordinal outcome
continuous response variables for both packages, variables. In some situations, Ordinary Least
SPSS PLUM estimated the logit coefficient for Squares (OLS) techniques may be used for
the category with smaller value when the preliminary analysis of such data by treating the
predictor variable was categorical, and thus the ordinal scale variable as continuous. However,
estimated thresholds were different from those ignoring the discrete ordinal nature of the
estimated by Stata. Researchers should variable would cause the analysis lose some
understand the differences of parameterization useful information and could lead to misleading
of ordinal logistic models using Stata and other results. Therefore, it is crucial for researchers to
statistical packages. Researchers should pay use the most appropriate models to analyze
attention to the sign before the estimated logit ordinal categorical dependent variables. In
coefficients and the cut points in the model, and addition, the role of any statistical software
644
LIU
package is a tool for researchers. The choice of Liu, X., O’Connell, A. A., & McCoach
software is the preference of researchers; it is D. B. (2006). The initial validation of teachers’
therefore not the purpose of the study to suggest perceptions of grading practices. Paper
which one is the best for ordinal logistic presented at the April 2006 Annual Conference
regression analysis. This demonstration clarifies of the American Educational Research
some of the issues that researchers must consider Association (AERA), San Francisco, CA.
in using different statistical packages when Long, J. S. (1997). Regression models
analyzing ordinal data. for categorical and limited dependent variables.
Thousand Oaks, CA: Sage.
References Long, J. S., & Freese, J. (2006).
Agresti, A. (2002). Categorical data Regression models for categorical dependent
analysis (2nd Ed.). NY: John Wiley & Sons. variables using Stata (2nd Ed.). Texas: Stata
Agresti, A. (1996). An introduction to Press.
categorical data analysis. NY: John Wiley & McCullagh, P. (1980). Regression
Sons. models for ordinal data (with discussion).
Allison, P. D. (1999). Logistic Journal of the Royal Statistical Society, B, 42,
regression using the SAS system: Theory and 109-142.
application. Cary, NC: SAS Institute, Inc. McCullagh, P., & Nelder, J. A. (1989).
Ananth, C. V., & Kleinbaum, D. G. Generalized linear models (2nd Ed.). London:
(1997). Regression models for ordinal Chapman and Hall.
responses: A review of methods and Menard, S. (1995). Applied logistic
applications. International Journal of regression analysis. Thousand Oaks, CA: Sage.
Epidemiology, 26, 1323-1333. O’Connell, A. A., (2000). Methods for
Armstrong, B. B., & Sloan, M. (1989). modeling ordinal outcome variables.
Ordinal regression models for epidemiological Measurement and Evaluation in Counseling and
data. American Journal of Epidemiology, Development, 33(3), 170-193.
129(1), 191-204. O’Connell, A. A. (2006). Logistic
Bender, R., & Benner, A. (2000). regression models for ordinal response
Calculating ordinal regression models in SAS variables. Thousand Oaks, CA: SAGE.
and S-Plus. Biometrical Journal, 42(6), 677- O’Connell, A. A., Liu, X., Zhao, J., &
699. Goldstein, J. (2006). Model Diagnostics for
Brant, R. (1990). Assessing proportional and partial proportional odds
proportionality in the proportional odds model models. Paper presented at the April 2006
for ordinal logistic regression. Biometrics, 46, Annual Conference of the American Educational
1171-1178. Research Association (AERA), San Francisco,
Clogg, C. C., & Shihadeh, E. S. (1994). CA.
Statistical models for ordinal variables. Powers, D. A., & Xie, Y. (2000).
Thousand Oaks, CA: Sage. Statistical models for categorical data analysis.
Greene, W. H. (2003). Econometric San Diego, CA: Academic Press.
analysis (5th Ed.). Upper Saddle River, NJ: Wooldridge, J. M. (2001). Econometric
Prentice Hall. analysis of cross section and panel data.
Hosmer, D. W., & Lemeshow, S. Cambridge, MA: The MIT Press.
(2000). Applied logistic regression (2nd Ed.).
NY: John Wiley & Sons.
Liu, X. (2004). A validation of teachers’
perceptions of grading practices. Paper
presented at the October Annual Conference of
the Northeastern Educational Research
Association (NERA), Kerhonkson, NY.
645
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 646-658 1538 – 9472/09/$95.00
A Fortran 77 subroutine is provided for implementing the Gibbs sampling procedure to a multi-
unidimensional IRT model for binary item response data with the choice of uniform and normal prior
distributions for item parameters. In addition to posterior estimates of the model parameters and their
Monte Carlo standard errors, the algorithm also estimates the correlations between distinct latent traits.
The subroutine requires the user to have access to the IMSL library. The source code is available at
http://www.siuc.edu/~epse1/sheng/Fortran/MUIRT/GSMU2.FOR. An executable file is also provided for
download at http://www.siuc.edu/~epse1/sheng/Fortran/MUIRT/EXAMPLE.zip to demonstrate the
implementation of the algorithm on simulated data.
Key words: multi-unidimensional IRT model, two-parameter normal ogive model, MCMC, Gibbs
sampling, Fortran.
646
SHENG & HEADRICK
chain Monte Carlo (MCMC; e.g., Chib & unconstrained covariance matrix Σ* is introduced
Greenberg, 1995) techniques are used to (Lee, 1995), where Σ = σ ij
*
, so that the
summarize the posterior distributions that arise m×m
in the context of the Bayesian prior-posterior correlation matrix Σ can be easily transformed
framework (Carlin & Louis, 2000; Chib & σ st
Greenberg, 1995; Gelfand & Smith, 1990; from Σ* using ρ st = (s ≠ t). A non-
σ sσ t
Gelman, Carlin, Stern, & Rubin, 2003; Tanner &
Wong, 1987). Lee (1995) applied Gibbs informative prior is assumed for Σ* so that
m+1
sampling (Casella & George, 1992; Gelfand & −
Smith, 1990; Geman & Geman, 1984), an p(Σ* ) ∝ Σ* 2
. Hence, with prior
MCMC algorithm, to the 2PNO multi- distributions assumed for ξ vj , where
unidimensional model and illustrated the model
parameterization by adopting non-informative ξ vj = (α vj , γ vj ) ' , the joint posterior distribution
priors for item parameters. for (θ, ξ, Z, Σ* ) is
Due to the reasons that informative
priors are desirable in some applications in the
Bayesian framework, and MCMC is p (θ, ξ, Z, Σ* |y ) ∝
(2)
computational demanding (see Sheng & f (y | Z) p (Z | θ, ξ ) p (ξ ) p (θ | Σ) p ( Σ* ).
Headrick, 2007, for a description of the
problems), this study focuses on using Fortran, where f ( y | Z ) is the likelihood function.
the fastest programming language for numerical
computing (Brainerd, 2003) to implement the With non-informative priors for α vj
procedure. In particular, the paper provides a and γ vj so that α vj > 0 and p (γ vj ) ∝ 1 , the full
Fortran subroutine that obtains the posterior
estimates and Monte Carlo standard errors of conditional distributions of Zvij, θi , ξ vj and Σ*
estimates for the item and person parameters in can be derived in closed forms as follows:
the 2PNO multi-unidimensional IRT model, as
well as the posterior estimates of the correlations N(0,∞) (αvjθvi − γ vj ,1), if yvij = 1
between the distinct latent traits. The subroutine Zvij | • ~ ; (3)
allows the user to specify non-informative and N(−∞,0) (αvjθvi − γ vj ,1), if yvij = 0
informative priors for item parameters.
θi | • ~ Nm ((A ' A + Σ)−1 A ' B,(A ' A + Σ)−1 ), (4)
Methodology
The Gibbs Sampling Procedure α1 0 0
To implement Gibbs sampling to the
0 α2 0
2PNO multi-unidimensional IRT model defined where A ( K ×m ) = and
in (1), a latent continuous random variable Z is
introduced so that Zvij~ N( α vjθ vi − γ vj , 1) 0 0 αm
(Albert, 1992; Lee, 1995; Tanner & Wong,
1987). Next, denote each person’s latent traits Z1i + γ1
for all items as θi = (θ1i ,..., θ mi ) , which is
' Z + γ2
B ( K ×1) = 2i
v
(
, α = α v1 ,..., α v , kv ' ,)
assumed to have a multivariate normal (MVN)
distribution, θi ~ N m (0, Σ) , where Σ is a Z mi + γ m
correlation matrix, and ρ st is the correlation
between θsi and θti, s ≠ t, on the off diagonals. It ( ) (
Z vi = Z vi1 ,..., Z nikv ' , γ v = γ v1 ,..., γ v ,kv ' ;)
may be noted that the unidimensional IRT model
is a special case of the multi-unidimensional ξvj | • ~ N2 ((xv ' xv )−1xv ' Zvj ,(xv ' xv )−1)I (αvj > 0),
model where ρ st = 1 for all s, t. Then, an (5)
647
GIBBS SAMPLING FOR 2PNO MULTI-UNIDIMENSIONAL ITEM RESPONSE MODELS
648
SHENG & HEADRICK
649
GIBBS SAMPLING FOR 2PNO MULTI-UNIDIMENSIONAL ITEM RESPONSE MODELS
Table 1: Posterior Estimates and Monte Carlo Standard Errors of Estimates (MCSEs) for
α v with Uniform and Normal Priors
ρ = .2 ρ = .5 ρ = .8
Uniform Normal Uniform Normal Uniform Normal
Estimate Estimate Estimate Estimate Estimate Estimate
Parameters (MCSE) (MCSE) (MCSE) (MCSE) (MCSE) (MCSE)
α1
0.0838 0.0828 0.0869 0.0846 0.0830 0.0847
0.0966
(.0013) (.0011) (.0007) (.0012) (.0013) (.0007)
0.0675 0.0660 0.0731 0.0740 0.0657 0.0689
0.0971
(.0010) (.0013) (.0008) (.0010) (.0014) (.0012)
0.4698 0.4704 0.4748 0.4707 0.4829 0.4797
0.4589
(.0035) (.0026) (.0028) (.0021) (.0021) (.0021)
0.8556 0.8531 0.8804 0.8753 0.8937 0.8928
0.9532
(.0039) (.0069) (.0054) (.0058) (.0063) (.0045)
0.0510 0.0502 0.0552 0.0550 0.0589 0.0577
0.0771
(.0009) (.0005) (.0013) (.0008) (.0007) (.0008)
0.4900 0.4895 0.4855 0.4864 0.4659 0.4649
0.4891
(.0020) (.0024) (.0029) (.0012) (.0017) (.0017)
1.0401 1.0348 1.0180 1.0120 0.9983 0.9930
0.8599
(.0185) (.0114) (.0080) (.0069) (.0057) (.0061)
0.9381 0.9327 0.9477 0.9408 0.9628 0.9479
0.9427
(.0075) (.0024) (.0085) (.0088) (.0033) (.0075)
α2
0.3013 0.2973 0.2654 0.2685 0.2348 0.2358
0.2727
(.0010) (.0026) (.0006) (.0014) (.0016) (.0013)
0.7279 0.7251 0.6354 0.6346 0.7188 0.7142
0.6532
(.0051) (.0061) (.0028) (.0020) (.0042) (.0028)
0.1231 0.1226 0.1528 0.1527 0.1088 0.1108
0.1002
(.0010) (.0014) (.0008) (.0012) (.0012) (.0018)
0.0945 0.0965 0.1557 0.1535 0.1683 0.1670
0.2339
(.0014) (.0026) (.0021) (.0015) (.0020) (.0013)
0.8554 0.8552 0.8145 0.8184 0.9208 0.9149
0.9291
(.0155) (.0131) (.0042) (.0071) (.0039) (.0061)
0.8730 0.8575 0.9107 0.9001 0.9067 0.9055
0.8618
(.0128) (.0095) (.0060) (.0069) (.0034) (.0050)
0.0543 0.0518 0.0556 0.0570 0.0463 0.0464
0.0908
(.0006) (.0016) (.0005) (.0007) (.0010) (.0007)
0.2003 0.1967 0.2045 0.2035 0.2339 0.2351
0.2083
(.0006) (.0021) (.0016) (.0010) (.0013) (.0007)
650
SHENG & HEADRICK
Table 2: Posterior Estimates and Monte Carlo Standard Errors of Estimates (MCSEs) for
γ v and ρ with Uniform and Normal Priors
ρ = .2 ρ = .5 ρ = .8
Uniform Normal Uniform Normal Uniform Normal
Estimate Estimate Estimate Estimate Estimate Estimate
Parameters (MCSE) (MCSE) (MCSE) (MCSE) (MCSE) (MCSE)
γ1
0.3457 0.3447 0.3467 0.3448 0.3450 0.3452
0.3629
(.0007) (.0003) (.0010) (.0005) (.0004) (.0005)
-0.8881 -0.8875 -0.8891 -0.8885 -0.8875 -0.8865
-0.9010
(.0003) (.0002) (.0006) (.0006) (.0005) (.0003)
-0.9288 -0.9277 -0.9288 -0.9270 -0.9317 -0.9310
-0.9339
(.0017) (.0017) (.0018) (.0012) (.0015) (.0011)
-0.3976 -0.3983 -0.4035 -0.4012 -0.4059 -0.4062
-0.3978
(.0023) (.0017) (.0018) (.0016) (.0020) (.0018)
0.4077 0.4076 0.4085 0.4072 0.4073 0.4066
0.3987
(.0003) (.0008) (.0006) (.0006) (.0002) (.0007)
0.1679 0.1681 0.1675 0.1666 0.1665 0.1669
0.1654
(.0003) (.0005) (.0009) (.0007) (.0010) (.0008)
-0.8302 -0.8294 -0.8232 -0.8186 -0.8122 -0.8091
-0.8108
(.0082) (.0062) (.0032) (.0039) (.0030) (.0030)
-0.7091 -0.7064 -0.7145 -0.7102 -0.7186 -0.7140
-0.8012
(.0025) (.0019) (.0043) (.0043) (.0012) (.0048)
γ2
0.2902 0.2896 0.3122 0.3109 0.3037 0.3047
0.2452
(.0008) (.0007) (.0005) (.0002) (.0006) (.0005)
1.0954 1.0941 1.0476 1.0461 1.1095 1.1045
0.9792
(.0031) (.0032) (.0015) (.0024) (.0021) (.0016)
-0.0216 -0.0212 -0.0058 -0.0068 -0.0200 -0.0196
-0.0190
(.0006) (.0005) (.0006) (.0002) (.0005) (.0009)
0.9549 0.9536 0.9624 0.9616 0.9568 0.9538
0.8749
(.0005) (.0006) (.0008) (.0009) (.0014) (.0005)
-0.2139 -0.2143 -0.2049 -0.2068 -0.2250 -0.2256
-0.3119
(.0026) (.0013) (.0019) (.0011) (.0005) (.0011)
0.2902 0.2888 0.2781 0.2735 0.2777 0.2750
0.2005
(.0025) (.0024) (.0021) (.0019) (.0012) (.0022)
0.4658 0.4638 0.4514 0.4501 0.4550 0.4545
0.4626
(.0011) (.0004) (.0004) (.0002) (.0005) (.0012)
0.7528 0.7510 0.7485 0.7462 0.7738 0.7723
0.7184
(.0008) (.0007) (.0007) (.0007) (.0003) (.0013)
651
GIBBS SAMPLING FOR 2PNO MULTI-UNIDIMENSIONAL ITEM RESPONSE MODELS
Chib, S., & Greenberg, E. (1995). Lee, H. (1995). Markov chain Monte
Understanding the Metropolis-Hastings Carlo methods for estimating multidimensional
algorithm. The American Statistician, 49, 327- ability in item response analysis. Ph.D.
335. Dissertation, University of Missouri, Columbia,
Gelfand, A. E., & Smith, A. F. M. MO.
(1990). Sampling-based approaches to Patz, R. J., & Junker, B. W. (1999). A
calculating marginal densities. Journal of the straightforward approach to Markov chain
American Statistical Association, 85, 398-409. Monte Carlo methods for item response models.
Gelfand, A. E., Hills, S. E., Racine- Journal of Educational and Behavioral
Poon, A., & Smith, A. F. M. (1990). Illustration Statistics, 24, 146-178.
of Bayesian inference in normal data models Ripley, B. D. (1987). Stochastic
using Gibbs sampling. Journal of the American simulation. NY: Wiley.
Statistical Association, 85, 315-331. Sheng, Y., & Headrick, T. C. (2007). An
Geman, S., & Geman, D. (1984). algorithm for implementing Gibbs sampling for
Stochastic relaxation, Gibbs distributions, and 2PNO IRT models. Journal of Modern Applied
the Bayesian restoration of images. IEEE Trans. Statistical Methods, 6, 341-249.
Pattern Analysis and Machine Intelligence, 6, Sheng, Y., & Wikle, C. K. (2007).
721-741. Comparing multi-unidimensional and
Gelman, A., Carlin, J. B., Stern, H. S., & unidimensional IRT Models. Educational &
Rubin, D. B. (2003). Bayesian data analysis. Psychological Measurement, 67, 899-919.
Boca Raton: Chapman & Hall/CRC. Tanner, M. A., & Wong, W. H. (1987).
Hoijtink, H., & Molenaar, I. W. (1997). The calculation of posterior distribution by data
A multidimensional item response model: augmentation (with discussion). Journal of the
Constrained latent class analysis using posterior American Statistical Association, 82, 528-550.
predictive checks. Psychometrika, 62, 171-189.
652
SHENG & HEADRICK
653
GIBBS SAMPLING FOR 2PNO MULTI-UNIDIMENSIONAL ITEM RESPONSE MODELS
C*************************************************************************
C Update samples for Z from its normal posterior distributions
C*************************************************************************
ZLP = MATMUL(TH, TRANSPOSE(AA))
DO 60 I = 1, N
DO 60 J = 1, K
LP = ZLP(I, J) - G(J)
BB = ANORDF(0.0 - LP)
CALL RNUN(1, U)
TMP = BB*(1-Y(I, J)) + (1-BB)*Y(I, J))*U + BB*Y(I, J)
Z(I, J) = DNORIN(TMP) + LP
60 CONTINUE
C*************************************************************************
C Update samples for theta from their MVN posterior distributions
C*************************************************************************
C*************************************************************************
C Call the matrix inversion routine.
C Invert matrix SIGMA with the inverse stored in RSIG
C*************************************************************************
CALL MIGS(SIGMA, M, RSIG, INDX)
PVAR1 = RSIG + MATMUL(TRANSPOSE(AA), AA)
C*************************************************************************
C Call the matrix inversion routine to invert matrix PVAR1 with the
C inverse stored in PVAR
C*************************************************************************
CALL MIGS(PVAR1, M, PVAR, INDX)
DO 70 I = 1, N
DO 80 J = 1, K
BA(J) = Z(I, J) + G(J)
80 CONTINUE
PMEAN1 = MATMUL(TRANSPOSE(AA), BA)
PMEAN = MATMUL(PVAR, PMEAN1)
C****************************************************************************
C Call the Cholesky factorization routine. Compute the Cholesky factorization
C of the symmetric definite matrix PVAR and store the C result in RSIG
C****************************************************************************
CALL CHFAC (M, PVAR, M, 0.00001, IRANK, RSIG, M)
C*************************************************************************
C Generate a random sample of theta(v) from MVN dist by calling RNMVN
C*************************************************************************
CALL RNMVN (1, M, RSIG, M, RTH, 1)
DO 90 J = 1, M
TH(I, J) = RTH(J) + PMEAN(J)
THV(I, J, COUNT) = TH(I, J)
90 CONTINUE
70 CONTINUE
C*************************************************************************
C Update samples for item parameters, a(v) and g(v) from their MVN
C posterior distributions
C*************************************************************************
JJ = 0
DO 100 J = 1, M
DO 110 I = 1, N
X(I, 1) = TH(I, J)
X(I, 2) = -1
110 CONTINUE
IF (UNIF == 0) THEN
654
SHENG & HEADRICK
C*************************************************************************
C Specify the prior means and variances for a(v) and g(v)
C*************************************************************************
AMU = 0.0
GMU = 0.0
AVAR = 1.0
GVAR = 1.0
C*************************************************************************
C Put them in vector or matrix format
C*************************************************************************
AGMU(1, 1) = AMU
AGMU(2, 1) = GMU
AGVAR(1, 1) = AVAR
AGVAR(2, 2) = GVAR
C*************************************************************************
C Call the matrix inversion routine to invert matrix AGVAR with the
C inverse stored in AGSIG
C*************************************************************************
CALL MIGS(AGVAR, 2, AGSIG, INDX)
XX = MATMUL(TRANSPOSE(X), X) + AGSIG
ELSE IF (UNIF == 1) THEN
XX = MATMUL(TRANSPOSE(X), X)
END IF
C*************************************************************************
C Call the matrix inversion routine to invert matrix XX with the
C inverse stored in IX
C*************************************************************************
CALL MIGS(XX, 2, IX, INDX)
C*************************************************************************
C Call the Cholesky factorization routine. Compute the Cholesky
C factorization of the symmetric definite matrix IX and store the
C result in AMAT
C*************************************************************************
CALL CHFAC (2, IX, 2, 0.00001, IRANK, AMAT, 2)
JM = 0
PRODA = 1.0
130 JM = JM + 1
JJ = JJ + 1
DO 120 I = 1, N
ZV(I, 1) = Z(I, JJ)
120 CONTINUE
IF (UNIF == 0) THEN
XZ = MATMUL(AGSIG, AGMU) + MATMUL(TRANSPOSE(X), ZV)
ELSE IF (UNIF == 1) THEN
XZ = MATMUL(TRANSPOSE(X), ZV)
END IF
BZ = MATMUL(IX, XZ)
A(JJ) = 0
DO WHILE (A(JJ) .LE. 0)
CALL RNMVN(1, 2, AMAT, 2, BETA, 1)
A(JJ) = BETA(1) + BZ(1, 1)
G(JJ) = BETA(2) + BZ(2, 1)
END DO
AV(COUNT, JJ) = A(JJ)
GV(COUNT, JJ) = G(JJ)
END IF
655
GIBBS SAMPLING FOR 2PNO MULTI-UNIDIMENSIONAL ITEM RESPONSE MODELS
PRODA = PRODA*A(JJ)
IF (JM .LT. MN(J)) THEN
GOTO 130
END IF
DO 135 I = 1, M
C(I, J) = 0.0
135 CONTINUE
C(J, J) = PRODA ** (1/RMN(J))
100 CONTINUE
C*************************************************************************
C Update samples for the hyperparameter, SIGMA
C*************************************************************************
CTH = MATMUL(C, TRANSPOSE(TH))
D = MATMUL(CTH, TRANSPOSE(CTH))
C*************************************************************************
C Call the subroutine to generate the unconstrained covariance matrix
C VAR from the inverse Wishart distribution
C*************************************************************************
CALL INVWISHRND(D, M, N, VAR)
DO 140 I = 1, M
DO 140 J = 1, M
SIGMA(I, J) = VAR(I, J)/SQRT(VAR(I, I))/SQRT(VAR(J, J))
RHO(I, J, COUNT) = SIGMA(I, J)
140 CONTINUE
30 CONTINUE
C*************************************************************************
C Calculate the batch means and se's for a(v), g(v), theta(v) and
C their correlations, and store them in ITEM, PER, and RPER
C*************************************************************************
BSIZE = (L - BURNIN)/BN
DO 150 J = 1, K
COUNT = BURNIN
TOT1 = 0.0
TOT2 = 0.0
SS1 = 0.0
SS2 = 0.0
DO 160 JJ = 1, BN
SUM1 = 0.0
SUM2 = 0.0
DO 170 I = 1, BSIZE
COUNT = COUNT + 1
SUM1 = SUM1 + AV(COUNT, J)
SUM2 = SUM2 + GV(COUNT, J)
170 CONTINUE
M1 = SUM1/FLOAT(BSIZE)
M2 = SUM2/FLOAT(BSIZE)
TOT1 = TOT1 + M1
TOT2 = TOT2 + M2
SS1 = SS1 + M1*M1
SS2 = SS2 + M2*M2
160 CONTINUE
ITEM(J, 1) = TOT1/FLOAT(BN)
ITEM(J, 2) = SQRT((SS1-(TOT1*TOT1/BN))/(BN-1))/SQRT(FLOAT(BN))
ITEM(J, 3) = TOT2/BN
ITEM(J, 4) = SQRT((SS2-(TOT2*TOT2/BN))/(BN-1))/SQRT(FLOAT(BN))
656
SHENG & HEADRICK
150 CONTINUE
JJ = 0
JK = 0
DO 180 IM = 1, M
JJ = JK + 1
JK = JJ + 1
DO 190 J = 1, N
COUNT = BURNIN
TOT3 = 0.0
SS3 = 0.0
DO 200 IB = 1, BN
SUM3 = 0.0
DO 210 I = 1, BSIZE
COUNT = COUNT + 1
SUM3 = SUM3 + THV(J, IM, COUNT)
210 CONTINUE
M3 = SUM3/FLOAT(BSIZE)
TOT3 = TOT3 + M3
SS3 = SS3 + M3*M3
200 CONTINUE
PER(J, JJ) = TOT3/FLOAT(BN)
PER(J, JK) = SQRT((SS3-(TOT3*TOT3/BN))/(BN-1))/SQRT(FLOAT(BN))
190 CONTINUE
180 CONTINUE
JK = 0
DO 220 J = 1, M
DO 220 IM = J + 1, M
JK = JK + 1
COUNT=BURNIN
TOT = 0.0
SS = 0.0
DO 230 JJ = 1, BN
SUM0 = 0.0
DO 240 I = 1, BSIZE
COUNT = COUNT + 1
SUM0 = SUM0 + RHO(J, IM, COUNT)
240 CONTINUE
M0 = SUM0/FLOAT(BSIZE)
TOT = TOT + M0
SS = SS + M0*M0
230 CONTINUE
RPER(JK, 1) = TOT/FLOAT(BN)
RPER(JK, 2) = SQRT((SS-(TOT*TOT/BN))/(BN-1))/SQRT(FLOAT(BN))
220 CONTINUE
RETURN
END
657
GIBBS SAMPLING FOR 2PNO MULTI-UNIDIMENSIONAL ITEM RESPONSE MODELS
DO 10 I = 1, V
DO 10 J = 1, P
CALL RNNOR (1, Z(I, J))
10 CONTINUE
ZZ = MATMUL(TRANSPOSE(Z), Z)
CALL MIGS(S, P, IS, INDX)
CALL CHFAC (P, IS, P, 0.00001, IRANK, A, P)
AZ = MATMUL(TRANSPOSE(A), ZZ)
W = MATMUL (AZ, A)
CALL MIGS(W, P, IW, INDX)
RETURN
END
658
Journal of Modern Applied Statistical Methods Copyright © 2009 JMASM, Inc.
November 2009, Vol. 8, No. 2, 659-669 1538 – 9472/09/$95.00
659
A FORTRAN PROGRAM FOR DOMINANCE ANALYSIS OF INDEPENDENT DATA
dominance variable dij is defined as: dij = sign(xi - The Fortran Program
xj), where xi represents any observation in the first The Fortran program for the independent
group, xj in the second. The dij simply represent groups d analysis applies the algorithm of the
the direction of differences between the xi scores above Equations (1), (2), (3), (4), and (5). The
and the xj scores: a score of +1 is assigned if xi > program is interactive, supplying prompts at
xj; a score of -1 is assigned if xi < xj; and a score of several points. Data can be either read from a file
0 is assigned if xi = xj. The d is an unbiased or input from the keyboard; if input from the
estimate of δ: keyboard, data will be stored in a file. In both
d = Σ Σ d ij / n1n 2 (1) cases, any number of experimental variables is
possible, but an analysis is conducted on only one
variable at a time. After input, data are sorted
whereas sd2, the unbiased sample estimate of σd2, within each group.
is obtained by The program calculates the statistical
inferences about δ, generating d and its variance,
n12 Σ(d i. − d) 2 + n 2 2 Σ(d. j − d) 2 − ΣΣ(d ij − d) 2 as well as the components of variance of d. The
sd 2 =
n1n 2 ( n1 − 1)( n 2 − 1) outputs include a CI for δ and the significance of d
(2) (a z-score), testing the null hypothesis. The
where di. is program also calculates the dominance variable
dij, and a dominance matrix for the variables
analyzed is generated as a part of the outputs
# ( xi > x j ) − # ( xi < x j ) when the data are no more than 75 cases.
d i. = (3)
n1 Otherwise, only the statistics and their
components are included in the outputs. In order
and similarly for d.j. To eliminate possible to compare the d method with the classical test
negative estimate of variance, (1 - d2)/(n1n2 - 1) methods, the program also performs the classical t
was used as the minimum allowable value for sd2. statistic for independent groups with Welch’s
An asymmetric CI for δ was shown to improve the adjustment of degrees of freedom. Table 1 shows
performance of d (Cliff, 1993; Feng & Cliff, an example of the output file the program
2004): generated when the sample size is 25 for both
groups.
d − d 3 ± Zα /2s d (1 − 2d 2 + d 4 + Zα /2 2s d 2 )
½
Conclusion
δ = 2 2 2 The ordinal method d does not involve excessive
1 − d + Zα /2 s d
elaboration and complicated statistical analyses.
Its concepts can be easily understood by non-
(4)
statisticians. However, popular statistical
software packages such as SAS and SPSS do not
where Zα/2 is the 1- α/2 normal deviate. When d is
allow for ordinal dominance analyses. This
1.0, sd reduces to zero, the upper bound for the CI
Fortran program (see the appendix for source
for δ is 1.0, and the lower bound is calculated by
codes) for independent groups d analysis is easy
to implement. Its outputs provide descriptive
(n min − Zα /2 2 ) information, not only the null hypothesis is
δ = (5)
( n min + Zα /22 ) tested, but also a CI is provided. In addition, a
dominance matrix is produced as a useful visual
aid to the test. A comparison of d with Welch’s
where nmin is the smaller of the two sample sizes.
t. also is provided. Furthermore, if the users have
When d equals -1.0, the solution is the negative of
access to the IMSL library, the current source
(5).
codes can be easily adapted and used in Monte
Carlo studies to evaluate the performance of d in
terms of Type I error rate, power, and CI
coverage.
660
FENG & CLIFF
661
A FORTRAN PROGRAM FOR DOMINANCE ANALYSIS OF INDEPENDENT DATA
662
FENG & CLIFF
INTEGER I,J,NV,NP,JQ,JC,JPLU,JG(2),NPER(2),GAP,IG,
& JORDER(2000,2),NDCOL(2000),NDROW(2000)
REAL YY,DB,SSROW,SSCOL,MINI,NUM,VARD,DEN,M1,M2,
& VARDROW,VARDCOL,VARDIJ,SD,UPPER,LOWER,SUM1,SUM2,MINN,
& SUMSQ1,SUMSQ2,VARDIFF,MDIFF,TEE,Y(2000,2),Z(2000,50)
REAL DEL,SQIJ,Q1,Q2,Q12
CHARACTER*1 ANS, PLUS(3),DFILE*18,SPLU(70),SSPLU*70,
& STR*45, OUTFILE*8
DATA PLUS(1),PLUS(2),PLUS(3)/'-','0','+'/
C****************************************************************************C
C Read data from a file, or input from the keyboard. C
C****************************************************************************C
WRITE(*,101)
101 FORMAT('This is inddelta.f for computing d statistics.',
& 3X,'It is copyright 1992, Norman Cliff. Comments and',
& 1X,'suggestions are solicitted.')
WRITE(*,102)
102 FORMAT('Type b to bypass instructions,any other letter to',
& 1X,'see them.')
READ(*,'(A1)') ANS
IF((ANS.EQ.'B').OR.(ANS.EQ.'b')) GOTO 80
WRITE(*,103)
103 FORMAT('Data can be either read form a file or input',
& 1X,'from the keyboard. If it is in a file,it must be cases',
& 1X,'by variabls, i.e., all the scores for the first case')
WRITE(*,104)
104 FORMAT(' or subject, all for the second,etc. If it is not',
& 1X,'arranged that way, type E for exit and go arrange it.')
READ (*,'(A1)') ANS
IF ((ANS.EQ.'E').OR.(ANS.EQ.'e')) GOTO 1500
80 WRITE(*,105)
105 FORMAT('Type f if it is in a file or k if you will enter',
& 1X,'it from the keyboard.')
READ(*,'(A1)')ANS
IF((ANS.EQ.'K').OR.(ANS.EQ.'k')) THEN
WRITE(*,111)
111 FORMAT('Data will be stored in a file. Give its full',
& 1X,'name and extension.')
READ(*,'(A18)') DFILE
WRITE(*,112)
112 FORMAT('Data must be entered casewise, that is, all the',
& 1X,'scores for one case or person, then all for the next,',1X,
& 'etc.. And we need to know the number of cases and variables.')
WRITE(*,113)
113 FORMAT('Group membership should be entered as a',
& 1X,'variable.')
WRITE(*,114)
114 FORMAT('Scores, or variables, within each case must be',
& 1X,'separated by a comma.')
WRITE(*,115)
115 FORMAT('No. of cases:')
READ(*,'(I3)') NP
WRITE(*,116)
116 FORMAT('No. of variables:')
READ(*,'(I3)') NV
OPEN(3,FILE=DFILE,STATUS='NEW')
WRITE(*,117)
117 FORMAT(1X,'Enter the scores for each case, including',
& 1X,'the grouping variable.')
DO 1 I=1,NP
WRITE(*,*) I
DO 2 J=1,NV
READ(*,*) Z(I,J)
663
A FORTRAN PROGRAM FOR DOMINANCE ANALYSIS OF INDEPENDENT DATA
2 CONTINUE
1 CONTINUE
WRITE(*,118)
118 FORMAT(1X,'The scores will be printed out on the screen',
& 1X,'for checking.')
DO 3 I=1,NP
WRITE(*,*) I
WRITE(*,*) (Z(I,J),J=1,NV)
WRITE(*,*)
3 CONTINUE
WRITE(*,119)
119 FORMAT('If there are any corrections, type the row,',
& 1X,'column, and the correct value. If not, type 0,0,0.')
276 READ(*,*) I,J,P
IF(I.EQ.0) GOTO 281
Z(I,J)=P
WRITE(*,120)
120 FORMAT(1X,'More? Type 0,0,0 , if not.')
GOTO 276
281 DO 29 I=1,NP
DO 30 J=1,NV
WRITE(1,*) Z(I,J)
30 CONTINUE
29 CONTINUE
CLOSE (3,STATUS='KEEP')
ELSE
IF((ANS.NE.'F').AND.(ANS.NE.'f')) THEN
GOTO 80
ELSE
WRITE(*,106)
106 FORMAT('Type name of file, including extention,',
& 1X,'also path if not in this directory.')
WRITE(*,107)
107 FORMAT('filename')
READ(*,'(A18)') DFILE
WRITE(*,108)
108 FORMAT('How many variables per case?')
READ(*,'(I2)') NV
WRITE(*,109)
109 FORMAT('How many cases?')
READ(*,'(I3)') NP
OPEN(4,FILE=DFILE,STATUS='OLD')
DO 31 I=1,NP
READ(4,*) (Z(I,J), J=1,NV)
31 CONTINUE
CLOSE(4,STATUS='KEEP')
ENDIF
ENDIF
282 WRITE(*,122)
122 FORMAT('Which variable no. is the grouping variable?')
READ(*,'(I1)') JC
WRITE(*,123)
123 FORMAT('Which variable no. is the experimental?')
READ(*,'(I1)') JQ
WRITE(*,124)
124 FORMAT('Which are two values of the grouping variable',1X,
& 'designate the groups to be compared?(e.g.:1 and 2)')
WRITE(*,125)
125 FORMAT(1X,' First group: ')
READ(*,'(I2)') JG(1)
WRITE(*,126)
126 FORMAT(1X,' Second group: ')
READ(*,'(I2)') JG(2)
664
FENG & CLIFF
NPER(1) = 1
NPER(2) = 1
WRITE(*,226)
226 FORMAT(1X,' Name of the output file is: ')
READ(*,'(A9)') OUTFILE
OPEN(8,FILE=OUTFILE)
C******************************************************************************C
C Sort data. C
C******************************************************************************C
DO 4 I=1,NP
IF(Z(I,JC).EQ.JG(1)) THEN
Y(NPER(1),1) = Z(I,JQ)
JORDER(NPER(1),1) = NPER(1)
NPER(1) = NPER(1)+1
ELSE IF (Z(I,JC).EQ.JG(2)) THEN
Y(NPER(2),2) = Z(I,JQ)
JORDER(NPER(2),2) = NPER(2)
NPER(2) = NPER(2)+1
ELSE
ENDIF
4 CONTINUE
NPER(1)=NPER(1)-1
NPER(2)=NPER(2)-1
WRITE(*,127) NPER(1),NPER(2)
127 FORMAT(1X,2I4)
DO 5 IG=1,2
DO 6 K=4,1,-1
GAP=2**K-K
DO 7 I=GAP,NPER(IG)
XX=Y(I,IG)
YY=JORDER(I,IG)
J=I-GAP
430 IF((J.LE.0).OR.(XX.GE.Y(J,IG))) GOTO 450
Y(J+GAP,IG)=Y(J,IG)
JORDER(J+GAP,IG)=JORDER(J,IG)
J=J-GAP
GOTO 430
450 Y(J+GAP,IG)=XX
JORDER(J+GAP,IG)=YY
7 CONTINUE
6 CONTINUE
5 CONTINUE
C******************************************************************************C
C Calculate dominance matrix (and print the matrix for small data set). C
C******************************************************************************C
SQIJ = 0.0
DEL= 0.0
WRITE(8,131)
131 FORMAT(1X,'This is an independent data analysis using',1X,
& 'inddelta.f.')
WRITE(8,*)
WRITE(8,132) DFILE
132 FORMAT(1X,'The data are from ',A18)
WRITE(8,*)
WRITE(8,133) NV-1
133 FORMAT(1X,'There are ',I3,' experimental variable(s).')
WRITE(8,*)
WRITE(8,134) JC
134 FORMAT(1X,'The grouping variable is ',I3)
WRITE(8,135) JQ
135 FORMAT(1X,'The experimental variable is ',I3)
WRITE(8,*)
DO 999 I = 1,NPER(1)
665
A FORTRAN PROGRAM FOR DOMINANCE ANALYSIS OF INDEPENDENT DATA
NDROW(I) = 0
999 CONTINUE
DO 998 I = 1,NPER(2)
NDCOL(I) = 0
998 CONTINUE
IF(NP.LE.75) THEN
WRITE(8,137) JG(1),JG(2)
137 FORMAT(1X,'Dominance matrix for group',I3,' vs. group',I3)
WRITE(8,*)
WRITE(8,138) JG(1),JG(2)
138 FORMAT(1X,'A + INDICATES ',I3,' HIGHER THAN',I3)
WRITE(8,*)
DO 9 I=1,NPER(1)
SSPLU = ' '
DO 10 J=1,NPER(2)
IF(Y(I,1).GT.Y(J,2)) THEN
IWON=1
ELSE IF(Y(I,1).LT.Y(J,2)) THEN
IWON=-1
ELSE
IWON=0
ENDIF
DEL = DEL +IWON
SQIJ = SQIJ+IWON*IWON
NDROW(I) = NDROW(I)+IWON
NDCOL(J) = NDCOL(J)+IWON
JPLU = IWON + 2
SPLU(J) = PLUS(JPLU)
SSPLU = SSPLU(1:J)//SPLU(J)
10 CONTINUE
WRITE(8,139) SSPLU
139 FORMAT(1X,A72)
9 CONTINUE
WRITE(8,*)
WRITE(8,*)
WRITE(8,*)
ELSE
DO 11 I=1,NPER(1)
DO 12 J=1,NPER(2)
IF(Y(I,1).GT.Y(J,2)) THEN
IWON=1
ELSE IF(Y(I,1).LT.Y(J,2)) THEN
IWON=-1
ELSE
IWON=0
ENDIF
DEL = DEL +IWON
SQIJ = SQIJ+IWON*IWON
NDROW(I) = NDROW(I)+IWON
NDCOL(J) = NDCOL(J)+IWON
12 CONTINUE
11 CONTINUE
ENDIF
C***************************************************************************C
C Calculate d and variance of d. C
C***************************************************************************C
DB = DEL/(NPER(1)*NPER(2))
WRITE(8,*)
WRITE(8,140)
140 FORMAT(1X,'***',2X,'d and its variance',2X,'***')
WRITE(8,141) JG(1),JG(2),DB
141 FORMAT(1X,'d for ',I3,' vs. ',I3,27X,' = ',F6.3)
666
FENG & CLIFF
C*************************************************************************C
C This part is for calculations of variance of d. C
C*************************************************************************C
SSROW=0.0
SSCOL=0.0
DO 14 I=1,NPER(1)
SSROW = SSROW + NDROW(I)**2
14 CONTINUE
DO 15 I=1,NPER(2)
SSCOL = SSCOL + NDCOL(I)**2
15 CONTINUE
MINI=(SQIJ/(NPER(1)*NPER(2))-DB**2)
& /(NPER(1)*NPER(2)-1)
NUM=SSROW-NPER(2)*DEL*DB + SSCOL - NPER(1)*DEL*DB
& -SQIJ + DEL*DB
DEN = NPER(1)*NPER(2)*(NPER(1) - 1)*(NPER(2)-1)
VARD = NUM/DEN
IF (VARD.LE. MINI) THEN
VARD = MINI
WRITE(8,142)
142 FORMAT(1X,'variance = minimum.Interpret with caution.')
ELSE
ENDIF
STR='variance for d'
WRITE(8,143) STR,VARD
143 FORMAT(1X,A45,' = ',F7.4)
VARDROW = (SSROW - NPER(2)*DEL*DB)
& /(NPER(2)**2*(NPER(1) - 1))
VARDCOL = (SSCOL - NPER(1)*DEL*DB)
& /(NPER(1)**2*(NPER(2) - 1))
VARDIJ = (SQIJ - DEL*DB)/(NPER(1)*NPER(2) - 1)
WRITE(8,*)
WRITE(8,144)
144 FORMAT(10X,'*** Components of the variance of d : ***')
STR='row di variance '
WRITE(8,145) STR,VARDROW
145 FORMAT(1X,A45,' = ',F7.4)
STR='column di variance '
WRITE(8,146) STR,VARDCOL
146 FORMAT(1X,A45,' = ',F7.4)
STR='variance of dij'
WRITE(8,147) STR,VARDIJ
147 FORMAT(1X,A45,' = ',F7.4)
SD = SQRT(VARD)
C*************************************************************************C
C Calculate the asymmetric 95% confidence interval for delta, C
C with further agjustment on C.I. when d = 1.0 or d = -1.0. C
C*************************************************************************C
IF (NPER(1).LE.NPER(2)) THEN
MINN = NPER(1)
ELSE
MINN = NPER(2)
ENDIF
IF (DB.EQ.1.0) THEN
UPPER = 1.0
LOWER = (MINN - 1.96**2)
& /(MINN + 1.96**2)
ELSE IF (DB.EQ.(-1.0)) THEN
LOWER = -1.0
UPPER = -(MINN - 1.96**2)
& /(MINN + 1.96**2)
667
A FORTRAN PROGRAM FOR DOMINANCE ANALYSIS OF INDEPENDENT DATA
ELSE
UPPER = (DB-DB**3 + 1.96*SD*SQRT(DB**4 - 2*DB**2 + 1
&+ 1.96*1.96*VARD)) / (1-DB**2 + 1.96*1.96*VARD)
IF (UPPER.GT.1) UPPER = 1
LOWER = (DB-DB**3 - 1.96*SD*SQRT(DB**4 - 2*DB**2 + 1
&+ 1.96*1.96*VARD)) / (1-DB**2 + 1.96*1.96*VARD)
IF (LOWER.LT.-1) LOWER = -1
ENDIF
WRITE(8,148)
148 FORMAT(10X,'** Inference : **')
STR='approximate .95 Confidence limits for d '
WRITE(8,149) STR
149 FORMAT(1X,A40)
WRITE(8,*)
IF(UPPER.GT.1) UPPER = 1
IF(LOWER.LT.-1) LOWER = -1
WRITE(8,150) LOWER,UPPER
150 FORMAT(20X,F6.3,' to ',F6.3)
WRITE(8,*)
STR='significance of d :'
WRITE(8,151) STR,DB/SD
151 FORMAT(1X,A45,' z = ',F7.4)
WRITE(8,*)
C******************************************************************************C
C This short section computes the ordinary unpooled t-test C
C with Welch’s adjusted df. C
C******************************************************************************C
SUM1 = 0.0
SUM2 = 0.0
SUMSQ1 = 0.0
SUMSQ2 = 0.0
DO 20 I = 1,NPER(1)
SUM1 = SUM1 + Y(I,1)
20 CONTINUE
M1 = SUM1/NPER(1)
DO 21 I =1,NPER(1)
SUMSQ1 = SUMSQ1 + (Y(I,1) - M1)**2
21 CONTINUE
DO 22 I =1,NPER(2)
SUM2 = SUM2 + Y(I,2)
22 CONTINUE
M2 = SUM2/NPER(2)
DO 23 I =1,NPER(2)
SUMSQ2 = SUMSQ2 + (Y(I,2)-M2)**2
23 CONTINUE
Q1 = SUMSQ1/(NPER(1)*(NPER(1)-1))
Q2 = SUMSQ2/(NPER(2)*(NPER(2)-1))
VARDIFF = Q1 + Q2
MDIFF = M1 - M2
TEE = MDIFF/SQRT(VARDIFF)
WRITE(8,152)
152 FORMAT(6X,'*** Independent t-test with unpooled variance :',
& 1X,' *** ')
STR='mean difference'
WRITE(8,153) STR,MDIFF
153 FORMAT(1X, A45,' = ',F7.4)
STR='standard deviations:'
WRITE(8,154) STR,SQRT((SUMSQ1/(NPER(1) - 1))),
& SQRT((SUMSQ2/(NPER(2)-1)))
154 FORMAT(1X,A47,'(1) ',F7.4,' (2) ',F7.4)
STR='standard error of mean difference'
668
FENG & CLIFF
WRITE(8,155) STR,SQRT(VARDIFF)
155 FORMAT(1X,A45,' = ',F7.4)
Q12=(Q1+Q2)**2/(Q1**2/(NPER(1)-1)+Q2**2/(NPER(2)-1))
WRITE(8,156) TEE,Q12
156 FORMAT(9X,'t = ',F8.4,9X,'adjusted df = ',F8.4)
WRITE(8,*)
WRITE(*,157)
157 FORMAT('Do you want the data to be printed on the',1X,
& 'printer, y/n?')
READ(*,'(A1)') ANS
IF((ANS.EQ.'Y').OR.(ANS.EQ.'y')) THEN
WRITE(8,*)
WRITE(8,158)
158 FORMAT(10X,'*** Ordered data for this variable : ***')
WRITE(8,159)
159 FORMAT(1X,'ORDER',5X,'SUBJ.',5X,'SCORE',5X,'ROWDOM')
WRITE(8,160) JG(1)
160 FORMAT(1X,'Group ',I3)
DO 25 I=1,NPER(1)
WRITE(8,161) I,JORDER(I,1),Y(I,1),NDROW(I)
161 FORMAT(1X,I5,5X,I5,5X,F6.3,5X,I3)
25 CONTINUE
WRITE(8,162) JG(2)
162 FORMAT(1X,'Group ',I3)
DO 26 I=1,NPER(2)
WRITE(8,163) I,JORDER(I,2),Y(I,2),NDCOL(I)
163 FORMAT(1X,I5,5X,I5,5X,F6.3,5X,I3)
26 CONTINUE
ELSE
ENDIF
C*****************************************************************************C
WRITE(8,*)
WRITE(8,*)
WRITE(8,*)
WRITE(*,164)
164 FORMAT('Do you want to do another analysis, y or n?')
READ(*,'(A1)') ANS
IF (ANS.EQ.'Y'.OR.ANS.EQ.'y') GOTO 282
1500 CLOSE(8,STATUS="KEEP")
END
669