Вы находитесь на странице: 1из 17

PSYCHOMETRIKA--VOL. 57, NO.

1, 89--105
MARCH 1992

STRUCTURAL EQUATION MODELS WITH CONTINUOUS AND


POLYTOMOUS VARIABLES

SIK-YUM LEE
WAI-YIN POON

THE CHINESE UNIVERSITY OF HONG KONG

P. M. BENTLER
UNIVERSITY OF CALIFORNIA, LOS ANGELES

A two-stage procedure is developed for analyzing structural equation models with contin-
uous and polytomous variables. At the first stage, the maximum likelihood estimates of the
thresholds, polychoric covariances and variances, and polyserial covariances are simulta-
neously obtained with the help of an appropriate transformation that significantly simplifies the
computation. An asymptotic covariance matrix of the estimates is also computed. At the second
stage, the parameters in the structural covariance model are obtained via the generalized least
squares approach. Basic statistical properties of the estimates are derived and some illustrative
examples and a small simulation study are reported.
Key words: structural equation models, polychoric correlations, polyserial correlations, max-
imum likelihood, generalized least squares, asymptotic properties.

1. Introduction

Structural equation modelling is a widely used multivariate technique in analyzing


behavioral data. Traditionally, it has been carried out in practice under the assumption
that the observed variables are continuous with a multivariate normal distribution.
Recently, asymptotic distribution-free methods have been developed (Bentler, 1983;
Browne, 1982, 1984), and robustness of the normal theory methods under violation of
the distributional assumption has been studied (Anderson, 1988; Browne, 1987; Satorra
& Bentler, 1990). Although the work cited above mainly concentrated on continuous
variables, some attention has been focused on polytomous variables because most
variables in practice are only observable in dichotomous or polytomous form. Bock and
Liberman (I970) considered the maximum likelihood method for a dichotomous factor
analysis with only one factor. This model was extended to a multiple factors model by
Christoffersson (1975) and Muth6n (1978). A review of recent developments in factor
analysis of categorical variables was given by Mislevy (1986). Recently, Lee, Poon, and
Bentler (1990b) developed a full maximum likelihood method to analyze general struc-
tural equation models for polytomous variables. In their approach, all parameters in the
model, including the thresholds and the covariance structure parameters, are estimated
simultaneously based on the full maximum likelihood approach. Hence, the final esti-
mates have optimal statistical properties, such as consistency, asymptotic efficiency
and normality. A three-stage estimation procedure that requires less computational

This research was supported in part by a research grant DA01070 from the U.S. Public Health Service.
We are indebted to several referees and the editor for very valuable comments and suggestions for improve-
ment of this paper. The computing assistance of King-Hong Leung and Man-Lai Tang is also gratefully
acknowledged.
Request for reprints should be sent to Sik-Yum Lee, Department of Statistics, The Chinese University
of Hong Kong, Shatin, N.T., HONG KONG.

0033 -3123/92/0300-88076500.75/0 89
© 1992 The Psychometric Society
90 PSYCHOMETRIKA

effort was proposed by Lee, Poon, and Bentler (1990a), and which used the partition
method (Poon & Lee, 1987) at the first two stages to obtain the pseudo maximum
likelihood estimates of the thresholds, polychoric variances and covariances. The as-
ymptotic properties of the pseudo maximum likelihood estimator were studied. In
particular, they proved that the asymptotic distribution of the estimator is normal, and
the expressions for computing the covariance matrix were also provided. The third
stage of the procedure was devoted to estimating the covariance structure parameters
by a generalized least squares approach with a correctly specified weight matrix. The
statistical properties of this three-stage procedure, such as the asymptotic distribution
of the estimates and the goodness-of-fit test statistic, were also provided. Therefore,
structural equation models with only polytomous variables can be analyzed rigorously
by results in Lee, Poon, and Bentler (1990a, 1990b).
More general situations that involve both continuous and polytomous variables are
more complicated. Olsson, Drasgow, and Dorans (1982) studied the estimation of a
bivariate polyserial correlation by means of maximum likelihood and a two-step ap-
proach. Poon and Lee (1987) extended the bivariate model to a multivariate polyserial
and polychoric model, where a vector of continuous variables and a vector of polyto-
mous variables were involved. Optimal maximum likelihood estimates of the thresh-
olds, polychoric and polyserial correlations were obtained. The contribution of these
papers is only in the estimation of correlations, since no structural model for the
covariance matrix is considered and estimated.
The main purpose of this paper is to develop a procedure to analyze the general
structural equation model for continuous and polytomous variables. A two-stage ap-
proach based on the rationale of maximum likelihood and generalized least squares is
developed. Basic asymptotic properties for statistical inference, such as the asymptotic
distribution of the estimator and the goodness-of-fit statistic are established rigorously.
The present development is a generalization of the results in Lee, Poon, and Bentler
(1990a, 1990b) to continuous and polytomous variables. It is also a generalization of
Olsson, Drasgow, and Dorans (1982), and Poon and Lee (1987) since general covariance
structures can be analyzed. More specifically, the following contributions are new in
this paper: (i) maximum likelihood estimation of the basic covariance matrix that is
based on new, more convenient identification conditions, (ii) the asymptotic covariance
matrix of this maximum likelihood estimate; and (iii) analysis of the structural model for
the covariance matrix. Property (ii) is essential to constructing the appropriate weight
matrix in the generalized least squares function at the second stage, and in achieving the
asymptotic statistical properties of the final generalized least squares analysis of the
structural model, such as the asymptotic chi-square test.
PRELIS and LISREL (J6reskog & S6rbom, 1988a; 1988b) are probably the most
widely used programs in analyzing structural equation models. It is claimed that struc-
tural equation models of continuous and polytomous variables can also be analyzed by
PRELIS and LISREL. The procedure is to first obtain the polychoric and polyserial
correlation estimates and a weight matrix by PRELIS; the structural parameters in the
correlation matrix are then estimated by the weighted least squares procedure in
LISREL VII (J6reskog & S6rbom, 1988b, chap. 7). Unfortunately, no statistical de-
velopment of the underlying procedure has been given, and hence, it is impossible to
provide a theoretical comparison of our procedure with theirs, except that our proce-
dure is not confined to the analysis of correlation structures. Practically, PRELIS and
LISREL require sample sizes larger than 200. This requirement can be relaxed by an
option, but as they pointed out, results produced in small samples may not be at all
reliable (J6reskog & S6rbom, 1988a, pp. 2-8). Our approach does not have this limi-
tation.
SIK-YUM LEE, WAI-YIN POON, AND P. M. BENTLER 91

Muth6n (1984) proposed a three-stage estimation method that has become an im-
portant part of the program LISCOMP (Muth6n, 1987). His underlying model is more
general than ours in the sense that it allows nonnormal exogenous variables that are
measured without error. And as we will see, our second stage, which includes the
generalized least squares estimation of the structural parameters based on the estimates
of polychoric, polyserial variances and covariances, and an appropriate weight matrix,
is quite similar to his third stage. However, the methods for obtaining the estimates at
the earlier stage(s) are quite different. We use the optimal joint maximum likelihood
approach to estimate simultaneously all the parameters in the multivariate polychoric
and polyserial model. Thus, our estimator is efficient and its asymptotic properties
follow from the standard maximum likelihood theory. For instance, the asymptotic
distribution of the estimates is jointly asymptotic normal with the covariance matrix
obtained from the inverse of the information matrix. It should be emphasized that this
property is extremely important to achieve the statistical properties of the generalized
least squares estimation at the final stage. In Muth6n (1984), a nonstandard partition
two-step maximum likelihood approach was used. Hence, his estimates are not neces-
sarily jointly multivariate normal and efficient, although his approach is less computa-
tionally intensive.
The model and the main steps of our procedure are developed in section 2. The first
stage estimates are obtained by the iterative Fletcher-Powell algorithm, while the sec-
ond stage estimates of the structural parameters are obtained by the iterative Gauss-
Newton algorithm. Some illustrative examples and a simulation study are reported in
section 3, and a concluding discussion is presented in section 4. Certain technical
details are given in the Appendix.

2. Analysis of the Model

Letting X and Y be continuous random vectors of dimensionalities r and s, re-


spectively, it is assumed that the joint distribution of (X', Y')' is multivariate normal
with zero mean vector and covariance matrix ~ = X(0), where

[Xxx Xxy ]
: t~y x ~yy ( 1)

with ?~xx, ~ x y , and ~yy being the covariance matrices of X, (X, Y), and Y, respectively.
The covariance structure ~ is a matrix function of the q by I parameter vector 0.
Examples of ~ include the factor analysis model, the EQS model (Bentler, 1989), and
the LISREL model (J6reskog & S6rbom, 1988b). If the continuous measurements
defined by X and Y are observable, the estimation of 0 can be carried out via standard
theory and computer programs, such as EQS and LISREL. In this paper, we are
concerned with the situation where exact measurements of Y are not available, and the
information it contains is given only by a observable polytomous vector Z whose
relationship with Y is specified as

Zi = k(i) if a i,k(i) <- Y i < a i,k(i) + J, (2)

for i = I , 2 , . . . , s , k ( i ) = 1, 2 , . . . , r e ( i ) , and where oti,k(i) is the threshold


parameter with a i , l = - ~ and Oti,rn(i)+ 1 oo Thus, the basic set of unknown param-
=

eters in the model consists of the covariance structural parameters 0 and threshold
parameters oti,k(i) , i = 1 . . . . , s , k ( i ) = 2 , . . . , r e ( i ) .
Suppose that we have a random sample from (X', Z')' with size N, consisting of
N (r + s) by 1 observed vectors of the form (Xz,i(z), z')' where the i-th component of z,
92 PSYCHOMETRIKA

z i , takes the value from 1, 2, . . . , m ( i ) and i(z) denotes the index of the particular
observation with Z = z. Thus, i(z) takes the value from the sequence 1, 2 . . . . , f(z)
with f(z) the total number of observations with Z = z, and
m(l) re(s)
E "" Z f(k(1) . . . . , k(s)) =N.
k(1) = 1 k(s) = I

A two-stage procedure for estimating the parameters will be developed. The first stage
is devoted to obtaining the maximum likelihood (ML) estimates of :~ and the vector
of thresholds a with no covariance structure in E, giving ~ and &. The second stage
estimates the structural parameter 0 via the generalized least squares (GLS) approach
based on ~ and an appropriate weight matrix that converges to the inverse of the
covariance matrix of ~ in probability.
At the first stage, let 13' be the vector of parameters consisting of the thresholds
and the elements in E. Let tryx. , j be thej-th row of Eyx, %j2 be the variance of Yj and
R y y . x be the correlation matrix of the conditional distribution of Y given X. It follows
from similar reasoning as in Poon and Lee (1987) that the negative logarithm of the
likelihood function is equal to
re(l) re(s) f(z)

E ... E E {log p(Xz,/(z)) + log p(ZlXz,/(z)), (3)


"'"k~) = 1 k(s) = 1 i(z) = 1

in which p(xz,i(z)) is the r-dimensional multivariate normal density function, and


p(ZlXz,i(z)) is given by

I ) ~ i(u)
p(zlx)=(-1) ~ ~ ..-~ ( - ~ ) .... ¢),(~, , ... , o¢*, Ryy.x), (4)
i(1) = 0 i(s) = 0

where
t -I 2 -1/2
Ol j~ = (Olj,v(j) -- Ilryx.jXxx X ) ( O ' y j - - Ilr~x.jXxxlGryx.j) , (5)

with v ( j ) = k ( j ) + i ( j ) , and dPs(a ] , . . . , a s*, Ryy. x) is equal to

(27r)-s/2]Ryy.x] -l/z exp --~y.x, dy* ... dyT. (6)


oe oc

The ML estimate of 13" is defined as the vector I~* that minimizes L([I*). From (5)

oty=Oryj( 1 -- n y ,x . j . ~, ~x-x1 'llyxd ,)1 / 2 - - (1 _ Tl~.~xl,qyx.j) 1/2' (7)

where ~l~xj = ~yxj/°ryj; hence, any scalar multiple of both a j v(j) and trrj will give the
same a j , and the same value for dp s ( a l, . . . , a s Ryy.x). Thus, the parameters and the
model is not identified without further restrictions. In Poon and Lee (1987), tTyj is
restricted to be 1.0. If we use this method to solve the identification problem, then in
the analysis of the covariance structures at the second stage, we have restrictions
o-~9(0) = 1, j = 1, . . . , s. In general, since Oryj(O) m a y be a nonlinear function of 0,
some nonlinear constraints have to be imposed in estimating 0. If these nonlinear
constraints cannot be handled by reparametrization, a more complicated sequential
minimization procedure is required to attain the constrained estimates; see, for exam-
SIK-YUM LEE, WAI-YIN POON, AND P. M. BENTLER 93

pie, Lee and Bentler (1980), and Lee (1981). Therefore, we consider another kind of
restriction here (see discussion below (1t)) to keep O-yj as a parameter and avoid
nonlinear constraints. Now, it should be pointed out that minimizing L(I~*) is extremely
complicated since it is a very involved function of many parameters. Hence, the fol-
lowing one-one onto transformations are considered to simplify the computation. Con-
sider ai, 1 = - - ~ , ai,m(i)+l = ~ ,
i ~ -1 x -1/2;
ai,v(i) = ai,v(i)(I - "qyx.i xx 'qyx.i) (8)
b i = -~xl'qyx •i(1 - lf]yx.i
' E -XXI ~l~yx.i), - 1 / 2 "~ (9)

(flij -- ~yx.i~xxt -l~yx.j)


rij [(1-- ' -1 _ , E-I 1/2, (10)
"qyx.i~xx "qyx.i)(1 ~qyx.j xx "qyx.j)]

where Pij is the correlation of Yi and Yj, for i, j = 1. . . . . s, i < j, and v(i) = 2, . . . ,
m(i). Then, clearly from (5) and (7),

a l,v(t) as,v(s) )
(I)s(aT, ... , Ors*, Ryy•x)=¢l)s +bix .... , --+ bsx; R , (II)
\ O'y I O'ys

where R is the correlation matrix with off-diagonal element rij. By similar reasoning as
before, ai,v(i)/Cryi is invariant up to a scalar multiplication by ai,v(i) and %i, and hence,
the parameters and the model are not identified. To solve this problem, we fix a i , 2 , i =
1, . . . , s, to some preassigned values• It should be noted that the number of restric-
tions in this method of identification is s, which is equal to the number of restrictions
for O-yi = 1, for i = 1 , . . . , s. Our numerical examples suggest that the choice of these
preassigned values will not greatly affect the estimation. At this stage, the new param-
eter vector is equal to (o x, I~), where

13=(b ', O'y,


' r ,' a')' , (12)

with ox, b, try, r, and a being the parameter vectors that define the unknown distinct
parameters in Exx, h i , the diagonal elements o f ~yy, rij , and ai.v(i) for i, j = I , . • . ,
s ; j > i, and v(i) = 2, . . . , m(i). From (4) and (11), the likelihood function (3) in terms
of this new set of parameters is expressed as

L(ctx, [ ~ ) = L l ( o x ) + L2(b, O'y, r, a), (13)


where

Li(crx) = 2 -l
{ r N log (27r) + N log tExxl + ~
N x'Ex~txi
} , (14)
i=1

re(l) re(s) f(z) f I 1


Lz(b, crr, r , a ) = - 2 "'" 2 2 ~log(-1) s Z "'' E
k(l)= 1 k(s)= 1 i(z)= 1 i(1)=0 i(s)=O

x (- l )"~' /(")q%\ %1 + b ~ x , . . . ,--+b~X;Ro.ys ' (15)

are functions that depend on fewer number of independent parameters. The ML esti-
mate, ~xx, of ~xx can be easily obtained by minimizing (14). The ML estimates b, Oy,
94 PSYCHOMETRIKA

~, and ~i are obtained by minimizing (15) via the iterative Fletcher-Powell algorithm.
Starting values for the unknown parameters can be chosen by the method given in Poon
and Lee (1987), using the values of z and cell frequencies. The Fletcher-Powell algo-
rithm requires the first derivatives of the likelihood function with respect to the param-
eters, and these are presented in the Appendix.
It is well known that under mild regularity conditions, the ML estimates O x , b, Or,
~, and ~i are consistent and asymptotically efficient. The asymptotic distribution of ~x
is multivariate normal with mean Ox and covariance matrix 2N-IK'r(Exx ® Exx)Kr,
where Kr is of order r 2 by r(r + 1)/2 with typical elements
Kr(ij, 9h) = 2-1(~i96jh + 8ih6jg), i <- r, k < r, 9 < h <_r,
and 6ja represents the Kronecker delta; see, for example, Browne (1974). Moreover the
joint asymptotic distribution of (b', O~, if, h')' is also multivariate normal with mean
(b', o y , r', a')' and covariance matrix 1-1 , where
(0L2 OLS.~
I:E~- 013J"

Let I* be the matrix with the typical (i, j)-th entry of I* given by

m(l) m(s) e(z) log p(zlx,(z),z) o log


I*(i,j)= ~'~ "'" ~ ~ [ Ofli ~j -j. (16)
k(I)= 1 k(s)= 1 i(z)= 1

It can be shown (proof available from authors) from standard asymptotic theory that I*
converges to I in probability, and thus, in practice, we use I* instead of I. The expres-
sions for 0 log p(ztxi(z),z)/Ofl) will be derived and presented in the Appendix. From (13),
(14), and (15), the joint asymptotic distribution of (Ox, [3) is normal with mean (ox, I~)
and covariance matrix

The ML estimates (il, Oy, O, &) of 01, ~ry, p, ~) can be obtained by the inverse
transformations of (8) to (10). Furthermore, by a one-to-one transformation that con-
verts correlations to covariances and vice versa, the ML estimates ~, vx, a n d £yy of ~ y x
and ~vy can be obtained from (il, Oy 0). Thus, the ML estimate I~ of E, composed of
1~,xx, 1~,yx a n d ~yy, can be obtained. By the delta theorem, the asymptotic covariance
matrix of ~; is given by
H = Jl)J', (18)

where J is the appropriate Jacobian matrix corresponding to the transformation of (Ox,


~, Oy, f-, ~) to ($xx, $yx, ~,yy). Expressions for components of J can be derived via
some matrix calculus methods (see, for example, Bentler & Lee, I978, or McDonald &
Swaminathan, 1973). To save space, these expressions are not presented. It should be
pointed out that H is no longer a diagonal block matrix since Exx is involved in the
transformation defined by (8) to (10).
At the second stage, the parameter vector 0 in the covariance structure E(0) is
estimated by minimizing the GLS function
1
Q(O) = ~ (0 - o(O))'W-~(O - o'(0)), (19)
SIK-YUM LEE, WAI-YIN POON, AND P. M. BENTLER 95

where tr is the vector defined by the lower symmetric part of 1£ and W is an appropriate
weight matrix that converges to H in probability. Let t) be the vector that minimizes
Q(0). It can be shown (proof available from authors) by similar arguments as in Fer-
guson (1958) that (i) 6 is consistent, (ii) the asymptotic distribution of 0 is normal with
mean vector 0 and covariance matrix {(0(r/00)'W-I (0Gr/00)}-l, and (iii) the asymptotic
distribution of 2Q(I}) is chi-squared with degrees of freedom ( s + r ) ( s + r + 1)/2 -
q, where q is the dimensionality of 0. Basic statistical inference for structural equation
models, such as goodness-of-fit test of the models, tests of the null hypothesis con-
cerning 0, and so forth, can be performed via (ii) and (iii). Computationally, t) can be
obtained efficiently by the Gauss-Newton algorithm (see, for example, Lee & Jennrich,
I979). This iterative algorithm has nice features such as (a) being robust to starting
values, (b) requiring only the first derivative of the objective function, and (c) producing
the asymptotic covariance matrix of 0 at the last iteration.

3. Examples

Based on the theory developed in section 2, computer programs written in


FORTRAN IV with double precision have been implemented to obtain the two-stage
estimates. One of the main computations is to evaluate the multivariate normal distri-
bution function and this is done via the subroutine developed by Schervish (1984). In
our illustrative examples, the confirmatory factor analysis model
= AHA' + ~ ,
was used, where A is the factor loading matrix, II and xt, are covariance matrices of the
factors and error measurements, respectively.
To demonstrate the theory with a real example, two continuous variables (speed of
right and left lateral run of 10 meters), and two polytomous variables (competition
attitude items I and II), with three categories were selected from a study of relationship
of physical abilities and some psychological traits of university students of Hong Kong
(see Fu & Chan, 1987). These attitude items were adapted from the 14-items Physical
Activity Questionnaire of Corbin and Lindsey (1985). The continuous scores are stan-
dardized and the attitude items are such that lower scores represent serious attitude and
conversely. The sample size is 65. A confirmatory factor analysis model with two
factors (speed and attitude) was used to analyze the data with the one objective of
assessing correlation between speed achievement and attitude. Based on the procedure
and theory developed in previous sections, estimates of the parameters in the model
were obtained. These estimates and their standard error estimates are presented in
Table 1. Note that the standard errors are quite large due to the small sample size. The
final function value 2Q(t)) is equal to 0.452, and based on a chi-square distribution with
one degree of freedom, the asymptotic goodness-of-fit test indicates acceptance of the
structural hypothesis. Hence, it may be concluded that the proposed confirmatory
factor analysis model fits the sample data. The estimated correlation between the fac-
tors is equal to -0.466, indicating that runners with serious attitude have better speed.
The data set has been reanalyzed using LISCOMP (Muthfn, 1987) and the results
are also reported in Table 1. Since only the diagonal elements of xIt that correspond to
the continuous observed variables are treated as parameters to be estimated by the
LISCOMP program, xItt33 and qt44 cannot be estimated directly. In this and the next
example, the estimates of these parameters are obtained using the formula, xItii = I -
A/22, i = 3, 4. From Table I, it is clear that the parameter estimates and the standard
errors estimates obtained from our approach and LISCOMP are quite different. The
value of its chi-squared goodness-of-fit statistic is 3.51 with one degree of freedom. This
96 PSYCHOMETRIKA

TABLE 1
A Factor Analytic Solution of the Speed-Attitude Data

A H Diag !

0.95(0.49) 0.0" 0.10(1.16)


Estimates 0.80(0.47) 0.0" 1.0" 0.20(0.84)

from our 0.0" 0.63(0.30) -0.54(0.34) 1.0" 0.54(0.34)


approach 0.0" 0.42(0.16) 0.73(0.52)

0.77(0.16) 0.0" 0.44(0.26)


Estimates 1.12(0.22) 0 . 0 ' 1.0" -0.10(0.58)
from 0.0" 0.99(0.55) -0.27(0.21) 1.0" 0.02(-)
LISCO]IP 0.0" 0.29(0.19) 0.92(-)

Note: * i n d i c a t e s t h a t t h e parameter was f i x e d , and standard e r r o r


estimates are in p a r e n t h e s e s .

value is also quite different from ours. These differences in parameter estimates and
goodness-of-fit statistic may be due to the small sample size of the data and/or the
different approaches in the analyses.
Since PRELIS and LISREL (J6reskog & Sfrbom, 1988a, 1988b) recommend a
sample size larger than 200, we do not analyze our real data set by these programs. To
compare our approach with LISCOMP, PRELIS and LISREL in a practical setting, the
following artificial example is used.
A sample of 200 random observations (x~, y~)' was simulated from a multivariate
normal distribution N[0, E]. The dimensions of x i and Yi are both equal to two. The
covariance matrix is taken to be
= ALIA' + ~ ,
where the true values of A, 11, and ~ are given by

A' [0.620 0.688 0" 0" ] [100; O05.1


= [ 0* 0* 0.620 0.688 ; II =

Diag • = (0.615, 0.526, 0.615, 0.526).


The off-diagonal elements of • and parameters with an asterisk are treated as fixed
parameters. Since LISREL only estimates correlation structures for this kind of prob-
lem, we purposely chose these population values so that Z is a correlation matrix. The
simulated random vector Yi was transformed to z i according to the following preas-
signed thresholds:
SIK-YUM LEE, WAI-YIN POON, AND P. M. BENTLER 97

czl = ( - 0 . 8 , 0.6), o~z = ( - 0 . 7 , 0.5),


which represents a fairly symmetric distribution of the polytomous variables. The free
parameters in the model are estimated by our method, LISCOMP, PRELIS and
LISREL. Moreover, for the sake of comparison, ML estimates based on the original
continuous observation (x}, y))' were also computed. The first stage estimates O of E
obtained by our method, LISCOMP and PRELIS are given by

O = (0.947, 0.351, 0,846, 0.181, 0.228, 0.926, 0.126, 0.211, 0.429, 0.940),

O = (0.947, 0.351, 0.846, 0.188, 0.235, 1.000, 0.127, 0.217, 0.460, 1.000),
and

O = (1.000, 0.392, 1.000, 0.I91, 0.253, 1.000, 0.129, 0.232, 0.460, 1.000),
respectively. The final standardized solutions of the structural parameter 0 are reported
in Table 2. We observe that most of our estimates, especially those corresponding to
the polytomous variables, are closer to the ML estimates and the true population values
than the estimates obtained from LISREL or LISCOMP. However, since the sample
size is pretty large, the differences are not huge. The value of the chi-squared goodness-
of-fit statistics obtained from our method is equal to 0.074, with one degree of freedom,
implying the expected result that the proposed model fits the sample data. The corre-
sponding values obtained from LISREL and LISCOMP are 0.130 and 0.247, respec-
tively. Therefore, these three approaches may give similar results for data sets with
large sample sizes.
Following the suggestion of a reviewer, a small simulation study was conducted to
compare the performance of our estimates with those from LISCOMP. Three sample
sizes N = 100,200, and 500 were considered. The dimensions ofxi and Yi were six and
two, respectively, and their covariance structure was taken to be the same as before.
The true values of A, II, and ~ , and the preassigned thresholds for transforming Yi to
z i are given by

A'=
I0.8
0*
0.8
0*
0.8
0*
0.8
0*
0*
0.8
0*
0.8
0*
0.8
0*
0.8 ' 1 17= [o°: ]
.
0.6
1.0" '

Diag ~ = (0.36, 0.36, 0.36, 0.36, 0.36, 0.36, 0.36, 0.36),

OLl = O~2 = ( - - 1 . 0 , 0.0),

respectively. As before, the LISCOMP estimate of xlt77 and W88 corresponding to the
polytomous variables were obtained from 1 - A i2 2, i = 7, 8, respectively. For each
sample size, 50 replications were completed (we had to discard a total of five divergent
or Heywood cases from LISCOMP and our program at N = 100). The means and the
root mean squares errors (RMS) between the estimates and the true population values
are reported in Table 3. It seems that, in general, all estimates from our method and
LISCOMP are accurate, though LISCOMP produces slightly better estimates at N =
100 and 200. For each sample size, the 50 goodness-of-fit test statistics values were
analyzed via the SPSS (1988) program to see whether they deviate from that expected
from a chi-squared distribution. The p-values of the Kolmogorov test (see, e.g., Afifi &
Azen, 1972, p. 50) based on our test statistics values for N = 100, 200, and 500 are
0.91, 0.47, and 0.28, respectively, while those based on the LISCOMP's values are
0.00, 0.08, and 0.28, respectively. These statistics verify that at sample sizes of 200 or
more, both methods yield goodness-of-fit statistics that are approximately chi-squared.
98 PSYCHOMETRIKA

TABLE 2

Comparison of E s t i m a t e s w i t h LISC0]IP, PI~LIS and LISI~L

A H Diag t

0.50 0.0" 0.75

E s t i m a t e s from 0.78 0.0" 1.0" 0.39


IlL approach
0.0" 0.67 0.40 1.0" 0.55

0.0' 0.68 0.55

0.49 0.0" 0.76

E s t i m a t e s from 0.80 0.0" 1.0" 0.36


o ur a p p r o a c h
0.0' 0.70 0.45 1.0" 0.51

0.0" 0.66 0.56

0.51 0.0" 0.74

E s t i m a t e s from 0.77 0.0" 1.0" 0.41


PKELIS
and 0.0" 0.74 0.46 1.0" 0.45
LISREL
0.0' 0.62 0.62

0.49 0.0" 0.70

E s t i m a t e s from 0.76 0.0" 1.0" 0.35


LISCOIP
0.0" 0.73 0.47 1.0" 0.47

0.0' 0.63 0.60

This remains true at N = 100 for our method. On the other hand, LISCOMP's good-
ness of fit statistic was clearly not )'2 distributed at N -- 100. The relative performance
of the statistics is visualized in the chi-squared P - P plots (Wilk & Gnanadesikan, 1968)
shown in Figures 1 and 2. The approximate straight-line shown for our method is
consistent with a ),2 distribution, but at N = 100, the corresponding plot for LISCOMP
SIK=YUM LEE, WAI-Y1NPOON, AND P. M. BENTLER 99

TABLE 3

Results of the Simulation Study

True Our Estimates LISCOMP's Estimates


Parameters N=IO0 N=200 N=500 N=IO0 N=200 N=500

All = 0.80 0.75 0.79 0.78 0.84 0.83 0.80


A21 = 0.80 0.75 0.78 0.79 0.84 0.82 0.80
/'31 = 0.80 0.76 0.79 0.80 0.83 0.84 0.81
/'41 = O. 80 O. 75 O. 79 O. 78 O. 82 O. 83 O. 80
/'52 = 0.80 0.81 0.80 0.80 0.81 0.81 0.81
/'62 = 0.80 0.81 0.78 0.81 0.82 0.80 0.81
A72 = 0.80 0.76 0.78 0.80 0.85 0.82 0.81
A82 = 0.80 0.75 0.80 0.80 0.83 0.82 0.82
1121 = 0.60 mean 0.65 0.62 0.61 0.66 0.62 0.61
t l l = 0.36 0.37 0.36 0.36 0.40 0.36 0.36
|22 = 0.36 0.36 0.36 0.36 0.39 0.37 0.36
t33 = 0.36 0.36 0.36 0.36 0.38 0.37 0.37
|44 = 0.36 0.37 0.36 0.36 0.40 0.38 0.37
|55 = 0.36 0.36 0.37 0.36 0.40 0.38 0.36
t66 = 0.36 0.37 0.36 0.36 0.39 0.36 0.37
t77 = 0.36 0.41 0.39 0.36 0.27 0.32 0.34
t88 = 0.36 0.43 0.36 0.36 0.31 0.32 0,33

A l l = 0.80 0.13 0,09 0.05 0.09 0.07 0.04


A21 = 0.80 0.11 0.08 0.05 0.08 0.06 0.04
A31 = 0.80 0.13 0.09 0.05 0.09 0.07 0.04
A41 = 0.80 0.12 0.09 0.06 0.08 0.07 0.04
A52 = 0.80 0.09 0.07 0.05 0.08 0.07 0.05
/`62 = 0.80 0.10 0.06 0.04 0.09 0.05 0.04
A72 = 0.80 0.09 0.06 0.03 0.08 0.04 0.03
A82 = 0.80 0.09 0.05 0.03 0.08 0.04 0.03
1121 = 0.60 Ells 0.09 0.06 0.04 0.12 0.06 0.05
tll = 0.36 0.08 0.05 0.03 0.09 0.05 0.03
t22 = 0.36 0.08 0.05 0.03 0.10 0.05 0.03
t33 = 0.36 0.08 0.05 0.03 0.09 0.06 0.03
|44 = 0.36 0.08 0.06 0.03 0.10 0.05 0.04
t55 = 0.36 0.08 0.06 0.04 0.09 0.05 0.04
|66 = 0.36 0.11 0.08 0.04 0.10 0.05 0.04
t77 = 0.36 0.13 0.09 0.05 0.14 0.07 0.06
|88 = 0.36 0.14 0.09 0.05 0.13 0.07 0.05
/.///
100 PSYCHOMETRIKA

I I I I I I I
S
A
M .9
P
L
E
CHI-SO. P-P PLOT: OUR METHOD
P .6
E N=IO0
R
C
E
N .3
T
I
L
E
S 0
I I I I I I I
0 .15 .3 .45 .6 .75 .9 THEORETIC CHI-SO PERCENTILES

I l I, I I I
S
A
M .9
P
L
E
CHI-SQ. P-P PLOT: OUR METHOD
P .6
E N=200

C .
E :
T
I
L
E
S 0-
i I I l I I
0 .15 .3 .45 .6 .75 .9 THEORETIC CHI-SO. PERCENTILES

I I i I I I
S
A
M .9.
P x
L
E
CHI-SQ. P-P PLOT: OUR METHOD
P .6
E N=500
R
C
E
N ,3
T
I
L
E
S 0
I ! I i I l
0 .15 .3 .45 .6 .75 .9 THEORETIC CHI-SQ. PERCENTILES

FI6URE I.
Chi-square P - P plots of the goodness-of-fit statistics values from our method. F r e q u e n c i e s a n d s y m b o l s used:
' . ' represents l or 2 points, and ' x ' represents 4 points.
SIK-YUM LEE, WAI-Y1N POON, AND P. M. BENTLER I01

I I I 1 I .....I,,,

CHI-SQ. P-P PLOT: LISCOMP


P
E N=IO0
R
C
E
N
T
I
L
E
S
o / " i"
I I I ! ! !
0 .15 .3 .45 .6 .75 .9 THEORETIC CHI-SO. PERCENTILES
I 1 I I I 1
S
A
M .9
P
L
E
CIII-SO. P-P PLOT: LISCOMP
.6
N=200

.3

I I | I I I
0 .15 .3 .45 .6 .75 .9 THEORETIC CHI-SO. PERCENTILES
I I I I I .... I

CIII-SO. P-p PLOT: LISCOMP


P .6
E N=500
R
C
E
N .3
T
I
L
E
S o/ 0 .15
~
.3
t
.45
t
.6
I
.75
...i..........
.9 THEDRETIC CHI-SQ. PERCENTILES
FIGURE 2.
Chi-square P-P plots of the goodness-of-fit statistics values from LISCOMP. Frequencies and symbols used:
' ' represents 1 or 2 points, and 'x' represents 4 points.
102 PSYCHOMETRIKA

deviates markedly from a straight line, indicating behavior substantially discrepant


from that expected under a theoretical X2 distribution. The analogous analyses for
normality have also been applied to the parameter estimates obtained by both methods.
The two smallest and largest p-values of the Kolomogorov test from all our parameters
estimates are 0.22, 0.37, 0.99, 0.99 (N = 100); 0.61, 0.63, 0.99, 0.99 (N = 200); and
0.30, 0.35, 0.94, 0.96 (N = 500), respectively, while the corresponding p-values from
LISCOMP's estimates are 0.I6, 0.23, 0.99, 0.99 (N = 100); 0.36, 0.51, 0.97, 0.99
(N = 200) and 0.06, 0.66, 0.99, 0.99 (N = 500), respectively. Hence, we may conclude
that the asymptotic behaviors of the estimates are reasonable. To save space, the
normal probability plots for these estimates are not presented.
The reason for the choice of the values used for our simulation study was to
minimize divergent and improper solutions (Heywood cases). A more thorough Monte
Carlo analysis that takes into account the effects of smaller and larger sample sizes,
different kinds of models, and various aspects of the Anderson and Gerbing (1984) and
Boomsma (1985) studies is needed to study the empirical behavior of the two methods
carefully.

4. Discussion

A two-stage estimation procedure for analyzing structural equation models with


continuous and polytomous variables has been developed in this paper. Basic asymp-
totic statistical properties of the estimates are derived that enable one to perform
various statistical inferences. The estimates are computed via the Fletcher-Powell al-
gorithm at the first stage and the Gauss-Newton algorithm at the second. Computa-
tionally, these are efficient iterative algorithms in providing the minima of the objective
functions. The basic computational burden is due to the evaluation of the distribution
function ~ s ( a l . . . . . a s ; R) at the first stage. Practically, if s -< 4, the method is still
feasible even if the dimension of X is relatively large.
The small simulations with 50 replications verified that our method produces ac-
ceptable results at various sample sizes, including N = I00, indicating that our as-
ymptotic statistical theory in practice can be applied in reasonably sized samples. Our
estimates performed quite well, though we were surprised to note that LISCOMP's
estimates were slightly better than ours at sample sizes of N = 100 and 200. On the
other hand, this advantage at N = I00 was offset by the fact that LISCOMP yielded
goodness of fit statistics that were not X2 distributed. The plot of Figure 2 shows that
LISCOMP tended to reject models more frequently than expected by theory. The
results of this simulation study agree with the fact that theoretically, both estimates are
consistent estimates; hence in practice, with moderate sample sizes, they are close to
the true value and hence close to each other. But, consistency and asymptotic distri-
butions of the estimates and goodness-of-fit statistics are different concepts. For some
estimation methods, the estimates can be consistent but not asymptotically normal and
the goodness-of-fit statistic may not be chi-squared. In our two-stage procedure, the
first stage is based on the optimal full ML approach, in which the thresholds, the
polychoric and polyserial correlations are estimated simultaneously. Hence, the stan-
dard ML theory can be applied to show that the joint asymptotic distribution of the
estimator is multivariate normal. This property is crucial in achieving the statistical
properties for inference of the model at the second stage GLS estimation; for example,
see Ferguson (1958).
SIK-YUM LEEs WAI-YIN POON, AND P. M. BENTLER 103
Appendix
We first derive the expression for 0 log p(zlx)/a/3 j required in computing I*(i, j)
and the gradient vector of the likelihood function (15). In terms of the parameters in I~,
1 1 ~ i(u)
p(z[x)=(-1) s ~ ... ~ (-1) TM Cbs(aT, . . . , a*; R), (A1)
i(1) = 0 i(s) = 0

where a *i = ai,v(i)Ory-1
i + b~x. From Johnson and Kotz (1972),

OdPs(a~, . . . , a * , R)
= ~b2(a*, a ~ , r i j ) a P s _ 2 ( . . . , a~ -ck, .,. ; R.ij), (A2)
Orij
where ~b2 is the standard bivariate normal distribution with correlation r/j, k ~ i ;~ j,
R.i j is the partial correlation matrix given Yi and Yj, and

(rik -- rjkrij)a * + ( r j k -- r i k r i j ) a 7
Ck = (1 -- ri~ ) 1/2 (A3)

Hence,
a log p(zlx)
Orii

1 1)u~l i(U) }
(-- ~b2(a*, aT, rzT)~s_2( .... a~ -- Ck, ... ; R.q)
i(s) = 0

"*° X
( - 1 ) '~'i(u)dps(aT, . . . , a*; R)}
i(D = o i(s) = o

(A4)
with k ~ i # j. To obtain the other derivatives, we first note that
OdPs(aT, . . . , a * ; R)
Oa* = q~(a*)*s- 1(. • • , aT*, . . . ; R.i), (AS)

where a~* = (a~ - r i j a * ) ( 1 - ri2 ) - 1 / 2 w i t h j ;~ i, ~b is the univariate standard normal


density function, and R.i is the partial correlation matrix given Y i . B y the chain rule of
calculus and the fact that b i is not involved in a j*, j " # i,
ddgs(a ~, . . . , a * , R)
= x~b(a*)@s - 1(-.- , aj**, . . . , R.i). (A6)
abi
Hence,

x ~
1
"'" X
1 (_1)~ ' i<u)rk(a*)dP, _,(... , aT*, . . . ; R.i)
}
0 log p(zlx ) /(1) = 0 i(s) = 0

Obz

"'" ~s(aT, ..., as*; R)


iO) ~ o i(s) = o
(A7)
104 PSYCHOMETRIKA

Similarly, it can be shown that


0 log p(zlx)
00"yi

--ai,v(i)O'y~2 { ~ "'" (-1) "~' i(u) dP(a *)tiPs - 1 ( . . . , a~*, . . . , R.i)}


i(1) = 0 i(s) = 0

1 s. }
(-1)'~"'(u) ~Ps(a~, ..., a*; R)
{/0)~-- 0 " " " i(s) = 0
(A8)
and that
1
oo* Z (-1) "~' i(U)o'~lq~(a*)dPs-l(..., a~7, . . . ; Ri)}
0 log p(zlx) i(1) = 0 i(s) = 0

Oai,d(i) 1 }
• .. ~] (-1),-' dPs(a~, . . . , a*; R) ,
i(1) = 0 i(s) = 0
(A9)
if d(i) - v(i); otherwise, the derivative is equal to zero. Since
re(l) re(s) f(z)
L2(b, O'y, r, a ) = - ~ "'" ~ ~ log p(zlXz,i(z) ),
k(1) = 1 k(s) = 1 i(z) = 1
we have
OL2(b, Cry, r, a) re(l) m!~ I f(z) O log ptZ!Xz,i(z) )
--- Z" 2 Z (A10)
k(1) = 1 k(s) = 1 i(z) = 1
with expressions a logp(zlXz,i(z))/Ol$ given by (A4), (A7), (A8), and (A9).

References
Afifi, A. A., & Azen, S. P. (1972). Statistical analysis: A computer oriented approach. New York: Academic
Press.
Anderson, J. C., & Gerbing (1984). The effect of sampling error on convergence, improper solutions, and
goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49, 155-
173.
Anderson, T. W. (1988). Multivariate linear relations. In T. Pukkila & S. Puntanen (Eds.), Proceedings of the
second international Tampere conference in statistics (pp. 9-30). Tampere, Finland: University of Tam-
pere.
Bentler, P. M. (1983). Some contributions to efficient statistics in structural models: Specification and esti-
mation of moment structures. Psychometrika, 48, 493-517.
Bentler, P. M. (1989). EQS: Structural equation program manual. Los Angeles: BMDP Statistical Software.
Bentler, P. M., & Lee, S.-Y. (1978). Matrix derivatives with chain rule and rules for simple, Hadamard, and
Kronecker products. Journal of Mathematical Psychology, 17, 255-262.
Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psy-
chometrika, 35, 179-197.
Boomsma, A. (1985). Nonconvergence, improper solutions and starting values in LISREL maximum likeli-
hood estimation. Psychometrika, 50, 229-242.
SIK-YUM LEE, WAI-YIN POON, AND P. M. BENTLER 105

Browne, M. W. (1974). Generalized least-square estimators in the analysis of covariance structures. South
African Statistical Journal, 8, 1-24.
Browne, M. W. (1982). Covariance structures. In D. M. Hawkins (Ed.), Topics in applied multivariate
analysis (pp 72--41). Cambridge: Cambridge University Press.
Browne, M. W. (1984). Asymptotically distribution-free methods for the analysis of covariance structures.
British Journal of Mathematical and Statistical Psychology, 37, 62-83.
Browne, M. W. (1987). Robustness of statistical inference in factor analysis and related models. Biornetrika,
74, 375-384.
Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5-32.
Corbin, B. C., & Lindsey, R. (1985). Concepts of physical fitness (5th ed.) Dubuque, Iowa: Wm. C. Brown.
Ferguson, T. S. (1958). A method of generating best asymptotically normal estimates with applications to the
estimation of bacterial densities. Annals of Mathematical Statistics, 29, 1046-1062.
Fu, F. H., & Chan K. M. (1987). Synopsis of sports medicine and sports science, Hong Kong: The Chinese
University of Hong Kong Press.
Johnson, N. L., & Kotz, S. (1972). Distributions in statistics: Continuous multivariate distributions. New
York: Wiley.
J6reskog, K. G., & Sfrbom, D. (1988a). PRELIS: A preprocessor ofLISREL. Mooresville, IN: Scientific
Software.
J6reskog, K. G., & Sfrbom, D. (1988b). LISREL 7: A guide to the program and application. Mo0resville, IN:
Scientific Software.
Lee, S.-Y. (1981). The multiplier method in constrained estimation of covariance structure models. Journal
of Statistical Computation and Simulation, 12, 247-257.
Lee, S.-Y., & Bentler, P. M. (1980). Some asymptotic properties of constrained generalized least squares
estimation in covariance structure models. South African Statistical Journal, 14, 121-136.
Lee, S.-Y., & Jennrich, R. I. (1979). A study of algorithms for covariance structure analysis with specific
comparisons using factor analysis. Psychometrika, 44, 99-113.
Lee, S.-Y., &Poon W.-Y. (1986). Maximum likelihood estimation of polyserial correlations. Psychometrika,
51, 113-121.
Lee, S.-Y., Pooh, W.-Y., & Bentler, P. M. (1990a). A three-stage estimation procedure for structural equa-
tion models with polytomous variables. Psychometrika, 55, 45-52.
Lee, S.-Y., Poon, W.-Y., & Bentler, P. M. (1990b). Full maximum likelihood analysis of structural equation
models with polytomous variables. Statistics and Probability Letters, 9, 91-97.
McDonald, R. P., & Swaminathan, H. (1973). A simple matrix calculus with application to multivariate
analysis. General Systems, 18, 37-54.
Mislevy, R. J. (1986). Recent developments in factor analysis of categorical variables. Journal of Educational
Statistics, 11, 3-31.
MutMn, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika, 43, 551-560.
Muthrn, B. (1984). A general structural equation model with dichotomous, ordered categorical, and contin-
uous latent variable indicators. Psychometrika, 49, 115-132.
Muthrn, B. (1987). LISCOMP: Analysis of linear structural equations using a comprehensive measurement
model. Mooresville, IN: Scientific Software.
Olsson, U., Drasgow, F., & Dorans, N. J. (1982). The polyserial correlation coefficient. Psychometrika, 47,
337-347.
Pooh, W.-Y., & Lee, S.-Y. (1987). Maximum likelihood estimation of multivariate polyserial and polychoric
correlation coefficients. Psychometrika, 52, 4(19--430.
Satorra, A., & Bentler, P. M. (1990). Model conditions for asymptotic robustness in analysis of linear
relations. Computational Statistics & Data Analysis, 10, 235-249.
Schervish, M. J. (1984). Multivariate normal probabilities with error bound. Applied Statistics, 33, 81-94.
SPSS (1988). SPSS-X user's guide (3rd ed.). Chicago, IL: Author.
Wilk, M. B., & Gnanadesikan, R. (1968). Probability plotting methods for the analysis of data. Biometrika,
55, 1-17.
Manuscript received 12/18/88
Final version received 4HI91

Вам также может понравиться