Вы находитесь на странице: 1из 8

Long Range Planning xxx (2014) 1–8

Contents lists available at ScienceDirect

Long Range Planning


journal homepage: http://www.elsevier.com/locate/lrp

PLS’ Janus Face – Response to Professor Rigdon’s ‘Rethinking


Partial Least Squares Modeling: In Praise of Simple Methods’
Theo K. Dijkstra

I respond to two of the theses put forward by Professor Rigdon: 1. PLS should sever every tie with factor modeling, and 2.
The fundamental problem of factor indeterminacy makes factor-based methods fundamentally unsuited to prediction-oriented
research. With respect to the first thesis my response is that adherence to its advice would waste a potentially very useful
method: there is a version of PLS that appears to be a valuable alternative to the mainstream approaches in factor modeling,
both linear and non-linear. The response to the second thesis is that one can generate predictions with factor models, and that all
models and techniques are ultimately instruments that transform data into predictions about behavior. Our task is to find
out which approach works best in which circumstances. I would not support an a priori exclusion of tools. In an addendum I attempt
to sketch the historical path-dependent development of Herman Wold’s PLS, in order to elucidate some of its characteristic
features.
Ó 2014 Elsevier Ltd. All rights reserved.

Introduction

In a thought provoking, carefully crafted and worded paper Professor Rigdon has put forward a number of strong
statements (Rigdon, 2013). They have as their core an unreserved choice for PLS as an extension of principal components
and canonical variables analysis. He strongly rejects PLS as a tool for estimating structural relationships between latent
variables, ‘factors’. Instead he calls for a full-blown development of path modeling in terms of linear composites, with
tailor-made inferential tools and proper means to assess measurement validity.
Of the statements put forward I choose two to comment on, the ones I feel most strongly about. In essence I wish to
maintain the double-sided nature of PLS that characterized it from the very start. In the family of a structural equations
estimators PLS, when properly adjusted, can be a valuable member as well.

PLS should sever all ties with factor-based SEM

Response: This will be a somewhat elaborate reply, but the topic seems to demand it. First I will indicate why application
of traditional PLS to factor models could indeed be problematic. Next I will specify a simple way to correct for the ‘dis-
tortions’ that the use of proxies as stand-ins for the latent variables generates. This covers linear as well as nonlinear
models. I will report some recent results that allow one, in my view at least, to be optimistic about the feasibility and
usefulness of the new approach. Its core is still PLS and therefore it maintains its numerical efficiency. Standard PLS
software produces essentially all the input one needs. The high complexity of alternative methods in mainstream latent
variable modeling (see for example Schumacker and Marcoulides 1998, or Kelava et al. 2011), makes this variation of PLS a
valuable option. In a nutshell: I contend that when Professor Rigdon’s advice is adhered to, we may waste a potentially very
useful tool.
Our starting point is Wold’s ‘basic design’, a system of structural equations between latent variables, each of which is
measured by a unique set of indicators (at least two indicators), and where all measurement errors are mutually independent,
and independent of all latent variables. We assume we have a random sample from a population with finite moments of
appropriate order. It is well-known that in this setting PLS, when applied to the population (probability limits), will yield
loadings that are too large in absolute value, and correlations between the latent variables that are too small in absolute value
(Dijkstra, 1981, 2010). Also, the squared multiple correlation coefficients, the R2’s, are too small (see Dijkstra, 2010). These

http://dx.doi.org/10.1016/j.lrp.2014.02.004
0024-6301/Ó 2014 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Dijkstra, T.K., PLS’ Janus Face – Response to Professor Rigdon’s ‘Rethinking Partial Least Squares
Modeling: In Praise of Simple Methods’, Long Range Planning (2014), http://dx.doi.org/10.1016/j.lrp.2014.02.004
2 T.K. Dijkstra / Long Range Planning xxx (2014) 1–8

properties are reflected as tendencies in finite samples. If we focus on the correlations between the latent variables, a typical
variable is denoted by hi, we have:

rij ¼ rij ,qi ,qj : (1)

Here rij is the correlation between the PLS proxies hi and hj for hi and hj; the correlation between the latter is rij. The quality
qi of the PLS-proxy hi for hi is by definition its positive correlation with hi, and qj is defined similarly. The quality depends on
the ‘mode’. With less than maximal qualities (and they can never equal one in regular situations) we have a ‘distorted’ picture
of the relationships between the latent variables. To give an extreme randomly generated example, suppose the true corre-
lation matrix between three latent variables h1, h2 and h3 equals:
2 3 2 3
1 r12 r13 1 :97 :69
4 r12 1 r23 5 ¼ 4 :97 1 :58 5 (2)
r13 r23 1 :69 :58 1

We assume here as elsewhere that the latent variables have zero mean and unit variance. Let

h3 ¼ b1 h1 þ b2 h2 þ z (3)
be the regression equation of interest, i.e. the coefficients b1 and b2 are chosen such that the implied residual z is uncorrelated
with h1 and h2. Every other choice for the b’s produces a residual whose average squared value is higher thanEðz Þ. So the b’s
2

must satisfy the so-called normal equations, 0 ¼ E(zh1) ¼ E((h3  b1h1  b2h2)h1) ¼ r13  b1  b2r12 and 0 ¼ E(zh2). In matrix
notation:
    
r13 1 r12 b1
¼ : (4)
r23 r12 1 b2

Solving for b2 yields:

r23  r12 r13


b2 ¼ (5)
1  r212

with an analogous expression for b1. For the values at hand we get (the numbers are rounded to two decimals)

h3 ¼ 2:16,h1 þ 1:51,h2 þ z and R2 ¼ :61: (6)

If qi ¼ .90 for i ¼ 1,2,3, PLS will yield for the correlation matrix between the proxies:
2 3 2 3
1 r12 r13 1 :79 :56
4 r12 1 r23 5 ¼ 4 :79 1 :47 5 (7)
r13 r23 1 :56 :47 1
and a regression produces

h3 ¼ :50,h1  :80,h2 þ residual and R2 ¼ :31: (8)

The coefficients are way too small in absolute value, one of them is of the wrong sign as well, and the R2 is close to half the
true size. Note that

r23  r12 r13 q21


b2 ¼ ,q2 q3 (9)
1  r212 q21 q22

Table 1
Range of PLS-values as a function of proxy-quality

q2 PLS range (bi) PLS range (R2)


.50 .50 [0,.33]
.75 .76 [0,.64]
.90 1.16 [0,.85]
.95 1.61 [0,.93]
1.00 4.40 [0,1]

Please cite this article in press as: Dijkstra, T.K., PLS’ Janus Face – Response to Professor Rigdon’s ‘Rethinking Partial Least Squares
Modeling: In Praise of Simple Methods’, Long Range Planning (2014), http://dx.doi.org/10.1016/j.lrp.2014.02.004
T.K. Dijkstra / Long Range Planning xxx (2014) 1–8 3

so, as is easily verified, the coefficient of the second proxy will be negative as long as q1 < .93 with the rij’s as specified. I
emphasize that the regression numbers as reported are not estimates, they are ‘population values’, around which sample
values will fluctuate. Clearly, the dramatic distortion of the true relationship is mainly due to the very low value of r12, close to
minus one.
A more systematic analysis can be obtained by varying across all possible and permissible values of the correlations1 r12,
r13 and r23. For each set of correlations we calculated the corresponding regression coefficients, bi, and the R2. For PLS we
assumed that all proxies have the same quality q (Table 1). We have the following table:
The last row gives the undistorted, true picture: what we would get with the latent variables instead of with the proxies; R2
is uniformly distributed in this case. The true R2 is always larger than the value for PLS; the true regression coefficients are in
absolute value larger than those of PLS in 94% of the cases for q2 ¼ .50, and in 92% of the cases for each of the other values of q2.
Irrespective of one’s interpretation of this table, I would maintain that there are grounds to try an old idea, first put forward
by me in print in 1981, but never tested until 2011 (see Dijkstra 1981, 2011). Depending on the PLS mode, I suggested to adjust
the correlation estimates using easy estimates of the quality of the proxies. For mode A we have a consistent estimator for qi
by the formula:
 
b
qi : ¼ b Ti w
w b i ,bc i (10)

b i is the estimated vector of weights for the ith proxy and


where w
2 31
2
b Ti ðSii  diagðSii ÞÞ w
w bi
bc i : ¼ 4    5 : (11)
b Ti w
w b iw b Ti  diag wb iwb Ti bi
w

The matrix Sii is the covariance/correlation matrix of the ith block of indicators. The use of diag means that only the off-
diagonal elements of the matrices are used. So the numerator within brackets is just Sasb w b i;a w
b i;b Sii;ab et cetera. We get
consistent estimates for the rij’s by dividing PLS’ estimates by b
q i and b
q j . Loadings can be estimated consistently byb b i (see
ci ,w
for example Dijkstra, 2011 or Dijkstra and Henseler 2012, and Dijkstra and Schermelleh-Engel 2012 for derivations and
additional details). It is worth pointing out that standard PLS-software yields all the required input.
We now have consistent estimators for the loadings and for the correlations between the latent variables, they are in fact
asymptotically normal as well. The same is true for the standard estimators for the structural relationships, and also for the
theoretical, structured correlation matrix of the vector of indicators. The latter fact opens up the possibility to test the overall
goodness-of-fit. See Dijkstra and Henseler (2012) for some encouraging results for a linear structural model with feedback.
Incidentally, goodness-of-fit tests are conditio sine qua non in mainstream factor modeling, but appear to be alien to the ‘PLS-
scene’. In the final section of this paper, the addendum, I try to shed some light on this issue by revisiting the past, the path
dependent genesis of PLS.
An exciting new development, a PhD-project at UCLA (Huang, 2013), exploits an old idea of Bentler and Dijkstra (1985): if
we update a consistent estimator once, using Newton-Raphson or Gauss-Newton on the first-order conditions of a suitable
fitting function, we get an asymptotically efficient estimator. So PLS can be made to be as efficient as GLS- or ML-estimators,
by a simple adjustment followed by one iteration of a standard optimization routine.
The approach as outlined here can be extended to recursive systems of regression equations with interaction between the
latent variables (modeled by products hihj). No distributional assumptions beyond the existence of moments are needed (see
Dijkstra, 2011 and Dijkstra and Schermelleh-Engel 2013). To spell it out for the simplest case, consider

h3 ¼ g1 h1 þ g2 h2 þ g12 ðh1 h2  r12 Þ þ residual: (12)

The g’s satisfy the following normal equations, as induced by the demand that the residual is uncorrelated with the three
‘explanatory’ variables:
2 3 2 32 3
Eh1 h3 1 Eh1 h2 Eh21 h2 g
4 Eh2 h3 5 ¼ 6 74 1 5
4 Eh1 h2 1 Eh1 h22
5 g2 (13)
E h1 h2 h3 Eh21 h2 Eh1 h22 Eh21 h22  r212 g12

(Recall that the latent variables are standardized to have zero mean and unit variance). The moments can be found from
the ‘PLS-moments’ for the proxies and the q’s:

1
I used H. Joe (2006) to generate a million correlations, uniformly from the ellipsoid that describes all permissible correlations (the ensuing correlation
matrix has to be positive definite). I selected those outcomes where r212 < .95, effectively discarding five from every thousand outcomes. Table 1 is based on
995,155 correlation matrices.

Please cite this article in press as: Dijkstra, T.K., PLS’ Janus Face – Response to Professor Rigdon’s ‘Rethinking Partial Least Squares
Modeling: In Praise of Simple Methods’, Long Range Planning (2014), http://dx.doi.org/10.1016/j.lrp.2014.02.004
4 T.K. Dijkstra / Long Range Planning xxx (2014) 1–8

Eh1 h2 ¼ q1 q2 Eh1 h2 (14)

Eh1 h3 ¼ q1 q3 Eh1 h3 (15)


Eh2 h3 ¼ q2 q2 Eh2 h3 (16)
Eh21 h2 ¼ q21 q2 Eh21 h2 (17)
Eh1 h22 ¼ q1 q22 Eh1 h22 (18)
Eh1 h2 h3 ¼ q1 q2 q3 Eh1 h2 h3 (19)
 
Eh21 h22  1 ¼ q21 q22 Eh21 h22  1 : (20)

Only the last equation deviates from the easy pattern. The moments for the proxies are estimated simply by their cor-
responding sample equivalents. For example Eh21 h22 is estimated by

n  2  2 
1X
b T1 y1t , w
w b T2 y2t (21)
n t ¼1

where n is the sample size and y1t is the vector of scores on the indicators of the first block for the tth individual or entity, with an
analogous definition for y2t. Note that for this to work we need to have the raw data, not just the moments of the sample. Inserting
the estimators for the moments of the latent variables, as adjusted for the imperfect correlations with the proxies, in (13) and
solving for the g’s yields consistent and (under appropriate conditions) asymptotically normal estimators for the coefficients.
With squares and higher order terms the same approach can be followed, except that now some distributional as-
sumptions appear to be needed. With normality (for the exogenous latent variables) recursive systems of structural equations
with polynomial terms are in principle estimable. In this way one keeps the high numerical efficiency and stability of PLS, as
well as its robustness against misspecification, and gets consistent and asymptotically normal estimators for non-trivial
structural models. And as stated before, in terms of complexity the new approach appears to compare rather favorably
with mainstream alternatives (see for example Schumacker and Marcoulides 1998 and Kelava et al., 2011). A Monte Carlo
study for a model with an interaction term and squares where the new approach was compared with the most efficient
maximum likelihood method around (LMS, ‘Latent Moderated Structural Equations’), produced quite acceptable results:
unbiased estimates with standard errors comparable with those of LMS and computationally much faster than LMS (Dijkstra
and Schermelleh-Engel 2013).

Factor indeterminacy and (factor) prediction

Professor Rigdon stresses the fact that factor scores cannot be determined unambiguously, and argues for the exclusive use
of composites instead. He also pleads for new ways to validate the model, by forecasting theoretical concepts using the
observations on the indicators. Determining the gap between proxy and concept would be the main evaluating validity.
Response: I found the argument hard to follow, and I may well have misconstrued it, but here is my reply. It is indeed
undeniable that factor scores cannot be determined unambiguously. At best, we can get some estimates of aspects of the
conditional distribution of the latent variable given its indicators. But how does selecting one number as representative for
this distribution solve the issue? A composite as generated by mode B for example is proportional to the mode of the con-
ditional distribution in case of normality, or proportional to the best linear approximation to the conditional expectation of
the latent variable given its indicators in more general situations. A plethora of other choices could be made. The use of a
specific proxy cannot take away the inherent and real uncertainty.
I find it also difficult to understand how the gap between the concept and its proxy can be measured, if we can have
numbers for the proxy only. To think of a concrete example: in (Chin et al., 2003) the relationship between the concept
‘intention to use’ (of a particular IT-application) and the concepts ‘perceived usefulness’ and ‘enjoyment’ was investigated (see
also Dijkstra and Henseler 2011 and Dijkstra and Schermelleh-Engel 2013). One of the models employed was the interaction
model specified above. One can of course predict the proxy for ‘intention to use’ via the indicators for the other concepts/
proxies, but I do not see the number with which to compare it. If in a follow-up study one can ascertain actual use as well, then
we can test the quality of the predicted ‘intention to use’-proxy as a forecast for actual use, via a logistic regression e.g. I would
like to emphasize that the latent variable model could be employed for this purpose as well:Eðh3 jy1 ; y2 Þ, where y1 and y2 are
the indicator vectors for ‘perceived usefulness’ and ‘enjoyment’, respectively, can be shown to be a linear expression in terms
of linear composites and their products, using normality as a working hypothesis. At the end of the day, we have algorithms
that transform observations on indicators into predictions of actual behavior. We can pit the algorithms against each other
and find out which works best in which circumstances. To me, techniques with proxies and techniques with latent variables
are instruments or tools. I am not inclined to endorse a categorical ban on either of them.

Please cite this article in press as: Dijkstra, T.K., PLS’ Janus Face – Response to Professor Rigdon’s ‘Rethinking Partial Least Squares
Modeling: In Praise of Simple Methods’, Long Range Planning (2014), http://dx.doi.org/10.1016/j.lrp.2014.02.004
T.K. Dijkstra / Long Range Planning xxx (2014) 1–8 5

Addendum, a sketch of the genesis of PLS

Herman O. A. Wold (1908–1992) had a lifelong fascination for ‘least squares’ and its role in modeling behavioral re-
lationships in prediction and estimation. The evolution of the least squares (LS) body of methods is the red thread in his
research (Wold, 1980).2 It started with his doctoral thesis (1938), in which he established by LS methods the ‘decomposition
theorem for stationary time series’ that brought him fame. His treatise on Demand Analysis (Wold, 1953, 1982a), a synthesis of
Paretoan utility theory, theory of stationary processes, and regression theory was entirely based on LS analysis. In the late
fifties he saw that the rationale of LS analysis could be developed on the basis of ‘predictor specification’, by defining a causal
predictive relation as the conditional expectation of the predictand. This would also reflect its intended use, prediction in case
of singular equations, and prediction by the ‘chain principle’, i.e. repeated substitutions, in case of recursive systems of
equations. For a long time Wold insisted on a conditional expectation specification for behavioral equations. Systems of
equations would qualify only as causal-predictive when each separate equation had that property, hence his strong prefer-
ence for recursive systems, or ‘causal chains’ as he would call them (Wold, 1964). The causal interpretation of simultaneous
equations, in which there is feedback between the variables has long been a very controversial topic in econometrics, and
appears to be only recently resolved (see Pearl 2009). Anyway, Wold saw in the sixties a way to reformulate non-recursive
systems that would allow of a causal-predictive interpretation of the separate equations, and estimation by (iterative)
least squares. Consider the following model

h1 ¼ b12 h2 þ g11 x1 þ g12 x2 þ z1 (22)

h2 ¼ b21 h1 þ g23 x3 þ g24 x4 þ z2 (23)

where all variables, the ‘endogenous’ h1 and h2, and the ‘exogenous’ x1, x2, x3 and x4 are scalar.3 This is known as Summers’
model, (Summers, 1965), it has been used often in econometric studies, see also (Mosbaek and Wold 1970). In the classical
setting the residuals z1 and z2 are assumed to be uncorrelated with the exogenous variables x. The feedback between the
endogenous variables then entails that the coefficients of the equations are not regression coefficients: z1 cannot be un-
correlated with h2, nor can that be the case for z2 and h1. This is most easily seen by solving for the endogenous variables. We
 
can writeh1 ¼ h1 þ residual1 where h1 is a linear combination of the four exogenous variables, in fact it is the regression of h1

on x, and residual1 combines z1 and z2 linearly; similarly for h2 ; h2 ¼ h2 þ residual2. So z1 is correlated with residual2 and
therefore with h2. It was observed in the fifties that systems like (22, 23) could be rewritten as:

h1 ¼ b12 h2 þ g11 x1 þ g12 x2 þ residual1 (24)

h2 ¼ b21 h1 þ g23 x3 þ g24 x4 þ residual2 (25)

In each equation the residual is uncorrelated with the right-hand side explanatory variables. So now the coefficients are
regression coefficients. Estimation could proceed in two rounds: first regress each endogenous variable on the four exogenous
 
variables, this yields estimates for h1 andh2 , and then use regressions on (24) and (25). This is known as ‘two-stage LS’ (see
Theil, 1958 and Basmann, 1957 for the original development).4 Note that
 
h1 ¼ b12 h2 þ g11 x1 þ g12 x2 (26)
 
h2 ¼ b21 h1 þ g23 x3 þ g24 x4 (27)

Wold suggested to iterate the procedure, i.e. use the freshly obtained estimates for the b’s and the g’s to get new values for
the h*’s, based on (26) and (27). Then update the coefficient estimates by new regressions, use them to get new h*’s, followed
by yet another set of regressions et cetera, and continue iterating until the values no longer change. This was known as the
Fix-Point method (see Mosbaek and Wold, 1970). Later, see the same reference, Wold noted that the algorithm would also
work for the model where the residuals are not uncorrelated with all four exogenous variables, but only with the relevant
variables at the same side of the equation. Previously we specified eight zero correlations for six parameters, now only six. In
the parlance of econometrics we lose the ‘over-identifying’ restrictions, used for testing and estimation, but in the parlance of
PLS, we now adhere to the ‘parity principle’, making the model more modest, and possibly more robust. See Mosbaek and
Wold (1970) for an extensive analysis of the Generalized Interdependent (GEID)-systems.

2
I refer here and below to a number of mimeographed documents, mostly written by Herman Wold, which are or may not be available in print. Naturally,
I will send a copy on request.
3
For ease of readability I am deliberately ‘sloppy’ in the specification of the model, ignoring constraints on the coefficients et cetera, but this will not
obscure the main point I want to make. Also, very special parameter values may contradict the general statements, but I will not specify them. Finally, I will
freely confuse zero correlation with independence.
4
Theil had in fact discovered two-stage LS in 1953, and described it in an internal report of the Central Planning Bureau of the Netherlands.

Please cite this article in press as: Dijkstra, T.K., PLS’ Janus Face – Response to Professor Rigdon’s ‘Rethinking Partial Least Squares
Modeling: In Praise of Simple Methods’, Long Range Planning (2014), http://dx.doi.org/10.1016/j.lrp.2014.02.004
6 T.K. Dijkstra / Long Range Planning xxx (2014) 1–8

As I see it, Wold tended to use the word ‘model’ not in the conventional way, as a restriction on moments or on distri-
butions, but more as a rule for constructing variables. He often used the conditional expectation specification formally but as
far as I know never actually tested its validity. With e.g. h1 ¼ f(x) þ z1 where Eðh1 jxÞ ¼ f ðxÞ for a function f of a specified form,
we have Eðz1 jxÞ ¼ 0. The zero conditional expectation is a very strong property. It says essentially that z1 is uncorrelated with
all functions of x, not just linear functions. An informal test would be to plot the estimated residual versus the elements of x
and check for nonlinear patterns. But this was never done. Instead, the implication of zero-correlation between z1 and the
elements of x was used to justify LS. The ensuing estimate for the LS-residual is uncorrelated with the elements of x by
construction. Constraints as imposed on factor models, see below, were not used for testing either.
So in the early sixties an important feature that would turn out to be the characteristic of Wold’s future work had emerged:
the estimation of both parameters and variables by means of ‘alternating least squares’, and the implicit definition of those
entities as a fixed point of an algorithm. Generally, there was no overall fitting criterion to be optimized, instead one per-
formed a sequence of local optimizations where the outputs of the previous LS programs temporarily set part of the pa-
rameters of the next program. A lecture tour in 1964 in the USA turned out to be pivotal for PLS’ genesis: when he presented
the Fix-Point method it was pointed out to Wold that principal components could be calculated in an analogous way.5 To spell
out, consider a set of vectors fyt gnt¼1 where for each (individual or entity) t the vector yt contains the observations on a fixed
set of indicators, possibly standardized. Write

yt ¼ ~l,h
~ t þ ~εt (28)

where the tildes indicate that these are not factor model parameters and variables. Here the scalar variable h
~t is treated as a
l. Since doubling, say, ~l and halving h~t yields the same product,
parameter as well and is to be estimated alongside the loading ~
we need to set the scale. One convention among others is to impose unit sample variance on the h ~t ’s. If we now define the
P
loadings and the ‘latent variable’ as those values that minimize the sum of the squared residuals nt¼1 ~εTt ~εt ; we find that for a
given ~
l the corresponding h~ is a linear combination of y , obtainable by a regression of y on ~l, and for given fh~ gn , the kth
t t t t t¼1
element of ~l is obtainable by a regression of the yk,t’s on the h~t ’s. Switching back and forth between these regressions until
they settle down leads to a fixed point, also known as the first principal component. Note that, again, by construction the
T
residuals are uncorrelated with the ‘latent variable’ (but they cannot be mutually uncorrelated since ~ l ~ε must be zero). Alsot
note that (28) is not a model in the classical sense, since there is no restriction at all: every sample covariance matrix has a
largest eigenvalue and a corresponding real eigenvector. A one-factor model with uncorrelated residuals does impose a non-
trivial restriction, even for the ‘saturated model’ with three indicators (Dijkstra, 1992), that can be used for testing via a
goodness-of-fit test. A principal component ‘model’ on the other hand can ‘only’ be tested by means of its usefulness, for
example, in condensing and summarizing high-dimensional data.
Continuing with the historical sketch, it is clear that extensions first to two blocks, later to more blocks and more than one
interrelationship between the blocks, were not immediate. Attempts to build models for two blocks are described in Wold
(1975). These include of course the canonical variables, also present in Wold (1966). There were others that did not use
two latent variables, each one combining the indicators of one block only, but one latent variable combining indicators of both
blocks. Eventually Wold decided on the form as we know it (Wold, 1976). But the extension to more blocks and their re-
lationships was still not settled. See Wold (1976) for some of the complicated schemes he devised. Some of my attempts to
solve the issue, as a member of Wold’s research group at the Wharton School in 1977, built upon extensions of the principal
components approach. One could perhaps, I thought, minimize the sum of the squared residuals of both blocks plus the
squared residuals of the inner equation, as a function of the loadings, the structural parameters, and all values of the latent
variables, and similarly for more blocks. The ensuing latent variables typically combined all indicators, and for larger models
things seemed to get hopelessly messy and the calculations time consuming and unstable. In 1977, Wold opted for a radical
simplification, maybe, if I may flatter myself, stimulated partly by noting that approaches such as mine were going nowhere:
he introduced the sign-weighted sum of ‘adjacent’ latent variables, and weights for a latent variable were simply to be
determined by a regression of its corresponding sign-weighted sum on the indicators of its block, or in reverse order,
depending on the ‘mode’. I vividly remember how relieved he was when PLS seemed to have settled down and attained a
form on which one could build. He may have liked the quotation attributed to Winston Churchill: out of intense complexities,
intense simplicities emerge. And he would have been enormously pleased to learn that at this time of writing (2013) the main
software program, SmartPLS, has 40,000 registered users! The Handbook of PLS (Esposito Vinzi et al., 2010) and its many
references is an impressive tribute to the fruitfulness and intellectual stimulus of Wold’s approach.
The double-sided nature of PLS, as a descriptive/predictive approach as well as an alternative to SEM in high-dimensional
data in a low-structure environment, always seemed to be a natural thing to Wold, as I think one can read him. The conceptual
structure of factor models was the backdrop against which Wold developed and tested his methods. This is particularly clear
in his discussion of the ‘basic design’ (see for example Wold, 1982b). If he had just wanted to extend principal components and
canonical variables analysis, there would have been no need to develop the concept of consistency-at-large, that allows one to

5
Wold acknowledges Professor G. S. Tolley, University of North Carolina, and Dr. R. A. Porter, Spindletop Research Center, Lexington, Kentucky. See Wold
(1966).

Please cite this article in press as: Dijkstra, T.K., PLS’ Janus Face – Response to Professor Rigdon’s ‘Rethinking Partial Least Squares
Modeling: In Praise of Simple Methods’, Long Range Planning (2014), http://dx.doi.org/10.1016/j.lrp.2014.02.004
T.K. Dijkstra / Long Range Planning xxx (2014) 1–8 7

say when the difference between a factor model and one of PLS ‘modes would be small. In fact, he insisted that a fundamental
principle of soft modeling is that all interaction between the blocks of observables is conveyed by the latent variables (Wold,
1981, 1982b). What is still surprising to me is that the implication of this principle, namely that the covariance matrix has an
explicit and clear structure (the covariance matrices between the blocks all have rank one), is tested nor exploited. This is the
more surprising since the (probability limits of the) weights of mode A and the loadings of mode B are then proportional to
the true loadings. In principle one could recover all factor model parameters consistently as described above. Of course, it
would be absurd to say that Wold shunned testing. He strongly preferred ‘local tests’ if I may call them that way. He
emphasized the need to check whether, say, signs and sizes of particular estimates were as expected and in agreement with
theory. He strongly favored distribution-free sample-reuse methods like the Jackknife for tests of significance, and the Stone-
Geisser test for predictive quality. But the implied structural constraints of a model were never exploited.6 A pragmatic,
instrumental approach to science may call for a neutral integration of the two approaches.
PLS is one of those families of ideas that evokes emotions, and apparently calls upon people to make a choice. I have been
on both sides of the fence. In the time I worked on my PhD thesis my sympathies were clearly with the structural maximum
likelihood approach, (Dijkstra, 1981, 1983). Later, inspired by Kettenring’s wonderful paper on canonical analysis (Kettenring,
1971) I regained my appreciation and enthusiasm for methods like PLS (Dijkstra, 2009, 2010) and Dijkstra and Henseler
(2011). Now I have a convinced pragmatic stance and refuse to condemn one and praise only the other. Let us establish
empirically where each works best. For problems in well-established fields highly structured approaches like mainstream SEM
may be appropriate, other fields will be well served by highly efficient means of extracting information from high-
dimensional data. And methods that blend the two, as I am working on, may well occupy their own niche.
I thank the editors for inviting me to response to Professor Rigdon’s paper. And I am grateful for his stimulating paper that
helped improve my thinking on PLS again.

References

Basmann, R.L., 1957. A generalized classical method of linear estimation of coefficients in a structural equation. Econometrica 25, 77–83.
Bentler, P.M., Dijkstra, T.K., 1985. Efficient estimation via linearization in structural models. Chapter 2. In: Krishnaiah, P.R. (Ed.), Multivariate Analysis, vol. VI,
pp. 9–43. North-Holland, Amsterdam.
Chin, W.W., Marcolin, B.L., Newsted, P.R., 2003. A partial least squares latent variable modeling approach for measuring interaction effects. International
Systems Research 14, 189–217.
Dijkstra, T.K., 1981. Latent Variables in Linear Stochastic Models. PhD thesis, second ed., 1985. Sociometric Research Foundation, Amsterdam.
Dijkstra, T.K., 1983. Some comments on maximum likelihood and partial least squares methods. Journal of Econometrics 22, 67–90.
Dijkstra, T.K., 1992. On statistical inference with parameter estimates on the boundary of the parameter space. British Journal of Mathematical and Sta-
tistical Psychology 45, 289–300.
Dijkstra, T.K., 2009. PLS for Path Diagrams Revisited, and Extended. Paper Published in the Proceedings of the 6th International Conference on Partial Least
Squares, Sept 4–7, Beijing, China. See also. http://www.rug.nl/staff/t.k.dijkstra/research.
Dijkstra, T.K., 2010. Latent variables and indices. In: Esposito Vinzi, V., Chin, W.W., Henseler, J., Wang, H. (Eds.), Handbook of Partial Least Squares, chapter 1.
Springer-Verlag, Heidelberg, pp. 23–46.
Dijkstra, T.K., 2011. Consistent Partial Least Squares Estimators for Linear and Polynomial Factor Models. Working Paper. http://www.rug.nl/staff/t.k.dijkstra/
research.
Dijkstra, T.K., Henseler, J., 2011. Linear indices in nonlinear structural equation models: best fitting proper indices and other composites. Quality Quantity 45,
1505–1518.
Dijkstra, T.K., Henseler, J., 2012. Consistent and Asymptotically Normal PLS-estimators for Linear Structural Equations. Working Paper. http://www.rug.nl/
staff/t.k.dijkstra/research.
Dijkstra, T.K., Schermelleh-Engel, K., 2013. Consistent partial least squares for nonlinear structural equation models. Psychometrika. http://dx.doi.org/10.
1007/s11336-013-9370-0.
Esposito Vinzi, V., Chin, W.W., Henseler, J., Wang, W. (Eds.), 2010. Handbook of Partial Least Squares. Concepts, Methods and Applications. Springer Verlag,
Heidelberg.
Huang, W., 2013. PLSe: Efficient Estimators and Tests for Partial Least Squares. Ph. D. thesis. University of California, Department of Psychology, Los Angeles.
Joe, H., 2006. Generating random correlation matrices based on partial correlations. Journal of Multivariate Analysis 97, 2177–2189.
Kelava, A., et al., 2011. Advanced nonlinear latent variable modeling: distribution analytic LMS and QML estimators of interaction and quadratic effects.
Structural Equation Modeling 18 (3), 465–491.
Kettenring, J.R., 1971. Canonical analysis of several sets of variables. Biometrika 58 (3), 433–451.
Mosbaek, E.J., Wold, H., 1970. Interdependent Systems’ Structure and Estimation. North-Holland, Amsterdam.
North-Holland, Amsterdam. In: Wold, H. (Ed.), Econometric Model Building. Essays on the Causal Chain Approach.
Pearl, J., 2009. Causality. Models’ Reasoning and Inference, second ed. Cambridge University Press, Cambridge.
Rigdon, E.E., 2013. Rethinking partial least squares path modeling: in praise of simple methods. Long Range Planning 45 (5–6), 341–358.
Schumacker, R.E., Marcoulides, G.A. (Eds.), 1998. Interaction and Nonlinear Effects in Structural Equation Modelling. Lawrence Erlbaum Ass., Mahwah, NJ.
Summers, R., 1965. A capital intensive approach to the small sample properties of various simultaneous equation estimators. Econometrica 33, 1–41.
Theil, H., 1958. Economic Forecasts and Policy. North-Holland, Amsterdam.
Wold, H., 1966. Nonlinear estimation by iterative least squares procedures. In: David, F.N. (Ed.), Research Papers in Statistics. Festschrift for J. Neyman’.
Wiley, New York, pp. 411–444.
Wold, H., 1975. Path models with latent variables: the nipals approach. In: Blalock, H.M., et al. (Eds.), Quantitative Sociology, International Perspectives on
Mathematical and Statistical Modeling. Academic Press, New York, pp. 307–359.
Wold, H., 1976. On the Transition from Pattern Recognition to Model Building. European Meeting of the Econometric Society, Helsinki, Finland, 23–27
August.
Wold, H., 1980. Contributions to the Rationale of LS, PLS and Soft Modelling in My Published Work, Geneva, May 1980 mimeographed document.

6
A PLS-reviewer of a paper that I submitted in which I showed that an adjustment of PLS would allow of goodness-of-fit tests, informed me that ‘we have
no need for this’. My subjective impression, unsupported by serious interviews, is that this lack of appreciation of the possibility to check for the presence of
structure in the data could be widespread in the PLS-community.

Please cite this article in press as: Dijkstra, T.K., PLS’ Janus Face – Response to Professor Rigdon’s ‘Rethinking Partial Least Squares
Modeling: In Praise of Simple Methods’, Long Range Planning (2014), http://dx.doi.org/10.1016/j.lrp.2014.02.004
8 T.K. Dijkstra / Long Range Planning xxx (2014) 1–8

Wold, H., 1981. Comments on the Papers by J. B. Grubman et al, Comments Integrated with a Briefing of PLS (Partial Least Squares) Soft Modeling. ASA-
CENSUS-NBER Conference on Applied Time Series Analysis of Economic Data, October 13–15.
Wold, H., 1982a. Demand Analysis, a Study in Econometrics. Greenwood Press, Westport, reprint of 1953 edition. Wiley, New York.
Wold, H., 1982b. The basic design and some extensions. In: Jöreskog, K.G., Wold, H. (Eds.), Systems under Indirect Observation, part II, pp. 1–54. North-
Holland, Amsterdam.

Biography

Dr. Theo K. Dijkstra is Professor of Econometrics, part-time, at the Faculty of Economics and Business, University of Groningen, The Netherlands, and in-
dependent researcher. He specializes in multivariate statistics and multi-criteria decision analysis. See http://www.rug.nl/staff/t.k.dijkstra/ and https://www.
researchgate.net/profile/Theo_Dijkstra/ for more information on Dr. Dijkstra. E-mail: t.k.dijkstra@rug.nl

Please cite this article in press as: Dijkstra, T.K., PLS’ Janus Face – Response to Professor Rigdon’s ‘Rethinking Partial Least Squares
Modeling: In Praise of Simple Methods’, Long Range Planning (2014), http://dx.doi.org/10.1016/j.lrp.2014.02.004

Вам также может понравиться