Академический Документы
Профессиональный Документы
Культура Документы
1997,50
823
824
PERSONNEL PSYCHOLOGY
coefficient provides one rough index of how good a job you are likely
to do in selecting among applicants, and when combined with a number
of other parameters of selection decisions (e.g., selection ratios, costs of
testing) helps provide an estimate the utility of these tests (Boudreau,
1991; Boudreau, Sturman, &Judge, 1994).
Much of what we know about the validity of selection devices is based
on analyses of univariate relationships between tests and criterion measures. However, personnel selection is almost always a multivariate process, involving multiple X variables and multiple Y variables. For example, organizations typically use more that one selection measure or
test when hiring (Boudreau et al., 1994; Gatewood & Field, 1994; Hakstian, Woolley, Woolley, & Kryger, 1991a, 1991b; Jones & Wright, 1992;
Milkovich & Boudreau, 1994). Assessment methods (e.g., tests, interviews) that tap multiple domains (e.g., cognitive ability, personality) are
the norm in most selection systems. More important, there is growing
recognition of the fact that the domain of job performance is complex
and multidimensional (Astin, 1964; Borman, Hanson, & Hedge, 1997;
Campbell, 1990; Conway, 1996; Murphy, 1989, 1996). As we will note
in sections that follow, the different facets that underlie the construct
job performance may in some cases be only weakly intercorrelated, and
different organizational policies for emphasizing one facet or another
when definingjob performance could lead to substantially different conclusions about the validity of selection tests.
In this paper, we will argue that fully multivariate approaches, that
consider multiple predictors and multiple facets of the criterion domain,
are better and more realistic for studying the validity of selection tests
than the univariate approaches that have historically dominated this literature. We will illustrate how moving from a univariate to a multivariate framework can change the way we think about the validity and usefulness of selection tests (Boudreau et al., 1994; Murphy, 1996), and in
particular will discuss the implications of changing conceptions of the
domain of job performance for evaluating the validity of selection tests
and test batteries.
Overview
We believe that there are several reasons why a multivariate framework to evaluating the validity of selection tests is preferable to current
univariate approaches. First, as noted above, job performance is not a
simple or unitary phenomenon, and models that treat performance as
a single entity without considering how the different facets of this complex construct are combined can present a misleading picture of the Val-
825
idity and utility of these tests. This implies that multiple Y variables
should routinely be considered when assessing selection validity.
Second, there is abundant evidence that multiple X variables are relevant for predicting which applicants are most likely to perform well in
most jobs. At the most general level, there is considerable evidence that
both general cognitive ability and broad personality traits (e.g., conscientiousness) are relevant to predicting success in a wide array of jobs
(Barrick & Mount, 1991; Hunter & Hirsh, 1987; Hunter & Hunter, 1984;
Ree & Earles, 1994; Tett, Jackson, & Rothstein, 1991). Discussions of
the validity of any one class of tests or measures, considered in isolation,
can present a misleading or incomplete picture of the validity of these
measures in field settings, where multiple X and multiple Y variables are
likely to be the rule rather than the exception.
Third, there is evidence that different facets of job performance have
different antecedents. That is, the attributes that lead some applicants to
excel in specific aspects of performance (e.g., performing individual job
tasks) appear to be different from those that lead some applicants to excel in other aspects of job performance (e.g., teamwork. See Borman et
al., 1997; Day & Silverman, 1989; McCloy, Campbell, & Cudeck, 1994;
McHenry, Hough, Toquam, Hanson, & Ashworth, 1990; Motowidlo &
Van Scotter, 1994; Rothstein, Paunonen, Rush, & King, 1994). The common practice of using multiple tests or assessment methods in personnel
selection might be interpreted as a recognition that any one class of measures, no matter how good, is unlikely to capture the range of attributes
that is likely to be relevant to predicting job performance.
In the sections that follow, we first review research on the multidimensional nature of job performance. This research suggests that job
performance can mean quite different things, depending on the relative
emphasis given to various aspects of the performance domain. Next, we
briefly review research on the individual difference variables most likely
to be related to performance. This review suggests that measures of ability and personality will be relevant to predicting performance in a wide
range of jobs.
Next, we discuss research on the conditions under which the strategy
for combining predictor and/or criterion information is likely to have
a large or a small effect on the correlation between the predictor and
criterion sets. As we will show, the problem of predicting multidimensional job performance criteria on the basis of broad individual difference measures is one in which strategies for combining predictors and/or
criterion dimensions is likely to have a substantial impact on conclusions
about test validity. Next, we review partially multivariate approaches,
826
PERSONNEL PSYCHOLOGY
which involve either using multiple tests to predict a single unitary performance construct, or which involve differential weighting of criterion
attributes to arrive at decisions involving multiple criteria.
Finally, we discuss a fully multivariate approach, in which there are
multiple X variables and multiple Y variables, and in which decisions
must be made about how to weight the Xs to form predictor composites
and about how to combine the Ys to create an overall index of performance. We use a Monte Carlo study, with parameter values obtained
from large-scale studies and meta-analyses, to illustrate how policies for
defining the construct performance and for combining predictor scores
can be important influences on the validity of a personnel selection system. Even when all of the tests used in selection are known to be "valid"
(e.g., as would be the case if psychometricallysound measures of cognitive ability and broad personality traits such as conscientiousnesswere
used in selection), the level of validity can vary substantially, depending
on the extent to which the strategy for selecting applicants is consistent
with the definition of job performance adopted by a particular organization.
827
Campbells (1990) model of performance includes behaviors such as volunteering, persisting, helping and maintaining individual discipline (See
also Campbell et al., 1993). Labels such as contextual performance,
organizational citizenship and prosocial behaviors have been applied to this facet of the performance domain, and while these three
terms are not interchangeable, they all capture aspects of effective job
performance that are not always directly linked to accomplishingspecific
individual tasks. The use of teams and group-oriented methods of work
organization has grown tremendously in the last decade, and the aspects
of performance that are most relevant to the effective functioning of
teams and work groups appear to be increasingly important in defining
the construct of job performance (McIntyre & Salas, 1995).
A great deal has been written about changes in the way jobs, roles,
and work organizations are defined (e.g., See Cascio, 1995; Howard,
1995;Ilgen & Hollenbeck, 1991;McIntyre & Salas, 1995). These changes
have clear relevance for understanding precisely what job performance
means in different jobs, organizations, and so forth. In particular, it is
likely that the very different sets of behaviors might define effective performance in the same job, depending on how work is organized (e.g.,
individual vs. team-oriented production methods), or how organizations
are structured (e.g., hierarchical structures with rigid job descriptions
vs. fluid structures and relatively undefined job descriptions), and so
forth. Different organizations that do similar work might place substantially different emphasis on individual versus team- or group-oriented
facets of job performance, and validity studies that do not pay careful
attention to possible differences in the meaning and antecedents of job
performance across organizations, or across time, may yield misleading
estimates of the contributions of specific tests or sets of tests to the task
of selecting job applicants who are most likely to perform well.
Predictors of Peflormance
Throughout much of the history of personnel selection research, substantial attention has been devoted to the study of the validity of various selection techniques. Several influential reviews, notable Reilly and
Chao (1982) and Hunter and Hunter (1984) compared the validity of
written tests, interviews, biodata instruments, and other selection instruments. Other important reviews have concentrated on evaluating the
validity and utility of one specific method or family of methods (e.g.,
Gaugler, Rosenthal, Thornton, and Bentson, 1987, reviewed research
on assessment centers).
In recent years, the focus of research and theory in the prediction
of job performance has shifted somewhat from a focus on methods or
828
PERSONNEL PSYCHOLOGY
829
830
PERSONNELPSYCHOLOGY
simulation to show just how much validity might vary, even when selection is done solely on the basis of measures that are known to be valid
predictors of at least some aspect of performance in virtually all jobs.
Partially Multivariate Approaches to Predicting Peiformance
Ours is not the first study to advocate examining validity using either
multiple predictors or multiple criterion dimensions. Several studies
have examined the relationship between multiple predictors and a single performance criterion, often using multiple regression to link predictors with the criterion measure (e.g., Guion, 1991; Hakstian et al, 1991a;
Ones et al., 1993a;Ones, Schmidt, & Viswesvaran, 1993b;Schmidt, 1992;
Schmidt, Hunter, & Outerbridge, 1986; see also Hunter and Schmidt,
1990, pp. 502-503). These studies illustrate the use of multiple X variables to predict a single Y (i.e., performance is treated as a univariate construct). Multi-attribute utility models (See Bazerman, 1990; Edwards & Newman, 1982; Pritchard, 1990; Roth, 1994) often include multiple criterion dimensions, but they rarely include multiple predictors.
These studies illustrate the use of multiple Y variables, but do not typically include multiple predictors, or multiple independent variables.
The literature on selection validity contains few studies that employ
a fully multivariate model, in which the links between multiple predictors and multiple criteria are simultaneously examined (Murphy, 1996).
As we have noted above, there are several aspects of the personnel selection paradigm that suggest that fully multivariate models would have
considerable utility (e.g., multiple predictors and criterion dimensions,
low correlations among predictors and criterion dimensions, etc.). Such
a model is described in the section that follows, and is applied in a Monte
Carlo simulation to illustrate the importance of the specific definition of
the job performance construct in reaching conclusions about the validity
of selection tests.
A Fully Multivariate Approach for Estimating Selection Test klidity
831
organization, individual task performance is very important and organizational citizenship behavior is not (e.g., if production is done on an individual piece-rate basis). This definition of performance is essentially
different from that in another organization where OCBs are seen as just
as important as individual task performance. Similarly, the process of
defining a selection composite is essentially a process of deciding about
the relative weight given to each test in the selection battery.
Although the definition of the performance composite is essentially a
matter of organizational policy (i.e., the organization's definition of what
constitutes job performance), the definition of the predictor composite
need not be. One possibility is to identify the unique set of weights for
the predictor set that maximizes the validity of the selection composite,
given a particular set of weights for the performance comp0site.l Although optimal weights are useful for understanding the upper limits of
prediction, they do not always reflect organizational practice. That is,
organizations often use simpler systems for combining information from
multiple predictors (e.g., equal weighting). In the Monte Carlo study
presented below, we will examine the effects of choosing a wide range
of weights, and will not focus on the statistically optimal weighting system
for combining predictors. We will, however, show that when statistically
optimal weights are used to combine predictors, the same overall pattern
of results holds as when simpler weighting strategies are used.
832
PERSONNEL PSYCHOLOGY
used in selection are arbitrary, and can be rescaled at the test users convenience. The true variability in various aspects of performance is not directly under the researchers control, and it can have a substantial effect
on the effective weight of a specific performance dimension in defining
overall job performance. For example, an organization might set a policy
that says that individual task performance and organizational citizenship
are the two key facets of job performance, and that individual task performance is twice as important as organizational citizenship. We refer
to this statement that organizational citizenship should be given twice
the emphasis as individual performance as a nominal weight. However, if individual differences on OCBs are twice as large as individual
differences on individual task performance, the effective weights of the
two facets of performance in defining overall job performance (and the
validity of the selection composite) will be identical. Regardless of the
organizations stated policy, if subjects actually differ more on one of
the performance dimensions than on the other, the effective weight of
each performance facet in defining overall job performance might be
substantially different from the nominal weight.
One reason why it is important to consider the variability of each
facet of performance is that individual differences, selection policies, organizational socialization experiences or organizational cultures could
conceivably lead to restricted variability in some aspects of performance
and enhanced variability in others. For example, suppose an organization provided extensive training, performance aids and technical support
to assist individual task performance but did nothing to increase organizational citizenship behaviors. This could easily lead to a high mean but
a small standard deviation for individual task performance measures,
and to relatively larger variability in OCBs. Alternatively, strong organizational cultures supporting OCBs might lead to restricted variability
in that facet of performance and to relatively larger variability in individual task performance. In each case, individual and organizational factors that affected the standard deviations of performance facets would
also have an impact on the effective weight of these facets in defining the
performance construct.
833
cognitive ability tests and personality tests that measure the dimension
conscientiousness, in predicting performance. Performance, in turn,
will be defined as some combination of individual task performance and
organizational citizenship behaviors. The relationships between ability,
conscientiousness, individual task performance and organizational citizenship behaviors have all been studied extensively (there are several
meta-analyses that summarize research on specific pairs of variables),
and we can use this research base to build a realistic and informative
Monte Carlo study that examines the effects of a number of critical parameters on the validity of predictor batteries.
The underlying model for the prediction process is illustrated in Figure 1. Organizations use some combination of ability tests and personality tests to predict the future performance of applicants, and overall
job performance is defined as some composite of individual task performance and organizational citizenship. Figure 1includes estimates of the
correlations between each of the pairs of variables along with (in parentheses) estimates of the standard deviations of those correlations. We
include standard deviations in Figure 3 because none of the correlations
shown in Figure 1represents a known fmed quantity. For example, .50
represents a reasonable estimate of the relationship between measures
of cognitive ability and measures of individual task performance, but this
number is not likely to be constant across all jobs (Gutenberg, Arvey, Osburn, & Jenneret, 1983). The standard deviation of .10 reflects the fact
that the correlations between ability tests and individual performance
measures are not completely invariant, but rather fall in some range.
It is important to emphasize that the figures included in Figure 1 are
used solely for the purposes of illustrating the importance of decisions
about the weights assigned to predictor and/or criterion dimensions in
determining the validity of selection tests or batteries. Every one of
the values shown in Figure 1 has been the focus of considerable research
and debate, and researchers in these areas might reasonably propose
alternatives to any of the specific mean or SD values shown in this figure.
Rather thanviewing Figure 1as the definitive summary of what is known
about these constructs, we suggest that it be used as an illustration of
the implications of the consequences of using these sorts of constructs
to predict performance (where performance might take on a variety of
definitions) across a range of job types. In specific settings, jobs, or
types of organizations,one or more of the correlations among these four
constructs might differ substantially from those shown in Figure 1.
Similarly, it is important to emphasize that the outcomes of any
Monte Carlo study depend heavily on the range of parameter values
studied. In this paper, we consider the implications of potentially large
(J5)
.20
(.I 0)
.oo
Ind. Task
Accomplishment
Figure 1: RelationshipsAmong Selection Tests and Performance Dimensions: Estimated Correlations (and Standard Deviations of Estimates)
Conscienti
(-03)
.10
Cog. Ability
E
P
835
differences in the way organizationsvalue different facets of job performance, in the emphasis they give to ability versus personality dimensions
as predictors, and so on. Different choices about the range of parameter
values to be studied would of course affect the results of such a simulation. The purpose of our simulation study is not to establish definitively
the range of validities across all situations researchers might encounter,
but rather to illustrate concretely the principles that are implicit in the
preceding discussions of the effects of the weights given to predictors
and/or criterion dimensions when combining multiple X and/or Y variables. In other words, our purpose here is to illustrate just how much difference the choice of parameter values might make in reaching conclusions about the validity of selection test batteries as predictors of overall
job performance.
Before discussing our Monte Carlo simulation,we will briefly discuss
the sources for each of the values of the estimated correlations (and their
standard deviations) shown in Figure 1.
Ability linkages. In Figure 1, cognitive ability is related to conscientiousness, individual task performance, and organizational citizenship. The first link, between cognitive ability and individual task performance, is one of the most widely studied topics in psychology; psychologists have studied general cognitive ability as a predictor of performance
since the turn of the century (Schmidt, 1994). Large-scale studies and
meta-analyses of ability-performancerelationships (where performance
is typically measured using supervisory ratings) have typically reported
uncorrected validities of .35 or above, with some variability across jobs
that differ in complexity (Hunter & Hunter, 1984, McHenry et al., 1990;
Nathan & Alexander, 1988; Ree & Earles, 1994; Schmidt et al., 1986).2
Many of these studies use measures that confound individual task performance and OCBs (e.g., supervisory ratings are probably affected by
both); studies that focus more exclusively on individually oriented performance measures (e.g., work samples; see Hunter; 1986) often report
uncorrected validities of approximately S O .
21n this simulation we use uncorrected validities for two reasons. First, statistical theory
for understandingmultivariate results on the basis of multiple corrected correlations is not
well developed. Correctionsfor attenuation, range restriction, and so forth affect the interpretations of confidence intervals, significance tests and even effect size measures, and
analyses based on combining several corrected r values can be very difficult to interpret.
Second, it is likely that some of the apparent unreliability of performance measures is due
to the fact that the domain is multidimensional,which will yield low internal consistency
and low agreement between raters who might place different emphasis on the individual
facets of performance. Reliability can be increased by developing and appropriately combining measures of each homogeneousfacet of the performance domain, and as reliability
increases, corrections for attenuation have a vanishingly small effect. Range restrictions
would have been applicable to only a few of the predictor-criterion correlations studied
here, and their effects are typically also small unless selection ratios are quite low.
836
PERSONNEL PSYCHOLOGY
The precise extent of variability in validities is a matter of some dispute; we will use a mean of S O and a standard deviation of .lo in our
Monte Carlo study to represent the distribution of correlations between
cognitive ability and individual task performance. The effective range of
ability-individualtask performance correlations (i.e., three standard deviations from the mean in either direction) would then be approximately
.20 to 30.
As we noted earlier, numerous studies suggest that cognitive ability and conscientiousness are reasonably independent (Ackerman et al.,
1995; Barrick et al., 1994; Brand, 1994; Cattell & Butcher, 1968; Cattell & Kline, 1977; Dreger, 1968; Ones et al., 1993a; Wolfe & Johnson,
1995). In our Monte Carlo analysis, we will specify a mean correlation
of .lo, with a standard deviation of .03, to represent the feasible range of
the ability-personalitycorrelations. This mean and SD yields an effective
range of approximately .01 to .19.
The relationships between cognitive ability and organizational citizenship behaviors have not been as extensively researched as ability-task
performance or ability-personality relationships;virtually all of the studies of the antecedents of OCBs have focused on attitudinal or personality variables rather than abilities (Organ & Ryan, 1995). One large-scale
study (i.e., U.S. Army Project A) reported a reliable estimate of the relationship of general cognitive ability and the Effort and Leadership factor of performance in the military (McHenry et al., 1990). This aspect
of performance is closest to the concept of OCB; McHenry et al. (1990)
reported that ability measures correlated .31 with this aspect of performance. In our Monte Carlo study, we will describe this relationship using a mean correlation of .30 and a standard deviation of .05, yielding an
effective range of .15 to .45.
Conscientiousness linkages. Conscientiousness is linked to both individual task performance and OCBs. Based on meta-analysesby Barrick
and Mount (1991), Mount and Barrick (1995), and Tett et al. (1991), we
estimate the mean and standard deviation of the distribution of correlations between conscientiousness and individual task performance to be
-20 and .04, respectively, yielding an effective range of .08 to .32.
There have been several studies and meta-analyseslinking conscientiousness and organizational citizenship behaviors (Barrick, Mount, &
Strauss, 1992; Becker & Randall, 1994; Organ, 1994; Organ & Konovsky,
1989; Organ & Lingl, 1995; Organ & Ryan, 1995). On the basis of these
studies, we will use a mean correlation of .20 and a standard deviation
of .05 to characterize this relationship. This mean and SD yields an effective range of conscientiousness-OCB correlations of .05 to .35.
837
838
PERSONNEL PSYCHOLOGY
TABLE 1
ValidityEstimates
Entire population
Weighta ability
0.3
0.5
0.7
Weight individual task perf.
0.3
0.5
0.7
SD individual task perf.
Mean r
SD of r values
.49
.15
.20 - .78
.41
.51
.54
.09
.14
.17
.23 - .59
.24 - .77
.21 - .87
.46
SO
.16
.17
.17
.14 - .78
.16 - 3 4
.18 - .86
.52
.09 - .66
.38
.15
.16 - .80
.48
.16
.53
.16 - 3 6
.I7
.20 - .88
.17
.54
.51
.I8
.17 - .86
a Note that weights assigned to X and Y variables sum to 1.0,and SD values for Individual Risk Performance and OCBssum to 20, so if weight of SD or Ability or Individual B s k
Performance is known, the weight or SD of Conscientiousnessand OCBsis also known.
839
840
PERSONNEL PSYCHOLOGY
841
TliBLE 2
Eflects of Weights and Standard Deviations on Rlidity
Eta squared
Main effects
Weights assigned to selection tests (Wz)
Weights assigned to performance dimensions (wy)
Standard deviations of performance dimensions ( S D )
Interactions:
wz x wy
Wx X SD
Wy X SD
W x X Wy X SD
Residual
.23
.02
.24
.003
.02
.09
.004
.38
PERSONNEL PSYCHOLOGY
842
TABLE 3
Breakdown of Efects of Criterion Weights and CriterionSD Values on Validity
Weighta ITP
SD IT'
Mean T
SD of r values
.3
2
6
10
14
.35
.40
.47
.56
.53
.37
.47
.56
.55
.07
.09
.12
.17
.18
.08
.12
.17
.18
.17
.10
.17
.19
.19
.17
.21 - .49
.22 - .57
.24 - .70
.22 - .89
.18 - .89
.21 - .53
.23 - .67
.22 - .87
.18 - .92
.17 - .84
.23 - .6I
.23 - 3 9
.18 - .92
.17 - .87
.15- 3 0
18
2
6
10
14
18
2
6
10
14
18
SO
.42
.56
.55
.52
.48
a ITP = Individual Thsk Performance. Highest means in each set of cells receiving the
same criterion weights are underlined.
of the two tests or on one of the two performance facets, validity can
decrease. This point is most clearly illustrated by examining the interaction between the weights assigned to performance dimensions and the
standard deviations of each performance dimension; the cell means that
define this interaction are shown in Table 3.
As Bble 3 shows, validity is highest when the effective weights of the
two performance dimensions are equal (the values underlined in each
section of Table 3 represent mean validities when the effective weight assigned to individual task performance is the same as the effective weight
for OCBs). As the effective weights assigned to the two facets of performance diverge (in either direction), estimated validity drops. Table 3
suggests that validity is highest when both performance dimensions are
important in defining the composite entitled performance; as the effective definition of this composite shifts toward a heavier emphasis on either individual task performance or organizational citizenship, validities
tend to drop.
Unexplained variance in validities. Table 2 shows that a substantial
portion of the variance in estimated validities (i.e., 38%) is not explained
by the weights attached to tests or performance facets or by the differences in the variability of performance facets. The confidence intervals
shown in Tables 1and 3 provide a concrete illustration of just how much
validities might vary; confidence intervals of 50 to 60 points are quite
common. This extensive variability in validities illustrates an important
aspect of virtually all multivariate procedures-that is, that uncertainty
compounds.
843
The procedures outlined here combine information from a 4 x 4 matrix (i.e., two predictors and two criterion facets) to obtain an overall
validity estimate. Each of the correlations that are included in these calculations (i.e., estimated correlations among ability, conscientiousness,
individual task performance, and organizational citizenship) represents
an uncertain quantity; the standard deviations shown in parentheses in
Figure 1 illustrate how much each one of these values might vary. When
these correlations are combined to estimate the overall validity of the selection battery, you will generally be less certain of this calculated value
than you were of any of the individual values used to compute it (for
example, the overall standard deviation of estimated validities shown in
Table 1 is .15, a value that is larger than any of the standard deviations
shown in Figure 1). We will examine the implications of this aspect of
multivariate validity models in the sections that follow.
844
PERSONNEL PSYCHOLOGY
forth, and jobs that are similar in terms of their titles, main duties, and so
forth may still yield very different definitions of what constitutes good or
poor performance. This model described here suggests that statements
about the validity and utility of any test or set of tests as predictors of performance must be preceded by a careful analysis of precisely what overall
job performance means in a specific setting. Organizations might vary
considerably in the extent to which they value or emphasize individual
task performance versus behaviors that enhance others performance,
and different workforces might vary substantially on one facet and be
highly homogeneous on another. The multivariate model described here
provides both an impetus and a mechanism for investigating the effects
of organizational policies and workforce characteristics (i.e., variability
in specific aspects of performance) on validity.
There is a long tradition of expressing validity in terms of a single
number (ie., rZy). Multivariate models use the same correlational scale,
but the process by which a validity estimate is obtained is likely to be fundamentally different than that which has characterized validity research
to date. Most validity studies have either taken some specific measure
as an approximation of an ultimate criterion (i.e., the best measure of
performance), or have simply ignored the criterion problem altogether
and treated performance as a simple quantity that can be estimated by a
single data source. As recognition of the complex, multidimensional nature of the performance domain has emerged, it has become increasingly
clear that the traditional univariate models do not provide a sufficient
basis for understanding how personnel selection will in fact affect job
performance (Murphy, 1994,1996). If an organization happens to value
organizational citizenship highly, the fact that a particular test is valid
as a predictor of individual task performance does not necessarily say
much about its relevance for hiring the best workers for that organization. The model developed here suggests that tests must be matched with
the definition of performance adopted by the organization to provide the
best tools for personnel selection.
Three aspects of our simulation results stand out as particularly important. First, the mean of the uncorrected correlations between predictor composites and criterion composites, across all conditions, is .49.
Note that this is lower than the simple correlation between one of the
tests (i.e., cognitive ability) and one of the facets of the criterion domain
(i.e., individual task performance ); the uncorrected correlation between
these two measures was assumed to have a mean value of S O . That is,
studies that focus on the validity of ability tests for predicting this single
facet of the performance domain might overestimate our success in predicting job performance (where performance is defined to include both
MURPHY A N D SHIARELLA
845
individual- and group-oriented facets), even when ability tests are combined with other valid predictors. Our simulation shows that there are a
variety of conditions in which validities well in excess of S O might be expected, but they also show that there are numerous situations in which
these same tests might not do a very good job predicting which applicants
are most likely to perform well.
Second, our simulation suggests that as group-oriented facets of performance become more important (e.g., as the relative weight given to
OCBs in defining the performance construct increases), validities tend
to decrease. As Table 2 shows, average validities tend to be higher when
individual task performance is given more emphasis in definingjob performance. However, lhble 3 suggests that broadening the domain of job
performance to include group-oriented facets does not automaticallydepress validities. This table shows that when the eflective weights of the
two facets of the domain are roughly equal (i.e., the underlined coefficients in Table 3 are ones in which the two facets of the performance
domain receive equal effective weights), average validities can be consistently high.
Third, results shown in Thble 2 provide concrete support for our general assertion that weights matter. Roughly 23% of the variance in the
correlations between predictor and criterion composites could be explained in terms of the weights assigned to cognitive ability and personality measures. Roughly 34% of the variance in validities could be explained in terms of the effective weights of individual task performance
and OCBs in defining job performance. There are many areas of research in personnel psychology in which the weights assigned to predictors and/or criteria do not matter, but this is not one of those areas. If
the constructs used to predict performance are relatively distinct, if the
facets that define the performance domain are also relatively distinct,
and if different aspects of performance have different antecedents, the
weights assigned to predictors and criteria have a substantial impact on
conclusions about the validity of your selection test battery. The literature reviewed here shows that all three of these conditionsmay hold; our
Monte Carlo simulation is merely an illustration of the concrete consequences of using these types of predictors to predict multidimensional
performance criteria.
846
PERSONNEL PSYCHOLOGY
examining tests or assessment methods in isolation becomes less appealing as simple methods for taking into account the fundamentally multivariate nature of real selection decisions are made available. There is a
good deal to be learned by examining the validity of a single test or class
of tests (e.g., cognitive ability tests), but the use of validity coefficients
that connect scores on a single test with a single performance measure
are unlikely to provide a good basis for evaluating the probable success
of a personnel selection systems. Research on the broad constructs that
appear to define the domain of job performance and that appear to be
important antecedents to the various facets of job performance has progressed far in the last 10-20 years, and our accumulated knowledge base
is sufficiently rich to support a more sophisticated approach to selection
validity research.
The multivariate framework outlined here also implies that some of
the questions that seemed to be settled in univariate validity research
might need to be reexamined from a univariate perspective. For example, it is generally accepted that cognitive ability tests are among the
most valid predictors of performance in virtually all jobs. However, the
research that supports this conclusion is notoriously vague in defining
just what performance means, and as the jobs and methods of work organization change, the relative importance of those behaviors that are
most closely related to cognitive ability might also change. A better articulation of the job performance construct might help to illustrate shortcomings in our current knowledge that are not at all evident if the term
job performance is taken at face value.
Finally, the results presented here suggest that aggregation of validity
across settings may sometimes be unwise. In particular, if different organizations have fundamentally different definitions of job performance, it
is likely that the true validity of predictors will vary. It is well known that
current meta-analytic methods can have insufficient power for detecting true and meaningful differences in validity across settings (Kemery,
Mossholder, & Roth, 1987; Osburn, Callender, Greener, & Ashworth,
1983; Sackett, Harris, & Orr, 1986); the current analysis suggests concrete factors that might lead to real variance in validity. Our model provides a starting point for studying the fully multivariate model of validity,
but a good deal of work will be needed to develop an adequate statistical model and to determine concretely the circumstances under which
multivariate validities are likely to be sufficiently similar across settings
to conclude that validity in fact generalizes.
847
Implicationsfor Practice
The model and analyses described in this paper carry a number of
implications for practice. First, our analyses suggest that generic statements about the validity of tests or test batteries can be misleading. Depending on how performance is defined by the organization, the same
tests might show relatively high or relatively lower levels of criterionrelated validity. Thus, the first task in estimating the validity of a test or
a battery of tests is likely to be a thorough explication of the performance
construct, as it is defined in that organization. Traditional job-analytic
methods for determining the dimensions of job performance may not
be sufficient; a psychologist who wants to understand and predict job
performance might want to start with an organizational-levelanalysis of
what behaviors are most consistentlyvaluedand supported by the organization. Murphy and Cleveland (1995) note that organizational decision
makers are not always skilled in articulating their values and the relative
weights they assign to different facets of the performance domain; they
suggest ways that tools from decision research might be applied to help
managers articulate what they mean by job performance.
The mean correlations shown for different combinationsof predictor
and criterion weights in Tables 1and 3 suggest that decisions about how
to combine predictors and criterion measures require careful thought
and can have meaningful consequences. The results presented in Tables 1 and 3 also highlight the sometime extensive variability of validities, even in situations where similar strategies for combining predictors
and/or criteria a held constant. In almost all cases, the confidence intervals for validity estimates are quite wide; it is not unusual to find confidence intervals .30 to S O points wide. This finding contrasts sharply with
the common conclusion in validity generalization research that univariate validities are usually quite similar across a range of situations (See
Hunter & Hunter, 1984 and Hunter & Hirsh, 1987 for reviews). In fact,
there is no real conflict between these two sets of findings. Multivariate validity estimates will often vary extensively across settings because
uncertainty compounds. That is, a test user might be reasonably certain about the univariate relationships between specific tests and specific facets of job performance (i.e., confidence intervals for each individual test-performance facet correlation might be small), but still be
uncertain about the overall validity of a battery of tests used to predict
a multi-faceted criterion. As we noted in our introduction, personnel
selection is almost always a multivariate problem, involving both multiple predictors and multiple facets of the criterion domain. Univariate
validity studies, or meta-analyses of these studies, may underestimate
the uncertainty that a test user is likely to encounter in estimating the
848
PERSONNEL PSYCHOLOGY
As we have emphasized throughout, the Monte Carlo simulation presented here is used solely to illustrate the concrete implications of the
principles of measurement and analysis that underlie the multivariate
validation model, and it is important to understand the limitations of this
simulation. First, the correlations shown in Figure 1, which form the
heart of this simulation, are all subject to debate. Sensible arguments
could be made for different values, and changes in these correlations
could lead to numerous changes in the outcomes of our simulation. Second, the validities discussed in this paper are mean observed validities,
and few efforts have been made to correct for various factors that could
attenuate these correlations. We think there are good reasons to focus
on observed validities (see footnote 2), but it is important to keep the
distinction between observed and corrected validity estimates firmly in
mind when evaluating our results.
There are also important limitations to the state of our current understanding of the job performance domain. There have been several
recent notable advances, both in the areas of construct explication (e.g.,
Borman & Motowidlo, 1993; Campbell et al., 1993) and in empirical
analyses of performance measures (e.g., Conway, 1996; Viswesvaran,
1996), but there are still important questions to be asked about the do-
849
main of job performance, and about the extent to which broad factors
such as those studied here can be used to characterize performance
across a variety of jobs, organizations, and so forth. In particular, there
are still important questions about the strength of the relationship between individual and group- or team-oriented facets of performance,
and about the extent to which this relationship varies across jobs, settings, and so forth. (Conway, 1996).
The main conclusion that this paper has to offer is that there is much
to be gained by thinking carefully about exactly what job performance
means in different settings, and about the implications of this construct
definition for research on selection test validity. This paper presents a
simple multivariate framework for thinking about and studying test validity, and we believe that the advantages of applying this approach outweigh the difficulties. First, it presents a relatively simple method that
can be applied to any number of tests or assessments. Because selection
tests are often designed to tap distinct (but not necessarily orthogonal)
domains, the effects of using several tests in combination can be quite
different from using them in isolation, and the multivariate model developed here provides a framework for studying such combinations. More
important, the model encourages you to recognize and incorporate information about the complex, multidimensional domain we refer to as
job performance. If your goal is to predict who is most likely to perform
well or poorly, it is important to start with a careful analysis of exactly
what you mean by job performance, and an analysis of the performance
domain is likely to lead you to the conclusion that multivariate models
are preferable to the univariate models that have long characterized research on the validity of selection tests.
REFERENCES
Ackerman PL, Kanfer R, Goff M. (1995). Cognitive and noncognitive determinants and
consequences of complex skill acquisition. Journal of Experimental Psychology: Applied, I , 270-304.
Astin A. (1964). Criterion-centeredresearch. Educational and PsychologicaZMeasurement,
24, 807422.
Avila RD, Fern EF, Mann OK (1988). Unraveling the criteria for assessing the performance of salespeople: A causal analysis.Journal of Personal Selling and Sales Management, 8, 45-S4.
Barrick MR, Mount MK (1991). The Big Five personality dimensions and job performance:A meta-analysis. PERSONNEL PSYCHOLOGY, 44, 1-26.
Barrick MR, Mount MK, Strauss JP. (1992, May). The Big Five and abiliry predictors of
citizemhip, delinquency, and salesperformance. Paper presented at Seventh Annual
Conference of the Society for Industrial and OrganizationalPsychology,Inc., Montreal.
Barrick MR, Mount MK, Strauss JP: (1994). Antecedents of involuntary turnover due to
a reduction in force. PERSONNEL PSYCHOLOGY, 47, 515-535.
850
PERSONNEL PSYCHOLOGY
Bazerman M. (1990). Judgment in managerial decision making (2nd ed.). New York Wiley.
Becker TE, Randall DM. (1994). Validation of a measure of organizational citizenship
behavior against an objective behavioral criterion. Educational and Psychological
Measurement, 54, 160-167.
Borman WC, Hanson M, Hedge J. (1997). Personnel selection. AnnualReview ofPsycho1om, 48, 299-337.
Borman WC, Motowidlo SJ. (1993). Expanding the criterion domain to include elements
of contextual performance. In Schmitt N, Borman WC (Eds.), Personnel selection
in organizations (pp. 71-98). San Francisco: Jossey-Bass.
Boudreau JW. (1991). Utility analysis for decisions in human resource management. In
Dunnette M, Hough L (Eds.), Handbook of industrial and organizationalpsychology
(2nd ed., Vol. 2, pp. 621-745). Palo Alto, C A Consulting Psychologists Press.
Boudreau JW, Sturman MC, Judge TA. (1994). Utility analysis: What are the black boxes,
and do they affect decisions? In Anderson N, Herriot P (Eds.), Assessment and
selection in organizations: Methods andpractice forrecncitment and appraisal (pp. 1196). New York Wiley.
Brand CR. (1994). Open to experience4osed to intelligence: Why the Big Five are
really the Comprehensive Six. Special Issue: The fifth of the Big Five. European
Journal of Personality, 8, 299-310.
Brief M,Motowidlo SJ. (1986). Prosocial organizational behaviors. Academy ofManagement Review, 10, 710-725.
Campbell JP. (1990). Modeling the performance prediction problem in industrial and organizational psychology. In Dunnette MD, Hough LM (Eds.), Handbook of industrialand organizationalpsychology (Vol. 1, pp. 687-732). Palo Alto, C A Consulting
Psychologists Press.
Campbell JP, McCloy RA, Oppler SH, Sager CE. (1993). A theory of performance. In
Schmitt N, Borman W (Eds.), Personnelselection in organizations (pp. 35-70). San
Francisco: Jossey Bass.
Cascio WF. (1995). Whither industrial and organizational psychology in a changing world
of work. American Psychologist, 50, 928-939.
Cattell RB, Butcher HJ. (1968). Theprediction of achievement and creativity. Indianapolis:
Bobbs-Merrill.
Cattell RB, Nine ? (1977). 13re scientific analysis ofpersonality and motivation. London:
Academic Press.
Conway JM. (1996). Additional evidence for the task-contextual performance distinction.
Human Perfomrance, 9, 309-330.
Dawes RM. (1979). The robust beauty of improper linear models in decision making.
American Psychologist, 34, 571-582.
Dawes RM, Corrigan B. (1974). Linear models in decision making. Psychological BuZletin,
81, 95-106.
Day DV, Silverman SB. (1989). Personality and job performance: Evidence of incremental
validity. PERSONNEL PSYCHOLOGY, 42, 25-36.
Dreger RM. (1968). General temperament and personality factors related to intellectual
performance. Journal of Genetic Psychology, 113, 275-293.
Edwards JE, Morrison R E (1994). Selecting and classifying future Naval officers: The
paradox of greater specialization in broader arenas. In Rumsey M, Walker C,
Harris J (Eds.), Personnel selection and classification (pp. 69-84). Hillsdale, NJ:
Sage.
Edwards W, Newman JR. (1982). Multiatmbute evaluation. Beverly Hills: Sage.
Einhorn H, Hogarth R. (1975). Unit weighting schemes for decision making. Organizational Behavior and Human Performance, 13, 171-192.
851
Gatewood RD, Field HS. (1994). Human resource selection (3rd ed.). Hinsdale, I L
Dryden Press.
Gaugler BB, Rosenthal DB, Thornton GC, Bentson C. (1987). Meta-analysis of assessment center validity. Journal ofApplied Psychology, 72, 493-511.
Guion RM. (1991). Personnel assessment, selection, and placement. In Dunnette M,
Hough L (Eds.), Handbook of industrial and organizational psychology (2nd ed.,
Vol. 2, pp. 327-398). Palo Alto, CA: Consulting Psychologists Press.
Gutenberg RL, Arvey RD, Osburn HG, Jenneret PR. (1983). Moderating effects of
decision-makindinformationprocessing job dimensions on test validities. Journal
ofApplied Psychology, 68, 602-608.
Hakstian AR, Woolley RM, Woolley LK, Kryger BR. (1991a). Management selection by
multiple-domain assessment: I. Concurrent validity. Educational and Psychological
Measurement, 51, 883-898.
Hakstian AR, Woolley RM, Woolley LK, Kryger BR. (1991b). Management selection
by multiple-domain assessment: 11. Utility to the organization. Educational and
Pvchological Measurement, 51, 899-911.
Howard A. (1995). The changing nature of work. San Francisco, CA: Jossey-Bass.
Hunter JE. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Journal of Vocational Behavior; 29, 340-362.
Hunter JE, Hirsh HR. (1987). Applications of meta-analysis. In. Cooper CL, Robertson
IT (Eds.), International review of industrial and organizationalpsychology (pp. 321357). Chichester: Wiley.
Hunter JE, Hunter RF. (1984). The validity and utility of alternative predictors of job
performance. Psychological BulletiG %, 72-98.
Hunter JE, Schmidt FL. (1990). Methods of meta-ana&x Newbury Park, C A Sage.
Ilgen D, Hollenbeck J. (1991). The structure of work Job design and roles. In Dunnette
M, Hough L (Eds.), Handbook of industrial and Organizationalpsychology (2nd ed.,
Vol. 2, pp. 165-208). Palo Alto: Consulting Psychologists Press.
Jones GR, Wright PM. (1992). An economic approach to conceptualizing the utility of
human resource management practices. In Ferris G, Rowland K (Eds.), Research
in human resources management (Vol. 10, pp. 31-72). Greenwich, CT JAI Press.
Kemry ER, Mossholder KW, Roth L. (1987). The power of the Schmidt and Hunter
additive model of validity generalization.Journal ofApplied Esychology, 72, 30-37.
Landy FJ, Shankster LJ, Kohler SS. (1994). Personnel selection and placement. Annual
Review of Psychology, 45, 261-296.
McCloy RA,Campbell JP, Cudeck R. (1994). A confirmatory test of a model of performance determinants. Journal ofApplied Psychology, 79, 493-505.
McHenry JJ, Hough LM, lbquam JL, Hanson MA, Ashworth S. (1990). Project Avalidity
results: The relationship between predictor and criterion domains. PERSONNEL
PSYCHOLOGY, 43, 335-355.
McIntyre RM, Salas E. (1995). Measuring and managing for team performance: Emerging
principles from complex environments. In G u m R, Salas E (Eds.), Team effectiveness and decision making in organizations (pp. 945). San Francisco: Jossey-Bass.
Milkovich GT, Boudreau JW. (1994). Human resource management, 7th ed. Homewood,
I L Richard D. Irwin.
Motowidlo SJ, Van Scotter JR. (1994). Evidence that task performance should be distinguished from contextual performance. Journal ofApplied Psychology, 79, 475-480.
Mount MK, Barrick MR. (1995). The Big Five personality dimensions: Implications for
research and practice in human resource management.
In Ferris G (Ed.),
. , Research
in personnel and human resource management (Vol. 13, pp. 153-200). Greenwich,
CT: JAI.
852
PERSONNEL PSYCHOLOGY
853
Schmidt FL. (1994). The future of personnel selection in the U.S.Army. In Rumsey
M, Walker C, Harris J (Eds.), Personnel selection and classification (pp. 333-349).
Hillsdale, NJ: Erlbaum.
Schmidt FL, Hunter JE, Outerbridge AN. (1986). Impact of job experience and ability on
job knowledge, work sample, performance, and supervisory ratings of job performance. Journal ofApplied Psycholo@, 71, 432439.
Schmidt FL, Kaplan LB. (1971). Composite versus multiple criteria: A review and resolution Of the COntrOVerSy. PERSONNEL PSYCHOLOGY, 24, 419434.
Schmidt FL, Ones DO, Hunter JE. (1992). Personnel selection. AnnualReview ofPsyCol08,43, 627-670.
Smith CA, Organ DW, Near JI? (1983). Organizational citizenship behavior: Its nature
and antecedents. Journal ofApplied Psycholoo, 68, 653-663.
Stevens J. (1986). Applied multivariate statistics for the social sciences. Hillsdale. NJ:
Erlbaum.
Sturman M, Judge TA (1995). Utility anabsis for multiple selection devices and multiple
outcomes. Center for Advanced Human Resource Studies Working paper 95-12.
Cornell University
R t t RP, Jackson DN, Rothstein M. (1991). Personality measures as predictors of job
performance: A meta-analytic review. PERSONNEL PSYCHOLOGY,44, 703-745.
Mswesvaran C. (1996, April). Modeling job pe8ormance: Is there a general factor? Presented at the 11th Annual Conference of the Society for Industrial and Organizational Psychology, Inc., San Diego.
Wainer H. (1976). Estimating coefficients in linear models: It dont make no never mind.
Psychological Bulletin, 83, 2 13-21 7.
Wolfe RN, Johnson SD. (1995). Personality as a predictor of college performance. Educational and Psychological Measurement, SS, 177-185.
APPENDIX
PERSONNEL PSYCHOLOGY
854
ables
Then, the covariance between the two composites and the variance of
each is defined as:
COWSP
= w z CxywY (wy is transpose of wy)
PI
V a r s = w, Cxw,
[21
V a r P = w y Cy wyf
[31
Which means that the correlation between a selection composite and a
performance composite is given by:
rsp = covsp I
( d E* d K q
[41
An equivalent formulation that does not use matrix algebra starts with
the correlations among all X and Y variables. Compute:
a = c(wx:)
+2 *
((mi*wxj)
* Correlation between and xj)
b
= C (w:) + 2 * cC((w~i
* ~
cc
151
)
161
[71