Вы находитесь на странице: 1из 32

PERSONNEL PSYCHOLOGY

1997,50

IMPLICATIONS OF THE MULTIDIMENSIONAL


NATURE OF JOB PERFORMANCE FOR THE
VALIDITY OF SELECTION TESTS: MULTIVARIATE
FRAMEWORKS FOR STUDYING TEST VALIDITY
KEVIN R. MURPHY, ANN HARRIS SHIARELLA
Colorado State University
Although most studies of criterion-related validity focus on univariate
relationships, the complex and multidimensional nature of the performance construct and the widespread use of multiple selection devices argue in favor of multivariate frameworks for evaluating validity.
Using a Monte Carlo simulation we estimated the validity of general
cognitive ability tests and personality tests in predicting job performance, where performance is conceptualized as a composite of multiple performance measures (i.e., individual job task performance and
organizational citizenship behaviors). The validity of a selection battery varies substantially as a function of the relative weight given to
both predictors and criteria; the 95% confidence interval for validities
ranged from .20 to .78. The effective weights given to performance dimensions accounted for 34% of the variance in selection battery validities; depending on precisely how performance is defined, the same
test battery can have relatively high or relatively low levels of validity.
Our model suggests that the way an organization defines job performance is a source of true and important variability in validities, and
that the validity of selection tests for predicting complex performance
criteria may show considerably less generalizability that current metaanalysis of univariate validities would suggest.

The validity of the tests or assessment devices used in personnel


selection is usually assessed in terms of the correlation (rzy)between
scores on a test and scores on some performance measure. There have
been literally thousands of studies of the validity of selection tests (See
Hunter & Hirsh, 1987; Hunter & Hunter, 1984; Schmidt, Ones, &
Hunter, 1992 for reviews of validation research), and there is also a substantial body of methodological research aimed at achievingthe best possible estimate of rzV(Landy, Shankster, & Kohler, 1994). The validity
We thank Mike Sturman and Tim Judge for sharing valuable preprints with us and
the reviewers for their many excellent suggestions. An earlier version of this paper was
presented at the Annual Conference of the Societyfor Industrial and Organizational Psychology (1996; San Diego).
Correspondence and requests for reprints should be sent to Kevin R. Murphy, Department of Psychology, Colorado State University, Fort Collins, CO 80523-1876,or e-mail:
KRMURPHY@LAMAR.COLOSTATE.EDU.
COPYRIGHT 0 1997 PERSONNEL PSYCHOLOGY, INC.

823

824

PERSONNEL PSYCHOLOGY

coefficient provides one rough index of how good a job you are likely
to do in selecting among applicants, and when combined with a number
of other parameters of selection decisions (e.g., selection ratios, costs of
testing) helps provide an estimate the utility of these tests (Boudreau,
1991; Boudreau, Sturman, &Judge, 1994).
Much of what we know about the validity of selection devices is based
on analyses of univariate relationships between tests and criterion measures. However, personnel selection is almost always a multivariate process, involving multiple X variables and multiple Y variables. For example, organizations typically use more that one selection measure or
test when hiring (Boudreau et al., 1994; Gatewood & Field, 1994; Hakstian, Woolley, Woolley, & Kryger, 1991a, 1991b; Jones & Wright, 1992;
Milkovich & Boudreau, 1994). Assessment methods (e.g., tests, interviews) that tap multiple domains (e.g., cognitive ability, personality) are
the norm in most selection systems. More important, there is growing
recognition of the fact that the domain of job performance is complex
and multidimensional (Astin, 1964; Borman, Hanson, & Hedge, 1997;
Campbell, 1990; Conway, 1996; Murphy, 1989, 1996). As we will note
in sections that follow, the different facets that underlie the construct
job performance may in some cases be only weakly intercorrelated, and
different organizational policies for emphasizing one facet or another
when definingjob performance could lead to substantially different conclusions about the validity of selection tests.
In this paper, we will argue that fully multivariate approaches, that
consider multiple predictors and multiple facets of the criterion domain,
are better and more realistic for studying the validity of selection tests
than the univariate approaches that have historically dominated this literature. We will illustrate how moving from a univariate to a multivariate framework can change the way we think about the validity and usefulness of selection tests (Boudreau et al., 1994; Murphy, 1996), and in
particular will discuss the implications of changing conceptions of the
domain of job performance for evaluating the validity of selection tests
and test batteries.

Overview
We believe that there are several reasons why a multivariate framework to evaluating the validity of selection tests is preferable to current
univariate approaches. First, as noted above, job performance is not a
simple or unitary phenomenon, and models that treat performance as
a single entity without considering how the different facets of this complex construct are combined can present a misleading picture of the Val-

MURPHY AND SHIARELLA

825

idity and utility of these tests. This implies that multiple Y variables
should routinely be considered when assessing selection validity.
Second, there is abundant evidence that multiple X variables are relevant for predicting which applicants are most likely to perform well in
most jobs. At the most general level, there is considerable evidence that
both general cognitive ability and broad personality traits (e.g., conscientiousness) are relevant to predicting success in a wide array of jobs
(Barrick & Mount, 1991; Hunter & Hirsh, 1987; Hunter & Hunter, 1984;
Ree & Earles, 1994; Tett, Jackson, & Rothstein, 1991). Discussions of
the validity of any one class of tests or measures, considered in isolation,
can present a misleading or incomplete picture of the validity of these
measures in field settings, where multiple X and multiple Y variables are
likely to be the rule rather than the exception.
Third, there is evidence that different facets of job performance have
different antecedents. That is, the attributes that lead some applicants to
excel in specific aspects of performance (e.g., performing individual job
tasks) appear to be different from those that lead some applicants to excel in other aspects of job performance (e.g., teamwork. See Borman et
al., 1997; Day & Silverman, 1989; McCloy, Campbell, & Cudeck, 1994;
McHenry, Hough, Toquam, Hanson, & Ashworth, 1990; Motowidlo &
Van Scotter, 1994; Rothstein, Paunonen, Rush, & King, 1994). The common practice of using multiple tests or assessment methods in personnel
selection might be interpreted as a recognition that any one class of measures, no matter how good, is unlikely to capture the range of attributes
that is likely to be relevant to predicting job performance.
In the sections that follow, we first review research on the multidimensional nature of job performance. This research suggests that job
performance can mean quite different things, depending on the relative
emphasis given to various aspects of the performance domain. Next, we
briefly review research on the individual difference variables most likely
to be related to performance. This review suggests that measures of ability and personality will be relevant to predicting performance in a wide
range of jobs.
Next, we discuss research on the conditions under which the strategy
for combining predictor and/or criterion information is likely to have
a large or a small effect on the correlation between the predictor and
criterion sets. As we will show, the problem of predicting multidimensional job performance criteria on the basis of broad individual difference measures is one in which strategies for combining predictors and/or
criterion dimensions is likely to have a substantial impact on conclusions
about test validity. Next, we review partially multivariate approaches,

826

PERSONNEL PSYCHOLOGY

which involve either using multiple tests to predict a single unitary performance construct, or which involve differential weighting of criterion
attributes to arrive at decisions involving multiple criteria.
Finally, we discuss a fully multivariate approach, in which there are
multiple X variables and multiple Y variables, and in which decisions
must be made about how to weight the Xs to form predictor composites
and about how to combine the Ys to create an overall index of performance. We use a Monte Carlo study, with parameter values obtained
from large-scale studies and meta-analyses, to illustrate how policies for
defining the construct performance and for combining predictor scores
can be important influences on the validity of a personnel selection system. Even when all of the tests used in selection are known to be "valid"
(e.g., as would be the case if psychometricallysound measures of cognitive ability and broad personality traits such as conscientiousnesswere
used in selection), the level of validity can vary substantially, depending
on the extent to which the strategy for selecting applicants is consistent
with the definition of job performance adopted by a particular organization.

The Domain ofJob Peflormance


The construct job performance represents a set of behaviors that is
relevant to the goals of the job or the organization (Astin, 1964). Several studies have examined the dimensionality of job performance (e.g.,
Campbell, 1990; Campbell, McCloy, Oppler, & Sager, 1993; Conway,
1996; Murphy, 1989), and they suggest a number of potential facets to
the performance domain. These facets can be grouped into two broad
categories; (a) individual task performance and (b) behaviors that create and maintain the social and organizational context that allows others
to carry out their individual tasks. Individual task performance (ITP)involves learning the task and the context in which it is performed as well as
being able and motivated to perform the task when it is called for. Many
validity studies appear to equate individual task performance with overall job performance (Hunter, 1986; Murphy, 1989,1996).
In addition to the specific tasks that are included on most job descriptions, the domain of job performance includes a wide range of behaviors such as teamwork, customer service, and organizational citizenship,
which are not always necessary to accomplish the specific tasks in an
individual's job, but are absolutely necessary for the smooth functioning of teams and organizations (Borman & Motowidlo, 1993; Brief &
Motowidlo, 1986; Edwards & Morrison, 1994; McIntyre & Salas, 1995;
Murphy, 1989; Organ, 1988;Smith, Organ, & Near, 1983). For example,

MURPHY AND SHIARELLA

827

Campbells (1990) model of performance includes behaviors such as volunteering, persisting, helping and maintaining individual discipline (See
also Campbell et al., 1993). Labels such as contextual performance,
organizational citizenship and prosocial behaviors have been applied to this facet of the performance domain, and while these three
terms are not interchangeable, they all capture aspects of effective job
performance that are not always directly linked to accomplishingspecific
individual tasks. The use of teams and group-oriented methods of work
organization has grown tremendously in the last decade, and the aspects
of performance that are most relevant to the effective functioning of
teams and work groups appear to be increasingly important in defining
the construct of job performance (McIntyre & Salas, 1995).
A great deal has been written about changes in the way jobs, roles,
and work organizations are defined (e.g., See Cascio, 1995; Howard,
1995;Ilgen & Hollenbeck, 1991;McIntyre & Salas, 1995). These changes
have clear relevance for understanding precisely what job performance
means in different jobs, organizations, and so forth. In particular, it is
likely that the very different sets of behaviors might define effective performance in the same job, depending on how work is organized (e.g.,
individual vs. team-oriented production methods), or how organizations
are structured (e.g., hierarchical structures with rigid job descriptions
vs. fluid structures and relatively undefined job descriptions), and so
forth. Different organizations that do similar work might place substantially different emphasis on individual versus team- or group-oriented
facets of job performance, and validity studies that do not pay careful
attention to possible differences in the meaning and antecedents of job
performance across organizations, or across time, may yield misleading
estimates of the contributions of specific tests or sets of tests to the task
of selecting job applicants who are most likely to perform well.

Predictors of Peflormance
Throughout much of the history of personnel selection research, substantial attention has been devoted to the study of the validity of various selection techniques. Several influential reviews, notable Reilly and
Chao (1982) and Hunter and Hunter (1984) compared the validity of
written tests, interviews, biodata instruments, and other selection instruments. Other important reviews have concentrated on evaluating the
validity and utility of one specific method or family of methods (e.g.,
Gaugler, Rosenthal, Thornton, and Bentson, 1987, reviewed research
on assessment centers).
In recent years, the focus of research and theory in the prediction
of job performance has shifted somewhat from a focus on methods or

828

PERSONNEL PSYCHOLOGY

techniques to a focus on underlying constructs. Broad consensus has


been reached in two areas. First, cognitive ability appears to be relevant to predicting performance in virtually every job studied (Hunter &
Hunter, 1984, McHenry et al., 1990; Nathan & Alexander, 1988; Ree
& Earles, 1994; Schmidt, Hunter, & Outerbridge, 1986). Second, there
are broad personality traits that show generalizable validity across a wide
range of jobs. For example, Barrick and Mount (1991) suggested that
the dimension conscientiousnesswas a valid predictor of performance
across many jobs. Other analyses (e.g., Tett et al., 1991) have suggested
that other similarly broad personality attributes might also show generalizable validity, and have confirmed the finding that individual differences in conscientiousness appear to be consistently related to job performance.
Using a combination of cognitive ability measures and measures of
personality traits such as conscientiousness to predict various facets of
job performance can yield higher validities than those obtained when
ability or personality measures are used alone. There are two reasons
for this. First, as noted above, both classes of measures show generalizable univariate validities. Second, general cognitive ability and conscientiousness appear to be only weakly related (Ackerman, Kanfer, &
Goff, 1995; Barrick, Mount, & Strauss, 1994; Brand, 1994; Cattell &
Butcher, 1968; Cattell & Kline, 1977; Dreger, 1968; Ones, Schmidt, &
Viswesvaran, 1993a; Wolfe & Johnson, 1995), which implies that a combination of measures from these two domains will capture variance that
is not adequately captured by even the best measures of ability or personality considered alone. It also implies that the exact way in which you
combine predictors could have a substantial impact on the validity of a
selection battery that includes ability and personality measures.

Combining Multiple Measures: Conditions Under Which Weights Can


Make a Difference
If we think of both the predictor and the criterion domain as being
multidimensional,questions naturally arise about how information from
multiple predictors and/or from multiple criterion dimensions should be
integrated. For example, if measures of both cognitive ability and specific personality attributes (e.g., conscientiousness)are used to predict a
performance construct that is itself a composite of individual task performance and group or team-oriented facets of performance (e.g., organizational citizenship behaviors), decisions must be made about how to
combine information from multiple predictors and criteria.

MURPHY AND SHIARELLA

829

When multiple predictors and/or criterion measures are combined,


the correlation between the predictor composite and the criterion composite is determined in part by the weights assigned to predictors and
criteria (Stevens, 1986). The choice of weights in forming composites
sometimes makes little practical difference, and the use of unit weights is
often advocated for combining multiple measures into composite predictor or criterion variables (Dawes, 1979; Dawes & Corrigan, 1974; Einhorn & Hogarth, 1975; Schmidt & Kaplan, 1971; Wainer, 1976). However, there are circumstances where the choice of weights for combining
predictors andlor criterion measures does make a meaningful difference.
When these conditions are met, the validity of a multidimensional predictor composite for predicting a multidimensional criterion can depend
substantially on the weights assigned to predictors and to facets of the
criterion domain.
The choice of weights used in forming composites makes a substantial difference when the following conditions are met: (a) predictors
(and/or criterion dimensions) are not highly intercorrelated, (b) each
predictor is correlated with one or more criterion dimension, and (c)
each of the criterion dimensions is most strongly related to a different
predictor variable (Dawes & Corrigan, 1974; Stevens, 1986). All three
of these conditions are likely to be met when ability and personality measures are used to predict multidimensional measures of job performance.
First, as we noted earlier, ability and personality variables such as conscientiousness are only weakly related; the same appears to be true for individual and group or team-oriented facets of the performance domain.
Second, ability and personality constructs seem relevant in predicting
both individual and group or team-oriented facets of performance. Finally, the relative importance of ability and personality characteristics as
predictors appears to differ for individual and group or team-oriented
facets of performance. Evidence bearing on these last two points will be
reviewed in sections that follow.
The importance of the low correlations between alternative predictors and between different facets of job performance cannot be overemphasized. The pattern of correlations observed among these constructs
in the literature suggests that the validity of a selection battery that contains both ability and personality measures will vary considerably, depending on how much emphasis is given to individual versus group or
team-oriented facets of the performance construct. In other words, the
validity of a selection battery might vary considerably depending on the
relative weights given to each of these facets when defining the construct
of overall job performance, and might also vary considerably depending on the relative weight given to ability and personality as predictors of
performance. In a later section of this paper, we will use a Monte Carlo

830

PERSONNELPSYCHOLOGY

simulation to show just how much validity might vary, even when selection is done solely on the basis of measures that are known to be valid
predictors of at least some aspect of performance in virtually all jobs.
Partially Multivariate Approaches to Predicting Peiformance
Ours is not the first study to advocate examining validity using either
multiple predictors or multiple criterion dimensions. Several studies
have examined the relationship between multiple predictors and a single performance criterion, often using multiple regression to link predictors with the criterion measure (e.g., Guion, 1991; Hakstian et al, 1991a;
Ones et al., 1993a;Ones, Schmidt, & Viswesvaran, 1993b;Schmidt, 1992;
Schmidt, Hunter, & Outerbridge, 1986; see also Hunter and Schmidt,
1990, pp. 502-503). These studies illustrate the use of multiple X variables to predict a single Y (i.e., performance is treated as a univariate construct). Multi-attribute utility models (See Bazerman, 1990; Edwards & Newman, 1982; Pritchard, 1990; Roth, 1994) often include multiple criterion dimensions, but they rarely include multiple predictors.
These studies illustrate the use of multiple Y variables, but do not typically include multiple predictors, or multiple independent variables.
The literature on selection validity contains few studies that employ
a fully multivariate model, in which the links between multiple predictors and multiple criteria are simultaneously examined (Murphy, 1996).
As we have noted above, there are several aspects of the personnel selection paradigm that suggest that fully multivariate models would have
considerable utility (e.g., multiple predictors and criterion dimensions,
low correlations among predictors and criterion dimensions, etc.). Such
a model is described in the section that follows, and is applied in a Monte
Carlo simulation to illustrate the importance of the specific definition of
the job performance construct in reaching conclusions about the validity
of selection tests.
A Fully Multivariate Approach for Estimating Selection Test klidity

The Appendix presents the details of a fully multivariate approach


for predicting a multifaceted performance criterion, using information
from multiple selection tests. The strategy outlined in the Appendix involves forming weighted linear composites of both predictors and criteria. As we note in this Appendix, the process of defining overall job
performance is essentially one of deciding how to much weight to assign
to each facet of the performance domain in forming an overall performance composite. For example,you might decide that in your particular

MURPHY AND SHIARELLA

831

organization, individual task performance is very important and organizational citizenship behavior is not (e.g., if production is done on an individual piece-rate basis). This definition of performance is essentially
different from that in another organization where OCBs are seen as just
as important as individual task performance. Similarly, the process of
defining a selection composite is essentially a process of deciding about
the relative weight given to each test in the selection battery.
Although the definition of the performance composite is essentially a
matter of organizational policy (i.e., the organization's definition of what
constitutes job performance), the definition of the predictor composite
need not be. One possibility is to identify the unique set of weights for
the predictor set that maximizes the validity of the selection composite,
given a particular set of weights for the performance comp0site.l Although optimal weights are useful for understanding the upper limits of
prediction, they do not always reflect organizational practice. That is,
organizations often use simpler systems for combining information from
multiple predictors (e.g., equal weighting). In the Monte Carlo study
presented below, we will examine the effects of choosing a wide range
of weights, and will not focus on the statistically optimal weighting system
for combining predictors. We will, however, show that when statistically
optimal weights are used to combine predictors, the same overall pattern
of results holds as when simpler weighting strategies are used.

Nominal I/ersus Effective Weights in Defining the Performance Construct


An organization that used ability tests and conscientiousness measures to select individuals into jobs where overall job performance was
defined in terms of both individual task performance and organizational
citizenship behaviors (OCBs) would find substantially different levels of
validity, depending on how selection tests were combined (i.e., w,;see
the Appendix) and on how performance dimensionswere combined (i.e.,
UJ,). However, the weights assigned to tests and performance dimensions are not the only factors affecting validity.
An additional factor affecting the validity of a set of tests as a predictor of a multidimensional performance construct is the extent to which
individuals actually differ on each of the performance dimensions (i.e.,
the SD of each Y variable). The standard deviations of each of the tests
'Sturman and Judge (1995) note that once w, is defined, it is possible to solve for the
optimal w,,using a matrix equation that is equivalent to:
optimal W, = ( R , -1 R,, w,)/(w,'R, w , ) ~ / ~
Where R5,R,, and R,, represent the matrices of correlations among the variables in
X , the variables in Y and the correlations between variables in X and variables in Y,
respectively.

832

PERSONNEL PSYCHOLOGY

used in selection are arbitrary, and can be rescaled at the test users convenience. The true variability in various aspects of performance is not directly under the researchers control, and it can have a substantial effect
on the effective weight of a specific performance dimension in defining
overall job performance. For example, an organization might set a policy
that says that individual task performance and organizational citizenship
are the two key facets of job performance, and that individual task performance is twice as important as organizational citizenship. We refer
to this statement that organizational citizenship should be given twice
the emphasis as individual performance as a nominal weight. However, if individual differences on OCBs are twice as large as individual
differences on individual task performance, the effective weights of the
two facets of performance in defining overall job performance (and the
validity of the selection composite) will be identical. Regardless of the
organizations stated policy, if subjects actually differ more on one of
the performance dimensions than on the other, the effective weight of
each performance facet in defining overall job performance might be
substantially different from the nominal weight.
One reason why it is important to consider the variability of each
facet of performance is that individual differences, selection policies, organizational socialization experiences or organizational cultures could
conceivably lead to restricted variability in some aspects of performance
and enhanced variability in others. For example, suppose an organization provided extensive training, performance aids and technical support
to assist individual task performance but did nothing to increase organizational citizenship behaviors. This could easily lead to a high mean but
a small standard deviation for individual task performance measures,
and to relatively larger variability in OCBs. Alternatively, strong organizational cultures supporting OCBs might lead to restricted variability
in that facet of performance and to relatively larger variability in individual task performance. In each case, individual and organizational factors that affected the standard deviations of performance facets would
also have an impact on the effective weight of these facets in defining the
performance construct.

Estimating the Validity of Ability and Personality Composites as Predictors


of Multidimensional Perjhormance Composites
To show how the model outlined in the Appendix applies to the problem of predicting future job performance (where the definition of performance depends on the policies of the organization and the extent to
which employees actually vary in different facets of their job behavior),
we will examine the validity of two widely-researched sets of predictors,

MURPHY AND SHIARELLA

833

cognitive ability tests and personality tests that measure the dimension
conscientiousness, in predicting performance. Performance, in turn,
will be defined as some combination of individual task performance and
organizational citizenship behaviors. The relationships between ability,
conscientiousness, individual task performance and organizational citizenship behaviors have all been studied extensively (there are several
meta-analyses that summarize research on specific pairs of variables),
and we can use this research base to build a realistic and informative
Monte Carlo study that examines the effects of a number of critical parameters on the validity of predictor batteries.
The underlying model for the prediction process is illustrated in Figure 1. Organizations use some combination of ability tests and personality tests to predict the future performance of applicants, and overall
job performance is defined as some composite of individual task performance and organizational citizenship. Figure 1includes estimates of the
correlations between each of the pairs of variables along with (in parentheses) estimates of the standard deviations of those correlations. We
include standard deviations in Figure 3 because none of the correlations
shown in Figure 1represents a known fmed quantity. For example, .50
represents a reasonable estimate of the relationship between measures
of cognitive ability and measures of individual task performance, but this
number is not likely to be constant across all jobs (Gutenberg, Arvey, Osburn, & Jenneret, 1983). The standard deviation of .10 reflects the fact
that the correlations between ability tests and individual performance
measures are not completely invariant, but rather fall in some range.
It is important to emphasize that the figures included in Figure 1 are
used solely for the purposes of illustrating the importance of decisions
about the weights assigned to predictor and/or criterion dimensions in
determining the validity of selection tests or batteries. Every one of
the values shown in Figure 1 has been the focus of considerable research
and debate, and researchers in these areas might reasonably propose
alternatives to any of the specific mean or SD values shown in this figure.
Rather thanviewing Figure 1as the definitive summary of what is known
about these constructs, we suggest that it be used as an illustration of
the implications of the consequences of using these sorts of constructs
to predict performance (where performance might take on a variety of
definitions) across a range of job types. In specific settings, jobs, or
types of organizations,one or more of the correlations among these four
constructs might differ substantially from those shown in Figure 1.
Similarly, it is important to emphasize that the outcomes of any
Monte Carlo study depend heavily on the range of parameter values
studied. In this paper, we consider the implications of potentially large

(J5)

.20

(.I 0)

.oo

Ind. Task
Accomplishment

Figure 1: RelationshipsAmong Selection Tests and Performance Dimensions: Estimated Correlations (and Standard Deviations of Estimates)

Conscienti

(-03)

.10

Cog. Ability

E
P

MURPHY AND SHIARELLA

835

differences in the way organizationsvalue different facets of job performance, in the emphasis they give to ability versus personality dimensions
as predictors, and so on. Different choices about the range of parameter
values to be studied would of course affect the results of such a simulation. The purpose of our simulation study is not to establish definitively
the range of validities across all situations researchers might encounter,
but rather to illustrate concretely the principles that are implicit in the
preceding discussions of the effects of the weights given to predictors
and/or criterion dimensions when combining multiple X and/or Y variables. In other words, our purpose here is to illustrate just how much difference the choice of parameter values might make in reaching conclusions about the validity of selection test batteries as predictors of overall
job performance.
Before discussing our Monte Carlo simulation,we will briefly discuss
the sources for each of the values of the estimated correlations (and their
standard deviations) shown in Figure 1.
Ability linkages. In Figure 1, cognitive ability is related to conscientiousness, individual task performance, and organizational citizenship. The first link, between cognitive ability and individual task performance, is one of the most widely studied topics in psychology; psychologists have studied general cognitive ability as a predictor of performance
since the turn of the century (Schmidt, 1994). Large-scale studies and
meta-analyses of ability-performancerelationships (where performance
is typically measured using supervisory ratings) have typically reported
uncorrected validities of .35 or above, with some variability across jobs
that differ in complexity (Hunter & Hunter, 1984, McHenry et al., 1990;
Nathan & Alexander, 1988; Ree & Earles, 1994; Schmidt et al., 1986).2
Many of these studies use measures that confound individual task performance and OCBs (e.g., supervisory ratings are probably affected by
both); studies that focus more exclusively on individually oriented performance measures (e.g., work samples; see Hunter; 1986) often report
uncorrected validities of approximately S O .
21n this simulation we use uncorrected validities for two reasons. First, statistical theory
for understandingmultivariate results on the basis of multiple corrected correlations is not
well developed. Correctionsfor attenuation, range restriction, and so forth affect the interpretations of confidence intervals, significance tests and even effect size measures, and
analyses based on combining several corrected r values can be very difficult to interpret.
Second, it is likely that some of the apparent unreliability of performance measures is due
to the fact that the domain is multidimensional,which will yield low internal consistency
and low agreement between raters who might place different emphasis on the individual
facets of performance. Reliability can be increased by developing and appropriately combining measures of each homogeneousfacet of the performance domain, and as reliability
increases, corrections for attenuation have a vanishingly small effect. Range restrictions
would have been applicable to only a few of the predictor-criterion correlations studied
here, and their effects are typically also small unless selection ratios are quite low.

836

PERSONNEL PSYCHOLOGY

The precise extent of variability in validities is a matter of some dispute; we will use a mean of S O and a standard deviation of .lo in our
Monte Carlo study to represent the distribution of correlations between
cognitive ability and individual task performance. The effective range of
ability-individualtask performance correlations (i.e., three standard deviations from the mean in either direction) would then be approximately
.20 to 30.
As we noted earlier, numerous studies suggest that cognitive ability and conscientiousness are reasonably independent (Ackerman et al.,
1995; Barrick et al., 1994; Brand, 1994; Cattell & Butcher, 1968; Cattell & Kline, 1977; Dreger, 1968; Ones et al., 1993a; Wolfe & Johnson,
1995). In our Monte Carlo analysis, we will specify a mean correlation
of .lo, with a standard deviation of .03, to represent the feasible range of
the ability-personalitycorrelations. This mean and SD yields an effective
range of approximately .01 to .19.
The relationships between cognitive ability and organizational citizenship behaviors have not been as extensively researched as ability-task
performance or ability-personality relationships;virtually all of the studies of the antecedents of OCBs have focused on attitudinal or personality variables rather than abilities (Organ & Ryan, 1995). One large-scale
study (i.e., U.S. Army Project A) reported a reliable estimate of the relationship of general cognitive ability and the Effort and Leadership factor of performance in the military (McHenry et al., 1990). This aspect
of performance is closest to the concept of OCB; McHenry et al. (1990)
reported that ability measures correlated .31 with this aspect of performance. In our Monte Carlo study, we will describe this relationship using a mean correlation of .30 and a standard deviation of .05, yielding an
effective range of .15 to .45.
Conscientiousness linkages. Conscientiousness is linked to both individual task performance and OCBs. Based on meta-analysesby Barrick
and Mount (1991), Mount and Barrick (1995), and Tett et al. (1991), we
estimate the mean and standard deviation of the distribution of correlations between conscientiousness and individual task performance to be
-20 and .04, respectively, yielding an effective range of .08 to .32.
There have been several studies and meta-analyseslinking conscientiousness and organizational citizenship behaviors (Barrick, Mount, &
Strauss, 1992; Becker & Randall, 1994; Organ, 1994; Organ & Konovsky,
1989; Organ & Lingl, 1995; Organ & Ryan, 1995). On the basis of these
studies, we will use a mean correlation of .20 and a standard deviation
of .05 to characterize this relationship. This mean and SD yields an effective range of conscientiousness-OCB correlations of .05 to .35.

MURPHY AND SHIARELLA

837

Organizational citizenship linkages. In Figure 1, OCBs are linked to


individual task performance. We know of one study reporting correlations of .49 and .63 between task performance measures and OCB measures (Avila, Fern, & Mann, 1988). However, the performance measures in that study are not restricted to individual job task performance,
and the correlations they reported probably overestimate the relationship between OCBs and individual task performance. Conway (1996)
reports substantial correlations between measures of individual performance and measures of contextual performance in some jobs (with relatively small correlations in others), but notes that halo error is likely to
substantially inflate correlations between measures of various aspects of
job performance. On the assumption that OCBs and individual task performance could be positively related, negatively related, or independent
(See Organ, 1988; Organ & Ryan, 1995, for discussions of the relationship between OCBs and task performance), we used a mean and SD of
.OO and .lo, respectively, to characterize the relationship between OCBs
and individual task performance. This yields an effective range of -.30
to .30.
Method
We used a Monte Carlo simulation to examine the effects of varying
the weights given to the two tests (i.e., wZ),the weights given to the two
performance dimensions (i.e, wy), and the standard deviations of the
two performance dimensions (i.e., S&b and S&)
on the correlation
between the selection composite and the performance composite. As is
shown in Table 1, we examined 45 different combinations of w,, wY,and
SD values in a 3 x 3 x 5 factorial design.
First, we examined three possible schemes for weighting cognitive
ability relative to conscientiousness in forming a selection battery, using
weights of .3, .5, and .7 for the ability test (coupled with weights of .7, .5,
and .3 for the conscientiousness measure, so that the test weights always
sum to 1.0). We also examined three possible schemes for weighting individual task performance in defining the performance composite, once
again using weights of .3, .5, and .7 for the individual performance dimension (coupled with weights of .7, .5, and .3 for the OCB dimension
measure, so that the criterion weights always sum to 1.0). Finally, we examined five possible combinationsof SD values for the two performance
dimensions, using values of 2,6,10,14, and 18 to represent the standard
deviation of individual task performance (coupled with standard deviations for OCBs of 18,14,10,6, and 2, so that the criterion dimension SD
values always sum to 20).

838

PERSONNEL PSYCHOLOGY
TABLE 1
ValidityEstimates

Entire population
Weighta ability
0.3
0.5
0.7
Weight individual task perf.
0.3
0.5
0.7
SD individual task perf.

Mean r

SD of r values

95% confidence interval

.49

.15

.20 - .78

.41
.51
.54

.09
.14
.17

.23 - .59
.24 - .77
.21 - .87

.46
SO

.16
.17
.17

.14 - .78
.16 - 3 4
.18 - .86

.52

.09 - .66
.38
.15
.16 - .80
.48
.16
.53
.16 - 3 6
.I7
.20 - .88
.17
.54
.51
.I8
.17 - .86
a Note that weights assigned to X and Y variables sum to 1.0,and SD values for Individual Risk Performance and OCBssum to 20, so if weight of SD or Ability or Individual B s k
Performance is known, the weight or SD of Conscientiousnessand OCBsis also known.

We chose to examine a wider range of potential values for the SDs


of the two performance facets than for the weights attached to prediction and criterion measures for several reasons. The relative weights attached to the cognitive ability test and to the conscientiousnessmeasure
represent decisions on the part of the organization, and we reasoned that
they would not choose to include particular tests in a battery and then
assign it a very low weight (so, for example, we did not study situations
where the relative weights of the two tests were .9 and .1). Based on a
similar logic, we chose a relatively narrow range of values (i.e., from .3
to .7) to represent the nominal weights attached to criterion dimensions.
In contrast, we chose to examine a wider range of relative SD,values
for the two performance facets. As noted earlier, these are typically not
under the direct control of the organization, and it is plausible that selection, socialization, or organizational cultures could lead to considerable
homogeneity in some aspects of performance, and therefore to relatively
large amounts of variation in other aspects of job performance. Because
differences in the standard deviations of performance dimensions may
be the indirect (and perhaps inadvertent) result of aspects of the organization or the subject population, it seemed plausible to consider situations where one facet of performance might show substantially more
variability than another, leading to our decision to examine a relatively
wide array of possible SD, values.
Sensitivity analysis. As Figure 1 shows, each of the correlations
among selection tests and performance dimensions is better thought of

MURPHY AND SHIARELLA

839

in terms of a distribution of values than in terms of a fixed value (i.e.,


the standard deviations of each of the correlations is larger than 0). To
estimate the effects of uncertainty in our estimates of each of the correlations shown in Figure 1, we used a multiple-replication procedure,
in which each of these correlations were treated as estimates sampled
from a known distribution, rather than treating them as fixed quantities.
For each of the combinationsof simulationparameters shown in Table 1,
we ran 100replications, each time randomly sampling values for each of
the correlations among the X and Y variables from normal distributions
of correlation coefficients; the means and standard deviations of these
distributions are shown in Figure 1.
For example, we computed 100 separate analyses in which Ability received a weight of .30 (and Conscientiousnessreceived a weight of .70) in
defining the predictor composite, Individual Task Performance received
a weight of .30 (and Organizational Citizenship received a weight of .70)
in defining the criterion composite, and the standard deviation of Individual Task Performance was set to be 2.00 (SD,,b = 18.00). In each of
these 100 analyses, the correlation between Ability and Individual Task
Performance was estimated by randomly sampling from a normal distribution of T values with mean of S O and a standard deviation of .lo.
Each of the other correlations involved in calculating the relationship
between the predictor and criterion composites was estimated similarly
(See Figure 1for the means and standard deviations of each distribution
of correlations). We followed the same procedure in each cell of our 3
x 3 x 5 design.
This procedure yielded a total N of 4,500 (i.e., 3 x 3 x 5 x 100). This
sampling procedure allows us to determine both the best estimate of the
validity of the selection composite, given a particular set of values for
wz, wy, and the SDs, and the range of validities that might reasonably
be observed, given what is known about the the individual correlations
that go into estimating the correlation between the selection composite
and the performance composite. By incorporating concrete measures of
the extent to which specific correlations might be expected to vary across
situations, we are able to realistically estimate not only the mean of the
distribution of selection validities, but more importantly its standard deviation. We are also able to determine the effect of each of the simulation parameters (e.g., wz,w y ) and combinationsof these parameters on
the validity of the selection composites.
We wrote a FORTRAN program that sampled correlations from the
distributions described in Figure 1 and computed the validity of the selection composite for each replication. We then analyzed the resulting
4,500 correlations to determine how the policies for combining predictors and performance dimensions, as well as the variability of different

840

PERSONNEL PSYCHOLOGY

aspects of job performance, might affect the validity and usefulness of


these two tests as predictors of overall job performance.
Simulation Results
Our simulation provided both descriptive data (i.e., mean validities
and variability of validities at different levels of wz, wy, etc.) and data
that could be used to illustrate the effect of different simulation param-,
eters on validity.
Validity levels. Table 1 presents a summary of our findings for validity over all conditions. Over all the replications run for every condition, the mean uncorrected validity for the selection composite that uses
measures of cognitive ability and conscientiousness to predict overall job
performance (defined as a combination of individual task performance
and organizational citizenship) was .49, with a standard deviation of .15.
The 95% confidence interval of validities ranged from .20 to .78.
As Table 1shows, as the relative weight of the general cognitive ability test increases, validity tends to increase; mean validities are .41, Sl.,
and .54 as the relative weight given the cognitive ability tests increases
from .3 to .5 to .7 (If statistically optimal weights are used to combine
predictors, the same pattern is obtained; average multiple R values of
.46, .56, and .61 are found when individual task performance is given a
relative weight of .3., .5, and .7, respectively). As Table 1 also shows,
validity increases as the nominal weight of individual job task performance increases. When individual task performance is weighted .7 the
mean validity reaches .52, and is lower as the relative weight of OCBs in
defining job performance increases. That is, performance is more easily
predicted on the basis of ability and personality tests when this construct
is defined in a way that emphasizes individual task accomplishment than
when it is defined in a way that emphasizes organizational citizenship.
Validity also increases as the standard deviation of individual job
task performance increases, although this increase is not uniform; the
increase in validity levels off at the point where the SD of individual job
performance is 14 and the SD of organizational citizenship is 6 (T- = S4).
Lower validity is obtained when the standard deviations of individual
task performance and OCBs are 18 and 2, respectively (i.e., T = .51).
The mean validities obtained using different combinations of predictor and criterion weights can vary extensively. For example, when OCBs
are the most important aspect of performance (a weight of .7) and cognitive ability tests are given the most emphasis (a weight of .7) in the
selection battery mean validities can be as low as T- = .31. In contrast,
if individual task performance is the most important aspect of the job (a
weight of .7) and cognitive ability tests are the most important part of

MURPHY AND SHIARELLA

841

TliBLE 2
Eflects of Weights and Standard Deviations on Rlidity
Eta squared
Main effects
Weights assigned to selection tests (Wz)
Weights assigned to performance dimensions (wy)
Standard deviations of performance dimensions ( S D )
Interactions:
wz x wy
Wx X SD
Wy X SD

W x X Wy X SD

Residual

.23
.02
.24
.003
.02

.09
.004
.38

the selection battery (a weight of .7), mean validities can be as high as


= .62. In other words, the validity of the same set of test battery for
predicting job performance might be doubled or cut in half, depending
how important individual versus group or team-oriented facets of performance are emphasized in determining the overall effectiveness of each
employee (and on the degree to which individual task performance and
organizational citizenship behaviors actually vary in a particular setting).
Effects of weights and SD values on validity. We used the analysis of
variance to determine the effects of w Z ,wy, and SB values on the validity of the selection composite (correlations were converted to Fishers z
for this analysis). The results of this analysis are presented in Table 2.
All effects in the model were significant beyond the .001 level, but as the
effect size estimates (i.e., eta squared) show, some effects were substantially stronger than others.
The weights assigned to the selection tests accounted for 23% of the
variance in validities, whereas the nominal weights assigned to performance dimensions accounted for only 2% of the variance. However,
the main effect of the standard deviations of performance dimensions
accounted for an additional 24% of the variance, and the interaction
between SD values and performance dimension weights accounted for
another 9%. Because the effective weight of each of the performance
dimensions is a function of the nominal weights (i.e., wy), the standard
deviations, and their interactions, the total effects of the weights assigned
to and the variability of performance dimensions accounts for 34% of the
variance in validities (i.e., .02 -t .23 -t.09).
In general, validity is higher when more emphasis is placed on cognitive ability as a predictor or on individual task performance as a criterion (See n b l e 1). However, if too much emphasis is placed on one
T

PERSONNEL PSYCHOLOGY

842

TABLE 3
Breakdown of Efects of Criterion Weights and CriterionSD Values on Validity
Weighta ITP

SD IT'

Mean T

SD of r values

95% confidence interval

.3

2
6
10
14

.35
.40
.47
.56
.53
.37
.47
.56
.55

.07
.09
.12
.17
.18
.08
.12
.17
.18
.17
.10
.17
.19
.19
.17

.21 - .49
.22 - .57
.24 - .70
.22 - .89
.18 - .89
.21 - .53
.23 - .67
.22 - .87
.18 - .92
.17 - .84
.23 - .6I
.23 - 3 9
.18 - .92
.17 - .87
.15- 3 0

18
2
6
10
14
18
2
6
10
14
18

SO
.42

.56
.55
.52
.48

a ITP = Individual Thsk Performance. Highest means in each set of cells receiving the
same criterion weights are underlined.

of the two tests or on one of the two performance facets, validity can
decrease. This point is most clearly illustrated by examining the interaction between the weights assigned to performance dimensions and the
standard deviations of each performance dimension; the cell means that
define this interaction are shown in Table 3.
As Bble 3 shows, validity is highest when the effective weights of the
two performance dimensions are equal (the values underlined in each
section of Table 3 represent mean validities when the effective weight assigned to individual task performance is the same as the effective weight
for OCBs). As the effective weights assigned to the two facets of performance diverge (in either direction), estimated validity drops. Table 3
suggests that validity is highest when both performance dimensions are
important in defining the composite entitled performance; as the effective definition of this composite shifts toward a heavier emphasis on either individual task performance or organizational citizenship, validities
tend to drop.
Unexplained variance in validities. Table 2 shows that a substantial
portion of the variance in estimated validities (i.e., 38%) is not explained
by the weights attached to tests or performance facets or by the differences in the variability of performance facets. The confidence intervals
shown in Tables 1and 3 provide a concrete illustration of just how much
validities might vary; confidence intervals of 50 to 60 points are quite
common. This extensive variability in validities illustrates an important
aspect of virtually all multivariate procedures-that is, that uncertainty
compounds.

MURPHY AND SHIARELLA

843

The procedures outlined here combine information from a 4 x 4 matrix (i.e., two predictors and two criterion facets) to obtain an overall
validity estimate. Each of the correlations that are included in these calculations (i.e., estimated correlations among ability, conscientiousness,
individual task performance, and organizational citizenship) represents
an uncertain quantity; the standard deviations shown in parentheses in
Figure 1 illustrate how much each one of these values might vary. When
these correlations are combined to estimate the overall validity of the selection battery, you will generally be less certain of this calculated value
than you were of any of the individual values used to compute it (for
example, the overall standard deviation of estimated validities shown in
Table 1 is .15, a value that is larger than any of the standard deviations
shown in Figure 1). We will examine the implications of this aspect of
multivariate validity models in the sections that follow.

There are three important reasons why a multivariate framework for


selection validity research might be preferable to the standard univariate
approach. First, personnel selection is much more likely to be a multivariate process than a univariate one. Multiple tests or assessments are
common, and the goals of personnel selection (e.g., maximize individual
performance, maximize OCBs) are complex and multidimensional. Second, broadly useful predictors (i.e., ability and broad personality measures) appear to capture distinct aspects of performance, which suggests
that multivariate approaches can make a substantial difference in evaluations of the validity of selection tests. Third, as we have shown above,
the way you combine predictors and/or criteria can have a substantial
impact on your evaluations of the success of your personnel selection
system.
The fact that the weights assigned to different tests and to different
dimensions of performance affects the validity of selection test batteries
is hardly surprising; the mathematics of linear combinations virtually
guarantees this effect. What is perhaps surprising is the ewtenf to which
test batteryvaliditiesvary (e.g., the 95% confidence interval for validities
across all conditions ranged from .20 to .78). Both of the classes of tests
examined here are thought to be "valid" for virtually all jobs (Barrick &
Mount, 1991; Hunter & Hunter, 1984; McHenry et al., 1990). However,
it is clear that the level of validity depends very substantially on how
predictors are combined and on how the construct job performance ends
up being defined.
The construct job performance is one that is defined by the demands
of the job, the structure, strategy and mission of the organization, and so

844

PERSONNEL PSYCHOLOGY

forth, and jobs that are similar in terms of their titles, main duties, and so
forth may still yield very different definitions of what constitutes good or
poor performance. This model described here suggests that statements
about the validity and utility of any test or set of tests as predictors of performance must be preceded by a careful analysis of precisely what overall
job performance means in a specific setting. Organizations might vary
considerably in the extent to which they value or emphasize individual
task performance versus behaviors that enhance others performance,
and different workforces might vary substantially on one facet and be
highly homogeneous on another. The multivariate model described here
provides both an impetus and a mechanism for investigating the effects
of organizational policies and workforce characteristics (i.e., variability
in specific aspects of performance) on validity.
There is a long tradition of expressing validity in terms of a single
number (ie., rZy). Multivariate models use the same correlational scale,
but the process by which a validity estimate is obtained is likely to be fundamentally different than that which has characterized validity research
to date. Most validity studies have either taken some specific measure
as an approximation of an ultimate criterion (i.e., the best measure of
performance), or have simply ignored the criterion problem altogether
and treated performance as a simple quantity that can be estimated by a
single data source. As recognition of the complex, multidimensional nature of the performance domain has emerged, it has become increasingly
clear that the traditional univariate models do not provide a sufficient
basis for understanding how personnel selection will in fact affect job
performance (Murphy, 1994,1996). If an organization happens to value
organizational citizenship highly, the fact that a particular test is valid
as a predictor of individual task performance does not necessarily say
much about its relevance for hiring the best workers for that organization. The model developed here suggests that tests must be matched with
the definition of performance adopted by the organization to provide the
best tools for personnel selection.
Three aspects of our simulation results stand out as particularly important. First, the mean of the uncorrected correlations between predictor composites and criterion composites, across all conditions, is .49.
Note that this is lower than the simple correlation between one of the
tests (i.e., cognitive ability) and one of the facets of the criterion domain
(i.e., individual task performance ); the uncorrected correlation between
these two measures was assumed to have a mean value of S O . That is,
studies that focus on the validity of ability tests for predicting this single
facet of the performance domain might overestimate our success in predicting job performance (where performance is defined to include both

MURPHY A N D SHIARELLA

845

individual- and group-oriented facets), even when ability tests are combined with other valid predictors. Our simulation shows that there are a
variety of conditions in which validities well in excess of S O might be expected, but they also show that there are numerous situations in which
these same tests might not do a very good job predicting which applicants
are most likely to perform well.
Second, our simulation suggests that as group-oriented facets of performance become more important (e.g., as the relative weight given to
OCBs in defining the performance construct increases), validities tend
to decrease. As Table 2 shows, average validities tend to be higher when
individual task performance is given more emphasis in definingjob performance. However, lhble 3 suggests that broadening the domain of job
performance to include group-oriented facets does not automaticallydepress validities. This table shows that when the eflective weights of the
two facets of the domain are roughly equal (i.e., the underlined coefficients in Table 3 are ones in which the two facets of the performance
domain receive equal effective weights), average validities can be consistently high.
Third, results shown in Thble 2 provide concrete support for our general assertion that weights matter. Roughly 23% of the variance in the
correlations between predictor and criterion composites could be explained in terms of the weights assigned to cognitive ability and personality measures. Roughly 34% of the variance in validities could be explained in terms of the effective weights of individual task performance
and OCBs in defining job performance. There are many areas of research in personnel psychology in which the weights assigned to predictors and/or criteria do not matter, but this is not one of those areas. If
the constructs used to predict performance are relatively distinct, if the
facets that define the performance domain are also relatively distinct,
and if different aspects of performance have different antecedents, the
weights assigned to predictors and criteria have a substantial impact on
conclusions about the validity of your selection test battery. The literature reviewed here shows that all three of these conditionsmay hold; our
Monte Carlo simulation is merely an illustration of the concrete consequences of using these types of predictors to predict multidimensional
performance criteria.

Implications for Research


Adopting a fully multivariate framework is likely to lead to important
changes in research on selection validity. First, the common practice of

846

PERSONNEL PSYCHOLOGY

examining tests or assessment methods in isolation becomes less appealing as simple methods for taking into account the fundamentally multivariate nature of real selection decisions are made available. There is a
good deal to be learned by examining the validity of a single test or class
of tests (e.g., cognitive ability tests), but the use of validity coefficients
that connect scores on a single test with a single performance measure
are unlikely to provide a good basis for evaluating the probable success
of a personnel selection systems. Research on the broad constructs that
appear to define the domain of job performance and that appear to be
important antecedents to the various facets of job performance has progressed far in the last 10-20 years, and our accumulated knowledge base
is sufficiently rich to support a more sophisticated approach to selection
validity research.
The multivariate framework outlined here also implies that some of
the questions that seemed to be settled in univariate validity research
might need to be reexamined from a univariate perspective. For example, it is generally accepted that cognitive ability tests are among the
most valid predictors of performance in virtually all jobs. However, the
research that supports this conclusion is notoriously vague in defining
just what performance means, and as the jobs and methods of work organization change, the relative importance of those behaviors that are
most closely related to cognitive ability might also change. A better articulation of the job performance construct might help to illustrate shortcomings in our current knowledge that are not at all evident if the term
job performance is taken at face value.
Finally, the results presented here suggest that aggregation of validity
across settings may sometimes be unwise. In particular, if different organizations have fundamentally different definitions of job performance, it
is likely that the true validity of predictors will vary. It is well known that
current meta-analytic methods can have insufficient power for detecting true and meaningful differences in validity across settings (Kemery,
Mossholder, & Roth, 1987; Osburn, Callender, Greener, & Ashworth,
1983; Sackett, Harris, & Orr, 1986); the current analysis suggests concrete factors that might lead to real variance in validity. Our model provides a starting point for studying the fully multivariate model of validity,
but a good deal of work will be needed to develop an adequate statistical model and to determine concretely the circumstances under which
multivariate validities are likely to be sufficiently similar across settings
to conclude that validity in fact generalizes.

MURPHY AND SHlARELLA

847

Implicationsfor Practice
The model and analyses described in this paper carry a number of
implications for practice. First, our analyses suggest that generic statements about the validity of tests or test batteries can be misleading. Depending on how performance is defined by the organization, the same
tests might show relatively high or relatively lower levels of criterionrelated validity. Thus, the first task in estimating the validity of a test or
a battery of tests is likely to be a thorough explication of the performance
construct, as it is defined in that organization. Traditional job-analytic
methods for determining the dimensions of job performance may not
be sufficient; a psychologist who wants to understand and predict job
performance might want to start with an organizational-levelanalysis of
what behaviors are most consistentlyvaluedand supported by the organization. Murphy and Cleveland (1995) note that organizational decision
makers are not always skilled in articulating their values and the relative
weights they assign to different facets of the performance domain; they
suggest ways that tools from decision research might be applied to help
managers articulate what they mean by job performance.
The mean correlations shown for different combinationsof predictor
and criterion weights in Tables 1and 3 suggest that decisions about how
to combine predictors and criterion measures require careful thought
and can have meaningful consequences. The results presented in Tables 1 and 3 also highlight the sometime extensive variability of validities, even in situations where similar strategies for combining predictors
and/or criteria a held constant. In almost all cases, the confidence intervals for validity estimates are quite wide; it is not unusual to find confidence intervals .30 to S O points wide. This finding contrasts sharply with
the common conclusion in validity generalization research that univariate validities are usually quite similar across a range of situations (See
Hunter & Hunter, 1984 and Hunter & Hirsh, 1987 for reviews). In fact,
there is no real conflict between these two sets of findings. Multivariate validity estimates will often vary extensively across settings because
uncertainty compounds. That is, a test user might be reasonably certain about the univariate relationships between specific tests and specific facets of job performance (i.e., confidence intervals for each individual test-performance facet correlation might be small), but still be
uncertain about the overall validity of a battery of tests used to predict
a multi-faceted criterion. As we noted in our introduction, personnel
selection is almost always a multivariate problem, involving both multiple predictors and multiple facets of the criterion domain. Univariate
validity studies, or meta-analyses of these studies, may underestimate
the uncertainty that a test user is likely to encounter in estimating the

848

PERSONNEL PSYCHOLOGY

value of a test or set of tests to his or her organization. We hesitate to


recommend a wholesale return to local validation studies (small samples are likely to be an even more pressing problem in fully multivariate
validity studies than in univariate ones), but caution must be observed
in transporting validity estimates across contexts where the fundamental
definitions of job performance might vary.
Finally, aspects of the organization, the workforce, or specifichuman
resource systems (e.g., selection, compensation) might substantially affect the extent to which employees actually vary in specificaspects of performance (i.e., SDit, or SD,,b). As we noted earlier, the effective weight
of individual task performance versus organizational citizenship behaviors in determining precisely what job performance means depends both
on the organizationsstated policies and on the extent to which individuals vary in each facet of job performance. Practitioners who apply models such as our must consider the possibility that nominal weights will
differ substantially from effective weights, and should take appropriate
steps (e.g., rescaling weights to incorporate SD information) to make
sure that the actual weights of performance facets in defining the performance domain correspond with the policies articulated by organizational decision makers.

Limitations and Conclusions

As we have emphasized throughout, the Monte Carlo simulation presented here is used solely to illustrate the concrete implications of the
principles of measurement and analysis that underlie the multivariate
validation model, and it is important to understand the limitations of this
simulation. First, the correlations shown in Figure 1, which form the
heart of this simulation, are all subject to debate. Sensible arguments
could be made for different values, and changes in these correlations
could lead to numerous changes in the outcomes of our simulation. Second, the validities discussed in this paper are mean observed validities,
and few efforts have been made to correct for various factors that could
attenuate these correlations. We think there are good reasons to focus
on observed validities (see footnote 2), but it is important to keep the
distinction between observed and corrected validity estimates firmly in
mind when evaluating our results.
There are also important limitations to the state of our current understanding of the job performance domain. There have been several
recent notable advances, both in the areas of construct explication (e.g.,
Borman & Motowidlo, 1993; Campbell et al., 1993) and in empirical
analyses of performance measures (e.g., Conway, 1996; Viswesvaran,
1996), but there are still important questions to be asked about the do-

MURPHY AND SHIARELLA

849

main of job performance, and about the extent to which broad factors
such as those studied here can be used to characterize performance
across a variety of jobs, organizations, and so forth. In particular, there
are still important questions about the strength of the relationship between individual and group- or team-oriented facets of performance,
and about the extent to which this relationship varies across jobs, settings, and so forth. (Conway, 1996).
The main conclusion that this paper has to offer is that there is much
to be gained by thinking carefully about exactly what job performance
means in different settings, and about the implications of this construct
definition for research on selection test validity. This paper presents a
simple multivariate framework for thinking about and studying test validity, and we believe that the advantages of applying this approach outweigh the difficulties. First, it presents a relatively simple method that
can be applied to any number of tests or assessments. Because selection
tests are often designed to tap distinct (but not necessarily orthogonal)
domains, the effects of using several tests in combination can be quite
different from using them in isolation, and the multivariate model developed here provides a framework for studying such combinations. More
important, the model encourages you to recognize and incorporate information about the complex, multidimensional domain we refer to as
job performance. If your goal is to predict who is most likely to perform
well or poorly, it is important to start with a careful analysis of exactly
what you mean by job performance, and an analysis of the performance
domain is likely to lead you to the conclusion that multivariate models
are preferable to the univariate models that have long characterized research on the validity of selection tests.
REFERENCES
Ackerman PL, Kanfer R, Goff M. (1995). Cognitive and noncognitive determinants and
consequences of complex skill acquisition. Journal of Experimental Psychology: Applied, I , 270-304.
Astin A. (1964). Criterion-centeredresearch. Educational and PsychologicaZMeasurement,
24, 807422.
Avila RD, Fern EF, Mann OK (1988). Unraveling the criteria for assessing the performance of salespeople: A causal analysis.Journal of Personal Selling and Sales Management, 8, 45-S4.
Barrick MR, Mount MK (1991). The Big Five personality dimensions and job performance:A meta-analysis. PERSONNEL PSYCHOLOGY, 44, 1-26.
Barrick MR, Mount MK, Strauss JP. (1992, May). The Big Five and abiliry predictors of
citizemhip, delinquency, and salesperformance. Paper presented at Seventh Annual
Conference of the Society for Industrial and OrganizationalPsychology,Inc., Montreal.
Barrick MR, Mount MK, Strauss JP: (1994). Antecedents of involuntary turnover due to
a reduction in force. PERSONNEL PSYCHOLOGY, 47, 515-535.

850

PERSONNEL PSYCHOLOGY

Bazerman M. (1990). Judgment in managerial decision making (2nd ed.). New York Wiley.
Becker TE, Randall DM. (1994). Validation of a measure of organizational citizenship
behavior against an objective behavioral criterion. Educational and Psychological
Measurement, 54, 160-167.
Borman WC, Hanson M, Hedge J. (1997). Personnel selection. AnnualReview ofPsycho1om, 48, 299-337.
Borman WC, Motowidlo SJ. (1993). Expanding the criterion domain to include elements
of contextual performance. In Schmitt N, Borman WC (Eds.), Personnel selection
in organizations (pp. 71-98). San Francisco: Jossey-Bass.
Boudreau JW. (1991). Utility analysis for decisions in human resource management. In
Dunnette M, Hough L (Eds.), Handbook of industrial and organizationalpsychology
(2nd ed., Vol. 2, pp. 621-745). Palo Alto, C A Consulting Psychologists Press.
Boudreau JW, Sturman MC, Judge TA. (1994). Utility analysis: What are the black boxes,
and do they affect decisions? In Anderson N, Herriot P (Eds.), Assessment and
selection in organizations: Methods andpractice forrecncitment and appraisal (pp. 1196). New York Wiley.
Brand CR. (1994). Open to experience4osed to intelligence: Why the Big Five are
really the Comprehensive Six. Special Issue: The fifth of the Big Five. European
Journal of Personality, 8, 299-310.
Brief M,Motowidlo SJ. (1986). Prosocial organizational behaviors. Academy ofManagement Review, 10, 710-725.
Campbell JP. (1990). Modeling the performance prediction problem in industrial and organizational psychology. In Dunnette MD, Hough LM (Eds.), Handbook of industrialand organizationalpsychology (Vol. 1, pp. 687-732). Palo Alto, C A Consulting
Psychologists Press.
Campbell JP, McCloy RA, Oppler SH, Sager CE. (1993). A theory of performance. In
Schmitt N, Borman W (Eds.), Personnelselection in organizations (pp. 35-70). San
Francisco: Jossey Bass.
Cascio WF. (1995). Whither industrial and organizational psychology in a changing world
of work. American Psychologist, 50, 928-939.
Cattell RB, Butcher HJ. (1968). Theprediction of achievement and creativity. Indianapolis:
Bobbs-Merrill.
Cattell RB, Nine ? (1977). 13re scientific analysis ofpersonality and motivation. London:
Academic Press.
Conway JM. (1996). Additional evidence for the task-contextual performance distinction.
Human Perfomrance, 9, 309-330.
Dawes RM. (1979). The robust beauty of improper linear models in decision making.
American Psychologist, 34, 571-582.
Dawes RM, Corrigan B. (1974). Linear models in decision making. Psychological BuZletin,
81, 95-106.
Day DV, Silverman SB. (1989). Personality and job performance: Evidence of incremental
validity. PERSONNEL PSYCHOLOGY, 42, 25-36.
Dreger RM. (1968). General temperament and personality factors related to intellectual
performance. Journal of Genetic Psychology, 113, 275-293.
Edwards JE, Morrison R E (1994). Selecting and classifying future Naval officers: The
paradox of greater specialization in broader arenas. In Rumsey M, Walker C,
Harris J (Eds.), Personnel selection and classification (pp. 69-84). Hillsdale, NJ:
Sage.
Edwards W, Newman JR. (1982). Multiatmbute evaluation. Beverly Hills: Sage.
Einhorn H, Hogarth R. (1975). Unit weighting schemes for decision making. Organizational Behavior and Human Performance, 13, 171-192.

MURPHY AND SHIARELLA

851

Gatewood RD, Field HS. (1994). Human resource selection (3rd ed.). Hinsdale, I L
Dryden Press.
Gaugler BB, Rosenthal DB, Thornton GC, Bentson C. (1987). Meta-analysis of assessment center validity. Journal ofApplied Psychology, 72, 493-511.
Guion RM. (1991). Personnel assessment, selection, and placement. In Dunnette M,
Hough L (Eds.), Handbook of industrial and organizational psychology (2nd ed.,
Vol. 2, pp. 327-398). Palo Alto, CA: Consulting Psychologists Press.
Gutenberg RL, Arvey RD, Osburn HG, Jenneret PR. (1983). Moderating effects of
decision-makindinformationprocessing job dimensions on test validities. Journal
ofApplied Psychology, 68, 602-608.
Hakstian AR, Woolley RM, Woolley LK, Kryger BR. (1991a). Management selection by
multiple-domain assessment: I. Concurrent validity. Educational and Psychological
Measurement, 51, 883-898.
Hakstian AR, Woolley RM, Woolley LK, Kryger BR. (1991b). Management selection
by multiple-domain assessment: 11. Utility to the organization. Educational and
Pvchological Measurement, 51, 899-911.
Howard A. (1995). The changing nature of work. San Francisco, CA: Jossey-Bass.
Hunter JE. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Journal of Vocational Behavior; 29, 340-362.
Hunter JE, Hirsh HR. (1987). Applications of meta-analysis. In. Cooper CL, Robertson
IT (Eds.), International review of industrial and organizationalpsychology (pp. 321357). Chichester: Wiley.
Hunter JE, Hunter RF. (1984). The validity and utility of alternative predictors of job
performance. Psychological BulletiG %, 72-98.
Hunter JE, Schmidt FL. (1990). Methods of meta-ana&x Newbury Park, C A Sage.
Ilgen D, Hollenbeck J. (1991). The structure of work Job design and roles. In Dunnette
M, Hough L (Eds.), Handbook of industrial and Organizationalpsychology (2nd ed.,
Vol. 2, pp. 165-208). Palo Alto: Consulting Psychologists Press.
Jones GR, Wright PM. (1992). An economic approach to conceptualizing the utility of
human resource management practices. In Ferris G, Rowland K (Eds.), Research
in human resources management (Vol. 10, pp. 31-72). Greenwich, CT JAI Press.
Kemry ER, Mossholder KW, Roth L. (1987). The power of the Schmidt and Hunter
additive model of validity generalization.Journal ofApplied Esychology, 72, 30-37.
Landy FJ, Shankster LJ, Kohler SS. (1994). Personnel selection and placement. Annual
Review of Psychology, 45, 261-296.
McCloy RA,Campbell JP, Cudeck R. (1994). A confirmatory test of a model of performance determinants. Journal ofApplied Psychology, 79, 493-505.
McHenry JJ, Hough LM, lbquam JL, Hanson MA, Ashworth S. (1990). Project Avalidity
results: The relationship between predictor and criterion domains. PERSONNEL
PSYCHOLOGY, 43, 335-355.
McIntyre RM, Salas E. (1995). Measuring and managing for team performance: Emerging
principles from complex environments. In G u m R, Salas E (Eds.), Team effectiveness and decision making in organizations (pp. 945). San Francisco: Jossey-Bass.
Milkovich GT, Boudreau JW. (1994). Human resource management, 7th ed. Homewood,
I L Richard D. Irwin.
Motowidlo SJ, Van Scotter JR. (1994). Evidence that task performance should be distinguished from contextual performance. Journal ofApplied Psychology, 79, 475-480.
Mount MK, Barrick MR. (1995). The Big Five personality dimensions: Implications for
research and practice in human resource management.
In Ferris G (Ed.),
. , Research
in personnel and human resource management (Vol. 13, pp. 153-200). Greenwich,
CT: JAI.

852

PERSONNEL PSYCHOLOGY

Murphy KR. (1989). Dimensions of job performance. In Dillon R, Pelligrino J (Eds.),


Testing: Applied and theoreticalperspectives (pp. 218-247). New York Praeger.
Murphy KR. (1994). Toward a broader conception of jobs and job performance: Impact
of changes in the military environment on the structure, assessment, and prediction
of job performance. In Rumsey M, Walker C, Harris J (Eds.), Pemonnel selection
and classification (pp. 85-102). Hillsdale, NJ: Erlbaum.
Murphy KR. (1996). Individual differences and behavior in organizations: Much more
than g. In Murphy K (Ed), individual differences and behavior in organizations
(pp. 3-30). San Francisco: Jossey-Bass.
Murphy KR, Cleveland JN. (1995). Understandingper$omnce appmisal: Social organizational and goal-basedperspectives. Thousand Hills, C A Sage.
Nathan BR, Alexander RA. (1988). A comparison of criteria for test validation: A metaanalytic investigation. PERSONNEL PSYCHOLOGY, 41, 517-535.
Nunnally JC. (1978). Psychometric theoy. New York McGraw-Hill.
Ones DS, Schmidt FL, Viswesvaran C. (1993a, May). integrity and ability: Implications for
incremental validity and adverse impact. Presented at the Eighth Annual Conference
of Society for Industrial and Organizational Psychology, Inc., San Francisco.
Ones DS, Schmidt FL, Uswesvaran C. (1993b, May). Nomological net for measures of integrity and conscientiousness. Presented at the Eighth Annual Conference of Society
for Industrial and Organizational Psychology, Inc., San Francisco.
Organ DW. (1988). Organizational citizenship behavior: The good soldier syndrome. Lexington, MA: Lexington,Books.
Organ DW. (1994). Personality and organizationalcitizenship behavior. Journal of Management, 20, 465-418.
Organ DW, Konovsky MA. (1989). Cognitive versus affective determinants of organizational citizenship behavior. Journal ofApplied Psycholw, 74, 157-164.
Organ DW, Ling1 A. (1995). Personality, satisfaction, and organizational citizenship behavior. Journal of Social Psychology, 135, 339-350.
Organ DW, Ryan K. (1995). A meta-analyticreview of attitudinal and dispositional predictors of organizational citizenshipbehavior. PERSONNELPSYCHOLOGY, 48, 775-802.
Osburn HG, Callender JC, Greener JM, Ashworth S. (1983). Statistical power of tests of
the situational specificity hypothesisin validity generalizationstudies: A cautionary
note. Journal ofApplied Psychology, 68, 115-122.
Pritchard RD. (1990). Measuring and improving organizational productivity. New York
Praeger.
Ree MJ, Earles JA. (1994). The ubiquitous productiveness of g. In Rumsey MG, Walker
CB, Harris JH (Eds.), Personnel selection and classificafion (pp. 127-136). Hillsdale, NJ: Erlbaum.
Reilly RR, Chao GT (1982). Validity and fairness of some alternate employee selection
procedures. PERSONNEL PSYCHOLOGY, 35, 1-67.
Roth PL. (1994). Multi-attribute utility analysis using the ProMES approach. Joumal of
Business and Psycholoo, 9, 69-80.
Rothstein MG, Paunonen SV, Rush JC, King GA. (1994). Personality and cognitive ability
predictors of performance in graduate business school. Journal of Educational
Psycholoo, 86, 516-530.
Sackett PR, Harris MM, Orr JM. (1986). On seeking moderator variables in the metaanalysisof correlational data: A Monte Carlo investigationof statistical power and
resistance to m e I error. Journal ofApplied Psycholoa, 71, 302-310.
Schmidt FL. (1992). What do data really mean? Research findings, meta-analysis, and
cumulative knowledge in psychology. American Psychologist, 47, 117fl181.

MURPHY AND SHIARELLA

853

Schmidt FL. (1994). The future of personnel selection in the U.S.Army. In Rumsey
M, Walker C, Harris J (Eds.), Personnel selection and classification (pp. 333-349).
Hillsdale, NJ: Erlbaum.
Schmidt FL, Hunter JE, Outerbridge AN. (1986). Impact of job experience and ability on
job knowledge, work sample, performance, and supervisory ratings of job performance. Journal ofApplied Psycholo@, 71, 432439.
Schmidt FL, Kaplan LB. (1971). Composite versus multiple criteria: A review and resolution Of the COntrOVerSy. PERSONNEL PSYCHOLOGY, 24, 419434.
Schmidt FL, Ones DO, Hunter JE. (1992). Personnel selection. AnnualReview ofPsyCol08,43, 627-670.
Smith CA, Organ DW, Near JI? (1983). Organizational citizenship behavior: Its nature
and antecedents. Journal ofApplied Psycholoo, 68, 653-663.
Stevens J. (1986). Applied multivariate statistics for the social sciences. Hillsdale. NJ:
Erlbaum.
Sturman M, Judge TA (1995). Utility anabsis for multiple selection devices and multiple
outcomes. Center for Advanced Human Resource Studies Working paper 95-12.
Cornell University
R t t RP, Jackson DN, Rothstein M. (1991). Personality measures as predictors of job
performance: A meta-analytic review. PERSONNEL PSYCHOLOGY,44, 703-745.
Mswesvaran C. (1996, April). Modeling job pe8ormance: Is there a general factor? Presented at the 11th Annual Conference of the Society for Industrial and Organizational Psychology, Inc., San Diego.
Wainer H. (1976). Estimating coefficients in linear models: It dont make no never mind.
Psychological Bulletin, 83, 2 13-21 7.
Wolfe RN, Johnson SD. (1995). Personality as a predictor of college performance. Educational and Psychological Measurement, SS, 177-185.

APPENDIX

Calculating Validity Coeflcients in a Fully Multivariate Model


If personnel selection typically involves both multiple predictors and
multiple criterion dimensions, the validity of the set of selection tests will
be defined as the correlation between the selection composite (5) and
the performance composite ( P ) . To determine how well a set of selection tests predicts a multidimensional performance criterion, we can use
well known matrix equations for the correlations between composites
(Nunnally, 1978).
If Y is a NY x N matrix of measures of specific performance facets
is a 1 x NY vector of weights that reflect
for each of N examinees, and q,
the relative importance of each of these facets in defining overall job
performance, the performance composite ( P ) is defined as P = w y Y.
In other words, the composite P is formed by multiplying scores on each
facet of the performance domain by a weight that reflects its importance
to the organization, the researcher, or whoever defines the meaning of
the term overall job performance.

PERSONNEL PSYCHOLOGY

854

Similarly, if X is a NX x N matrix of scores on selection tests for


each of N examinees, and wz is a 1 x NX vector of weights that reflect
the relative importance of each of these tests, the selection composite
is defined as a linear combination of test scores, or S = w, X . The
correlation between a composite of the X variables and a composite of
the Y variables is easily obtained, using the following equations: Define:
C x = variance-covariance matrix (NXx NX)among the X variables
Cy = variance-covariance matrix (NY x NY)among the Y variables
C x y = NX x NY matrix of covariances between X variables and Y vari-

ables
Then, the covariance between the two composites and the variance of
each is defined as:
COWSP
= w z CxywY (wy is transpose of wy)
PI
V a r s = w, Cxw,
[21
V a r P = w y Cy wyf
[31
Which means that the correlation between a selection composite and a
performance composite is given by:
rsp = covsp I

( d E* d K q

[41
An equivalent formulation that does not use matrix algebra starts with
the correlations among all X and Y variables. Compute:
a = c(wx:)
+2 *
((mi*wxj)
* Correlation between and xj)
b

= C (w:) + 2 * cC((w~i
* ~

* Correlation between yi and yj)


c=
((mi
* Wyj)
* Correlation between xi and yj)

cc

151
)

161
[71

Here, single summation (C)is used to designate summing the squared


weights in X or Y ,while double summation (EX) is used to indicate
summing products of weights of variables i and j , multiplied by the
correlations between the two variables, taken over all pairs of variables.
The correlation between a selection composite and a performance
composite is given by:
r s p = c / (all2 * b1I2)
[81
Formulas 4 and 8 are exactly equivalent.

Вам также может понравиться