Вы находитесь на странице: 1из 11

Wonderlic Critique 1

Running head: WONDERLIC CRITIQUE

Technical Critique of the Wonderlic Personnel Test (WPT)

Stephanie A. Chadwick

University of Denver
Wonderlic Critique 2

Technical Critique of the Wonderlic Personnel Test (WPT)

Testing an athlete for success or performance might seemingly include week-long

assessments of physical skills including reaction time, speed, coordination, and other relevant

bodily tasks. The National Football League (NFL), however, uses more wide-ranging approach

to assessing an athlete’s future performance with standardized testing in addition to physical

tests, collectively known as the NFL Combine (History, 2010). Other well known standardized

tests include the SAT, the GRE, and perhaps one of the most well known, the Stanford-Binet

Intelligence Quotient (IQ) test. The NFL uses none of these, but does use an IQ test called the

Wonderlic Personnel Test-Quicktest (WPT-Q) to assess general intelligence. Although the NFL

has not made any official statements about why they use the WPT-Q—or any other intelligence

test—Wonderlic, Inc. (2010) currently suggests that, “How well a player will learn the playbook

and adapt within the scope of the team is forecast by the WPT. Of course, this is true for any

team, in any workplace” (Football and “The Wonderlic” section, ¶ 11). This definition was

slightly amended through the years (Wonderlic, 2004), but in general low scores on the

Wonderlic tend to indicate an aptitude or “learning problem” (Mulligan, 2004). While the overall

purpose of the Wonderlic test may be to assess general intelligence (g) which evidently

determines ability to learn the playbook, using psychometric criteria, many have questioned the

use of the Wonderlic Personnel Test as a measure of intelligence, and as a tool for measuring

success and performance in the NFL. Unfortunately for this examination, official Wonderlic, Inc.

materials were inaccessible due to the proprietary nature, but an analysis of recent research on

the subject was conducted to provide evidence of these criteria.


Wonderlic Critique 3

History

The Wonderlic Personnel Test, developed in 1937 by Edlon Wonderlic, was first

published in 1950 as a pre-employment test (Wonderlic, 2010). Since the original publishing,

Wonderlic, Inc.—now run by Eldon Wonderlic’s grandson, Charles Wonderlic—has amended the

original WPT and republished the test. Wonderlic, Inc. has also developed several versions of the

WPT, including the slightly shorter Wonderlic Personnel Test-Quicktest (WPT-Q) used in the

NFL, in addition to a variety of other tests for employers and educators. The Wonderlic

Personnel Test costs about $101 (Blumberg, 2006), and is a is a 12 minute, timed-test with 50

questions which can be completed on a computer or with pen and pencil; the Wonderlic

Personnel Test-Quicktest is simply a short-version of the WPT and is only eight minutes. Bertsch

& Pesta (2001) report that the WPT has a population mean score of 22 out of 50, and a

population SD of 7. The NFL average score in 2004 on the Wonderlic was a 19 (Mulligan,

2004). The Wonderlic is most often used as an employment selection tool, and is reported to be

used by almost three million people every year by almost seven thousand clients (Hatch, 2009).

Reliability

Reliability is one of the two primary techniques for evaluating the psychometric value of

a psychological test. The stability or consistency of a set of test scores from one person or from a

group of people establishes reliability (Johnson & Christensen, 2008). There are many different

types of reliability, any of which are suitable methods to verify stability of test scores, including

test-retest reliability, alternate/equivalent forms reliability, inter-scorer reliability, and internal

consistency. Concerning the reliability of the Wonderlic Personnel Test, Bell, Matthews, Lassiter,

& Leverett (2002), found that “Internal consistency reliabilities of the WPT range from .88 to .94

1
The Fourteenth Mental Measurements Yearbook (2001) reports the price of one test in 2000 as
$1.80. The price of the test may vary depending on the volume of use, and the type of form used.
Wonderlic Critique 4

while alternate-form reliability estimates range from .73 to .95 (Wonderlic, 1992). Test-retest

reliabilities range from .82 to .94 (Dodrill, 1983; Wonderlic, 1992)” (p. 116). More recently,

Geisinger (2001), reported from the Wonderlic Manual a .82 to .94 test-retest reliability

coefficient; a alternate-forms reliability coefficients as .73-.95; and a Kuder-Richardson-20

(internal consistency) as .88. It seems as though reliability has remained fairly stable over time,

and is with the range of minimum psychometric criteria2 for a psychological test.

Validity

The second type measure of psychometric value is validity. There are three dimensions to

the validity of a psychological test including Content-related validity, Criterion-related validity,

and Construct validity. The validity of a test establishes whether the test measures exactly, and

only, what it claims to measure, and unlike reliability, all three of these dimensions are equally

necessary to ensure validity.

Content-related validity is the process of making a “judgment of the degree to which the

evidence suggests that the items, tasks, or questions on your test adequately represent the domain

of interest” (Johnson & Christensen, 2008, p. 152). Unfortunately the content validity of the

Wonderlic Personnel Test was not addressed in the literature. The only bit of evidence to

demonstrate content related validity was by Geisinger (2001) who reported that the creator,

Edlon Wonderlic, adapted from a well-known and often used test called Otis Self-Administering

Tests of Mental Ability. Eldon Wonderlic was also an industrial psychologist, so his

understanding of current practices for assessing employee qualifications may have addressed the

content universe to some degree (Hatch, 2009).

2
“A popular rule of thumb is that the size of coefficient alpha should generally be, at a minimum, greater than or
equal to .70 for research purposes and somewhat grater than the value (e.g. ≥ .90) for clinical purposes (i.e. for
assign single individuals)” (Johnson & Christensen, p. 149).
Wonderlic Critique 5

Collecting the necessary questions to create a test is the first way to establish validity, and

ensuring a test predicts what it claims to predict, a criterion, is the second. According to

Wonderlic, Inc. (2010), “The WPT-Q is a short-form measure of general intelligence or cognitive

ability-the most powerful predictor of job success” (¶ 1). To further define success, Michael

Callans, President of Wonderlic Consulting, states that, “‘Corporations could learn a lot from the

NFL's use of testing,’ …Intelligence and personality determine a candidate's success, on the field

or off. Team owners recognize that the most successful [draft] choices are those players who not

only have the strength to play the game but the mental acuity to win. And they don't rely on gut

instinct to tell them who's right for the job. They get proof through testing” (Wonderlic, 2004, ¶

9). The proof, or measures of mental acuity as it relates to “success” in the NFL is are often

subjective, and relative, terms to the public, the players, and especially the team managers and

owners. Nonetheless, independent measures of success in this analysis are the criterion from

which we can determine whether the Wonderlic Personnel test measures what it claims.

Kuzmits & Adams (2008) offer a definition of success similarly for three positions,

quarterbacks, wide receivers, and running backs. The first definition is draft order, and two other

definitions are measured in the first three years of a player’s career and include salary and games

played. Other measures vary depending on the player’s position, but can consist of quarterback

rating3, average carry yards gained for a running back, and average reception yards gained for

wide receivers (Kuzmitz and Adams, 2008). In a replication study, Kuzmitz and Adams (2008)

then correlated these measures of success with the player’s scores obtained from the NFL

Combine and found similar results as previous research , that the research “failed to show a

relationship between the WPT and NFL scores” (p. 1726). Indeed, of the 30 correlations ran,

3
A quarterback’s rating is based on four criteria, “percentage of completions per attempt, average yards gained per
attempt, percentage of touchdown passes per attempt, and percentage of interceptions per attempt” (NCAA and NFL
passing efficiency computation, 2008, p. 1723).
Wonderlic Critique 6

most of the coefficients were less than r = .20, and only two were considered statistically

significant. The lack of a relationship between the WPT and measures of success indicates a

problem of predictive validity, which is a type of criterion-related validity. Predictive validity

evidence is that which measures a relationship between the current measure and events which

happen in the future (Furr & Bacharach, 2008).

A second type of criterion-related validity is concurrent evidence which is administering

your test and a second, validated criterion-related test (called a focal test) at the same time, and

verifying the test scores correlate. Wonderlic, Inc. (2010) claims the WPT is a test for general

intelligence and cognitive ability, and to determine concurrent evidence between the WPT and

another measure of general intelligence, Matthews and Lassiter (2007) compared the Wonderlic

Test to test measures of general intelligence (g) in the Woodcock Johnson Revised Test of

Cognitive Abilities (WJ-R), which now also measures crystallized intelligence (Gc), and fluid

intelligence (Gf). Fluid intelligence is “an ability to solve novel problems quickly and

efficiently… [and] crystallized intelligence is related to cognitive abilities based on experience”

(Matthews & Lassiter, 2007, pp. 707-709). The authors’ conclusion was that the WPT indeed

showed measures of concurrent validity with the WJ-R full battery (r = .55, p ≤ .01), but

differing associations with crystallized intelligence (r = .34, p ≤ .05), and fluid intelligence (r = .

26, n.s.), which may indicate problems with convergent validity, a type of construct validity

discussed in the next section.

In addition to content and criterion validity, construct validity is the third appraisal of

validity. McIntire & Miller (2007) describe the process of establishing construct validity as a

“gradual accumulation of evidence that the scores on the test relate to the observable behaviors

in the ways predicted by the theory underlying the construct” (p. 228). Although construct
Wonderlic Critique 7

validity is by far the most difficult to define and prove, methodologists have established two

specific strategies: discriminant (constructs that should not be related are not) and convergent

(construct is related to test scores) validity (McIntire & Miller, 2007). To address discriminant

and convergent validity, Schraw (2001) reported that, “construct validity of the [WPT]

instrument is also nicely addressed; its correlations with instruments such as the [Wechsler Adult

Intelligence scale] WAIS Full Scale IQ and the General Aptitude Test Battery's 'Aptitude G' (for

general mental ability or intelligence) are high—in the range of .70-.92. In contrast, the WPT is

uncorrelated with a wide variety of personality assessment measures”(¶ 6). Although Schraw’s

(2001) analysis of the WPT as a highly correlated tool with other measures of intelligence sounds

similar to criterion-related, concurrent validity, there are specific differences. Using a tool

concurrently with another tool at the outset of design is a valuable way to establish concurrent

validity. However, after time, the test must continue to correlate with a host of other

psychometrically valuable tools which measure the same construct (i.e., IQ), the convergent

validity, and also must not measure other constructs it does not claim to measure, (i.e. personality

trait factors), the discriminant validity. In addition to discovering concurrent validity (described

above), Matthews & Lassiter (2007) found problems with the WPT to measure new theories of

intelligence which as mentioned include fluid and crystallized intelligence. Whether Wonderlic is

addressing this problem is currently unknown.

Unfortunately, even with all of this evidence of validity, McDonald (2005) would agree

that while the Wonderlic Personnel Test does measure intelligence at a statistically significant

rate, in regard to success of a player, within the modern draft era, there exists no statistically

significant relationship between intelligence and quarterback performance at either the collegiate

or professional level. Likewise, more intelligent quarterbacks are neither selected earlier nor
Wonderlic Critique 8

compensated more for their mental abilities. Summary and Conclusions section, ¶ 2. Considering

the criterion and content validity presented here both posed problems for the NFL in terms of

defining successful players by their intelligence, it seems the Wonderlic should be reconsidered

as a tool.

Conclusion

The Wonderlic Personnel Test is a very commonly used tool for employment selection

purposes. There is little debate about whether the Wonderlic carries psychometric value in terms

of reliability and as a measure of general intelligence, although perhaps not as reliable of a

measure for more current theories of intelligence including crystallized and fluid intelligence.

Although the justification for using the Wonderlic, or any IQ test for the NFL draft is somewhat

unclear, the evidence does not seem to show a relationship between this part of the Combine and

successful outcomes for players. The NFL also appears to be the only professional sports

organization to use this test (Hatch, 2009), and using it may be at their own risk of eliminating

players which may be well-suited for play, but did not score well on the test. One possible flaw

with this examination, and others, is that authors of many of these studies may be defining

“success” and “performance” differently than how the NFL defines it. Unfortunately until the

NFL comes clean about their specific use of the Wonderlic or definition of such terms, the

purpose of using this test may remain unclear.

.
Wonderlic Critique 9

References

Bell, N., Matthews, T., Lassiter, K., & Leverett, J. (2002). Validity of the Wonderlic Personnel

Test as a Measure of Fluid or Crystallized Intelligence: Implications for Career

Assessment. North American Journal of Psychology, 4(1), 113. Retrieved from Academic

Search Complete database.

Bertsch, S., & Pesta, B. (2009). The Wonderlic Personnel Test and elementary cognitive tasks as

predictors of religious sectarianism, scriptural acceptance and religious questioning.

Intelligence, 37(3), 231-237. doi:10.1016/j.intell.2008.10.003.

Blumberg, J., & C., S. (2006). CHOOSE YOUR WEAPON. Inc, 28(8), 96. Retrieved from

Academic Search Complete database

Furr, M. R. & Bacharach, V. R. (2008). Psychometrics: An introduction. Thousand Oaks, CA:

Sage Publications, Inc.

Geisinger, K. (2001). Review of the Wonderlic personnel test and scholastic level exam. In B. S.

Plake & J. C. Impara (Eds.), The fourteenth mental measurements yearbook (pp. 1360–

1363). Lincoln, NE: The Buros Institute of Mental Measurements.

Hatch, C. (2009). Fourth and short on equality. The disparate impact of the NFL’s use of the

Wonderlic Intelligence Test and the case for a football-specific test. Connecticut Law

Review, 41, 1669-1699.

History. (2010). NFL Scouting Combine. Retrieved May 5, 2010, from www.nflcombine.com.

Johnson, B., & Christensen, L. (2008) Educational research, quantitative, qualitative, and mixed

methods, (3rd Ed.). New York, NY: Pearson.

Kuzmits, F., & Adams, A.. (2008). The NFL combine: does it predict performance in the national

football league? Journal of Strength and Conditioning Research, 22(6), 1721-1727.


Wonderlic Critique 10

Retrieved May 6, 2010, from ProQuest Health and Medical Complete. (Document

ID: 1669462821).

Matthews , D. T. & Lassiter, K. S. (2007). What does the Wonderlic Personnel Test measure?

Psychological Reports, 100, 707-712.

McDonald P. M. (2005). Intelligence and Football: Testing for Differentials in Collegiate

Quarterback Passing Performance and NFL Compensation, 8 SPORT J. 2005, available

at http://www.thesportjournal.org/tags/2005?page=2

McIntire, S. A., & Miller, L. A. (2007). Foundations of Psychological Testing. 2nd Edition. Sage

Publications.

Mulligan, M. (2004, April 22). Wonderlic scores have NFL teams wondering - Angelo

acknowledges the disparity in test results has a lot of NFL people questioning the validity

of the scores. Chicago Sun-Times (IL) 125. Retrieved May 5, 2010, from NewsBank on-

line database (Access World News)

NCAA and NFL passing efficiency computation. (2008). In Kuzmits, F., & Adams, A.. (2008).

The NFL combine: does it predict performance in the national football league? Journal of

Strength and Conditioning Research, 22(6), 1721-1727. Retrieved May 6, 2010, from

ProQuest Health and Medical Complete. (Document ID: 1669462821).

Plake, B. S. & Impara, J. C. (Eds.). (2001). The fourteenth mental measurements yearbook.

Lincoln, NE: The Buros Institute of Mental Measurements.

Schraw, G. (2001). Review of the Wonderlic personnel test and scholastic level exam. In B. S.

Plake & J. C. Impara (Eds.), The fourteenth mental measurements yearbook (pp. 1360–

1363). Lincoln, NE: The Buros Institute of Mental Measurements.


Wonderlic Critique 11

Wonderlic, Inc. (2004) “How Smart is Your First Round Draft Pick?” Wonderlic.com. Retrieved

May 7, 2010, from http://www.wonderlic.com/news/summer04/mm_article1.htm,

reposted on http://www.freerepublic.com/focus/f-news/1312628/posts

Wonderlic, Inc. (2010). Retrieved on May 12, 2010, from www.wonderlic.com

Вам также может понравиться