Академический Документы
Профессиональный Документы
Культура Документы
To cite this article: Steven M. Paul (1986) The Advanced Ravens Progressive Matrices, The Journal of Experimental
Education, 54:2, 95-100, DOI: 10.1080/00220973.1986.10806404
To link to this article: http://dx.doi.org/10.1080/00220973.1986.10806404
ABSTRACT
Normative data for the Advanced Raven's Progressive Matrices are presented based on 300 University
of California, Berkeley, students. Correlations with the
Wechsler Adult Intelligence Scale and the Terman Concept Mastery Test are reported. The relationship between the Advanced Raven's Progressive Matrices and
Spearman's g is explored.
Subjects
Three hundred students (190 female, 110 male) from
the University of California, Berkeley, served as subjects. Their average age was 252 months (21 years) with
a standard deviation of 32 months.
Procedure
Each subject was tested individually. The basic procedure of the matrices test was explained by the experimenter using examples (problems A1 and C5) from
the SPM. Subjects were instructed to put some answer
down for every question and were given a loose time
limit of 1 hour. If the subject was not finished in an
hour an additional 10 to 15 minutes was given to complete the test. A subject's score was the total number of
items answered correctly.
One hundred fifty of the subjects were also individually given the Terman Concept Mastery Test (CMT), a
high level test of verbal ability. A different set of 62 subjects out of the 300 were also individually administered
the Wechsler Adult Intelligence Scale (WAIS).
PAUL
total score was also 27.0. The mean total score of the
normative group of 170university students presented by
Raven (1965) was only 21 (SD = 4). Gibson (1975) also
found data on the APM which were significantly higher
than the published university norms. The mean total score
of 281 applicants to a psychology honors course at Hatfield Polytechnic in Great Britain was 24.28 (SD = '4.67).
Table 1 presents the absolute frequency, cumulative
frequency percentile, t score, and normalized t score for
the total APM score values based on the sample of 300
students. The 95th percentile corresponds to a total
score between 34 and 35 for this sample. The 95th percentile value based on Raven's normative group with
similar ages is between 23 and 24. The Berkeley sample
scored much higher overall than the normative sample
of Raven's 1962 edition of the APM.
The internal consistency reliability based on the
Kuder-Richardson formula (KR-20) is .83. That is, approximately 83% of the variance in total test scores is
attributable to true score variance, i.e., to what the
APM is actually testing.
There is strong agreement between the rank order of
the items, according to the frequency with which they
are solved, presented by Raven and those determined
for this sample (r = .94). However, there is one noteworthy exception. The item Raven ranked 13th turned
out to be much more difficult for the Berkeley students
than would have been expected. It ranked as only the
22nd most frequently solved item.
The item involves changes in three variables: object
shape (diamond, square, circle), number of internal
lines (one, two, three), and slant of internal lines (45",
90, 135"). The majority of subjects who did not choose
the correct response (#2) were attracted to a distractor
(#5) that ignored the necessary change in the slant of the
internal lines.
Information beyond what is provided by just total
score values can sometimes be found in an examination
of the incorrect responses to the APM (Thissen, 1976).
Selection of distractor items, incorrect multiple-choice
alternatives, for each of the problems of the APM was
examined to determine if patterns developed that would
aid in the discrimination between subjects. Two subgroups of the total sample of 300 were formed. The low
group came from the bottom 24th percentile receiving
total scores less than or equal to 23 (n = 72). The high
group comprised those in the top 26th percentile who
scored greater than or equal to 31 (n = 78). A comparison was made between the two groups to see if distractors chosen by the high group were different from or
perhaps better (i.e., closer to the correct response) than
the incorrect responses chosen by the low group. No differences between the two groups were found.
Unlike most studies of the Raven's Progressive Matrices, a significant difference (a = .05) was found between the average total score of males and females. In
97
Absolute
frequency
Cumulative
frequency
percentile
t score
Normalized
t score
98
Spearman's g
One of the most solidly established phenomena in
psychology is that scores on all mental ability tests, no
matter how diverse the mental skills or areas they cover,
are positively intercorrelated when they are obtained in
a representative sample of the general population. It
was Spearman who first hypothesized that there is some
"general factor" of mental ability that is measured in
common by all of the intercorrelated mental tests. He
gave the label "g" to this general factor.
Spearman developed the mathematical method known
as factor analysis which enabled him to extract the g
from all the intercorrelations among a collection of diverse tests and show the correlation between each test
and the hypothetical general ability factor. The correlation of a particular test with the g factor common to all
tests in the analysis is called the test's g loading. The
square of a test's g loading indicates the proportion of
the total variance in the scores on the test that is due to
individual differences in this general ability.
It is important to note that the g factor may not show
up on some tests given to highly selected groups, such as
the often tapped pool of university students, although
these tests show moderate g loadings when given to the
general population. The explanation is that these groups
have already been highly selected on g-loaded tests, such
as college entrance exams, and therefore their scores indicate less individual variation on the g factor. This
limits the intercorrelations among the various tests and
thereby prevents the g factor from showing up strongly
in a factor analysis of the matrix of intercorrelations.
Spearman originally hypothesized that each test measures only g plus some specific ability, s, which is tapped
only by the particular test. This theory that any given
test score is composed of only g + s, as well as measurement error, was soon refuted by the finding that there
are other common factors besides g in many mental
ability tests. However, they cannot be considered general factors because they do not enter into all tests, as
does g, but do enter only into certain groups of tests. In
a factor analysis of a large number of various mental
tests, the first unrotated factor (or principal component)
is g or general mental ability. It usually accounts for
almost half of the total variance in a large battery of diverse tests. The several other smaller factors, the group
factors, show highly differential loadings on tests that
are often characterized as verbal, numerical, spatial, or
involving memory.
Factor analysis by itself does not and cannot explain
the basis for the existence of g. Spearman himself stated
that factor analysis cannot reveal the essential nature of
g but only reveals where to look for it. Examination of
the characteristics of a wide variety of tests in connection with their g loadings can provide some descriptive
generalizations about the common features that charac-
terize tests that have relatively high g loadings as compared with tests that have relatively low g loadings.
Spearman originally tried to get at the psychological
nature of g by factor analyzing more than 100 tests,
each fairly homogeneous in content, and then comparing their g loadings (Spearman & Jones, 1950). He characterized the most g-loaded tests essentially as those requiring "the eduction of relations and correlates," that
is, perceiving relationships, inducing the general from
the particular, and deducing the particular from the
general. Such tests require inductive or inventive as contrasted to reproductive or rule-applying behavior. The
most g-loaded test in the whole battery was the Raven's
Progressive Matrices (RPM), which, as previously mentioned, depends almost entirely on perceiving key
features and relationships and discovering the abstract
rules that govern the differences among the elements in
the matrix.
There is much more test material available now than
was available to Spearman more than 50 years ago. This
had led to broader generalizations about g . The g factor
is manifested in tests to the degree that they involve
mental manipulation of the input elements, choice, decision, invention in contrast to selection, meaningful
memory in contrast to rote memory, long-term memory
in contrast to short-term memory, and distinguishing
relevant information from irrelevant information in
solving complex problems (Jensen, 1979).
Task comple~rityand the amount of conscious mental
manipulation required seem to be the most basic determinants of the g loading of a task. There are many examples in which a slight increase in task complexity is
accompanied by an increase in the g loading of the task.
Virtually any task involving mental activity that is complex enough to be recognized as involving some kind of
conscious mental effort is substantially g loaded. It is
the task's complexity rather than its content that is most
related to g. An almost infinite variety of test items, regardless of sensory modality, substantive or cultural
content, or the form of effector activity involved in the
required response, is capable of measuring g. This observation led to Spearman's principle of "the indifference of the indicator," meaning that the manifestation
of g is not limited to any particular types of information
or item types.
Previous research suggests that the Raven's Progressive Matrices administered to the general population
measures g and little else (Burlte, 1958). The occasional
loadings found on other factors, independently of g, are
mostly trivial and inconsistent from one analysis to
another. Although many other tests measure g to a
similar extent, unlike the Raven, they also have loadings
on the major group factors such as verbal, numerical,
spatial, and memory. The RPM does not measure perceptual ability or spatial-visualization ability as is com-
PAUL
monly believed. In fact, the Raven has very small loadings on these factors, when g is excluded.
Factor analysis of the RPM at the item level should
result in only a single factor. Some investigations that
have found more than one factor have employed improper orthogonal rotations of the principal components. This method can artificially create the appearance of several factors even in correlation matrices
that are artificially constructed so as to contain only one
factor plus random error (Jensen, 1980). Some of the
small spurious factors that emerge from factor analysis
of the inter-item correlations are not really ability factors at all but are "difficulty" factors, due to varying
degrees of restriction of variance on items of widely differing difficulty levels and to nonlinear regression of
item difficulties on age and ability (McDonald, 1965).
When these psychometric artifacts are taken into account, the RPM seems to measure only a single factor of
mental ability, which can be termed g.
The APM test results of this study were factor analyzed
at the item level. The first principal factor of the intercorrelation matrix of the 36 items, scored correct or incorrect, accounts for 15% of the total inter-item variance. A factor loading correlation matrix was created
from this first principal factor and subtracted from the
original inter-item correlation matrix. The resulting
residual matrix was tested and found not significantly
different from zero at a! = .05. Therefore, the APM can
be considered to measure only one factor. That this factor only accounts for 15% of the total inter-item variance indicates that the variance of each item is due
mostly to uniqueness, that is, item specificity and error.
The items are not highly intercorrelated. However, what
they do have in common may indeed be Spearman's g.
The first principal factor should not be considered
just a difficulty factor. The correlation between the item
loadings on the first principal factor and item difficulty
levels (percent passing) is - .36. (Only 13% of the variance in the first principal factor loadings can be explained
by differences in item difficulty.) The correlation between the item loadings on the first principal factor and
item variance is .41. (Sixteen percent of the variance in
the first principal factor loadings can be explained by
differences in item variance.) The correlation between
item difficulty and item variance is - 37. When item
variance is held constant, i.e., partialled out, the correlation between item loadings on the first principal factor and difficulty levels is .01.
The loadings of each item with the first principal factor and the correlations of each item with total score are
shown in Table 2. The total score on the APM can be
considered a reasonable measure of general mental ability. This notion is at the very least intuitively appealing.
There is near perfect agreement between the correlations
of each item with total score and the correlations of each
item with the hypothesized g factor. The correlation be-
99
Item
Total
score
First
principal
factor
Item
Total
score
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
.08
.20
.25
.38
.34
.17
.24
.35
.34
.34
.30
.42
.24
.30
.24
.45
.39
.35
.04
.24
.29
.43
.37
.17
.24
.37
.41
.37
.34
.46
.18
.29
.20
.47
.36
.31
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
.30
.27
.50
.41
.56
.53
.37
.36
.50
.45
.50
.45
.47
.39
.43
.53
.49
.43
First
principal
factor
.26
.23
.49
.36
.57
.50
.30
.30
.44
.38
.44
.38
.41
.33
.37
.51
.45
.37
based on the 62 subjects who took both tests. The correlation between the high g items and WAIS Full Scale
IQ scores was .67. Low g items and WAIS Full Scale IQ
scores correlated 56. Although the two correlations are
not significantly different from each other (t = 1.39, a
= .05), a trend was apparent. Despite the fact that the
items of the APM and the WAIS are drastically different in content, those items correlating highest with
the hypothesized g factor derived from the APM show a
stronger relationship to WAIS Full Scale IQ scores than
those items with a low correlation with the hypothesized
APM g. Since WAIS Full Scale IQ scores have been
shown to be highly g loaded in previous research
(Matarazzo, 1972; Jensen, 1980), the pattern found here
can be interpreted to indicate that the hypothesized g of
the APM is the same g that is measured by the WAIS.
In summary then, the distribution of scores for a
large cross section of University of California students
on the Advanced Raven's Progressive Matrices is
markedly higher than the estimated score distribution of
REFERENCES
Burke, H. R. (1958). Raven's Progressive Matrices: A review and critical evaluation. Journal of Genetic Psychology, 93, 199-228.
Court, J. H., & Kennedy, R. J. (1976). Sex as a variable in Raven's
Standard Progressive Matrices. Proceedings of the 21st International Congress of Psychology, Paris, France.
Forbes, A. R. (1964). An item analysis of the Advanced Matrices.
British Journal of Educational Psychology, 34, 1-14.
Foulds, G. A., & Raven, J. C. (1950). An experimental survey with
Progressive Matrices (1947). British Journal of Educational Psychology, 20, 4-10.
Gibson, H. B. (1975). Relations between performance on the Advanced Matrices and the EPI in high-intelligence subjects. British
Journal qf Social and Clinical Psychology, 14, 363-369.