Вы находитесь на странице: 1из 2

Educational Testing and Measurement, h.

61-62

COMPARING NORM-REFERENCED AND


CRITERION-REFERENCED TESTS
As you may have guessed, criterion-referenced tests must be very
specific if they are to yield information about individual skills. This is
both an advantage and a disadvantage. Using a very specific test enables
you to be relatively certain that your students have mastered or failed to
master the skill in question. The major disadvantage of criterion-referenced tests is that many such tests would be necessary to make
decisions about the multitude of skills typically taught in the average
classroom.
The norm-referenced test, in contrast, tends to be general. It
measures a variety of specific and general skills at once, but it fails to
measure them thoroughly. Thus you are not as sure as you would be with a
criterion-referenced test that your students have mastered the individual skills
in question. However, you get an estimate of ability in a variety of skills in a
much shorter time than you could through a battery of criterionreferenced tests. Since there is a trade-off in the uses of criterion- and
norm-referenced measures, there are situations in which each is
appropriate. Determining the appropriateness of a given type of test
depends on the purpose of testing.
Finally, the difficulty of items in NRTs and CRTs also differs. In the
NRT, items vary in level of difficulty from those that almost no one
answers correctly to those that almost everyone answers correctly. In the
CRT, the items tend to be equivalent to each other in difficulty. Following a
period of instruction, students tend to find CRT items easy and answer most
correctly. In a CRT, about 80% of the students completing a unit of
instruction are expected to answer each item correctly, whereas in an NRT
about 50% are expected to do so. Table 4.1 illustrates differences between
NRTs and CRTs.

Educational Testing and Measurement, h. 61-62


NORM- AND CRITERION-REFERENCED TESTS AND
CONTENT VALIDITY EVIDENCE
TABLE 4.1 Comparing NRTs and CRTs
Dimension

NRT

CRT

Average number of
students who get an item
right
Compares a student's
performance to
Breadth of content
sampled

50%

80%

The performance of
other students.
Broad, covers many
objectives.

Standards indicative
of mastery.
Narrow, covers a few
objectives.

Comprehensiveness of
content sampled
Variability

Shallow, usually
one or two items
per objective.
Since the
meaningfulness of a
norm-referenced
score basically
depends on the
relative position of
the score in
comparison with
other scores, the
more variability
or spread of scores,
the better.

Comprehensive,
usually three or more
items per objective.
The meaning of the
score does not
depend on
comparison with
other scores: It flows
directly from the
connection between
the items and the
criterion. Variability
may be minimal.

Item construction

Items are chosen to


promote variance or
spread. Items that are
"too easy" or "too
hard" are avoided.
One aim is to produce
good "distractor
options."

Reporting and interpreting Percentile rank and


considerations
standard scores used
(relative rankings).*

*For a further discussion of percentile rank and standard scores, see Chapters 13 and 14.

Items are chosen


to reflect the
criterion behavior.
Emphasis is
placed upon
identifying the
domain of relevant
responses.
Number
succeeding or
failing or range of
acceptable
performance used
(e.g., 90%
proficiency
achieved, or 80% of
class reached 90%
proficiency).

Вам также может понравиться