Академический Документы
Профессиональный Документы
Культура Документы
Interpreting Test
Scores & Item
Analysis
1
Lecture Overview
Interpreting Test Scores & Item analysis
1.0 Introduction
l Reviewing the process of test construction
l Reviewing item format – Objective (selected-response item format) and Subjective (free/
constructed-response item format)
2.0 Interpreting Test Scores
l Measures of Central Tendencies – Mode, Median, Mean
l Measures of Dispersion – Range, Standard Deviation
l Method of calculating s.d.
3.0 Item analysis
l Item Difficulty (facility value); F.V.
l Item discrimination; D
l Item difficulty & Discrimination
l Interpreting Item Difficulty & index discrimination and making statements about suitability of
items
4.0 Moderating classroom tests
l Why the need to do so?
5.0 Item Banking
l Merits of having an item bank
2
1.0 INTRODUCTION
3
2.0 Interpreting test scores
4
Frequency distribution
5
Frequency distribution…con’t
l In event of ties (same scores), two ways of rendering these are
shown. The usual c/r practice is that shown in the parenthesis.
Where statistical work is to be done on the ranks, it is essential
to record the average rank (e.g. candidates J, N and O, each
with the same mark, occupy places 10, 11 and 12 in the list,
averaging 11)
l Table 3 contains a frequency distribution showing the number
of SS who obtained each mark awarded. The strokes on the
right of the numbers ( e.g. ////) are called tallies and are
included simply to illustrate the method of counting the
frequency of scores.
l The frequency polygon (see graph) illustrates the distribution of
the scores
6
Measures of Central Tendencies
l Mode
l The mode refers to the score which most
candidates obtained: in this case it is 26, as five
candidates have scored this mark.
l Median
l The median refers to the score gained by the
middle candidate in the order of merit: in the case
of 26 students here (as in all cases involving even
numbers of testees), there is no obvious middle
person. Thus the score halfway between the lowest
score in the top half and the highest score in the
bottom half is taken as the median. The median
score in this case is also 26.
7
Measures of Central Tendencies…
l Mean
l The mean score of any test is the arithmetical average: i.e. the
sum of the separate scores divided by the total number of
candidates.
l The mode, median, and mean are all measures of central
tendency. The mean is the most efficient measure of central
tendency, but it is not always appropriate.
l Table 1 shows the formula and how the mean is calculated. X is
used to denote the score, N the number of the candidates, and
m the mean. The symbol f denotes the frequency with which a
score occurs. The symbol ∑ means the sum of.
l (See next table)
8
Table 1
X f fx
35 x 1 35
34 x 1 34
33 x 2 66
32 x 1 32
30 x 2 60
29 x 2 58
27 x 3 81 m = ∑fx = 702 = 27
26 x 5 130 N 26
25 x 3 75
24 x 1 24
23 x 2 46
22 x 1 22
20 x 1 20
19 x 1 19
Total = 702
= ∑fx
9
Measures of Dispersion
12
One sample method of calculating s.d. is shown below:
s.d. = √ ∑ d²/ N
1. Find out the amount by which each score deviates from the me
2. Square each result (d²);
3. Total all the results (∑ d²);
4. Divide the total by the number of test-takers (∑ d²/ N); and
5. Find the square root of this result (√ ∑ d²/ N)
13
l Standard deviation is also useful for
providing information concerning
characteristics of different groups.
l If, for example, the standard deviation on a
certain test is 4.08 for one class, but 8.96 on
the same test for another class, then it can
be inferred that the latter class is far more
heterogeneous than the former.
14
3.0 Item Analysis
16
l R represents the number of correct answers
and N the number of students taking the test.
l Thus, if 21 out of 26 students tested obtained
the correct answer for one of the items, that
item would have an index of difficulty (or
facility value) of .77 or 77 per cent.
l In this case, the particular item is a fairly
easy one since 77 per cent of the students
taking the test answered it correctly.
17
l Although an average facility value of .5 or 50 per
cent may be desirable for many public achievement
tests and for a few progress tests (depending on the
purpose for which one is testing), the facility value of
a large number of individual items will vary
considerably.
l While aiming for test items with facility values falling
between .4 and .6, many test constructors may be
prepared in practice to accept items with facility
values between .3 and .7.
18
l Items with FV values close to zero (very "hard" items) or
close to one (very "easy" items) do not, in general, contain
much information about the performance of the target
population, except for the tails of the distribution.
l For most applied measurement situations, tests having items
with item difficulty values between 0.3 and 0.7, inclusive,
will maximally differentiate students throughout the entire
performance range.
l If, however, the intent is to discriminate at a specific level of
performance—say for purposes of selection—then items with
FV values in the narrow interval consistent with that
performance level are chosen.
19
Item Discrimination
20
l It is presupposed that the total score on the test is a valid
measure of the student’s ability (i.e. the good student tends to
do well on the test as a whole and the poor student badly).
l On this basis, the score on the whole test is accepted as the
criterion measure, and it thus becomes possible to separate the
“good” students from the “bad” ones in performances on
individual items.
l If the “good” students tend to do well on an item and the
“poor” students badly on the same item, then the item is a
good one because it distinguishes the ‘good’ from the
‘bad’ in the same way as the total test score.
l This is the argument underlying the index of discrimination.
21
How to find out the index of
Discrimination (D)
l Arrange the scripts in rank order of total score and divide into
two groups of equal size (i.e. the top half and the bottom half).
If there is an odd number of scripts, dispense with one script
chosen at random.
l Count the number of those candidates in the upper group
answering the first item correctly; then count the number of
lower-group candidates answering the item correctly.
l Subtract the number of correct answers in the lower group from
the number of correct answers in the upper group: i.e. find the
difference in the proportion passing in the upper group and the
proportion passing in the lower group.
l Divide this difference by the total number of candidates in one
group:
22
Discrimination Index Formula
D = Correct U – Correct L
n*
l D = Discrimination index
l n = Number of candidates in one group
l U = Upper half
l L = Lower half
l (The index D is thus the difference between the
proportion passing the item in U and L)
23
Example
l The following item, which was taken from a test administered to
40 students, produced the results shown:
D = 15 – 6 = 9 = .45
20 20
l Such an item with a discrimination index of .45 functions fairly
effectively, although clearly it does not discriminate as well as
an item with an index of .6 or .7.
24
l Discrimination indices can range from
+1 ( = an item which discriminates perfectly – i.e. it
shows perfect correlation with the candidate’ results
on the whole test) through 0 ( = an item which
does not discriminate in any way at all) to -1 (=
an item which discriminates in entirely the wrong
way)
25
Item difficulty and Discrimination
26
The following table, compiled from the results of a test, shows how these
measures are recorded.
27
• Items showing a discrimination index of .30
are of doubtful use since they fail to discriminate
effectively.
• Thus, on the results listed in the table above,
only items 3, 4, 5 could be safely used in future
tests without being rewritten.
• However. ,many test writers would keep item 1
simply as a lead-in to put the students at ease.
U L U+L
FV = U + L = 18 = .45
A. 1 4 5 2n 40
B. 2 5 7
C* 14 4 18 D = U – L = 10 = .50
D. 3 7 10 n 20
The item has a facility value of .45 and a discrimination index of .50
and appears to have functioned efficiently – the distractors attract the
poorer students but not the better one.
29
The performance of the following item with a low discrimination index is of
particular interest:
U L U+L
A. 7 3 10
FV = .325
B. 4 3 7
C. 1 9 10 D = .15
D.* 8 5 13
(20) (20) (40)
31
4.0 Moderating Class Tests
33
5.0 Item banking
34
Item banking…
35