You are on page 1of 26

Multiple Choice Test Item

Analysis

MORE DOWNLOADS AT DEPED TAMBAYAN


Workshop Format
1. What is Multiple Choice Test Item
Analysis?
2. Background information
3. Fundamentals
4. Guided Practice
5. Individual Practice
What is Multiple Choice Test
Item Analysis?
Statistically analyzing your multiple
choice test items so that you can
ensure that your items are effectively
evaluating student learning.
Background information
What does a test score mean?
Reliability and Validity
Norm-referenced or Criterion-referenced
What does a Test Score Mean?
A score that is a reflection of what you really knew
(true score) and error (things like atmosphere,
nerves etc that modify your true score).
The purpose of a systematic approach to test
design is to reduce error in test taking.
Reliability and Validity
Reliability the test scores are consistent
Test-retest reliability (measure of an individual score
is consistent over time)
Inter-rater reliability (consistency of individual judges
ratings of a performance)
Validity the test measured what it was suppose
to measure.

You want your test to be both


reliable and valid
Norm-referenced or Criterion-referenced
Norm-referenced defines the performance of
test-takers in relation to one another. Use the
frequency distribution and can rank students.
Often used to predict success like GRE or GMAT.
Criterion-referenced defines the performance of
each test taker without regard to the performance
of others. The success is being able to perform a
specific task or set of competencies. Uses a
mastery curve.
Item analysis
How you interpret the results of a test and use
individual item statistics to improve the quality of a
test
Terms used
Standard deviation range above and below the
average score, the more the scores are spread out
the high the SD
Mean average score
N number of items on the test
Raw scores actual scores
Variance = standard deviation squared
Fundamentals of Item Analysis

1. Were any of the items too difficult or easy?


2. Do the items discriminate between those
students who really knew the material from those
that did not?
3. What is the reliability of the exam?
1. Were any of the items too difficult
or too easy?
Use the Difficulty Factor of a question
Proportion of respondents selecting the right answer
to that item
D=c/n
D = difficulty factor
c = number of correct answers
n = number of respondents
Range 0 -1
The HIGHER the difficulty factor the easier the
question is, so a value of 1 would mean all the
students got the question correct and it may be too
easy
Difficulty Factor
Optimal Level is .5
To be able to discriminate between different levels
of achievement, the difficulty factor should be
between .3 and .7
If you want the students to master the topic area,
high difficulty values should be expected.

D=c/n
Guided Practice
What is the D for Items 1-3
Raw
Student score Item 1 Item 2 Item 3 Item 4 Item 5
A 8 a b a d e
B 6 c b e c e
C 6 a c e c b
D 4 a b e a c
E 2 c a b d c
F 8 a b c c e
G 10 a b a c e
H 6 a b c d e
I 8 a c a c e
J 4 a c a d b
Difficulty Factor
Item # 1 = .8
Item # 2 = .6
Item # 3 = .4

What does it mean?


Item # 1 = .8 may be too easy
Item # 2 = .6 good
Item # 3 = .4 good
Individual Practice
What is the D for Items 4-5
Raw
Student score Item 1 Item 2 Item 3 Item 4 Item 5
A 8 a b a d e
B 6 c b e c e
C 6 a c e c b
D 4 a b e a c
E 2 c a b d c
F 8 a b c c e
G 10 a b a c e
H 6 a b c d e
I 8 a c a c e
J 4 a c a d b
Difficulty Factor
Item # 4 = .5
Item # 5 = .6

What does it mean?


Item # 4 = .5 optimal
Item # 5 = .6 good

Overall, you can say that only item #1 may be too


easy
2. Do the items discriminate between those students
who really knew the material from those that did not?

The Discrimination Index


DI = (a-b) / n
a=response frequency of the High group
b=response frequency of the Low group
n-number of respondents

Point- Biserial Correlation


2. Do the items discriminate between those students
who really knew the material from those that did
not?
Correlates the test-takers performance on a single
test item with their total score.
Range +1.00 to -1.00
Items which discriminate well are those which
have difficulties between .3 and .7
2. Do the items discriminate between those students
who really knew the material from those that did not?
Positive coefficient means that test-taker who got
the item right generally did well on the test as a
whole, while those who did poorly on the item did
poorly on the test.
Negative coefficient means that the test-taker who
did well on the test missed the item, while those
who did poorly got the item right.
Zero coefficient means that all test-takers got the
item correct or incorrect.
2. Do the items discriminate between those students
who really knew the material from those that did not?

The Discrimination Index Steps


1. Rank test scores from highest to lowest, so the
highest is at the top of the list
2. Define high group (top 27%)
3. Define low group (bottom 27%)
4. Calculate DI= a-b / n
What does it mean?
Point Biserial
Item # 1 = .48
Item # 2 = .43
Item # 3 = .47
Item # 4 = .62
Item # 5 = .83

Item 5 is close to not discriminating


Overall the test does discriminate
3. What is the reliability of the exam
1. Kuder- Richardson 20
2. Kuder-Richardson 21
3. Cronbach alpha
3. What is the reliability of the exam
Range 0-1
Higher value indicates a strong relationship
between items and test
Lower value indicates a weaker relationship
between test item and test

r = n / n-1[s2 + p1q1 / s2 ]

n = number of items on test


s= standard deviation
p1= proportion of correct responses
q1= 1- p1
What does it mean?
Kuder 20
Item # 1 = .88
Item # 2 = .63
Item # 3 = .40
Item # 4 = .76
Item # 5 = .89

Item 3 may not relate as well


Overall the test is reliable
Review
Purpose - statistically analyze multiple choice
test items to ensure items are effectively
evaluating student learning.
1. Were any of the items too difficult or easy?
(Difficulty index)
2. Do the items discriminate between those
students who really knew the material from those
that did not? (Discrimination index or Point
Biserial)
3. What is the reliability of the exam? (Kuder 20)
More Practice

Item Difficulty Discrimination Reliability

#1 .28 .40 .80

#2 .30 .68 .76

#3 .80 .78 .70

#4 .10 -1.00 .20


Thank you for your Time

Any Questions or Comments?