Академический Документы
Профессиональный Документы
Культура Документы
Chapter 5: Reliability
Chapter 5
Reliability
Everyday Psychometrics: The Importance of the Method Used for Estimating Reliability
Self-Assessment
Term to Learn
Reliability: It is a synonym for dependability and consistency. Broadly speaking, in the language
of psychometrics reliability refers to consistency in measurement.
Green, C.E., Chen, C. E., Helms, J. E., & Henze, K. T. (2011). Recent Reliability Reporting
Practices in Psychological Assessment: Recognizing the People Behind the Data.
Psychological Assessment, 23(3), 656–669.
Kieffer, K. M., & MacDonald, G. (2011). Exploring Factors that Affect Score Reliability and
Variability: In the Ways of Coping Questionnaire Reliability Coefficients: A Meta-Analytic
Reliability Generalization Study. Journal of Individual Differences, 32(1), 26–38.
Markon, K. E., Chmielewski, M., & Miller, C. J. (2011). The Reliability and Validity of Discrete
and Continuous Measures of Psychopathology: A Quantitative Review. Psychological
Bulletin, 137(5), 856–879.
What does it mean when a tool of assessment is characterized as reliable? Under what conditions
might one expect an otherwise useful tool of assessment to be unreliable?
Here is a list of questions that may be used to stimulate a class discussion, as well as critical and
generative thinking, with regard to some of the material presented in this chapter of the text.
1. Ask the class to draw parallels between a reliable person and a reliable test. What are the
similarities and differences between the two?
3. Create a list of the various types of tests and measurements on the board. Ask students to
respond with the corresponding type(s) of reliability estimates that would be appropriate
for each of the tests. You may also ask them to name the statistic of choice to be calculated.
A sample list of tests is as follows:
a. Typing test (timed)
b. Color blindness
c. Weight of an individual for every month from the age of 6 weeks till the age of 21
years
d. Intelligence
e. Mood
f. Weight of melting ice cubes
g. Test of reaction time
h. Multiple choice exams in this course for midterm and final (two different exams)
i. Test of test anxiety
j. Essay exam in an English literature course
k. Presidential preference poll
l. Art aptitude test, which includes judging the quality of a clay sculpture
4. Why is there a need for different methods of estimating the reliability for norm-referenced
tests and criterion-referenced tests?
5. Why is an observed score always represented as the sum of the true score and error?
6. What factors affect the test-retest reliability of the tests designed to measure the developing
cognitive and motor skills of infants?
In-Class Demonstrations
Bring the following materials to class to demonstrate some of the concepts discussed in the
chapter.
a. Test manuals of tests that do and do not employ a true score model
Bring some manuals for tests that do and do not employ a true score model of
measurement to class. Discuss the similarities and differences between the
information presented in the manuals.
Bring the data of the students’ scores for the past two quizzes or examinations in
your course to the class. This could be substituted for the data of the students’ scores
in the measurement course or another course. Have students analyze the data for test-
retest reliability.
Bring a copy of the most recent edition of the Standards to the class, and discuss the
material dealing with reliability and the errors of measurement.
Create two forms of a “Basic Arithmetic” test (“Form A” and “Form B”) that could
reasonably be administered to your class in five minutes. The test should tap
students’ knowledge of basic arithmetic operations such as addition, subtraction,
multiplication, and division. Administer Form A with a five-minute time limit, under
regular test-taking conditions. Collect the exam papers. Now, administer Form B also
with a five-minute time limit. During the administration of Form B, however, create
adverse testing conditions for students by playing loud or distracting music, flashing
the lights on and off, and so on. Collect the papers. Now, redistribute all of the papers
so that different students are marking different students’ tests and no student is
marking his or her own paper. Using the group data, calculate a rank order (Rho)
coefficient that will be the measure of test-retest reliability. Is it lower than expected
due to the various sources of error variance introduced?
Create a simple speed test in which the test taker’s task is to correctly alphabetize a
list of words. Administer the test to the class. As a class, calculate the split-half
reliability of the resulting data. Then, discuss why such an approach to estimating
reliability is inappropriate for use in determining the reliability of a speeded test.
Invite a guest speaker to class. The guest speaker could be any of the individuals listed
below.
a. A faculty member
Invite a faculty member (from your university or a neighboring one) who is an expert
in item response theory, classical test theory, or the area of reliability to elaborate on
the material presented in this chapter.
Invite a local user of psychological tests from any setting who can elaborate on the
principles of reliability as used in everyday work.
Divide the class into three groups: (1) WDHI test developers, (2) investors, and (3)
advisers to investors. Members of Group 1 play the role of the test developers, and their
task is to pitch a well-informed proposal to Group 2 for collecting investment funds to
develop their test. Members of Group 2, who are also well informed of many issues related
to assessment, particularly reliability issues, should question Group 1 about their test, in
particular regarding the reasons for investing. After the exchange between Group 1 and 2,
members of group 3 could act as advisors to Group 2 and advise them on whether or not
the WDHI test would be a good source of investment.
2. Debate: Classical Test Theory (CTT) versus Item Response Theory (IRT)
Give students the following topic: Should all psychological tests developed from the
present day forward rely on CTT or IRT? Ask them to research the relevant issues on the
given topic and come prepared for a debate in the next class. One half of the students in the
class will be assigned to the “CTT” team, while the other half of the students will be
assigned to the “IRT” team. The use of homemade T-shirts to create “uniforms” with the
appropriate lettering could be encouraged for the debate.
Arrange for the class to go on a field trip to the locations given below.
Arrange for the class to visit the human resources (HR) department of a local
business or a large corporation that employs psychological tests to see how they are
used in practice, with a specific focus on reliability issues.
Arrange for the class to visit a local consumer research firm that employs statistics
and/or statistical methods. Ask a member of the firm to discuss the testing that is
conducted in the firm and the measures in place to ensure the reliability of data from
the test.
Suggested Assignments
Critically evaluate any existing ability test with regard to all of the possible sources of error
that may be inherent in measurement. Explain why some sources of error are likely to be
greater than others in magnitude for this particular test.
Create a table with two headings: Appropriate and Inappropriate. Under these two
headings, list at least one type of test for which different methods of estimating reliability
would or would not be appropriate. Your listing should include, for example, one type of
test for which the test-retest method of estimating reliability would be appropriate, and one
type of test for which the test-retest method of estimating reliability would be
3. Read-then-Discuss Exercises
Prior to class, do some independent reading about item response theory and classical
test theory. During class, be prepared to discuss how the two theories differ.
Students are instructed to check out and review a test manual from their
college/university test library. Then, they should write a report on the types of
reliability estimates provided for their test of choice.
Moore (1981) provided some suggestions for teaching related reliability concepts
such as true score, true variance, and the standard error of measurement using the
measurements of lines.
Media Resources
On the Web
http://www.ehd.org/science_technology_testresults.php
This website explains some applied examples for understanding the reliability of a test from
fields other than psychology.
http://www.youtube.com/watch?v=DS8Hw0Ort4w&feature=related
http://www.youtube.com/watch?v=LolwQXYjuh8&feature=related
https://www.youtube.com/watch?v=CZQlqVswAq8
This three-part video segment explains the aspects of reliability and the validity of tests.
http://edres.org/irt/
This website provides information on item response theory.
References
Buck, J. (1992). When True Scores Equal Zero: A Reply to Allen. Teaching of Psychology, 19,
111–112.