Академический Документы
Профессиональный Документы
Культура Документы
Chap005 Testbank
Chapter 05 - Reliability
Chapter 05
Reliability
1. According to Gil et al. (2016), which of the following is a source of error in scores on
psychological tests?
A. whether or not the examiner has a beard
B. whether the testtaker's country is at war or peace
C. the body weight of the testtaker two weeks prior to the test
D. None of these
2. Hawkins et al. (2016) found that subjects with fasting glucose levels made nearly
times as many errors as subjects with fasting glucose levels in the normal range.
A. low; one-quarter
B. high; one-quarter
C. high; four times
D. high; twice
3. Berman et al. (2015) observed that one source of error in evaluations the suicide risk of
patients is
A. whether or not the patient has previously attempted suicide.
B. whether or not the clinician previously had a patient attempt suicide.
C. how religious the evaluating clinician is.
D. how religious the evaluated patient is.
5-1
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
6. What has been called a "replicability crisis" in psychology emerged as a result of a number
of factors. Which is not one of those factors?
A. a general lack of published attempts to replicate research
B. editorial preferences for papers with positive findings
C. questionable research practices on the part of study authors
D. unwillingness or inability of original study authors to share data
7. Makel et al. (2012) observed that only about of the published literature replicated
previous work.
A. 1%
B. 3%
C. 5%
D. 7%
5-2
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
11. Unreliable findings that reach general acceptance in the academic community
A. tend to self-correct.
B. tend to linger too long.
C. become exposed through social media.
D. are not admissible in a court of law.
5-3
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
14. As compared to what was business as usual in the past, more researchers are coming to the
realization that replication is
A. really not as necessary as what researchers once thought.
B. not something that can ever completely "right" past wrongs.
C. mandatory given the influence of social media.
D. needed if published findings are to be relied on.
15. With critical variables in a research study held constant, different methods used to
estimate reliability will typically yield
A. virtually no differences in the magnitude of the estimate.
B. sizable differences in the magnitude of the estimate.
C. skewed estimates of reliability.
D. identical estimates of reliability.
5-4
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
16. Generally, diagnostic reliability is necessary. However, which of the following is NOT a
reason that diagnostic reliability is necessary?
A. It is necessary for accurate diagnosis.
B. It is necessary for any double-blind study.
C. It is necessary to determine the effectiveness of treatments.
D. It is necessary to track changes in a disorder over time.
17. Field trials of DSM-5 demonstrated a mean kappa that was indicative of a level of
agreement among raters.
A. poor
B. fair
C. good
D. "kinder and gentler"
18. Prior to research on inter-rater reliability for DSM-5, DSM inter-rater reliability estimates
were obtained using the method.
A. test-retest
B. paired-paragraph
C. audio-recording
D. one-way mirror
19. Prior to DSM-5, a problem with the primary method used to estimate reliability of the
DSM was that the method
A. did not allow for truly independent judgments.
B. resulted in overestimates of reliability.
C. artificially constrained information provided to clinicians.
D. All of these
5-5
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
21. Which of the following terms is used in your textbook to describe the test-retest method of
estimating diagnostic reliability?
A. methodologically sound
B. artificially constrained
C. psychometrically balanced
D. ecologically valid
22. In their study of the diagnostic reliability of DSM-IV diagnoses, Chmielewsi et al. (2015)
used the "gold standard" in diagnostic instruments. The tool they used was the
A. MAST-2.
B. SCID I/P.
C. SCI-5.
D. Semi-Structured Diagnostic Interview (SSDI).
23. In classical test theory, an observed score on an ability test is presumed to represent the
testtaker's
A. true score.
B. true score less the variance.
C. true score combined with extraneous factors.
D. the testtaker's true score and error.
5-6
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
24. In an illustrative scenario described in Chapter 5 of your text, a group of 12 th grade "whiz
kids" in math, newly arrived to the United States from China, perform poorly on a test of 12th
grade math. According to the text, what probably accounted for this?
A. lower standards in China as compared to the US for measuring math ability.
B. higher standards in the US as compared to China for earning high grades.
C. the ability of the Chinese students to read what was required in English.
D. the reliability of the instrument used to test 12th grade math skills.
27. A Wall Street Securities firm that is actually located on Wall Street is testing a group of
candidates for their aptitude in finance and business. As the testing begins, an unexpected
"Occupy Wall Street" sit-in takes place. From a psychometric perspective in the context of
this testing, the sit-in is viewed as
A. systematic error.
B. random error.
C. test administration error.
D. background error.
5-7
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
28. A research study entails behavioral observation and rating of front desk clerks in the
hospitality industry to determine whether or not they greet guests with a smile. Which type of
error is this test most susceptible to?
A. test administration error
B. test construction error
C. examiner-related error
D. polling error
30. Stanley (1971) wrote that in classical test theory, a so-called "true score" is "not the
ultimate fact in the book of the recording angel." By this, Stanley meant that
A. it would be imprudent to trust in Divine influence when estimating variance.
B. the amount of test variance that is true relative to error may never be known.
C. it is near impossible to separate fact from fiction with regard to "true scores."
D. All of these
31. The term test heterogeneity BEST refers to the extent to which test items measure
A. different factors.
B. the same factor.
C. a unifactorial trait.
D. a nonhomogeneous trait.
5-8
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
34. Cronbach's alpha is to similarity of scores on test items as average proportional distance is
to
A. difference in scores on test items.
B. inter-item consistency.
C. test-retest reliability.
D. parallel forms reliability.
35. One of the problems associated with classical test theory has to do with
A. the notion that there is a "true score" on a test has great intuitive appeal.
B. the fact that CTT assumptions are often characterized as "weak."
C. its assumptions concerning the equivalence of all items on a test.
D. its assumptions allow for its application in most situations.
5-9
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
36. Which of the following is NOT an alternative to classical test theory cited in your text?
A. generalizability theory
B. representational theory
C. domain sampling theory
D. latent trait theory
38. The multiple-choice test items on this examination (yes, the one that your taking right at
this moment) are all examples of
A. dichotomous test items.
B. latent trait test items.
C. polytomous test items.
D. None of these
5-10
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
5-11
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
45. Why might ability test scores among testtakers most typically vary?
A. because of the true ability of the testtaker
B. because of irrelevant, unwanted influences
C. Both because of the true ability of the testtaker and because of irrelevant, unwanted
influences
D. None of these
5-12
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
48. Which type of reliability estimate is obtained by correlating pairs of scores from the same
person (or people) on two different administrations of the same test?
A. a parallel-forms estimate
B. a split-half estimate
C. a test-retest estimate
D. an au-paire estimate
49. Which type of reliability estimate would be appropriate only when evaluating the
reliability of a test that measures a trait that is presumed to be relatively stable over time?
A. parallel-forms
B. alternate-forms
C. test-retest
D. split-half
5-13
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
52. Which of the following is TRUE for estimates of alternate- and parallel-forms reliability?
A. Two test administrations with the same group are required.
B. Test scores may be affected by factors such as motivation, fatigue, or intervening events
like practice, learning, or therapy.
C. Item sampling is a source of error variance.
D. All of these
54. Which source of error variance affects parallel- or alternate-form reliability estimates but
does not affect test-retest estimates?
A. fatigue
B. learning
C. practice
D. item sampling
55. Which of the following types of reliability estimates is the most expensive due to the costs
involved in test development?
A. test-retest
B. parallel-form
C. internal-consistency
D. Spearman's rho
5-14
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
56. What term refers to the degree of correlation between all the items on a scale?
A. inter-item homogeneity
B. inter-item consistency
C. inter-item heterogeneity
D. parallel-form reliability
58. Which of the following is usually minimized when using split-half estimates of reliability
as compared with test-retest or parallel/alternate-form estimates of reliability?
A. time and expense
B. reliability and validity
C. reliability only
D. time spent in scoring and interpretation
59. Which of the following factors may influence a split-half reliability estimate?
A. fatigue
B. anxiety
C. item difficulty
D. All of these
5-15
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
63. Typically, adding items to a test will have what effect on the test's reliability?
A. Reliability will decrease.
B. Reliability will increase.
C. Reliability will stay the same.
D. Reliability will first increase and then decrease.
5-16
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
65. If items from a test are measuring the same trait, estimates of reliability yielded from split-
half methods will typically be as compared to estimates from KR-20.
A. higher
B. lower
C. similar
D. approximately the same
66. Which of the following is NOT an acceptable way to divide a test when using the split-
half reliability method?
A. Randomly assign items to each half of the test.
B. Assign odd-numbered items to one half and even-numbered items to the other half of the
test.
C. Assign the first-half of the items to one half of the test and the second half of the items to
the other half of the test.
D. Assign easy items to one half of the test and difficult items to the other half of the test.
67. If items on a test are measuring very different traits, estimates of reliability yielded from
split-half methods will typically be as compared with estimates from KR-20.
A. higher
B. lower
C. similar
D. approximately the same
5-17
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
68. KR-20 is the statistic of choice for tests with which types of items?
A. multiple-choice
B. true-false
C. All of these
D. None of these
70. Many assumptions must be met when using KR-21 to estimate reliability. Which is NOT
such an assumption?
A. Items should be dichotomous.
B. Items should be of equal difficulty.
C. Items should be homogeneous.
D. Items should be scorable by computer.
71. Which of the following is, generally speaking, the preferred statistic for obtaining a
measure of internal-consistency reliability?
A. KR-20
B. KR-21
C. Kendall's Tau
D. coefficient alpha
5-18
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
72. Coefficient alpha is appropriate to use with all of the following test formats EXCEPT
A. multiple-choice.
B. true-false.
C. short-answer for which partial credit is awarded.
D. essay exam with no partial credit awarded.
5-19
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
78. Which BEST conveys the meaning of an inter-scorer reliability estimate of .90?
A. Ninety percent of the scores obtained are reliable.
B. Ninety percent of the variance in the scores assigned by the scorers was attributed to true
differences and 10% to error.
C. Ten percent of the variance in the scores assigned by the scorers was attributed to true
differences and 90% to error.
D. Ten percent of the test's items are in need of revision according to the majority of the test's
users.
79. When more than two scorers are used to determine inter-scorer reliability, the statistic of
choice is
A. Pearson r.
B. Spearman's rho.
C. KR-20.
D. coefficient alpha.
5-20
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
80. For determining the reliability of tests scored using nominal scales of measurement, the
statistic of choice is
A. Kendall's Tau.
B. the Kappa statistic.
C. KR-20.
D. coefficient alpha.
82. Which type(s) of reliability estimates would be most appropriate for a measure of heart
rate?
A. test-retest
B. alternate-form
C. parallel form
D. internist consistency
83. If a time limit is long enough to allow test-takers to attempt all items, and if some items
are so difficult that no test-taker is able to obtain a perfect score, then the test is referred to as
a test.
A. speed
B. power
C. reliable
D. valid
5-21
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
85. Which type(s) of reliability estimates would be appropriate for a speed test?
A. test-retest
B. alternate-form
C. split-half from two independent testing sessions
D. All of these
86. Which of the following would result in the LEAST appropriate estimate of reliability for a
speed test?
A. test-retest
B. alternate-form
C. split-half from a single administration of the test
D. split-half from two independent testing sessions
87. A Kuder-Richardson (KR) or split-half estimate of reliability for a speed test would
provide an estimate that is
A. spuriously low.
B. spuriously high.
C. insignificant.
D. equal to a test-retest method.
5-22
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
88. A measure of clerical speed is obtained by a test that has respondents alphabetize index
cards. The manual for this test cites a split-half reliability coefficient for a single
administration of the test of .95. What might you conclude?
A. The test is highly reliable.
B. The published reliability estimate is spuriously low and would have been higher had
another estimate been used.
C. The split-half estimate should not have been used in this instance.
D. Clerical speed is too vague a construct to measure.
89. The Spearman-Brown formula can be used for which types of tests?
A. speed and multiple-choice
B. true-false and multiple-choice
C. speed, true-false, and multiple-choice
D. road or driving tests
5-23
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
93. Traditional measures of reliability are inappropriate for criterion-referenced tests because
variability
A. is maximized with criterion-referenced tests.
B. is minimized with criterion-referenced tests.
C. is variable with criterion-referenced tests.
D. cannot be determined with criterion-referenced tests.
94. If traditional measures of reliability are applied to a criterion-referenced test, the reliability
estimate will likely be
A. spuriously low.
B. spuriously high.
C. exactly zero.
D. None of these
95. The fact that the length of a test influences the size of the reliability coefficient is based on
which theory of measurement?
A. classical test theory (CTT)
B. generalizability theory
C. domain sampling theory
D. item response theory (IRT)
5-24
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
96. Which estimate of reliability is most consistent with the domain sampling theory?
A. test-retest
B. alternate-form
C. internal-consistency
D. interscorer
97. Classical reliability theory estimates the portion of a test score that is attributed to
, and domain sampling theory estimates .
A. specific sources of variation; error
B. error; specific sources of variation
C. the skills being measured; variation
D. the skills being measured; content knowledge
5-25
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
100. The standard deviation of a theoretically normal distribution of test scores obtained by
one person on equivalent tests is
A. the standard error of the difference between means.
B. the standard error of measurement.
C. the standard deviation of the reliability coefficient.
D. the variance.
101. Which of the following is NOT a part of the formula for the standard error of
measurement for a particular test?
A. the validity of the test
B. the reliability of the test
C. the standard deviation of the group of test scores
D. Both the reliability of the test and the standard deviation of the group of test scores
102. "Sixty-eight percent of the scores for a particular test fall between 58 and 61" is a
statement regarding
A. the utility of a test.
B. the reliability of a test.
C. the validity of a test.
D. None of these
103. The standard error of measurement of a particular test of anxiety is 8. A student earns a
score of 60. What is the confidence interval for this test score at the 95% level?
A. 52-68
B. 40-68
C. 44-76
D. 36-84
5-26
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
104. As the confidence interval increases, the range of scores into which a single test score
falls is likely to
A. decrease.
B. increase.
C. remain the same.
D. alternately decrease and increase.
106. If the standard deviations of two tests are identical but the reliability is lower for Test A
as compared to Test B, then the standard error of measurement will be for Test A
as compared with Test B.
A. higher
B. lower
C. the same
D. hard to tell
107. Which statistic can help the test user determine how large a difference must exist for
scores yielded from two different tests to be considered statistically different?
A. standard error of measurement between two scores
B. standard error of the difference between two scores
C. observed variance minus error variance
D. standard error of the difference between two means
5-27
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
108. The standard error of the difference between two scores is larger than the standard error
of measurement for either score because the standard error of the difference between the two
scores is affected by
A. the true score variance of each score.
B. the standard deviation of each score summed.
C. the measurement error inherent in both scores.
D. All of these
5-28
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
112. The universe score in Cronbach et al.'s generalizability theory analogous to the
in classical test theory.
A. coefficient of generalizability
B. true score
C. standard deviation
D. internal-consistency estimate
113. In classical test theory, there exists only one true score. In Cronbach generalizability
theory, how many "true scores" exist?
A. one
B. as many as the number of times the test is administered to the same individual
C. many, depending on the number of different universes
D. None of these
115. If a device to measure blood pressure consistently overestimated every assessee's actual
blood pressure by 10 units, which of the following could reasonably be expected to be TRUE
of the reliability of this measuring device as the years passed?
A. It would increase.
B. It would decrease.
C. It would not be affected.
D. It would alternately decrease and increase.
5-29
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
116. In general, which of the following is TRUE of the relationship between the magnitude of
the test-retest reliability estimate and the length of the interval between test administrations?
A. The longer the interval, the lower the reliability coefficient.
B. The longer the interval, the higher the reliability coefficient.
C. The magnitude of the reliability coefficient is typically not affected by the length of the
interval between test administrations.
D. The magnitude of the reliability coefficient is always affected by the length of the interval
between test administrations, but one cannot predict how it is affected.
117. What is the difference between alternate forms and parallel forms of a test?
A. Alternate forms do not necessarily yield test scores with equal means and variances.
B. Alternate forms are designed to be equivalent only with regard to level of difficulty.
C. Alternate forms are different only with respect to how they are administered.
D. There are no differences between alternate and parallel forms of a test.
119. In which type(s) of reliability estimates would test construction NOT be a significant
source of error variance?
A. test-retest
B. alternate-form
C. split-half
D. Kuder-Richardson
5-30
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
120. If the variance of either variable is restricted by the sampling procedures used, then the
magnitude of the coefficient of reliability will be
A. lowered.
B. raised.
C. unaffected.
D. affected only in tests with a true-false format.
123. If a student received a score of 50 on a math test with a standard error of measurement of
3, which of the following statements would be TRUE of the "true score"?
A. In 68% of the cases, the "true score" would be expected to be between 44 and 56.
B. In 68% of the cases, the "true score" would be expected to be between 47 and 53.
C. In 95% of the cases, the "true score" would be expected to be between 47 and 53.
D. In 95% of the cases, the "true score" would be expected to be between 44 and 56.
5-31
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
124. A psychologist administers a test and the test-taker scores a 52. If the cut-off score for
eligibility for a particular program is 50, what index will best help the psychologist determine
how much confidence to place in the test-taker's obtained score of 52?
A. the standard error of difference
B. the standard error of measurement
C. measures of central tendency: mean, median, or mode
D. measures of variability such as the standard deviation
125. Which of the following is TRUE of both the standard error of measurement and the
standard error of difference?
A. Both provide confidence levels.
B. Both can be used to compute confidence intervals for short answer tests.
C. Both can be used to compare performance between two different tests.
D. Both are abbreviated by SEM.
126. The meaning of reliability in the psychometric sense differs from the meaning of
reliability in the "every day" use of that word in that
A. reliability in the "every day sense" is usually "a good thing."
B. reliability in the psychometric sense is usually "a good thing."
C. reliability in the psychometric sense has greater implications.
D. None of these
5-32
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
127. A police officer administers a breathalyzer test to a suspected drunk driver, does not put
on his glasses to read the meter, and as a result, mistakenly records the blood alcohol level.
This is the kind of mistake that is BEST associated with which type of reliability estimates?
A. test-retest
B. inter-scorer
C. internal-consistency
D. situational
128. Which of the following statements is TRUE regarding the differences between a power
test and a speed test?
A. Power tests involve physical strength; speed tests do not.
B. In a power test, the testtaker has time to complete all items; in a speed test, a specific time
limit is imposed.
C. In a power test, a broad range of knowledge is assessed; in a speed test, a narrower range
of knowledge is assessed.
D. Both in a power test, the testtaker has time to complete all items; in a speed test, a specific
time limit is imposed and in a power test, a broad range of knowledge is assessed; in a speed
test, a narrower range of knowledge is assessed.
129. The index that allows a test user to compare two people's scores on a specific test to
determine if the true scores are likely to be different is
A. the standard error of the mean.
B. the standard error of the difference.
C. the standard deviation.
D. the correlation coefficient.
5-33
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
132. A test of attention span has a reliability coefficient of .84. The average score on the test
is 10, with a standard deviation of 5. Lawrence received a score of 64 on the test. We can be
95% sure that Lawrence's "true" attention span score falls between
A. 63 and 65.
B. 62 and 66.
C. 60 and 68.
D. 54 and 74.
5-34
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
134. Using estimates of internal consistency, which of the following tests would likely yield
the highest reliability coefficients?
A. a test of general intelligence
B. a test of achievement in a basic skill such as mathematics
C. a test of reading comprehension
D. a test of vocational interest
135. What type of reliability estimate is appropriate for use in a comparison of "Form A" to
"Form B" of a picture vocabulary test?
A. test-retest
B. alternate-forms
C. inter-rater
D. internal-consistency
136. What index of reliability would be BEST use to compare two evaluators' assessments of
a group of job applicants?
A. KR-20
B. coefficient alpha
C. the Kappa statistic
D. the Spearman-Brown correction
5-35
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
138. What type of reliability estimate is obtained by correlating pairs of scores from the same
person on two different administrations of the same test?
A. parallel-forms
B. split-half
C. interrater
D. test-retest
139. A test containing 100 items is revised by deleting 20 items. What might be expected to
happen to the magnitude of the reliability estimate for that test?
A. It will be expected to increase.
B. It will be expected to decrease.
C. It will be expected to stay the same.
D. It cannot be determined based on the information provided.
141. The greater the proportion of the total variance attributed to true variance, the more
the test.
A. scientific
B. variable
C. reliable
D. expensive
5-36
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
142. A score earned by a testtaker on a psychological test may BEST be viewed as equal to
A. the raw score plus the observed score.
B. the error score.
C. the true score.
D. the true score plus error.
145. Which of the following is TRUE about systematic and unsystematic error in the
assessment of physical and psychological abuse?
A. Few sources of unsystematic error exist, due to the nature of what is being assessed.
B. Few sources of systematic error exist.
C. Gender represents a source of systematic error.
D. None of these
5-37
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
146. In general, approximately what percentage of scores would be expected to fall within
two standard deviations above or below the standard error of measurement of the "true score"
on a test?
A. 85%
B. 90%
C. 95%
D. 99%
147. Manuel earns a 90 on a standardized math test. The standard error of measurement for
this test is 5. Approximately 95% of the scores fall between .
A. 85 and 95.
B. 80 and 100.
C. 80 and 100.
D. Cannot determine based on the information provided.
148. In Chapter 5 of your textbook, you read of the writing surface on a school desk that was
"riddled with heart carvings, the legacy of past years' students who felt compelled to express
their eternal devotion to someone now long forgotten." This imagery was designed to
graphically illustrate sources of error variance during test
A. development.
B. administration.
C. scoring.
D. interpretation.
5-38
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
149. In the Chapter 5 Meet an Assessment Professional feature, Dr. Bryce B. Reeve noted the
necessity for very brief questionnaires in his work due to the fact that many of his clients
were:
A. young children with very short attention spans.
B. seriously ill and would find taking tests burdensome.
C. visually impaired an unable to focus for an extended period of time.
D. All of these
150. In the Chapter 5 Meet an Assessment Professional feature, Dr. Bryce B. Reeve cited an
experience in which he learned that the "Excellent" response category on a test was best
translated as meaning in Chinese?
A. "super bad"
B. "superlative"
C. "bad"
D. None of these
151. The items of a personality test are characterized as heterogeneous in nature. This tells us
that the test measures
A. aspects of family history.
B. ability to relate to the opposite sex.
C. unconscious motivation.
D. more than one trait.
5-39
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
154. Most reliability coefficients, regardless of the specific type of reliability they are
measuring, range in value from
A. -1 to +1.
B. 0 to 100.
C. 0 to 1.
D. negative infinity to positive infinity.
156. The precise amount of error inherent in the reliability estimate published in a test manual
will vary with
A. the purchase price of the test (the more expensive, the less the error).
B. the sample of test-takers from which the data were drawn.
C. the population of test user actually using a published test.
D. All of these
5-40
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
158. A test of infant development contains three scales: (1) Cognitive Ability, (2) Motor
Development, and (3) Behavior Rating. Because these three scales are designed to measure
different characteristics (that is, they are not homogeneous), it would be inappropriate to
combine the three scales in calculating estimates of the test's
A. alternate-forms reliability.
B. internal-consistency reliability.
C. test-retest reliability.
D. interrater reliability.
159. The fact that young children develop rapidly and in "growth spurts" is a problem when it
comes to the estimation which type of reliability for an infant development scale?
A. internal-consistency reliability
B. alternate-forms reliability
C. test-retest reliability
D. interrater reliability
160. In the language of psychological testing and assessment, reliability BEST refers to
A. how well a test measures what it was originally designed to measure.
B. the complete lack of any systematic error.
C. the proportion of total variance that can be attributed to true variance.
D. whether or not a test publisher consistently publishes high quality instruments.
5-41
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
161. Because of the unique problems in assessing very young children, which of the following
would be the BEST practice when attempting to estimate the reliability of tests designed to
measure cognitive and motor abilities in infants?
A. Use relatively short test-retest intervals.
B. Use relatively long test-retest intervals.
C. Do not use the test-retest method for estimating reliability of the test.
D. Use only inter-scorer reliability estimates.
162. If the variance of either variable in a correlational analysis is restricted by the sampling
procedure used, then the resulting correlation coefficient tends to be
A. higher.
B. lower.
C. unaffected.
D. unstable.
163. If the variance of either variable in a correlational analysis is inflated by the sampling
procedure used, then the resulting correlation coefficient tends to be
A. higher
B. lower.
C. unaffected.
D. unstable.
5-42
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
164. The directions for scoring a particular motor ability test instruct the examiner to "Give
credit if the child holds his hands open most of the time." Because what constitutes "most of
the time" is not specifically defined, directions such as these could result in lowered reliability
estimates for
A. test-retest reliability.
B. alternate-form reliability.
C. inter-rater reliability.
D. parallel forms reliability.
165. A vice president (VP) of personnel employs a "Corporate Screening Test" in the hiring
process. For future testing purposes, the VP maintains records of scores achieved by
as opposed to in order to avoid restriction-of-range effects.
A. job applicants; hired employees
B. hired employees; job applicants
C. successful employees; hired employees
D. successful employees; other corporate officers
166. According to generalizability theory, a variable such as "number of items in the test" is a
description of one
A. facet of the universe.
B. true element of the dominion.
C. dominion in the domain.
D. None of these
5-43
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
167. Advocates of generalizability theory prefer the use of which of the following terms as an
alternative to the use of the term "reliability"?
A. generalizability
B. universality
C. regularity
D. dependability
169. As used in Chapter 5 of your text, the term inflation of variance is synonymous with
A. restriction of variance.
B. restriction of range.
C. inflation of range.
D. None of these
170. Test items with little discriminative ability prompt the test developer to consider the
possibility that
A. the content of the item does not match the construct measured by the other items in the
scale.
B. the item is poorly worded and needs to be rewritten.
C. the item is too complex for the educational level of the population.
D. All of these
5-44
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
171. The fact that cultural factors may be operating to weaken an item's ability to discriminate
between groups is evident from
A. a general lack of reliability in culture-specific tests.
B. latent trait theory.
C. Georg Rasch's unauthorized biography, You Can Never Be Too Rich or Too "Rasch."
D. None of these
173. According to generalizability theory, a variable such as "number of items in the test" is a
description of one
A. facet of the universe.
B. true element of the dominion.
C. dominion in the domain.
D. None of these
5-45
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
178. Why isn't IRT used more by "mom-and-pop" test developers such as classroom
teachers?
A. most classroom teachers were trained in generalizability theory
B. IRT has no application in classroom tests
C. applying IRT requires statistical sophistication
D. All of these
5-46
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 05 - Reliability
182. Latent trait models differ from classical test theory (CTT) in many key ways including
the fact that
A. in CTT, no assumptions are made about the frequency distribution of test scores.
B. latent trait models do not presume that test items can carry with them different "weight."
C. latent trait models typically provide for follow-up studies to support the existence of the
presumed trait.
D. All of these
5-47
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.