Вы находитесь на странице: 1из 45

Reliability

• Reliability refers to the consistency of test scores and


other evaluation results from one measurement to
another.
• Consistency of test results is considered with time,
i.e. change of time in a test can bring any difference
in scores.
• A test or any evaluation procedure is considered
reliable if results are consistent over different
occasions and over different samples of the same
domain.

12/27/2019 1
• Therefore, reliability refers to the consistency
of a measure. A test is considered reliable if
we get the same result repeatedly. For example,
if a test is designed to measure a trait (such as
reading ability), then each time the test is
administered to a subject, the results should be
approximately the same.

12/27/2019 2
Points to Note in Clarifying
Reliability
1. Reliability refers to the results obtained
with an evaluation instrument and not to
the instrument itself.
2. An estimate of reliability always refers to
a particular type of consistency. They
are reliable over different periods of time,
over different samples of questions, over
different raters, and other like.

12/27/2019 3
3. Reliability is a necessary condition but not
sufficient to make a test valid.
4. Reliability is primarily statistical. The
logical analysis of a test will provide little
evidence concerning the reliability of the
scores. Thus, the test must be
administered in one or more times, to an

12/27/2019 4
appropriate group of persons and the
consistency of the results determined.

12/27/2019 5
• Unfortunately, it is impossible to calculate
reliability exactly, but there are several
different ways estimate reliability. These are;
1. Test-Retest Method
2. Equivalent-Forms Method
3. Test-Retest with equivalent-forms method
4. Split-Half Method
5. Kuder-Richardson Method
12/27/2019 6
Test-Retest Method

• To estimate reliability by means of the


test-retest method, the same test is
administered twice to the same pupils with
a given time interval between the two
administrations.

12/27/2019 7
• The test scores are correlated, and this
correlation coefficient provides a measure
of stability, that is, it indicates how stable
the test results are over the given period of
time.
• One important factor to keep in mind when
interpreting measures of stability is the
time interval between tests.

12/27/2019 8
If time is short, say a day or two, the
constancy of the results will be increased
because pupils will remember some of
their answers from the first test to the
second. If the time interval is long, say
about a year, the results will be influenced
not only by the instability of the testing
procedure but also by the actual changes
in the pupils over that period of time.
12/27/2019 9
• In general the longer the time interval is,
between test and retest, the more the results
will be influenced by changes in the pupil
characteristic being measured and the smaller
the reliability coefficient will be.
• The best time interval between test
administrations will depend largely on the
use to be made of the results.

12/27/2019 10
Equivalent-Forms Method
• Equivalent-forms reliability is estimated by
comparing two different tests that were
created using the same content. This is
accomplished by creating a large pool of test
items that measure the same quality and then
randomly dividing the items into two
separate tests. The two tests should then be
administered to the same subjects at the same
time and the resulting test scores are
correlated.
12/27/2019 11
This correlation coefficient provide a measure of
equivalence. Thus it indicates the degree to which
both forms of the test are measuring the same aspects
of behaviour.
The equivalent-forms method tell us nothing about the
long-term stability of the pupils characteristics being
measured but, rather, reflects short-term constancy of
pupils performance and the extent to which the test
represents an adequate sample of the characteristics
being measured.
12/27/2019 12
• Note: if equivalent forms are constructed,
a detailed table of specifications should be
prepared.

12/27/2019 13
Test-Retest with Equivalent-Forms
Method
• The equivalent-forms method is
sometimes used with a time interval
between the administration of the two
forms of the test. Under these test-retest
conditions, the resulting reliability
coefficient provides a measure of stability
and equivalence.

12/27/2019 14
• This is the most accurate test reliability
because the stability of the testing
procedures, the constancy of the pupils
characteristic being measured, and the
representativeness of the sample of tasks
included in the test all are taken into
account.

12/27/2019 15
Split-Half Method
• Reliability can also be estimated from a single
administration in a single form of a test. The
test is administered to a group of pupils in the
usual manner and then is divided in half for
scoring purposes.
• To split the test into halves that are most
equivalent, the usual procedures is to score the
even-numbered and odd numbered items
separately, for example:

12/27/2019 16
• Sum of number of Sum of number of Full Test
• Odd items correct Even items correct
• _______________ _______________ _________
• Items 1 Items 2 1.
• 3 4 2.
• 5 6 3.
• . . 4.
• . . 5.
• . . 6.
• Odd Score = 40 Even Score = 42 Total Score = 82

12/27/2019 17
• The two scores of each pupil when
correlated, provide a measure of internal
consistency. This coefficient indicates the
degree to which consistent results are
obtained from the two halves of test.
• To estimate the scores’ reliability based on
full length test, the Spearman-Brown
Formula is usually applied. Which is;

12/27/2019 18
Reliability on full test =
Two times correlation between half-tests
One plus correlation between half-tests
Mathematically can be expressed as;
Reliability on full test = 2Xr
1+r

12/27/2019 19
• For example: If the correlation coefficient
between the test’s two halves is 0.60 the
reliability of full test will be;
• Reliability on full test = 2X0.60
1+0.60
= 1.2 = 0.75
1.6

12/27/2019 20
• This correlation coefficient 0.75 estimates
the reliability of full test when the half-test
correlated 0.60.

12/27/2019 21
• The split-half method is similar to the equivalent-
forms method in that it indicates the extent to
which the sample of test items is a dependable
sample of the content being measured.
• A high correlation between scores on the two
halves of a test denotes the equivalence of the
two halves and consequently the adequacy of
the sampling.

12/27/2019 22
Split-Half reliabilities tend to be higher than
equivalent-form reliabilities because the
split-half method is based on the
administration of a single test form.

12/27/2019 23
Kuder-Richardson Method
• The Kuder-Richardson reliability- like the
split-half method, is determined from a
single administration of a test. However,
unlike the split-half method, it is not
necessary to score each half separately.

12/27/2019 24
• An advantage of Kuder-Richardson
method is that it avoids the problem of
deciding in which way a test should be
split. Since the test is not split into parts for
separate scoring, there is no need to use
the Spearman-Brown formula. Like other
reliability coefficients, Kuder-Richardson
reliabilities value range in value from 0 to
1.0.
12/27/2019 25
Two Kuder-Richardson formulas are KR20
and KR21;

n  SD   pq 
2

KR20   
n 1  SD 2

Where n = the number of items on the test
SD2 = the variance of scores (the standard
deviation squared)
12/27/2019 26
P = the difficult level of each item or the
proportion of the group that responded an
item correctly.

• q = the proportion of students who failed


the item, or 1-p.

12/27/2019 27
• The formula for KR21is
   
2
M
 M 
n  n 
KR21  1
n 1  SD 2 
 
• where  

• n = number of items on the test


• M = Mean score on the test
• SD2 = variance, or SD squared
12/27/2019 28
• Kuder-Richardson formula (KR20 and
KR21) yield identical value only if items are
of equal difficulty (i.e. have equal p values)
if the p values vary among the items, KR21
Consistently will underestimates KR20.

12/27/2019 29
Objectivity
• Objective tests are objectively marked and
thus reduce the effect of scoring
procedure. On the other hand, subjective
(essay) tests are subjectively marked and
affect highly the scoring procedure.

12/27/2019 30
Usability
• Usability is another important
characteristic of measuring instruments.
Because practical considerations of the
evaluation instruments cannot be
neglected. The test must have practical
value from time, economy, and
administration point of view. This may be
termed as usability.

12/27/2019 31
Practical aspects must be
taken into account:
• Ease of administration:
It means the test should be easy to
administer so that the general class-
room teachers can use it. Therefore
simple and clear directions should be
given. The timing of the test should
not be too difficult.

12/27/2019 32
• Time required for administration:
Appropriate time limit to take the test should
be provided.
• Ease of interpretation and Application:
Another important aspect of test scores are
interpretation and application of test
results. If results are misinterpreted, it is
harmful on the other hand if it is not
applied, then it is useless.
12/27/2019 33
• Cost of testing:
A test should be economical from
preparation, administration and scoring
point of view.

12/27/2019 34
Factors Influencing Reliability
Measures
1. Factors related to test.
2. Factors related to testee.
3. Factors related to testing procedure.

12/27/2019 35
Factors related to test
• Length of Test
In general, the longer the test is, the higher
its reliability will be. This is because a
longer test will provide a more adequate
sample of the behaviour being measured,
and the scores are suitable to be less
distorted by the chance factors such as
guessing.

12/27/2019 36
Spread of scores
• Reliability coefficients are directly
influenced by the spread of scores in the
group tested. Other things being equal, the
larger spread of scores is, the higher the
estimate of reliability will be.
• Larger reliability of coefficients result when
individuals tend to stay in the same
relative position in a group from one test to
another.

12/27/2019 37
• Another way to consider the effects of
variability on reliability is to see what
happens when variability is 0. in this
extreme case, all individuals receive the
same score, and the standard deviation is
0. Each pupil is at mean and has a Z-
score of 0.

12/27/2019 38
Difficulty of Test
• A very easy or hard test is incapable of
measuring individuals differences because
examinees all tend to respond uniformly.
This is because both easy and difficult
result in a restricted spread of scores. For
the difficult test, the scores are grouped
together at the bottom end of the scale.
For the easy test, the scores are close
together at the top end of the scale.
12/27/2019 39
• For both, however, the differences among
individuals are small and tend to be
unreliable. Therefore, when variability is
low, reliability will usually be low also.

12/27/2019 40
Factors related to testee
• Heterogeneity of the group:
When the group is homogeneous group the
spread of the scores is likely to be less
and when the group tested is a
heterogeneous group the spread of scores
is likely to be more. Therefore reliability
coefficient for a heterogeneous group will
be more than homogeneous group.

12/27/2019 41
• Test wiseness of the students:
Experience of test taking also affect the
reliability of test scores. Practice of the
students in taking sophisticated tests
increase the test reliability. But when in a
group all the students do not have same
level of test wiseness, it leads to greater
measurement errors.

12/27/2019 42
• Motivation of the students:
When the students are not motivated to take
the test, they will not represent their best
achievement. This depresses the test
scores.

12/27/2019 43
Factors related to testing
procedure
• Time limit of test
• Cheating opportunity given to the
students

12/27/2019 44
reference

http://www.yourarticlelibrary.com/education/t
est/top-4-characteristics-of-a-good-
test/64804

12/27/2019 45

Вам также может понравиться