Validity & Reliability

VALIDITY
AND
RELIABILITY
Measurement and Evaluation in Education
Graduate School- LSU Ozamis
Presenter: Euprime B. Regalado
Validity
• A degree to which a test measures
what it is supposed to measure
• The most central and essential
quality in the development,
interpretation, and use of
educational measures.
Evidence
• Is used to determine whether a
test is measuring what it is
supposed to measure
• Grouped into three categories:
– Construct-related evidence
– Content-related evidence
– Criterion-related evidence
Construct-Related Evidence
• Establishes a link between
the underlying
psychological construct
we wish to measure and the
visible performance we
choose to observe
• In measuring learned
knowledge, bear in mind the
question:
“ How would I indicate to others

that I have this knowledge?”
Example:
• Give other examples of a
rectangle
• What happens to electrical
voltage if resistance is
decreased while current
remains constant?
Content-Related Evidence
• Indicates how well the
content of a test
corresponds to the
student performance to
be observed.
2 techniques
• Table of Specifications
(TOS)
• Performance Objectives
Performance Objectives for a test on validity
• Describe characteristics • No. of questions

of each of each of the 6
methods used to
establish evidence of
validity
• When provided an
example of evidence
used to establish
validity, classify by type
4
the evidence being
illustrated
Other Procedures Used to Judge
the Relevance of Test Content
Face Validity
• Test questions have

face validity when they
appear to be relevant
to the group being
examined.
Face Validity
Example:
Question:
How many square feet
are contained in a 4ft by
6ft rectangle?
How many square feet are contained
in a 4ft by 6ft rectangle?
Group Tested Question
Carpenters •How many square feet are
contained in a 4ft by 6ft piece of
plywood?
Glass Cutters •How many square feet of glass
are contained in a 4ft by 6ft
Painters window?
•How many square feet of
painted surface are contained in
a 4ft by 6ft area?
Curricular and Instructional Validity
• An evaluation of the
extent to which the
content of a test agrees
with the content of
instruction
Criterion-Related Evidence
• Indicates how well performance on a test
correlates with performance on relevant
criterion measures external to the test.
“If a test is valid, with what other

things should performance on
the test correlate?”
A teacher’s present knowledge
of a student is used to as the
criterion for informal
assessment.
Factors Affecting Validity
• Test-related factors
• Functioning Content and
Teaching Procedures
• Intervening events
• Nature of the group and the criterion
• Reliability
RELIABILITY
•The consistency of
measurements
A RELIABLE TEST
Produces similar scores
across various conditions
and situations, including
different evaluators and
testing environments.
How do we account for an individual who
does not get exactly the same test score
every time he or she takes the test?
1. Test-taker’s temporary psychological

or physical state
2. Environmental factors
3. Test form
4. Multiple raters
TEST-RETEST RELIABILITY
Suggests that subjects tend

to obtain the same score
when tested at different
times.
Split-Half Reliability
Sometimes referred to as internal
consistency
Indicates that subjects’ scores on

some trials consistently match
their scores on other trials
INTERRATER RELIABILITY
Involves having two raters independently
observe and record specified behaviors,
such as hitting, crying, yelling, and getting
out of the seat, during the same time period
TARGET BEHAVIOR
A specific behavior the
observer is looking to record
ALTERNATE FORMS RELIABILITY
• Also known as equivalent forms reliability or
parallel forms reliability
• Obtained by administering two equivalent
tests to the same group of examinees
• Items are matched for difficulty on each test
• It is necessary that the time frame between
giving the two forms be as short as possible
OBTAINED SCORE
•The score you get when you administer a test
•Consists of two parts: the true score and the
error score
STANDARD ERROR of
MEASUREMENT (SEM)
Gives the margin or error that you should
expect in an individual test score because of
imperfect reliability of the test
FACTORS AFFECTING RELIABILITY
1. Test length
2. Test-retest interval
3. Variability of scores
4. Guessing
5. Variation within the test
situation
Let the target ( bull’s eye), be the content objective
Let the distance of the darts to the target be the measure of validity
Let the distance of the darts to each other be the TEST SCORES
RELIABLE RELIABLE
VALID
THUS, A RELIABLE TEST IS NOT ALWAYS VALID
Thank you
for
LISTENING!

Validity & Reliability

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Validity & Reliability

Загружено:

Авторское право:

Доступные форматы

VALIDITY

“ How would I indicate to others

• Describe characteristics • No. of questions

• Test questions have

“If a test is valid, with what other

1. Test-taker’s temporary psychological

Suggests that subjects tend

Indicates that subjects’ scores on

Вам также может понравиться

Validity &amp; Reliability

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Validity &amp; Reliability

Загружено:

Авторское право:

Доступные форматы

VALIDITY

“ How would I indicate to others

• Describe characteristics • No. of questions

• Test questions have

“If a test is valid, with what other

1. Test-taker’s temporary psychological

Suggests that subjects tend

Indicates that subjects’ scores on

Вам также может понравиться

Validity & Reliability

Validity & Reliability