Академический Документы
Профессиональный Документы
Культура Документы
Chapter 05
Measurement, Reliability,
and Validity
Levels of Measurement
What You What You
Level Qualities Example
Can Say Can't Say
• Gender- (male
An observation
or female) Each
Assignment represents
Nominal • Preference observation
of "more" or "less
(like or dislike) belongs to its
labels than another
• Voting record own category
observation
(for- or against)
Assignment
of • Rank In One The amount that
Ordinal values along college observation rs one variable is
some • Order of ranked above or more or less
underlying finishing a race below another than another
dimension
One score The amount of
• Number of
differs from difference is an
EqU'11 words spelled
another on exact
Interval distances correctly
some measure representation
between • Intelligence
that has equally of differences on
points test scores
appearing the variable
• Temperature
intervals being studied
One value is
Meaningful
• Age twice as much as
Ratio and
• Weight another or no Not much I
nonarbitrary
• Time quantity of that
zero
variable can exist
A continuous variable is. one that can assume any value along some
underlying continuum.
You can have the sexiest-looking car on the road, but if the tires are
out of balance, you can forget good handling and a comfortable
ride. The tires, or where "the rubber meets the road," are crucial.
In the same way, you can have the most imaginative research
question with a well-defined, clearly articulated hypothesis, but if
the tools you use to measure the behavior you want to study are
faulty, you can forget your plans for success.
The reliability (or the consistency) and validity (or the does-what-it-
should qualities) of a measurement instrument are essential because
the absence of these qualities could explain why you act incorrectly
in accepting or rejecting your research hypothesis.
Method Error
Trait Error
Increasing Reliability
Test-Retest Reliability
Inter-Rater Reliability
lnter-rater reliability is a measure of the consistency from rater
to rater, rather than from time to time or even from test to test.
For example, let's say you are conducting
&tudy that measures aggression in preschool children. As part of the
study, you training several of your colleagues to collect data
accurately. You have developed a , scale consisting of a list of
different behaviors preschool children participate numbered 1 through
5, each representing a different type of behavior.
All raters are need to be in agreement with one another, inter-rater
reliability must be established.
This is done by having raters rate behavior and then examine
percentage of agreement between them.
The consequences of low inter-rater reliability can be serious. If one
of your rater misclassified 20% of the occurrences, it means that
20% of your data might be wrong. which can throw everything into
a tizzy!
Internal Consistency
Types of Validity
Content Validity
Content validity indicates the extent to which a test represents the
universe of items from which it is drawn, and it is especially helpful
when evaluating the usefulness of achievement tests or tests that
sample a particular area of knowledge.
Let's say you are dealing with eighth-grade history, and the unit deals
with the discovery of North America and the travels and travails of
several great European explorers. If you were to develop a history test
that asks questions about this period and wanted to establish the
validity of the questions.
you could show it to an expert in early American history and ask, "Do
these questions fairly represent the universe or
domain of early American history?"
You don't have to use such 25-cent words as universe
and domain, but you need to know whether you have covered what
you need to cover.
Criterion Validity
Criterion validity is concerned with either how well a test estimates
present performance (called concurrent validity) or how well it
predicts future performance (called predictive validity).
Criterion validity is most often used to evaluate the validity of
ability tests (current skills) and aptitude tests (potential skills).
The important thing about criterion validity is the use of a criterion.
Let's say you want to investigate the use of graduate school grades in
predicting which people in the Clinical psychology program will
become especially successful researchers. To that end, you locate a
sample of "good" researchers (as defined by the number of journal
articles they have published in the past 20 years). Then, you would
find out how well those researchers did as graduate students and how
well their school performance (or grades) predicted membership in
the "good" group. You might also want to locate a group of "not
good" researchers (or those who did not publish at all) and compare
how well their graduate school grades predicted membership in the
"good" or "not good" group. In this case, graduate school grades
would have predictive validity (of success as a researcher) if grades
(the test) predicted performance as a researcher (the criterion).
M. Jahangir Alam Chowdhury 5-21
University of Dhaka
Chapter 05 Business Research Methodology
Construct Validity
Construct validity examines whether test performance reflects an
underlying construct or set of related variables.
For example, construct validity allows one to say that a test labeled as
an "intelligence test" actually measures intelligence.
It is a time-consuming and often difficult type of validity to
establish, yet it is also the most desirable.
If you develop a set of test items based on the
construct and if you can show that the items reflect the contents of
the construct, then you are on your way to establishing the construct
validity of the test.
Therefore, the first step in the development of a test that has
construct validity is establishing the validity (in the most general
scientific terms) of the underlying construct on which the test will
be based.
Once the evidence for the validity of the construct is there, you then
could move on to the design of a test that reflects the construct.
Construct Validity
There is a variety of ways in which construct validity can be
established.
First, you can look for the correlation between the test you are
developing and some established test which has already been shown
to possess construct validity. This is a bit of a "chicken-and-egg"
problem because there is always the question of how construct
validity was first established.
Second, you can show that the scores on the newly designed test will
differ between groups of people with and without certain traits or
characteristics. For example, if you are developing a test for
aggression, you might want to compare the results for people
known to be aggressive with the results of those who are not.
Third, you can analyze the task requirements of the items and
determine whether these requirements are consistent with the theory
underlying the development of the test. If your theory of intelligence
says that memory is important, then you would expect
to have items that tap this ability on your test.
M. Jahangir Alam Chowdhury 5-23
University of Dhaka
Chapter 05 Business Research Methodology
Make sure your goals and objectives are clearly defined and
operationalized. Expectations of students should be written down.
Match your assessment measure to your goals and objectives.
Additionally, have the test reviewed by faculty at other schools to
obtain feedback from an outside party who is less invested in the
instrument.
Get others involved and look over the assessment for troublesome
wording, or other difficulties.
If possible, compare your measure with other measures, or data that
may be available.
A test can be reliable but not valid, but a test cannot be valid
without first being reliable.
In other words, reliability is a necessary, but not sufficient,
condition of validity.
2+2=?
Now, we can almost guarantee that this is a reliable item because it
is likely to result in a consistent assessment of whether the person
taking the test knows simple addition.
But what if we named it a spelling test? It is obviously not a test of
spelling and would certainly be invalid as such. This lack of
validity, however, does not affect the test's
reliability.
Thank You