Вы находитесь на странице: 1из 5

RESEARCH METHODOLOGY

LESSON 12:
MEASUREMENT & SCALING

Levels of Measurement • o Most opinion and attitude scales or indexes in the social
We know that the level of measurement is a scale by which a sciences are ordinal in nature
variable is measured. For 50 years, with few detractors, science
Interval Level of Measurement
has used the Stevens (1951) typology of measurement levels
The interval level of measurement describes variables that have
(scales). There are three things, which you need to remember about
more or less equal intervals, or meaningful distances between
this typology: Any thing that can be measured falls into one of
their ranks. For example, if you were to ask somebody if they
the four types
were first, second, or third generation immigrant, the assumption
The higher the level of measurement, the more precision in is that the distance, or number of years, between each generation
measurement and every level up contains all the properties of the is the same.
previous level. The four levels of measurement, from lowest to
Ratio Level of Measurement
highest, are as follows:
The ratio level of measurement describes variables that have equal
• Nominal intervals and a fixed zero (or reference) point. It is possible to
• Ordinal have zero income, zero education, and no involvement in crime,
• Interval but rarely do we see ratio level variables in social science since it’s
• Ratio
almost impossible to have zero attitudes on things, although
“not at all”, “often”, and “twice as often” might qualify as ratio
level measurement.
Advanced statistics require
• At least interval level measurement, so the researcher always
strives for this level,
• Accepting ordinal level (which is the most common) only
when they have to.
• Variables should be conceptually and operationally defined
with levels of measurement in mind since it’s going to affect
the analysis of data later

Types of Measurement Scales


Ordinal and nominal data are always discrete. Continuous data
has to be at either ratio or interval level of measure Now let us
discuss these in detail:
Nominal Level of Measurement
Nominal variables include demographic characteristics like sex,
race, and religion. The nominal level of measurement describes
variables that are categorical in nature. The characteristics of the
data you’re collecting fall into distinct categories:
• If there are a limited number of distinct categories (usually
only two), then you’re dealing with a dichotomous variable.
• If there are an unlimited or infinite number of distinct
categories, then you’re dealing with a continuous variable.
Ordinal Level of Measurement
• The ordinal level of measurement describes variables that can
be ordered or ranked in some order of importance.
• It describes most judgments about things, such as big or
little, strong or weak.

© Copy Right: Rai University


74 11.556
screened out spurious variables as well as thought out the possible

RESEARCH METHODOLOGY
contamination of other variables creeping into your study.
Anything you do to standardize or clarify your measurement
instrument to reduce user error will add to your reliability.
It’s also important consider the time frame that is appropriate for
what you’re studying as soon as possible. Some social and
psychological phenomena (most notably those involving
behaviour or action) lend themselves to a snapshot in time.
If so, your research need only be carried out for a short period of
time, perhaps a few weeks or a couple of months. In such a case,
your time frame is referred to as cross-sectional. Sometimes, cross-
sectional research is criticized as being unable to determine cause
and effect A longer time frame is called when cross-sectional data
fails to depict the cause- effect relationship, one that is called
longitudinal, which may add years onto carrying out your research.
There are many different types of longitudinal research, such as
those that involve time-series (such as tracking a third world nation’s
economic development over four years or so). The general rule is
to use longitudinal research the greater the number of variables
you’ve got operating in your study and the more confident you
want to be about cause and effect.
Methods of Measuring Reliability
Now, the question arises that how will you measure the reliability
of a particular measure? There are four good methods of measuring
reliability:
• Test-retest
• Multiple forms

Figure: Levels Of Measurement • Inter-rater


• Split-half
Reliability and Validity
For a research study to be accurate, its findings must be both • Test-retest
reliable and valid. Test-retest
Reliability The Test Retest in the same group technique is to administer your
Research means that the findings would be consistently the same test, instrument, survey, or measure to the same group of people
if the study were done over again at different points in time. Most researchers administer what is
called a pretest for this, and to troubleshoot bugs at the same
Validity time.
A valid measure is one that provides the information that it was All reliability estimates are usually in the form of a correlation
intended to provide. The purpose of a thermometer, for example, coefficient, so here, all you do is calculate the correlation coefficient
is to provide information on the temperature, and if it works between the two scores of each group and report it as your reliability
correctly, it is a valid thermometer. coefficient.
A study can be reliable but not valid, and it cannot be valid without
Multiple Forms
first being reliable. There are many different threats to validity as
The multiple forms technique has other names, such as parallel
well as reliability but an important early consideration is to ensure
forms and disguised test-retest, but it’s simply the scrambling or
you have internal validity.
mixing up of questions on your survey, for example, giving it to
the same group twice. It’s a more rigorous test of reliability.
Not Reliable (so not valid either)
Reliable but not Valid Reliable AND Valid
Inter-rater
Inter-rater reliability is most appropriate when you use assistants
to do interviewing or content analysis for you. To calculate this
kind of reliability, all you do is report the percentage of agreement
on the same subject between your raters, or assistants.
This means that you are using the most appropriate research design
Split-half
for what you’re studying (experimental, quasi-experimental,
Taking half of your test, instrument, or survey, and analyzing
survey, qualitative, or historical), and it also means that you have
that half as if it were the whole thing estimate split-half reliability.

© Copy Right: Rai University


11.556 75
Then, you compare the results of this analysis with your overall • Attitudes do not change much over time
RESEARCH METHODOLOGY

analysis. • Attitudes produce consistency in behavior.


Methods of Measuring Validity • Attitudes can be related to preferences.
Once you find that your measurement of variable under study is Attitudes can be measured using the following procedures:
reliable, you will want to measure its validity. There are four good
• Self-reporting - subjects are asked directly about their
methods of estimating validity:
attitudes. Self-reporting is the most common technique used
• Face to measure attitude.
• Content • Observation of behaviour - assuming that one’s behaviour
• Criterion is a result of one’s attitudes, attitudes can be inferred by
• Construct observing behaviour. For example, one’s attitude about an
issue can be inferred by whether he/she signs a petition
Face Validity related to it.
Face validity is the least statistical estimate (validity overall is not as
• Indirect techniques - use unstructured stimuli such as word
easily quantified as reliability) as it’s simply an assertion on the
researcher’s part claiming that they’ve reasonably measured what association tests.
they intended to measure. It’s essentially a “take my word for it” • Performance of objective tasks - assumes that one’s
kind of validity. Usually, a researcher asks a colleague or expert in performance depends on attitude. For example, the subject
the field to vouch for the items measuring what they were intended can be asked to memorize the arguments of both sides of
to measure. an issue. He/she is more likely to do a better job on the
arguments that favor his/her stance.
Content Validity
Content validity goes back to the ideas of conceptualization and • Physiological reactions - subject’s response to a stimulus is
operationalization. If the researcher has focused in too closely on measured using electronic or mechanical means. While the
only one type or narrow dimension of a construct or concept, intensity can be measured, it is difficult to know if the
then it’s conceivable that other indicators were overlooked. In attitude is positive or negative.
such a case, the study lacks content validity Content validity is • Multiple measures - a mixture of techniques can be used to
making sure you’ve covered all the conceptual space. validate the findings; especially worthwhile when self-
There are different ways to estimate it, but one of the most reporting is used.There are several types of attitude rating
common is a reliability approach where you correlate scores on scales:
one domain or dimension of a concept on your pretest with Attitude Measurement
scores on that domain or dimension with the actual test. Many of the questions in a marketing research survey are designed
Another way is to simply look over your inter-item correlations. to measure attitudes. Attitudes are a person’s general evaluation
of something. Customer attitude is an important factor for the
Criterion Validity
following reasons:
Criterion validity is using some standard or benchmark that is
known to be a good indicator. There are different forms of • Attitude helps to explain how ready one is to do something.
criterion validity: • Attitudes do not change much over time.
• Concurrent validity is how well something estimates actual • Attitudes produce consistency in behavior.
day-by-day behavior; • Attitudes can be related to preferences.
• Predictive validity is how well something estimates some
Scaling Defined
future event or manifestation that hasn’t happened yet. It is Scaling is a “procedure for the assignment of numbers (or other
commonly found in criminology. symbols) to a property of objects in order to impart some of the
Construct Validity characteristics of numbers to the properties in question.”1 Thus,
Construct validity is the extent to which your items are tapping one assigns a number scale to the various levels of heat and cold
into the underlying theory or model of behavior. It’s how well and call it a thermometer.
the items hang together (convergent validity) or distinguish Response Methods
different people on certain traits or behaviors (discriminant Questioning is a widely used stimulus for measuring concepts. A
validity). It’s the most difficult validity to achieve. You have to manager may be asked his or her views concerning an employee.
either do years and years of research or find a group of people to The response is,” a good machinist,” “a troublemaker,” “a union
test that have the exact opposite traits or behaviors you’re interested activist,” “reliable,” or “a fast worker with a poor record of
in measuring. attendance.” These answers represent different frames of reference
Attitude Measurement for evaluating the worker and are often of limited value to the
Many of the questions in a questionnaire are designed to measure researcher.
attitudes. Attitudes are a person’s general evaluation of something. Two approaches improve the usefulness of such replies. First,
Customer attitude is an important factor for the following reasons: various properties may be separated arid the respondent asked to
• Attitude helps to explain how ready one is to do something. judge each specific facet. Here, several questions are substituted for

© Copy Right: Rai University


76 11.556
a single one. Second, we can replace the free-response reply with Example of a Likert Scale

RESEARCH METHODOLOGY
structuring devices. How would you rate the following aspects of your food store?
To quantify dimensions that are essentially qualitative, rating scales Extremely Extremely
or ranking scales are used. Important unimportant
Rating Scales Service 1 2 3 4 5 6 7
One uses rating scales to judge properties of objects without Check outs 1 2 3 4 5 6 7
reference to other similar objects. These ratings may be in such Bakery 1 2 3 4 5 6 7
forms as “like-dislike,” “approve-indifferent disapprove,” or other Deli 1 2 3 4 5 6 7
classifications using even more categories. There is little conclusive
Semantic Differential scale
support for choosing a three-point scale over scales with five or
A semantic differential scale is constructed using phrases describing
more points. Some researchers think that more points on a rating
attributes of the product to anchor each end. For example, the left
scale provide an opportunity for greater sensitivity of measurement
end may state, “Hours are inconvenient” and the right end may
and extraction of variance. The most widely used scales range
state, “Hours are convenient”. The respondent then marks one
from three to seven points, but it does not seem to make much
of the seven blanks between the statements to indicate his/her
difference which number is used-with two exceptions.4 First, a
opinion about the attribute.
larger number of scale points is needed to produce accuracy with
single-item versus multiple-item scales. Second, in cross-cultural The process entitled Semantic Differential employs a similar
measurement, the culture may condition respondents to a standard approach as the Likert scaling in that it seeks a range of responses
metric-a ten-point scale in Italy. between extreme polarities but it seeks to place the ordinal range
of responses between two keywords expressing opposite “ideas”
Ranking Scales
or concepts.Bobbie’s illustration provides the best illustration of
In ranking scales, the subject directly compares two or more objects
the concept.
and makes choices among them. Frequently, the respondent is
asked to select one as the “best” or the “most preferred.” When
there are only two choices, this approach is satisfactory, but it often
results in “ties” when more than two choices are found. For Semantic Differential: Feelings about Musical Selections
example, respondents are asked to select the most preferred among
three or more models of a product. Assume that 40 percent choose Very Some- Neither Some- Very
Much what what Much
model A, 30 percent choose model B. and 30 percent choose model
C. “Which is, the preferred model?” The analyst would be taking Enjoyable Unenjoyable

a risk to suggest that A is most preferred. Perhaps that


interpretation is correct, but 60 percent of the respondents chose Simple Complex

some model other than A. Perhaps all B and C voters would place
A last, preferring either B or C to it. This ambiguity can be avoided Discordant Harmonic
by using some of the techniques described in this section.
Some of the measurement scales are discussed below:
Traditional Modern

Equal-appearing Interval Scaling


In this scale a set of statements are assembled. These statements
are selected according to their position on an interval scale of One of the first things that strike you is the highly interpretative
favorableness. Statements are chosen that has a small degree of nature of Bobbie’s example. Choices such as “enjoyable” and “un
dispersion. Respondents then are asked to indicate with which enjoyable” simply reflect preference, but the other choices are
statements they agree. sufficiently ambiguous as to invite imprecise understanding.
Likert Method of Summated Ratings If you are seeking nothing more than attitudinal information to
In this scale a statement is made and the respondents indicate an abstract social artifact such as a piece of music, the process of
their degree of agreement or disagreement on a five-point scale semantic differential may be usable. Otherwise, its ambiguity in
(Strongly Disagree, Disagree, Neither Agree Nor Disagree, Agree, application remains problematic.
Strongly Agree). It actually extends beyond the simple ordinal As with the Likert, Bogardus and Thurstone scales, Guttman
choices of “strongly agree”, “agree”, “disagree”, and “strongly scaling seeks to place indicators into an ordinal progression from
disagree” In fact, Likert scaling is initially assigned through a “weak” indicators to “strong” ones (well, that’s the difference
process that calculates the average index score for each item in an between a scale and an index in the first place).
index and subsequently ranks them in order of intensity (recall
Similarly, the assumption that a respondent indicating a given
the process for constructing Turnstone scales). Once ordinality
level of preference, attitude or belief will also demonstrate all
has been assigned, the assumption is that a respondent choosing “weaker” indicators of the same thing.
a response weighted with say a 15 out of 20 in an increasing scale
However, the premise of the Guttman scale extends even further,
of intensity is placed at that level for the index.
in that it examines all of the responses to the survey and separates
out the number of responses that do not exactly reflect the scalar

© Copy Right: Rai University


11.556 77
pattern; that is the number of response sets that do not reflect the Clean ___ ___ ___ ___ ___ dirty
RESEARCH METHODOLOGY

assumption that a respondent choosing one level of response Bright ___ ___ ___ ___ ___ dark
would give the same type of response to all inferior levels.
Low quality ___ ___ ___ ___ __high quality
The number of response sets that violate the scalar pattern is
compared to the number that do reflect the pattern and what is Conservative ___ ___ ___ ___ __innovative
referred to as a coefficient of reproducibility. Again, Bobbie’s Stapel Scale
illustration provides a very clear understanding. It is similar to the semantic differential scale except that numbers
Guttman Scaling and Coefficient of Reproducibility identifies points on the scale, only one statement is used and if
the respondent disagrees a negative number should marked, and
Response Number Index Scale Total there are 10 positions instead of seven. This scale does not require
Pattern of Cases Scores Scores Scale Errors
+++
that bipolar adjectives be developed and it can be administered by
612 3 3 0
Scale Types ++= 448 2 2 0 telephone.
+== 92 1 1 0
=== 79 0 0 0 Q-sort Technique
=+= 15 1 2 15 In Q- sort Technique the respondent if forced to construct a
Mixed Types +=+ 5 2 3 5 normal distribution by placing a specified number of cards in one
==+ 2 1 0 2
=++ 5 2 3 5 of 11 stacks according to how desirable he/she finds the
characteristics written on the cards. This technique is faster and less
Number of Errors
Coefficient of Reproducibility = 1 - tedious for subjects than paired comparison measures. It also
Number of Guesses
forces the subject to conform to quotas at each point of the scale
27 27
In the example = 1 -
1,258 x 3
=
3,774
= .993 or 99.3% so as to yield a normal or quasi – normal distribution.
Thus we can say that the objective of Q-Technique is intensive
The entire exercise is really just a way of indicating that the degree study of individuals.
to which a set of responses accurately reflects the scalar assumptions
Selection of an appropriate attitude measurement of scale:
is an indication of the degree to which the entire set could be
recreated from the scale itself. What the above illustration shows We have examined a number of different techniques, which are
is that if we were to project an imaginary “sample” from the available for the measurement of attitudes. Each method has got
coefficient of reproducibility of 99.3%, then the projection would certain strengths and weaknesses. Almost all the techniques can be
reflect the real sample to that degree. Guttman scaling shows that used for the measurement of any component of attitudes. But all
a well constructed scale can very accurately the profile of a response the techniques are not suitable for all purposes. The selection
set. But then, you only know the coefficient of reproducibility depends upon the stage and size of research.
after you have run the survey and crunched the numbers so it is Generally, Q-sort and Semantic differential scale are preferred in
not a predictive tool, it is a proof of the strength of the scale as a the preliminary stages. The Likert scale is used for item analysis.
measure. For specific attributes the semantic differential scale is very
A brief word on typologies is in order. So far, we have limited appropriate. Overall the semantic differential is simple in concept
ourselves to an examination of unidirectional variables; that is and results obtained are comparable with more complex, one-
one thing in one direction (attitudes for or against abortion, etc.). dimensional methods. Hence it is widely used.
Often relationships are better explained as the function of the Limitations of Attitude Measurement Scales
intersection of several variables. This is referred to as a typology. The main limitation of these tools is the emphasis on describing
Remember what we have noted about making sure that your attitudes rather than predicting behaviour. This is primarily because
indices and scales are comprised of single dimension indicators. of a lack of models that describe the attitudes in behaviour
Recall that while “religion” can have a strong correlation with
“attitudes on abortion”, that does not mean that a question on Tutorial
religion belongs in an index or scale of questions on “attitudes on Prepare a questionnaire on any one of the following objectives
abortion”. But, if you wish to examine the intersection of the 1. To know the corporate productivity
two, you can construct a typology effectively showing, for example 2. Job analysis / needs and satisfaction level of employees/
that “Catholics” may be “conservative” on “abortion” but remain motivation level of employees /job involvement etc.
“liberal” on “other human rights”. 3. Product testing / Feedback of after sales services
Bobbie warns us that typologies are useful as independent variables
References
(“religion” may be a good causal factor in “attitudes on abortion”)
but can be problematic as dependent variables (explaining the Donald R. Cooper – Business Research Methods, Tata McGraw –
“why” isn’t always clear). Catholics may be more anti-abortion Hill Publication
because the church has forbidden it but what of other groups? Kothari C R – Quantitative Techniques (Vikas Publishing House
You can get onto some very shaky ground using typologies as the 3rd ed.)
“effect” or dependent variable. Levin R I & Rubin DS - Statistics for Management (Prentice Hall
Example of Semantic Differential of India, 2002)
How would you describe Kmart, Target, and Wal-Mart on the
following scale?

© Copy Right: Rai University


78 11.556

Вам также может понравиться