Assessment Strategies and Tools

Assessment Strategies and Tools: Checklists, Rating Scales and
Rubrics
Checklists, rating scales and rubrics are tools that state specific criteria
and allow teachers and students to gather information and to make
judgments about what students know and can do in relation to the
outcomes. They offer systematic ways of collecting data about specific
behaviors, knowledge and skills.
The quality of information acquired through the use of checklists, rating

scales and rubrics is highly dependent on the quality of the descriptors
chosen for assessment. Their benefit is also dependent on students’
direct involvement in the assessment and understanding of the feedback
provided.
The purpose of checklists, rating scales and rubrics is to:
 provide tools for systematic recording of observations

 provide tools for self-assessment
 provide samples of criteria for students prior to collecting and

evaluating data on their work
record the development of specific skills, strategies, attitudes and
behaviours necessary for demonstrating learning
 Clarify students' instructional needs by presenting a record of
current accomplishments.
Tips for Developing Checklists, Rating Scales and Rubrics
1. Use checklists, rating scales and rubrics in relation to outcomes

and standards.
2. Use simple formats that can be understood by students and that
will communicate information about student learning to parents.
3. Ensure that the characteristics and descriptors listed are clear,
specific and observable.
4. Encourage students to assist with constructing appropriate criteria.
For example, what are the descriptors that demonstrate levels of
performance in problem solving?
5. Ensure that checklists, rating scales and rubrics are dated to track
progress over time.
6. Leave space to record anecdotal notes or comments.
7. Use generic templates that become familiar to students and to
which various descriptors can be added quickly, depending on the
outcome(s) being assessed.
8. Provide guidance to students to use and create their own checklists,
rating scales and rubrics for self-assessment purposes and as
guidelines for goal setting.
Checklists usually offer a yes/no format in relation to student

demonstration of specific criteria. This is similar to a light switch; the
light is either on or off. They may be used to record observations of an
individual, a group or a whole class.
Rating Scales allow teachers to indicate the degree or frequency of the

behaviors, skills and strategies displayed by the learner. To continue the
light switch analogy, a rating scale is like a dimmer switch that provides
for a range of performance levels. Rating scales state the criteria and
provide three or four response selections to describe the quality or
frequency of student work.
Teachers can use rating scales to record observations and students can
use them as self-assessment tools. Teaching students to use descriptive
words, such as always, usually, sometimes and never helps them
pinpoint specific strengths and needs. Rating scales also give students
information for setting goals and improving performance. In a rating
scale, the descriptive word is more important than the related number.
The more precise and descriptive the words for each scale point, the
more reliable the tool.
Effective rating scales use descriptors with clearly understood measures,
such as frequency. Scales that rely on subjective descriptors of quality,
such as fair, good or excellent, are less effective because the single
adjective does not contain enough information on what criteria are
indicated at each of these points on the scale.
Added value
Increase the assessment value of a checklist or rating scale by adding

two or three additional steps that give students an opportunity to identify
skills they would like to improve or the skill they feel is most important.
For example:
 put a star beside the skill you think is the most important for
encouraging others
 circle the skill you would most like to improve
 Underline the skill that is the most challenging for you.
Rubrics use a set of criteria to evaluate a student's performance. They

consist of a fixed measurement scale and detailed description of the
characteristics for each level of performance. These descriptions focus
on the quality of the product or performance and not the quantity; e.g.,
not number of paragraphs, examples to support an idea, spelling errors.
Rubrics are commonly used to evaluate student performance with the
intention of including the result in a grade for reporting purposes.
Rubrics can increase the consistency and reliability of scoring.
Rubrics use a set of specific criteria to evaluate student performance.

They may be used to assess individuals or groups and, as with rating
scales, may be compared over time.
Developing Rubrics and Scoring Criteria
Rubrics are increasingly recognized as a way to both effectively assess

student learning and communicate expectations directly, clearly and
concisely to students. The inclusion of rubrics in a teaching resource
provides opportunities to consider what demonstrations of learning look
like, and to describe stages in the development and growth of
knowledge, understandings and skills. To be most effective, rubrics
should allow students to see the progression of mastery in the
development of understandings and skills.
Rubrics should be constructed with input from students whenever

possible. A good start is to define what quality work looks like based on
the learning outcomes. Exemplars of achievement need to be used to
demonstrate to students what an excellent or acceptable performance is.
This provides a collection of quality work for students to use as
reference points. Once the standard is established, it is easy to define
what exemplary levels and less-than-satisfactory levels of performance
look like. The best rubrics have three to five descriptive levels to allow
for discrimination in the evaluation of the product or task. Rubrics may
be used for summative purposes to gauge marks by assigning a score to
each of the various levels.
When developing a rubric, consider the following:
 What are the specific outcomes in the task?

 Do the students have some experience with this or a similar task?
 What does an excellent performance look like? What are the
qualities that distinguish an excellent response from other levels?
 What do other responses along the performance quality continuum
look like?
 Is each description qualitatively different from the others? Are
there an equal number of descriptors at each level of quality? Are
the differences clear and understandable to students and others?
Begin by developing criteria to describe the Acceptable level. Then use

Bloom's taxonomy to identify differentiating criteria as you move up the
scale. The criteria should not go beyond the original performance task,
but reflect higher order thinking skills that students could demonstrate
within the parameters of the initial task.
When developing the scoring criteria and quality levels of a rubric,
consider the following guidelines.
 Level 4 is the Standard of excellence level. Descriptions should

indicate that all aspects of work exceed grade level expectations
and show exemplary performance or understanding. This is a
"Wow!"
 Level 3 is the Approaching standard of excellence level.
Descriptions should indicate some aspects of work that exceed
grade level expectations and demonstrate solid performance or
understanding. This is a "Yes!"
 Level 2 is the Meets acceptable standard. This level should
indicate minimal competencies acceptable to meet grade level
expectations. Performance and understanding are emerging or
developing but there are some errors and mastery is not thorough.
This is a "On the right track, but …".
 Level 1 Does not yet meet acceptable standard. This level
indicates what is not adequate for grade level expectations and
indicates that the student has serious errors, omissions or
misconceptions. This is a "No, but …". The teacher needs to make
decisions about appropriate intervention to help the student
improve.
Creating Rubrics with Students
Learning increases when students are actively involved in the

assessment process. Students do better when they know the goal, see
models and know how their performance compares to learning
outcomes.
Learning outcomes are clarified when students assist in describing the

criteria used to evaluate performance. Use brainstorming and discussion
to help students analyze what each level looks like. Use student-friendly
language and encourage students to identify descriptors that are
meaningful to them. For example, a Grade 3 class might describe levels
of quality with phrases such as the following.
 Super!
 Going beyond
 Meets the mark
 Needs more work.
Use work samples to help students practice and analyze specific criteria
for developing a critical elements list. They can also use samples to
practice assigning performance levels and compare criteria from level to
level.
Although rubrics are often used as assessment of learning tools, they can
also be used as assessment for learning tools. Students can benefit from
using rubrics as they become more competent at judging the quality of
their work and examining their own progress.
Example:
 Involve students in the assessment process by having them

participate in the creation of a rubric. This process facilitates a
deeper understanding of the intended outcomes and the associated
assessment criteria.
 After a rubric has been created, students can use it to guide their
learning. Criteria described in a rubric serve to focus student
reflection on their work and facilitate the setting of learning goals
for a particular performance assessment. Through self-assessment
or peer-assessment, students can use a rubric to assess work
completed to date and use it to guide their planning for the "next
steps" in learning.
Using Anecdotal Records in Classroom
Anecdotal notes are concise, objective narratives about an incident or person. In classrooms,
teachers can write anecdotal notes recording their observations of students – behaviors, skills,
attitudes, performance, and classroom incidents. Teachers can write, compile and use their
anecdotal notes on students as a documentation system.
Writing Anecdotal Notes:
Anecdotal notes must contain factual information about a significant event, behavior or learning
outcome. Here are some tips which can help teachers to write good anecdotal notes:
Pre-Observation plan: Teachers must decide in advance which specific behaviors and learning
outcomes they intend to observe and record. This helps teachers prepare and avoid confusion
while recording. Teachers can also decide when to observe to gain balanced profiles of their
students.
Content of anecdotal notes:
 Must be dated and include the name of student being observed.
 Should specify student strengths and positive traits.
 Can follow the ABC format for recording – Antecedent (why or
how), Behavior, Consequence of behavior and Context of incident.
 Can include teachers’ comments, plan for action and recommendation for further
observations.
 Can summarize identified learning patterns.
Time for writing: While in class, teachers can quickly write down any observations on sticky
notes (dated and named) and stick them in the specific student’s records. After class or when
time permits, teachers can refer their sticky notes and write properly formatted notes for the
record.
Setting and Maintaining Anecdotal Records
Teachers can use a three-ring binder for storing their anecdotal notes on students. At the
beginning of the binder teachers can keep:
 An index page with the names of all students and spaces for recording observation dates.
 A second page that includes the list of common behaviors and learning outcomes to be
observed. A similar sheet is may be used for each student with additional columns to record
the observation date for each point.
These sheets enable teachers to keep track of students who were observed, the frequency of
observations and to ensure that students were uniformly observed for pre-recorded behaviors and
outcomes.
Separate pages can be maintained for individual students. Teachers can maintain a standard
recording template which can include:
 Date of observation.
 A three-columned table to record events and behaviors (ABC format).
 Additional space or separate page for adding comments, recommendations and action plan.
Using this template, teachers can track students’ progress efficiently.
How are Anecdotal records useful?
Anecdotal records are useful as they:
 Are quick and easy to write.
 Require no additional training for teachers on how to document record sheets.
The notes help teachers:
 Record qualitative information about students.
 Identify students’ needs, behavior and learning patterns.
 Track progress and changes in students’ behavior and performance when generated over a
period of time.
 Plan for activities and strategies to use in classroom.
 Determine the efficiency of pedagogies in learning.
 Demonstrate students’ progress to parents at parent-teacher conferences.
Teachers can share their notes with students in one-on-one sessions:
 To give them feedback on behaviors and academic performance.
 To identify their areas of weakness and plan for interventions, thereby enabling students to
improve themselves.
This also helps to build and strengthen healthy student-teacher relationships.
The few disadvantages of anecdotal records are that they:
 Are not standardized.
 Accuracy of records depends on teacher’s memory and may be biased.
Nonetheless, anecdotal records are an informal documentation system, which, if implemented in
classrooms, simplifies documentation of student performance and facilitates easy sharing of
records between teachers, students and parents.
ABOUT ANECDOTAL RECORDS

What:
An anecdotal record is a short, objective, descriptive summary of one event or incident
writing down after the event has taken place. You often use anecdotes and telling your
friends a story about something that happened over the weekend or something cute or funny
your child did. A classroom anecdotal record differs a little bit in that the purpose is to learn
something specific about the child. This is a very relaxed method of recording observations.
The observer does not need any special forms to fill out, no particular setting, and no time
limitations. Anecdotal records are simply brief stories about something that happened.
Why:
Over time, a collection of anecdotal records provides a great deal of information about
a child. Like an investigator, the teacher can collect ongoing evidence of a child’s
development in a particular area. For instance, the teacher may jot down anecdotes about
how a child explores through her senses, creates with materials, displays leadership, etc…..
The teacher can then use these records to plan environments or curriculum or note the
curriculum that emerges through the children.
Who:
Anecdotal records focus on one child at a time and since they are written down later,
the observer can be a participant in the children’s activity.
Tips:
 Anecdotes are important to include in a child’s portfolio!

 Observe with an understanding of the developmental characteristics of the age-group you
are working with.
 Record significant happenings.
 Jot down brief notes while the activity is happening and fill in details as soon after the
even as possible.
 Date each anecdote and include the child’s age in year and months.
 Write in past tense.
 Be clear, objective, and concise.
 Put the developmental objective ( i.e. physical, creative, leadership, sensory, etc.) in
parentheses at the end of the anecdote.
 Organize your anecdotes into files for each child.
Sample Anecdotal Record:
12-14 AJ (2.2). While playing with a book that had buttons attached to it by strings and
corresponding circles for the buttons to fit into, AJ placed four buttons in circles and counted
in French, “Un, deux, troi, quatre”. (Cognitive- Math/ Language)
12/14 AJ (2.2). When AJ’s friend arrived at the house AJ shouted, “Yay, Maddy!” Then
grabbed Maddy’s hand and pulled her, running into the play room. (Social)
12/14 AJ (2.2). When asked not to touch the model train, AJ laid on her tummy and scooted
as close to the tracks as she could without touching. There she stayed for approximately 10
minutes watching the train go around (Following Directions
Attitude Scales - Rating Scales to measure data
Scaling Techniques for Measuring Data Gathered
from Respondents
The term scaling is applied to the attempts to measure the attitude objectively. Attitude is a
resultant of number of external and internal factors. Depending upon the attitude to be measured,
appropriate scales are designed. Scaling is a technique used for measuring qualitative responses
of respondents such as those related to their feelings, perception, likes, dislikes, interests and
preferences.
Types of Scales
Most frequently used Scales

1. Nominal Scale
2. Ordinal Scale
3. Interval Scale
4. Ratio Scale
Self Rating Scales

1. Graphic Rating Scale
2. Itemized Rating Scales
a. Likert Scale
b. Semantic Differential Scale
c. Stapel’s Scale
d. Multi Dimensional Scaling
e. Thurston Scales
f. Guttmann Scales/Scalogram Analysis
g. The Q Sort technique
Four types of scales are generally used for Marketing Research.
1. Nominal Scale
This is a very simple scale. It consists of assignment of facts/choices to various
alternative categories which are usually exhaustive as well mutually exclusive. These
scales are just numerical and are the least restrictive of all the scales. Instances of
Nominal Scale are - credit card numbers, bank account numbers, employee id numbers
etc. It is simple and widely used when relationship between two variables is to be studied.
In a Nominal Scale numbers are no more than labels and are used specifically to identify
different categories of responses. Following example illustrates -
What is your gender?
[ ] Male
[ ] Female
Another example is - a survey of retail stores done on two dimensions - way of

maintaining stocks and daily turnover.
How do you stock items at present?

[ ] By product category
[ ] At a centralized store
[ ] Department wise
[ ] Single warehouse
Daily turnover of consumer is?

[ ] Between 100 – 200
[ ] Between 200 – 300
[ ] Above 300
A two way classification can be made as follows
Daily/Stock
Product Department Centralized Single
Turnover
Category wise Store Warehouse
Method
100 – 200
200 – 300
Above 300
Mode is frequently used for response category.
2. Ordinal Scale
Ordinal scales are the simplest attitude measuring scale used in Marketing Research. It is
more powerful than a nominal scale in that the numbers possess the property of rank
order. The ranking of certain product attributes/benefits as deemed important by the
respondents is obtained through the scale.
Example 1: Rank the following attributes (1 - 5), on their importance in a microwave
oven.
1. Company Name
2. Functions
3. Price
4. Comfort
5. Design
The most important attribute is ranked 1 by the respondents and the least important is
ranked 5. Instead of numbers, letters or symbols too can be used to rate in a ordinal scale.
Such scale makes no attempt to measure the degree of favorability of different rankings.
Example 2 - If there are 4 different types of fertilizers and if they are ordered on the basis
of quality as Grade A, Grade B, Grade C, Grade D is again an Ordinal Scale.
Example 3 - If there are 5 different brands of Talcom Powder and if a respondent ranks
them based on say, “Freshness” into Rank 1 having maximum Freshness Rank 2 the
second maximum Freshness, and so on, an Ordinal Scale results.
Median and mode are meaningful for ordinal scale.
3. Interval Scale
Herein the distance between the various categories unlike in Nominal, or numbers unlike
in Ordinal, are equal in case of Interval Scales. The Interval Scales are also termed as
Rating Scales. An Interval Scale has an arbitrary Zero point with further numbers placed
at equal intervals. A very good example of Interval Scale is a Thermometer.
Illustration 1 - How do you rate your present refrigerator for the following qualities.
Company Less Well

1 2 3 4 5
Name Known Known
Functions Few 1 2 3 4 5 Many
Price Low 1 2 3 4 5 High
Design Poor 1 2 3 4 5 Good
Very
Overall Very
Dis- 1 2 3 4 5
Satisfaction Satisfied
Satisfied
Such a scale permits the researcher to say that position 5 on the scale is above position 4
and also the distance from 5 to 4 is same as distance from 4 to 3. Such a scale however
does not permit conclusion that position 4 is twice as strong as position 2 because no zero
position has been established. The data obtained from the Interval Scale can be used to
calculate the Mean scores of each attributes over all respondents. The Standard Deviation
(a measure of dispersion) can also be calculated.
4. Ratio Scale
Ratio Scales are not widely used in Marketing Research unless a base item is made
available for comparison. In the above example of Interval scale, a score of 4 in one
quality does not necessarily mean that the respondent is twice more satisfied than the
respondent who marks 2 on the scale. A Ratio scale has a natural zero point and further
numbers are placed at equally appearing intervals. For example scales for measuring
physical quantities like - length, weight, etc.
The ratio scales are very common in physical scenarios. Quantified responses forming a
ratio scale analytically are the most versatile. Rati scale possess all he characteristics of
an internal scale, and the ratios of the numbers on these scales have meaningful
interpretations. Data on certain demographic or descriptive attributes, if they are obtained
through open-ended questions, will have ratio-scale properties. Consider the following
questions:
Q 1) what is your annual income before taxes? ______ $

Q 2) How far is the Theater from your home? ______ miles
Answers to these questions have a natural, unambiguous starting point, namely zero.
Since starting point is not chosen arbitrarily, computing and interpreting ratio makes
sense. For example we can say that a respondent with an annual income of $ 40,000 earns
twice as much as one with an annual income of $ 20,000.
Self Rating Scales

1. Graphic Rating Scale
The respondents rate the objects by placing a mark at the appropriate position on a line
that runs from one extreme of the criterion variable to another. Example
0 1 5 7
(poor quality) (bad quality) (neither good nor bad) (good quality)
BRAND 1
This is also known as continuous rating scale. The customer can occupy any position.
Here one attribute is taken ex-quality of any brand of icecream.
poor good
BRAND 2
This line can be vertical or horizontal and scale points may be provided. No other
indication is there on the continuous scale. A range is provided. To quantify the responses
to question that “indicate your overall opinion about ice-ream Brand 2 by placing a tick
mark at appropriate position on the line”, we measure the physical distance between the
left extreme position and the response position on the line.; the greater the distance, the
more favourable is the response or attitude towards the brand.
Its limitation is that coding and analysis will require substantial amount of time, since we
first have to measure the physical distances on the scale for each respondent.
2. Itemized Rating Scales

These scales are different from continuous rating scales. They have a number of brief
descriptions associated with each category. They are widely used in Marketing Research.
They essentially take the form of the multiple category questions. The most common are
- Likert, Sementic, Staple and Multiple Dimension. Others are - Thurston and Guttman.
a. Likert Scale
It was developed Rensis Likert. Here the respondents are asked to indicate a
degree of agreement and disagreement with each of a series of statement. Each
scale item has 5 response categories ranging from strongly agree and strongly
disagree.
5 4 3 2 1
Strongly agree Agree Indifferent Disagree Strongly disagree
Each statement is assigned a numerical score ranging from 1 to 5. It can also be

scaled as -2 to +2.
-2 -1 0 1 2
For example quality of Mother Diary ice-cream is poor then Not Good is a
negative statement and Strongly Agree with this means the quality is not good.
Each degree of agreement is given a numerical score and the respondents total
score is computed by summing these scores. This total score of respondent reveals
the particular opinion of a person.
Likert Scale is of ordinal type, they enable one to rank attitudes, but not to
measure the difference between attitudes. They take about the same amount of
efforts to create as Thurston scale and are considered more discriminating and
reliable because of the larger range of responses typically given in Likert scale.
A typical Likert scale has 20 - 30 statements. While designing a good Likert

Scale, first a large pool of statements relevant to the measurement of attitude has
to be generated and then from the pool statements, the statements which are vague
and non-discriminating have to be eliminated.
Thus, likert scale is a five point scale ranging from ’strongly agreement’ to
’strongly disagreement’. No judging gap is involved in this method.
b. Semantic Differential Scale

This is a seven point scale and the end points of the scale are associated with
bipolar labels.
1 7
Unpleasant 2 3 4 5 6 Pleasant
Submissive Dominant
Suppose we want to know personality of a particular person. We have options-
1. Unpleasant/Submissive
2. Pleasant/Dominant
Bi-polar means two opposite streams. Individual can score between 1 to 7 or -3 to

3. On the basis of these responses profiles are made. We can analyse for two or
three products and by joining these profiles we get profile analysis. It could take
any shape depending on the number of variables.
Profile Analysis
---------------/---------------
----------/--------------------
--------/----------------------
Mean and median are used for comparison. This scale helps to determine overall
similarities and differences among objects.
When Semantic Differential Scale is used to develop an image profile, it provides

a good basis for comparing images of two or more items. The big advantage of
this scale is its simplicity, while producing results compared with those of the
more complex scaling methods. The method is easy and fast to administer, but it
is also sensitive to small differences in attitude, highly versatile, reliable and
generally valid.
c. Stapel’s Scale
It was developed by Jan Stapel. This scale has some distinctive features:-
i.Each item has only one word/phrase indicating the dimension it represents.
ii. Each item has ten response categories.
iii. Each item has an even number of categories.
iv. The response categories have numerical labels but no verbal labels.
For example, in the following items, suppose for quality of ice cream, we ask
respondents to rank from +5 to -5. Select a plus number for words which best
describe the ice cream accurately. Select a minus number for words you think do
not describe the ice cream quality accurately. Thus, we can select any number
from +5,for words we think are very accurate, to -5,for words we think are very
inaccurate. This scale is usually presented vertically.
+5
+4
+3
+2
+1
High Quality
-1
-2
-3
-4
-5
This is a unipolar rating scale.
d. Multi Dimensional Scaling

It consists of a group of analytical techniques which are used to study consumer
attitudes related to perceptions and preferences. It is used to study-
.The major attributes of a given class of products perceived by the consumers in considering the
product and by which they compare the different ranks.
I. To study which brand competes most directly with each other.
II. To find out whether the consumers would like a new brand with a
combination of characteristics not found in the market.
III. What would be the consumer’s ideal combination of product attributes.
IV. What sales and advertising messages are compatible with consumers
brand perceptions.
It is a computer based technique. The respondents are asked to place the various
brands into different groups like similar, very similar, not similar, and so on. A
goodness of fit is traded off on a large number of attributes. Then a lack of fit
index is calculated by computer program. The purpose is to find a reasonably
small number of dimensions which will eliminate most of the stress. After the
configuration for the consumer’s preference has been developed, the next step is
to determine the preference with regards to the product under study. These
techniques attempt to identify the product attributes that are important to
consumers and to measure their relative importance.
This scaling involves a unrealistic assumption that a consumer who compares

different brands would perceive the differences on the basis of only one attribute.
For example, what are the attributes for joining M.Com course. The responses
may be -to do PG, to go into teaching line, to get knowledge, appearing in the
NET. There are a number of attributes, you cannot base decision on one attribute
only. Therefore, when the consumers are choosing between brands, they base
their decision on various attributes. In practice, the perceptions of the consumers
involve different attributes and any one consumer perceives each brand as a
composite of a number of different attributes. This is a shortcoming of this scale.
Whenever we choose from a number of alternatives, go for multidimensional
scaling. There are many possible uses of such scaling like in market segmentation,
product life cycle, vendor evaluations and advertising media selection.
The limitation of this scale is that it is difficult to clearly define the concept of
similarities and preferences. Further the distances between the items are seen as
different
e. Thurston Scales
These are also known as equal appearing interval scales. They are used to
measure the attitude towards a given concept or construct. For this purpose a large
number of statements are collected that relate to the concept or construct being
measured. The judges rate these statements along an 11 category scale in which
each category expresses a different degree of favourableness towards the concept.
The items are then ranked according to the mean or median ratings assigned by
the judges and are used to construct questionnaire of twenty to thirty items that
are chosen more or less evenly across the range of ratings.
The statements are worded in such a way so that a person can agree or disagree
with them. The scale is then administered to assemble of respondents whose
scores are determined by computing the mean or median value of the items agreed
with. A person who disagrees with all the items has a score of zero. So, the
advantage of this scale is that it is an interval measurement scale. But it is the time
consuming method and labour intensive. They are commonly used in psychology
and education research.
f. Guttman Scales/Scalogram Analysis

It is based on the idea that items can be arranged along a continuem in such a way
that a person who agrees with an item or finds an item acceptable will also agree
with or find acceptable all other items expressing a less extreme position. For
example - Children should not be allowed to watch indecent programmes or
government should ban these programmes or they are not allowed to air on the
television. They all are related to one aspect.
In this scale each score represents a unique set of responses and therefore the total
score of every individual is obtained. This scale takes a lot of time and effort in
development.
They are very commonly used in political science, anthropology, public opinion,
research and psychology.
g. The Q Sort technique

It is used to discriminate among large number of objects quickly. It uses a rank
order procedure and the objects are sorted into piles based on similarity with
respect to some criteria. The number of objects to be sorted should be between
60-140 approximately. For example, here we are taking nine brands. On the basis
of taste we classify the brands into tasty, moderate and non tasty.
We can classify on the basis of price also-Low, medium, high. Then we can attain
the perception of people that whether they prefer low priced brand, high or
moderate. We can classify sixty brands or pile it into three piles. So the number of
objects is to be placed in three piles-low, medium or high.
Thus, the Q-sort technique is an attempt to classify subjects in terms of their

similarity to attribute under study.
Measurement & Measurement Scales
scoreprint all
 Contents
 Prev
 Part 1
 | Part 2
 | Page 3
 | Part 4
 | Part 5
 | Part 6
 | Part 7
Attitudes, Behaviors, and Rating Scales
Researchers are interested in people's attitudes. An attitudes is a psychological construct. It is a

person's predisposition to respond favorably or unfavorably to activities, people, events, and objects.
Attitudes are often considered precursors to behavior.
Attitudes have three components:
1) Affective, which deals with a person's feelings and emotions

2) Cognitive, which deals with a person's awareness and knowledge
3) Behavioral, which deals with a person's actions
Researchers have developed a variety of attitude rating scales to measure the intensity of an attitude's
affective, cognitive, and behavioral components. These scales may require a respondent to rank, rate,
sort, and choose when we assess an attitude.
Scaling refers to the process of assigning numbers or symbols to measure the intensity of abstract
attitudes. Scales can be uni-dimensional or multi-dimensional. Uni-dimensional scales measure a
single aspect or dimension of an attitude. Multi-dimensional scales measures more than one
dimension of an attitude.
Ranking: Ranking is a measurement that asks respondents to rank a small number of items on some
characteristic. Respondents might be asked to rank their favorite hot breakfast beverage: Hot
Chocolate, Tea, Coffee, or Herbal Tea. Ranking delivers an ordinal score.
Rating: Rating asks respondents the extent to which an item of interest possesses a characteristic.
Scales that requires respondents to rank an item result in a quantitative score.
Sorting: Sorting is a measurement task that asks respondents to sort several items into categories.
Choice: Choice is a measurement task that requires respondents to select among two or more
alternatives.
Category Scales: Category scales are the simplest type of rating scale. They contain only two
choices: yes/no or agree/disagree.
Example:
I approve of the Affordable Care Act or Obama Care.
We can expand the number of response categories to give respondents greater flexibility in rating the
item of interest.
Example:
How often to you this positively about the Affordable Care Act or Obama Care?
Category scales can deal with a wide variety of issues: Quality, Importance, Interest, Satisfaction,
Frequency, Truth, and Uniqueness.
Graphic Rating Scales: Graphic ratings scales include a graphic continuum anchored between two
extremes. When used for online surveys, graphic rating scales may have a "slider," which
respondents can move up or down the scale. Sliders allow respondents to make finely tuned
responses using a continuous scale.
Source: http://www.iperceptions.com/en/blog/2013/august/3-easy-steps-to-build-a-great-survey
Graphic rating scales are easy to create. Researchers must be careful about using overly extreme
anchors, which tend to push responses toward the center of the scale. Graphic rating scales are
frequently used when conducting research among children. Graphic rating scales are considered non-
comparative scales because respondents make their judgments without making comparisons to other
objects, concepts, people, or brands.
Eating a Happy Meal at McDonald's make me feel:
Itemized Rating Scales: Itemized rating scales require respondents to select from a limited number
of ordered alternatives. These scales are easy to construct, but they do not allow the respondent to
make the fine distinctions of a graphic rating scale using a slider.
Example:
How likely are you to use an open-source textbook in the courses you teach?
Graphic rating scales and itemized rating scales ask respondents about a single concept in isolation.
Such scales are often called monadic rating scales.
Rank-Order Scales: Unlike graphic rating scales and itemized rating scales, rank-order scales are
comparative scales. Responses rank the objects, concepts, people, or brands by comparing them to
similar alternatives.
Example:
Rank the following smart phones with one being the brand that best meets the characteristic and
six being the brand that is the worst on the characteristic.
Rank-order scales have the following disadvantages: First, if the alternative choice is missing, the
respondent's answer could be misleading. In the question above, the Blackberry 10 is not listed. If
that is the respondent's choice, the answer to this question might not reflect his or her real attitude.
Second, the answers provided are on an ordinal scale. We will not have the "distance" between the
ranks. Third, the question does not offer information as to why the respondent choose the order he or
she selected.
Paired Comparisons: Paired comparisons is a measurement scale that asks respondents to select one
of two alternatives.
Example:
Listed below are some of the characteristics of a McDonald's Big Mac and a Burger King
Whopper. Select the answer that best matches your opinion.
Which of the two brands tastes better?
Which of the two brands is healthier?
Which of the two brands is a better value for the money?
Paired comparisons overcome some of the problems of rank-order scales. First, it is easier for
respondents to select one item from a choice of two than to rank a larger set of objects, concepts,
people, or brands. The question of order bias—bias caused by how the objects, concepts, people, or
brands are ordered—is removed. But, the number of pairs to be compared should be kept to a
minimum to avoid tiring respondents.
Constant Sum Scales: Constant sum scales require respondents to divide a set number of points,
usually 100, to rate two or more attributes. The problem with constant sum scales is that respondents
find it difficult to allocate points especially if there are a lot of attributes to be measured.
Example:
Below are five attributes of the iPhone 6 Plus, Please allocate 100 points to these attributes so that
they reflect the importance of each attribute. Please make certain that the total number of points
adds up to 100.
Semantic Differential Scales: Semantic differential scales measure respondents' attitudes about the
strengths and weaknesses of a concept or construct. With this scale, researchers select a pair of
dichotomous adjectives to describe the concept under investigation. Typically researchers use a scale
from 1 through 7. The mean of each pair is calculated and then plotted on the table.
Example:
Below is a list of characteristics of Kmart stores. For each pair of adjectives, place an "X" at the
point that you believe best reflects your experience at Kmart.
Semantic Differential Scale Summary Chart
The semantic differential scale is widely used in marketing research because studies have repeatedly
shown that this scale is an efficient way to examine the differences in image attributes among a
variety of brands or companies. But, semantic differential scales are not without shortcomings. First
there are no general scales. Researchers must develop valid and reliable adjective scales for each
research project. Researchers should also watch for a "halo" effect, which will bias a respondent's
answers. The halo effect is when a respondent's overall impression overwhelms his or her views on a
single adjective pair. To counteract the halo effect, researchers never place all of the positive
adjectives on the same side of the scale.
Stapel Scale: The Stapel Scale is a uni-polar scale that requires respondents to rate a concept on a
scale of negative 5 to positive 5 on how closely an adjective at the center of the scale represents the
concept. The chief advantage of the Stapel Scale is that the researcher does not have to spend the
time and energy to creating bipolar pairs.
Example:
Select the appropriate plus number for the phrase that best represents attributes of the iPhone 6. If
the phrase does not represent the iPhone 6, select the appropriate negative number that reflects
your attitude.
Likert Scale: The Likert scale allows respondents to state how strongly they agree or disagree with
an attitude. The scale is named after Rensis Likert, who developed this scale in 1932 for his doctoral
dissertation. Likert is pronounced "Lick-ert," not "Like-urt."
Although the Likert scale is typically a five-point scale that ranges from "strongly disagree" to
neutral to "strongly agree." It is not uncommon to see a six-point or seven-point variant. A six-point
Likert scale has three levels of disagreement and three levels of agreement with no neutral point. The
seven-point Likert scale adds a neutral point.
Example:
McDonald's Happy Meals are good value for the money.

My children like eating McDonald's Happy Meals.
Researchers disagree on whether the Likert Scale is an ordinal or interval scale. Those who argue that
it is an ordinal scale say the intervals between the five-points of the scale are unknowable. Those
who argue that it is an interval scale score "Strongly Disagree" as a 1, "Disagree" as a 2, "Neutral" as
a 3, "Agree" as a 4, and "Strongly Agree" as a 5.
Closely related to the Likert Scale is a Purchase Intent scale. The disagreement and agreement
statements are replaced with answers that reflect a respondent's intent to purchase a product.
Example:
After owning a Chevrolet Impala for three years, how likely are you to purchase a new Chevrolet
Impala?
A five-point purchase intent scale is widely used in new product development and advertising testing.
Things to consider when selecting scales:
First consider the objectives of the research and whether the selected scales will help achieve the
research objectives. Typically researchers conduct qualitative research before designing the scales.
Qualitative research is used to help the researcher gain a deeper understanding of the constructs
under investigation.
Using qualitative research helps the researcher select the scales and craft how the scales will be
written. Once the scales are written, the researcher will pre-test the survey to make certain it works as
expected.
An important question to consider in developing of scales is how the survey will be administered: By
an interviewer, self-administered by the respondent on the Internet, self-administered by the
respondent using a survey delivered through the mail.
Creating a scale typically involves eight steps.[1]
Step 1: Clarify what is to be measured.
Step 2: Select scale formats (Likert, Stapel, Semantic Differential, etc.). Researchers typically
restrict themselves to a limited number of scale formats.
Step 3: Generate a pool of items that will be used to measure the concept or construct.
Step 4: Have others critique the pool of items.
Step 5: Consider adding items that will provide a check on internal consistency. For example, in
non-adjacent places ask the respondent's age and birth date.
Step 6: Pre-test the instrument. This is a critical step because it helps researchers learn if
respondents are misinterpreting questions.
Step 7: Drop redundant items.
Step 8: Optimize the scale, which involves consideration of reliability and the length of the
instrument.
Another consideration: How long does the researcher have to develop the scales. Rank-order scales
can be developed quickly while developing a semantic differential scale can take a long time.
Balanced versus Unbalanced Scales: Researchers must decide whether to employ balanced or
unbalanced scales. Balanced scales have an equal number of positive and negative categories while
unbalanced scales do not. Unbalanced scales are often used when pilot studies suggest that more
opinions are positive than negative, or more negative than positive. In these cases, unbalanced scales
will give the researcher a more nuanced view of respondents' attitudes.
Forced versus Non-Forced Choice: Sometimes researchers will add a "do not know" category to
the range of possible answers. When they are concerned that respondents with limited knowledge
will tend to answer with a "neutral" option, if available. Some researchers avoid using a "do not
know" answer out of fear that lazy respondents will often check this answer without much reflection.
The argument for "forcing" respondents to answer a question is that it makes them think about their
feelings and attitudes. The argument against "forcing" an answer is that respondents will give a
"false" answer, or they may refuse to answer the question.
What is the Guttman Scale?

In the social sciences, the Guttman or “cumulative” scale measures how much of a positive or
negative attitude a person has towards a particular topic.
The Guttman scale is one of the three major types of unidimensional measurement scales. The
other two are the Likert Scale and the Thurstone Scale. A unidimensional measurement scale has
only one (“uni”) dimension. In other words, it can be represented by a number range, like 0 to
100 lbs or “Depressed from a scale of 1 to 10”. By giving the test, a numerical value can be
placed on a topic or factor.
The scale has YES/NO answers to a set of questions that increase in specificity. The idea is that a
person will get to a certain point and then stop. For example, on a 5-point quiz, if a person gets to
question 3 and then stops, it implies they do not agree with questions 4 and 5. If one person stops
at 3, another at 1, and another at 5, the three people can be ranked along a continuum.
Examples
The scale is designed to measure one factor or subject. For example, the following shows a
questionnaire for a person’s attitudes towards depression:
Sometimes, sensitive topics are concealed within other questions to disguise the intent of the
questionnaire. For example, this one quizzes for possible gaming addiction:
One disadvantage of the Guttman scale is that respondents may feel overly committed to
questions; They may continue to answer YES beyond the point where they should have stopped.
Using the concealed questionnaire helps to avoid this issue.
Use in Education
In the social sciences, the Guttman scale is often used to measure an increasing amount of
“attitude” towards a single topic. In education, it’s sometimes used to show a student’s logical
progression through coursework. For example, the expected progression through math topics for
3 children. It’s expected that a child does well in fractions before they are able to grasp algebra.
A “0” means that the student hasn’t mastered a topic, while a “1” means that a
student has mastered it:
In practice, it’s rare to find data that fits perfectly to a Guttman scale. More often than not,
you’re actually testing more than one factor. For example, algebra may need good reading skills
as well as the ability to solve an equation. If you know (or suspect) that your instrument is
measuring two or more factors, use multidimensional scaling or Correspondence Analysis to
analyze your results.
Semantic Differential Scale – Definition & Examples To Get Started
Let’s talk about the semantic differential scale. If you’ve been following us
from the beginning, you already know how important soliciting customer
feedback is when working to improve the quality of your company’s services.
And, you probably know that surveys are a great way to collect that feedback.
But… the feedback you get from these outreach initiatives is only valuable if
you take the time to truly put yourself in your customer’s shoes.
Unfortunately, this advice has been repeated so often that it’s pretty much
lost all meaning. So let’s be very clear about what we actually mean when we
say it:
When collecting and analyzing customer feedback, it’s essential that you not
only consider the responses they provide, but what these responses actually
mean.
One of the most effective ways to discover this meaning is by using the
semantic differential scale.
The goal of the semantic differential scale is to gauge a person’s attitude

toward a specific subject. This allows you to understand the value that person
places on the subject at hand while also giving them the opportunity to report
on how your services lived up to their expectations.
That might sound a bit complicated, so let’s make it a bit clearer by

comparing the semantic differential scale to another survey type we’ve
discussed in the past: the Likert scale.
Semantic Differential Scale vs. Likert Scale: What’s The Difference?
At first glance, semantic differential scales and Likert scales might seem
quite similar:
 Both ask respondents to report on a certain part of their experience by

choosing an answer from a list of possible options (usually about five)
 Both can be supplemented with further explanation if the customer chooses to
do so
 Both are used to gain a better understanding of what customers do or don’t
like about your product, service, or brand as a whole
The differences between the two question types lie in how the questions are
asked and which information the surveyor hopes to get from the customers’
responses.
Wording
On the surface, semantic differential scales differ from Likert scales in the
way in which questions are asked.
Consider the following examples.
First, a Likert scale question:
Next, a semantic differential scale question:
The obvious difference between the two is the Likert scale question asked the
customer to agree or disagree with a given statement, while the semantic
differential scale question asked the customer to complete a statement,
offering two polarized options along with some middle-of-the-road options.
Of course, both questions serve the same relative purpose: to determine how
a customer would rate the checkout process of a specific store. But the way in
which the question is asked (as well as how the answer choices are listed) can
make a huge difference in how customers respond.
In the Likert scale example, the customer is given a statement that he must
agree or disagree with (or respond neutrally to). But think about it: if
absolutely anything about the checkout process was confusing, they simply
can’t truthfully say they “strongly agree” that the process was straightforward
(even if the problem they experienced was relatively minor).
In contrast, the semantic differential scale offers two descriptors as choices

(along with “in the middle” options, as well): straightforwardand confusing.
In the case that a part of the process was slightly confusing, the customer
doesn’t necessarily need to “strongly disagree” that the process was
straightforward – they can simply skew their answer a bit to the right.
The way in which survey questions are worded can also distort responses due
to the power of suggestion.
Consider the first example of the Likert scale-based question. Again, unless
something went majorly wrong while a customer was checking out, they’ll
probably agree the process was straightforward.
But, if the question instead said “The checkout process was confusing” and
provided the same agree/disagree options, any seemingly minor incident
during checkout would immediately come to mind, causing customers to
“strongly agree” that the process was confusing.
The semantic differential scale example, on the other hand, provides polar
opposite choices, allowing the customer to determine the degree to which the
checkout process was straightforward or confusing.
(Again, it’s worth noting that you can elicit more in-depth responses
from both types of surveys by including space for customers to expand upon
their responses.)
Perspective
Semantic differential scale questions empower respondents to make their

voices heard and their unique perspectives understood – much more so than
Likert scale questions.
This is because the way in which Likert scale questions are worded
unintentionally assume each respondent is “coming from the same place,”
while semantic differential scale questions operate under the assumption that
each respondent will understand the question differently – and thus answer
according to his own understanding.
That was a bit convoluted, so let’s take a break to look at this picture of a
sunset:
Don’t worry – there’s a point to this…
Take a good, long look at that photograph.
Now, complete the following Likert scale-like question about it:
The above photograph is beautiful. Disagree / No opinion / Agree
Now, do the same for this semantic differential scale-based question:

The above photograph is: Ugly ____ ____ ____ ____ ____ Beautiful
When I asked the Likert scale-based question, I defined the photograph as

beautiful, and am simply asking whether or not you agree with me.
When I asked the semantic differential scale-based question, I kept my

opinion to myself, and let you decide how to define the picture.
Semantic differential scale surveys allow your customers to define the value
of a specific factor on their own. In turn, you ensure the responses they
provide come from their own feelings and attitude, and have not been
influenced by any outside factors.
When to Use Semantic Differential Scale Surveys
Though semantic differential scale surveys can be used for a variety of

means, they’re most valuable when you need to:
 Understand the weight of a specific aspect of your service (in terms of

customer satisfaction)
 Gain insight regarding your customer’s attitudes, needs and goals
These two points essentially go hand in hand: the aspects of your service your
customers find most valuable are those which will best help them achieve
their goals.
In terms of what this means for your business, semantic differential scale
surveys can help you not only pinpoint your company’s strengths and
weaknesses, but it will help you determine which of these aspects to focus on
improving in the future.
Let’s again go back to our first example:
Of course, this is only one question out of possibly a dozen or so you might
as your customers to respond to. Let’s assume, then, that their answer skews
toward “confusing,” but their responses to most other questions (including
one about their overall experience) are generally positive. This would tell you
that, to this customer, the fact that the checkout process was a bit confusing
wasn’t a deal-breaker, and had little to no impact on their experience with
your store.
On the other hand, if most of their other responses were generally positive,
but they reported their overall experience to be surprisingly negative, you’d
be able to deduce that a streamlined checkout process is part of their
expectations when doing business with your company – and that you should
focus on improving this aspect of your service immediately.
As mentioned earlier, it takes time and effort to assess semantic differential

scale survey responses, as the goal of doing so is to deduce qualitative
information from quantitative data.
Rather than simply collecting data and taking it at face value, you should
always take the time to understand why a customer responded a certain way –
and what it means for your company moving forward.
Now that we understand the best use cases for semantic differential scale
surveys, let’s take a look at the advantages and disadvantages of using them.
Pros and Cons of Semantic Differential Scale Surveys
We’ve discussed semantic differential scale surveys in a pretty positive light,

so let’s summarize the benefits of using them in customer satisfaction
surveys.
Most importantly, semantic differential scale questions are inherently user-

facing, and their answers user-defined. As mentioned earlier, a customer’s
response to a semantic differential scale question is their personal response.
Barring the inclusion of a narrative explanation of their responses, your

customers’ responses to semantic differential scale questions are one of your
most valuable assets in terms of truly getting to know your customers.
Furthermore, because semantic differential scale questions allow the
customer to define their answers, there’s less chance that they’ll unwittingly
misrepresent themselves, or that you (the surveyor) will misunderstand their
answer.
Consider the picture of the sunset from above.
If you were given the Likert scale question and responded that you “disagree”
that the picture is beautiful, that doesn’t necessarily mean you think it’s
hideous. Maybe you have a high standard for what you consider beautiful, or
maybe you don’t find nature all that appealing.
The only way a surveyor could know for sure what you think of the picture
based on your answer to that question is if you answered in the affirmative.
On the other hand, the semantic differential scale question allows you to
report exactly what you think: you either think the picture is beautiful or ugly
– or you’re completely indifferent to it. Regardless of your response, the
surveyor will understand your answer completely.
Of course, there are some disadvantages to using the semantic differential

scale, as well.
Because the goal of using this type of questioning is to gain a more intimate
understanding of your customers’ attitudes and goals. Because of this, you’ll
need to go beyond simply scoring their responses and looking at the
numerical data.
To get the most out of this data, you need to view their survey as a story – not
isolated numbers. This will take time, effort, and other resources – which you
may or may not have at the moment, depending on where your company
currently stands.
Now, I know I just said that semantic differential scales are generally less
convoluted or confusing than Likert scales. But there are two ways in which
they can be:
 When they provide too many response options
 When they provide too few response options
On the one hand, too many options may overwhelm your customer. Imagine
if, for example, the question about their checkout experience had ten different
“middle ground” options instead of three. Now, imagine every question on
the survey was like that. Safe to say, not many customers would take the time
to dissect their shopping experience to that degree of certainty.
On the other hand, too few response options limit your customers’ responses
– which is exactly what a semantic differential scale attempts to avoid in the
first place.
Using the same example, imagine if there were only two options: “confusing”
and “straightforward.” In this instance, it’s easy to imagine individuals
getting caught in a thought loop: “Well, it wasn’t confusing, but there was a
problem at one point…but I definitely knew what I needed to do…but still, it
wasn’t exactly straightforward, either…” In turn, whichever answer they end
up choosing is almost certainly not 100% accurate.
Conclusion
Semantic differential scale surveys can be a powerful tool to help you truly
understand your customers not as personas, but as unique individuals with
their own attitudes, goals, and needs.
Though collecting and analyzing the data gleaned from these surveys take a
little more time and effort than Likert scales and other customer satisfaction
surveys, the results will help you focus on improving the aspects of your
service that your customers have defined as most important to them.
Finally, if you need a simple way to create beautiful and insightful surveys
which you can send to your customers, Fieldboom can help.

Assessment Strategies and Tools

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Assessment Strategies and Tools

Загружено:

Авторское право:

Доступные форматы

Assessment Strategies and Tools: Checklists, Rating Scales and

The quality of information acquired through the use of checklists, rating

The purpose of checklists, rating scales and rubrics is to:

 provide tools for systematic recording of observations

 provide samples of criteria for students prior to collecting and

1. Use checklists, rating scales and rubrics in relation to outcomes

Checklists usually offer a yes/no format in relation to student

Rating Scales allow teachers to indicate the degree or frequency of the

Increase the assessment value of a checklist or rating scale by adding

Rubrics use a set of criteria to evaluate a student's performance. They

Rubrics use a set of specific criteria to evaluate student performance.

Developing Rubrics and Scoring Criteria

Rubrics are increasingly recognized as a way to both effectively assess

Rubrics should be constructed with input from students whenever

When developing a rubric, consider the following:

 What are the specific outcomes in the task?

Begin by developing criteria to describe the Acceptable level. Then use

 Level 4 is the Standard of excellence level. Descriptions should

Creating Rubrics with Students

Learning increases when students are actively involved in the

Learning outcomes are clarified when students assist in describing the

 Involve students in the assessment process by having them

ABOUT ANECDOTAL RECORDS

 Anecdotes are important to include in a child’s portfolio!

Sample Anecdotal Record:

Most frequently used Scales

Self Rating Scales

Four types of scales are generally used for Marketing Research.

Another example is - a survey of retail stores done on two dimensions - way of

How do you stock items at present?

Daily turnover of consumer is?

A two way classification can be made as follows

Mode is frequently used for response category.

Median and mode are meaningful for ordinal scale.

Company Less Well

Functions Few 1 2 3 4 5 Many

Price Low 1 2 3 4 5 High

Design Poor 1 2 3 4 5 Good

Q 1) what is your annual income before taxes? ______ $

Self Rating Scales

2. Itemized Rating Scales

Each statement is assigned a numerical score ranging from 1 to 5. It can also be

A typical Likert scale has 20 - 30 statements. While designing a good Likert

b. Semantic Differential Scale

Suppose we want to know personality of a particular person. We have options-

Bi-polar means two opposite streams. Individual can score between 1 to 7 or -3 to

When Semantic Differential Scale is used to develop an image profile, it provides

This is a unipolar rating scale.

d. Multi Dimensional Scaling

This scaling involves a unrealistic assumption that a consumer who compares

f. Guttman Scales/Scalogram Analysis

g. The Q Sort technique

Thus, the Q-sort technique is an attempt to classify subjects in terms of their

Attitudes, Behaviors, and Rating Scales

Researchers are interested in people's attitudes. An attitudes is a psychological construct. It is a

Attitudes have three components:

1) Affective, which deals with a person's feelings and emotions

3) Behavioral, which deals with a person's actions

I approve of the Affordable Care Act or Obama Care.

Eating a Happy Meal at McDonald's make me feel:

Which of the two brands tastes better?

Which of the two brands is healthier?

Which of the two brands is a better value for the money?