Features of the holistic rating scale used in this study.


the main idea was indicated, but not clearly

the essay was not so well organized

the vocabulary choice was fair

some major grammatical errors

Features of the analytic rating scale used in this study.

Raters: Three trained native speakers of English.
Items:5 criteria were rated: (1) Originality of Content, (2) Organization, (3) Vocabulary, (4) Grammar, (5) Cohesion & Logical Consistency
Rating scale: A four-point scale (1, 2, 3, 4) was used with the following criteria:
Originality of Content

4 points: interesting ideas

were stated clearly

3 points: interesting ideas

were stated fairly clearly

2 points: ideas somewhat


4 points: well organized

3 points: fairly well


2 points: loosely


4 points: very effective

choice of words

3 points: effective choice

of words

2 points: fairly good


Cohesion & Logical


4 points: almost no

3 points: few minor


2 points: some errors

4 points: sentences
logically combined

3 points: sentences fairly

logically combined

2 points: sentences



1 point: ideas not clear

1 point: ideas


1 point: limited
vocabulary range of

poorly combined

1 point: many errors

1 point: many unfinished



A comparison of holistic and analytic scoring methods in the assessment of writing

by Yuji Nakamura (Tokyo Keizai University)

Several conclusions can be drawn from this study.

1. For practical and economical reasons, holistic (one item evaluation) assessment can be used, but to avoid risky
idiosyncratic ratings, analytic assessment (with several evaluation items) is strongly recommended.
2. In terms of rating options, the best practice is to have multiple raters and multiple rating items. The next best
practice is to have one overall evaluation item and multiple raters. In order of preference, the third choice would
be to have one rater and multiple items. The least recommended solution would be to have one rater and one
item. Even worse than this, however, would be to have one rater and an impressionistic scale.
3. This study suggests that it is very risky for one classroom teacher to judge students using a holistic rating system
(cf. Table 1 and the discussion).
4. The more ratings a person receives, the higher the rating precision, though one obvious condition is that
construct and content validity must come before statistical reliability. Otherwise, we do not know what the test is
measuring (cf. Table 3 and Table 4 and the discussion).


As Weigle (2002) further suggests:

1. If large numbers of students need to be placed into writing courses with limited time and limited resources, a
holistic scale may be the most appropriate choice in terms of practicality. Reliability, validity and impact can be
ameliorated by the possibility of adjusting placement scores.
2. A test of writing used for research purposes should have reliability and construct validity as central concerns, and
practicality and impact issues should be of lesser significance.
3. The choice of testing procedures should involve finding the best possible combination of the qualities (reliability,
validity, etc.) and deciding which qualities are most relevant in a given situation.