CoombeHubley - Assessing Language Skills PDF

Assessing Writing
Most teachers find that it is relatively easy to write subjective test item
prompts as contrasted to objective ones. The difficulty lies in clearly
specifying the task for the student so that grading is fair and equitable to all
students. Some teachers find that the best approach is to write a sample
answer and then analyze the elements of that answer. Alternatively, it is
useful to ask a colleague to write a sample answer and critique the prompt.
Writing good subjective items is an interactive, negotiated process.
The F/SL literature generally addresses two types of writing: free writing and
guided writing. The former requires students to read a prompt that poses a
situation and write a planned response based on a combination of
background knowledge and knowledge learned from the course. Guided
writing, however, requires students to manipulate content that is provided in
the prompt, usually in the form of a chart or diagram.
Guided Writing
Guided writing is a bridge between objective and subjective formats. This
task requires teachers to be very clear about what they expect students to do.
Decide in advance whether mechanical issues like spelling, punctuation and
capitalization matter when the task focuses on comprehension. Some
important points to keep in mind for guided writing are:
• Be clear about the expected form and length of response (one paragraph,
a 250-word essay, a letter etc.).
• If you want particular information included, clearly specify it in the prompt

(i.e. three causes and effects, two supporting details etc.)
• Similarly, specify the discourse pattern(s) the students are expected to use
(i.e. compare and contrast, cause and effect, description etc.)
• Since guided writing depends on the students manipulation of the

information provided, be sure to ask them to provide something beyond
the prompt such as an opinion, an inference, or a prediction.
• Be amenable to revising the anticipated answer even as you grade.
Coombe/Hubley 20
Fundamentals of Language Assessment
Free Writing
All of the above suggestions are particularly germane to free writing. The goal
for teachers is to elicit comparable products from students of different ability
levels.
• The use of multiple raters is especially important in evaluating free writing.

Agree on grading criteria in advance and calibrate before the actual
grading session.
• Decide on whether to use holistic, analytical or a combination of the two as

a rating scale for marking.
• If using a band scale, adjust it to the task.
• Acquaint students with the marking scheme in advance by using it for

teaching, grading homework and providing feedback.
• Subliminally teach good writing strategies by providing students with

enough space for an outline, a draft and the finished product.
• In ES/FL classrooms, be aware of cultural differences and sensitivities

among students. Avoid contentious issues that might offend or
disadvantage students.
Writing Assessment Scales
The F/SL assessment literature generally recognises two different types of writing
scales for assessing student written proficiency: holistic marking and analytical
marking.
Holistic Marking Scales

Holistic marking is where the scorer “records a single impression of the impact of
the performance as a whole” McNamara (2000:43). In short, holistic marking is
based on the marker's total impression of the essay as a whole. Holistic marking
is variously termed as impressionistic, global or integrative marking.
Experts in holistic marking scales recommend that this type of marking is quick
and reliable if 3 to 4 people mark each script. The general rule of thumb for
holistic marking is to mark for two hours and then take a rest grading no more than
20 scripts per hour. Holistic marking is most successful using scales of a limited
range (i.e. from 0-6).
FL/SL educators have identified a number of advantages to this type of marking.

First, it is reliable if done under no time constraints and if teachers receive
adequate training. Also, this type of marking is generally perceived to be quicker
than other types of writing assessment and enables a large number of scripts to
be scored in a short period of time. Thirdly, since overall writing ability is
Coombe/Hubley 21
assessed, students are not disadvantaged by one lower component such as poor
grammar bringing down a score.
Several disadvantages of holistic marking have also been identified. First of all,
this type of marking can be unreliable if marking is done under short time
constraints and with inexperienced, untrained teachers (Heaton, 1990). Secondly,
Cohen (1994) has cautioned that longer essays often tend to receive higher
marks. Testers point out that by reducing a score to one figure tends to reduce
the reliability of the overall mark. The most serious problem associated with
holistic marking is the inability of this type of marking to provide feedback to those
involved. More specifically, when marks are gathered through a holistic marking
scale, no information or washback on how those marks were awarded appears.
Thus, testers often find it difficult to justify the rationale for the mark. Hamp-Lyons
(1990) has stated that holistic marking is severely limited in that it does not
provide a profile of the student's writing ability.
Analytical Marking Scales

Analytical marking is where “raters provide separate assessments for each of a
number of aspects of performance” (Hamp-Lyons, 1991). In other words, raters
mark selected aspects of a piece of writing and assign point values to quantifiable
criteria (Coombe & Evans, 2001). In the literature, analytical marking has been
termed discrete point marking and focused holistic marking.
Analytical marking scales are generally more effective with inexperienced

teachers. These scales are more reliable for scales with a larger point range.
A number of advantages have been identified with analytical marking. Firstly,
unlike holistic marking, analytical writing scales provide teachers with a "profile" of
their students' strengths and weaknesses in the area of writing. Additionally, this
type of marking is very reliable if done with a population of inexperienced teachers
who have had little training and grade under short time constraints (Heaton, 1990).
Finally, training raters is easier because the scales are more explicit and detailed.
Just as there are advantages to analytical marking, educators point out a number
of disadvantages associated with using this type of scale. Analytical marking is
perceived to be more time consuming because it requires teachers to rate various
aspects of a student's essay. It also necessitates a set of specific criteria to be
written and for markers to be trained and attend frequent moderation or calibration
sessions. These moderation sessions are to insure that inter-marker differences
are reduced which thereby increase the validity. Also, because teachers look at
specific areas in a given essay, the most common being content, organization,
grammar, mechanics and vocabulary, marks are often lower than for their
holistically-marked counterparts. Another disadvantage is that that analytical
marking scales remove the integrative nature of writing assessment.
Selecting the Appropriate Marking Scale

Selecting the appropriate marking scale depends upon the context in which a
teacher works. This includes the availability of resources, amount of time
allocated to getting reliable writing marks to administration, the teacher population
Coombe/Hubley 22
and management structure of the institution. Reliability can be increased by using
multiple marking, which reduces the scope for error that is inherent in a single
score.
Writing Moderation/Calibration Process
For test reliability, it is recommended that clear criteria for grading be

established and that rater training in using these criteria take place prior to
marking. The criteria can be based on holistic or analytical rating scales.
However, whatever scale is chosen, it is crucial that all raters adhere to the
same scale regardless of their personal preference.
The best way to achieve inter-rater reliability is to practice. Start early in the
academic year by employing the marking criteria in non-test situations. Make
students aware from the outset of the criteria and expectations for their work.
Collect a range of student writing samples on the same task and have
teachers evaluate and discuss them until they arrive at a consensus score.
Involve students in peer-grading of classroom writing to familiarize them with
marking criteria. This has the benefit of making students more aware of ways
in which they can edit and improve their writing.
Recommendations for Writing Assessment
As always, assessment should first and foremost reflect the goals of the course.
In order for writing assessment to be fair for students, they should have plenty of
opportunities to practice a variety of different writing skills of varying lengths. In
other words, tests of writing should be shorter and more frequent, not just a
"snapshot" approach at midterm and final exams.
Coombe/Hubley 23
Assessing Reading
Most language teachers assess reading through the component subskills. Since
reading is a receptive language skill, we can only get an idea of how students
actually process texts through techniques such as think aloud protocols. It is not
possible to observe reading behavior directly. For assessment, we normally focus
on certain important skills which can be divided up into major and minor (or
contributing) reading skills.
Major reading skills include:

– Reading quickly to skim for gist, scan for specific details, and
establish overall organization of the passage
– Reading carefully for main ideas, supporting details, author’s
argument and purpose, relationship of paragraphs, fact vs. opinion
– Information transfer from nonlinear texts
Minor reading skills include:

– understanding at the sentence level
syntax, vocabulary, cohesive markers
– understanding at inter-sentence level
reference, discourse markers
– understanding components of nonlinear texts
the meaning of graph or chart labels, keys, and the ability to
find and interpret intersection points.
It should be noted that the designations major and minor largely relate to whether
the skills pertain to large segments of the text or whether they focus on certain
local structural or lexical points. Increasingly, grammar and vocabulary are
contextualized as part of reading passages instead of being assessed separately
in a discrete point fashion. However, there are times when it is appropriate to
assess structure, vocabulary, and language-in-use separately.
Reading texts include both prose passages and nonlinear texts such as tables,
graphs, schedules, advertisements and diagrams. Texts for assessment should
be carefully chosen to fit the purpose of assessment and the level of the students
taking factors such as text length, density and readability into account. For
assessment, avoid texts with controversial or biased material because they can
upset students and affect the reliability of test results. Ninety percent of the
vocabulary in a prose passage should be known to the students (Nation, 1990).
Reading tests use many of the formats already discussed. Recognition formats
include MCQs, TFNs, matching and cloze with answers provided. If limited
production formats such as short answer are used, usually the emphasis is on
meaning, not spelling. Of course, there will be authentic tasks such as reading
directions for form-filling where accuracy is important.
Specifications
Coombe/Hubley 24
As with all skills assessment, it is important to start with a clear understanding of
program objectives, intended outcomes and target uses of English. Once these
are clear, you can develop specifications or frameworks for developing
assessment. Specifications will clearly state what and how you will assess, what
the conditions of assessment will be (length and overall design of the test), and
will provide criteria for marking or grading. Here are typical features of
specifications:
Content
• What material will the test cover? What aspects of this material?
• What does the student have to be able to do? For example, in reading,
perhaps a students has to scan for detailed information.
• A propos reading passages, specifications state the type of text (prose or
nonlinear), the number of words in the passage and readability level.
• Acceptable topics and the treatment of vocabulary are usually set forth in
specifications. For instance, topics may be restricted to those covered in
the student book and vocabulary may focus on core vocabulary in the
course.
Conditions
• Specifications usually provide information about the structure of the
examination and the component parts. For example, a reading
examination may include 5 subsections which use different formats and
texts to test different subskills.
• Specific formats or a range of formats are usually given in specifications in
addition to the number of questions for each format or section.
• Timing is another condition which specifications state. The time for the
entire test may be given or sometimes for each individual subsection. For
example, you can place time-dependent skills such as skimming and
scanning in separately timed sections or you can place them at the end of a
longer reading test where students typically are reading faster to finish
within the allocated time.
Grading criteria
• Specifications indicate how the assessment instrument will be marked. For
instance, the relative importance of marks for communication as contrasted
to those for mechanics (spelling, punctuation, capitalization) should reflect
the overall approach and objectives of the instructional program. Similarly,
if some skills are deemed more important or require more processing than
other skills, they may be weighted more heavily.
In short, specifications help teachers and administrators establish a clear linkage

between the overall objectives for the program and the design of particular
assessment instruments. Specifications are especially useful for ensuring even
coverage of the main skills and content of courses as well as developing tests that
are comparable to one another because they are based on the same guidelines.
Coombe/Hubley 25
Recommendations for Reading Assessment
Texts
Texts can be purpose written, taken directly from authentic material or adapted.
The best way to develop good reading assessments is to constantly be on the
watch for appropriate material. Keep a file of authentic material from newspapers,
magazines, brochures, instruction guides – anything that is a suitable source of
real texts. Other ways to find material on particular topics are to use an
encyclopedia written at an appropriate readability level or to use an Internet
search engine. Whatever the source, cite it properly.
Microsoft Word provides word counts and readability statistics. First, highlight the
passage, and then select word count from the Tool menu. To access readability
information, go to Options under the Tool menu, then Spelling and Grammar, and
tick “Show Readability Statistics”. Readability is based on word and sentence
length as well as use of the passive voice. You can raise or lower the level by
changing these. You can also add line numbers and other special features to
texts.
Questions
Make sure that questions are written at a slightly lower level than the reading
passages. Reading questions should be in the same order as the material in the
passage itself. If you have two types of questions or two formats based on one
text, go through the text with different colored markers to check that you have
evenly covered the material in order.
For objective formats such as multiple choice and true/false/not given, try to make
all statements positive. If you phrase a statement negatively and an option is
negative as well, students have to deal with the logical problems of double
negatives. Whenever possible, rephrase material using synonyms to avoid
students scanning for verbatim matches. Paraphrasing encourages vocabulary
growth as positive washback.
Coombe/Hubley 26
Assessing Listening
The assessment of listening abilities is one of the least understood, least

developed and yet one of the most important areas of language testing and
assessment (Alderson & Bachman, 2001). In fact, Nunan (2002) calls
listening comprehension “the poor cousin amongst the various language
skills” because it is the most neglected skill area. As teachers we recognize
the importance of teaching and then assessing the listening skills of our
students, but - for a number of reasons - we are often unable to do this
effectively. One reason for this neglect is the availability of culturally
appropriate listening materials suitable for EF/SL contexts. The biggest
challenges for teaching and assessing listening comprehension center around
the production of listening materials. Indeed, listening comprehension is often
avoided because of the time, effort and expense required to develop,
rehearse, record and produce high quality audio tapes or CDs.
Approaches to Listening Assessment

Buck (2001) has identified three major approaches to the assessment of
listening abilities: discrete point, integrative and communicative approaches.
The discrete-point approach became popular during the early 1960’s with the
advent of the Audiolingual Method. This approach identified and isolated
listening into separate elements. Some of the question types that were
utilized in this approach included phonemic discrimination, paraphrase
recognition and response evaluation. An example of phonemic discrimination
is assessing students by their ability to distinguish minimal pairs like
ship/sheep. Paraphrase recognition is a format that required students to listen
to a statement and then select the option closest in meaning to the statement.
Response evaluation is an objective format that presents students with
questions and then four response options. The underlying rationale for the
discrete-point approach stemmed from two beliefs. First, it was important to
be able to isolate one element of language from a continuous stream of
speech. Secondly, spoken language is the same as written language, only it
is presented orally.
The integrative approach starting in the early 1970s called for integrative
testing. The underlying rationale for this approach is best explained by Oller
(1979:37) who stated “whereas discrete items attempt to test knowledge of
language one bit at a time, integrative tests attempt to assess a learner’s
capacity to use many bits at the same time.” Proponents of the integrative
approach to listening assessment believed that the whole of language is
greater than the sum of its parts. Common question types in this approach
were dictation and cloze.
Coombe/Hubley 27
The third approach, the communicative approach, arose at approximately the
same time as the integrative approach as a result of the Communicative
Language Teaching movement. In this approach, the listener must be able to
comprehend the message and then use it in context. Communicative
question formats must be authentic in nature.
Issues in Listening Assessment
A number of issues make the assessment of listening different from the

assessment of other skills. Buck (2001) has identified several issues that
need to be taken into account. They are: setting, rubric, input, voiceovers,
test structure, formats, timing, scoring and finding texts. Each is briefly
described below and recommendations are offered.
Setting
The physical characteristics of the test setting or venue can affect the validity
and/or reliability of the test. Exam rooms must have good acoustics and
minimal background noise. Equipment used in test administrations should be
well maintained and checked out beforehand. In addition, an AV technician
should be available for any potential problems during the administration.
Rubric
Context is extremely important in the assessment of listening comprehension
as test takers don’t have access to the text as they do in reading. Context
can be written into the rubric which enhances the authenticity of the task.
Instructions to students should be in the students’ L1 whenever possible.
However, in many teaching situations, L1 instructions are not allowed. When
L2 instructions are used, they should be written at one level of difficulty lower
than the actual test. Clear examples should be provided for students and
point values for questions should be included in the rubrics.
Input
Input should have a communicative purpose. In other words, the listener
must have a reason for listening. Background or prior knowledge needs to be
taken into account. There is a considerable body of research that suggests
that background knowledge affects comprehension and test performance. In
a testing situation, we must take care to ensure that students are not able to
answer questions based on their background knowledge rather than on their
comprehension.
Voiceovers
Anyone recording a segment for a listening test should receive training and
practice beforehand. In large-scale testing, it is advisable to use a mixture of
genders, accents and dialects. To be fair for all students, listening voiceovers
should match the demographics of the teacher population. Other issues are
the use of non-native speakers for voiceovers and the speed of delivery. Our
belief is that non-native speakers of English constitute the majority of English
speaking people in the world. Whoever is used for listening test voiceovers,
whether native or non-native speakers, should speak clearly and enunciate
Coombe/Hubley 28
carefully. The speed of delivery of a listening test should be consistent with
the level of the students and the materials used for instruction. If your
institution espouses a communicative approach, then the speed of delivery for
listening assessments should be native or near native delivery. The delivery of
the test should be standard for all test takers. If live readers are used, they
should practice reading the script before the test and standardize with other
readers.
Test Structure
The way a test is structured depends largely on who constructs it. There are
generally two schools of thought on this: British and the American
perspectives. British exam boards generally grade input from easy to difficult
in a test and mix formats within a section. This means that the easier sections
come first with the more difficult sections later. American exam boards, on
the other hand, usually grade question difficulty within each section of an
exam and follow the 30/40/30 rule. This rule states that 30% of the questions
within a test or test section are of an easy level of difficulty; 40% of the
questions represent mid range levels of difficulty; and the remaining 30% of
the questions are of an advanced level of difficulty. American exam boards
usually use one format within each section. The structure you use should be
consistent with external benchmarks you use in your program. It is advisable
to start the test with an ‘easy’ question. This will lower students’ test anxiety
by relaxing them at the outset of the test.
Within a listening test, it is important to test as wide a range of skills as

possible. Questions should also be ordered as they are heard in the passage.
Questions should always be well-spaced out in the passage for good content
coverage. It is recommended that no content from the first 15-20 seconds of
the recording be tested to allow students to adjust to the listening. Many
teachers only include test content which is easy to test, such as dates and
numbers. Include some paraphrased content to challenge students.
Formats
Perhaps the most important piece of advice here is that students should never
be exposed to a new format in a testing situation. If new formats are to be
used, they should be first practiced in a teaching situation and then introduced
into the testing repertoire. Objective formats like MCQs and T/F are often
used because they are more reliable and easier to mark and analyze. When
using these formats, make sure that the N option is dropped from T/F/N and
that three response options instead of four are utilized for MCQs. Remember
that with listening comprehension, memory plays a role. Since students don’t
have repeated access to the text, more options add to the memory load and
affect the difficulty of the task and question. Visuals are often used as part of
listening comprehension assessment. When using them as input, make
certain that you use clear copies that reproduce well.
Coombe/Hubley 29
Skill contamination is an issue that is regularly discussed with regard to
listening comprehension. Skill contamination is the idea that a test-taker must
use other language skills in order to answer questions on a listening test. For
example, a test taker must first read the question and then write the answer.
Whereas skill contamination used to be viewed negatively in the testing
literature, it is now viewed more positively and termed ‘skill integration.’
Timing
The length of a listening test is generally determined by one of two things: the
length of the tape or the number of repetitions of the passages. Most
published listening tests do not require the proctor to attend to timing. He/she
simply inserts the tape or CD into the machine. The test is over when the
proctor hears a pre-recorded “this is the end of the listening test” statement.
For teacher-produced listening tests, the timing of a test will usually be
determined by how many times the test takers are permitted to hear each
passage. Proficiency tests like the TOEFL usually allow one repetition
whereas achievement tests usually repeat the input twice. Buck (2001)
recommends that if you’re assessing main idea, input should be heard once
and if you’re assessing detail, input should be heard twice. According to
Carroll (1972), listening tests should not exceed 30 minutes.
It is important to remember to give students time to pre-read the questions

before the test and answer the questions throughout the test. If students are
required to transfer their answers from the test paper to an answer sheet,
extra time to do this should be built into the exam.
Scoring
The scoring of listening tests provides numerous challenges to the
teacher/tester. Dichotomous scoring (questions that are either right or wrong)
is easier and more reliable. However, it doesn’t lend itself to many of the
communicative formats such as note-taking. Other issues are whether points
are deducted for grammar or spelling mistakes or non-adherence to word
counts. When more than one teacher is participating in the marking of a
listening test, calibration or standardization training should be completed to
ensure fairness to all students.
Finding Suitable Texts

Many teachers feel that the unavailability of suitable texts is listening
comprehension’s most pressing issue. The reason for this is that creating
scripts which have the characteristics of oral language is not an easy task.
Some teachers simply take a reading text and ‘transform’ it into a listening
script. The transformation of reading texts into listening scripts results in
contrived and inauthentic listening tasks because written texts often lack the
redundant features which are so important in helping us understand speech.
A better strategy is to look for texts that concentrate on characteristics that are
unique to listening. If you start collecting texts that have the right oral
features, you can then construct tasks around them. When graphics or
visuals are used as test context, teachers often find themselves ‘driven by clip
art’. This occurs when teachers build a listening script around readily
Coombe/Hubley 30
available graphics. It is best to inventory the topics in a course and collect
appropriate material well in advance of exam construction.
To produce more extemporaneous listening recordings, use available

programs on your computer like Sound Recorder or shareware like Audacity
and PureVoice to record scripts for use as listening assessments in the
classroom.
Vocabulary
Research recommends that students must know between 90-95% of the
words to understand a text/script. Indeed the level of the vocabulary that you
utilize in your scripts can affect the difficulty and hence the comprehension of
students. If your institution employs word lists, it is recommended that you
seed vocabulary from your own word lists into listening scripts whenever
possible. To determine the vocabulary profile of your text/script, go to
http://www.er.uqam.ca/nobel/r21270/cgi-bin/webfreqs/web_vp.cgi for
Vocabulary Profiler, a very user-friendly piece of software. By simply pasting
your text into the program, you will receive information about the percentage
of words that come from Nation’s 1000 Word List and the Academic Word
List.
Another thing to remember about vocabulary is that ‘lexical overlap’ can affect
difficulty. Lexical overlap refers to when words used in the passage are used
in the questions and response options. When words from the passage are
used in the correct answer or key, the question is easier. The question
becomes more difficult if lexical overlap occurs from the passage/script to the
distractors. A final thought on vocabulary is that unknown vocabulary should
never occur as a keyable response (the actual answer) in a listening test.
Final Recommendations for Listening Assessment

No matter what the skill area, as always test developers should be guided by
the cornerstones of good testing practice when constructing tests.
• Validity (Does it measure what it says it does?)
• Reliability (Are the results consistent?)
• Practicality (Is the test “teacher-friendly”?)
• Washback (Is feedback channeled to everyone concerned?)
• Authenticity (Do the tasks mirror real life contexts?)
• Transparency (Are expectations clear to students? Do Ss and Ts have
access to information about the test/assessment?)
• Security (Are exams and item banks secure? Can they be reused?)
Coombe/Hubley 31
Assessing Speaking
Always keeping the cornerstones of good assessment in mind, why do we want to

test speaking? In a general English program, speaking is an important channel of
communication in daily life. We want to simulate real-life situations in which
students engage in conversation, ask and answer questions, and give information.
In an academic English program, the emphasis might be on participating in class
discussions and debates or giving academic presentations. In a Business English
course, students might develop telephone skills, interact in a number of common
situations involving meetings, travel, and sales as well as make reports. Whatever
the teaching focus, valid assessment should reflect the course objectives and the
eventual target language.
Speaking is a productive language skill like writing and thus shares many issues
such as whether to grade holistically or analytically. However, unlike writing,
speaking is more ephemeral unless measures are taken to record student
performance. Yet the presence of recording equipment can inhibit students and
often recording is not practical or feasible. To score reliably, it is often necessary
to have two teachers assess together. When this happens, one is the interlocutor
who interacts with the speaker(s) while the other teacher, the assessor, tracks the
student’s performance.
Based on Bygate’s categories, Weir (1993) divides oral skills into two main
groups: speaking skills that are part of a repertoire of routines for exchanging
information or interacting, and improvisational skills such as negotiating meaning
and managing the interaction. The routine skills are largely associated with
language functions and the spoken language required in certain situations. By
contrast, the improvisational skills are more general and may be brought into play
at any time for clarification, to keep a conversation flowing, to change topics or to
take turns. In circumstances when presentation skills form an important
component of a program, naturally they should be assessed. However, avoid
situations where a student simply memorizes a prepared speech. Decide which
speaking skills are most germane to a particular program and then create
assessment tasks that sample skills widely with a variety of tasks.
While it is possible to assess speaking skills on an individual basis, most large

exam boards opt to test pairs of students with pairs of testers. Within tests
organized in this way, there are times when only one student speaks and other
times when the students interact in a conversation. This setup makes it possible
to test common routine functions as well as a range of improvisational skills. For
reliability, interlocutors should work from a script so that all students get similar
questions framed in the same way. In general, the teacher or interlocutor should
keep in the background and only intercede if truly necessary.
Coombe/Hubley 32
Common speaking assessment formats
It is good practice to start the speaking assessment with a simple task that puts
students at ease so they perform better. Often this takes the form of asking the
students for some personal information.
Interview: can be teacher to student or student to student. Teacher to student is

more reliable when the questions are scripted.
Description of a photograph or item: Students describe what they see.
Narration: This is often an elaboration of a description. The student is given a

series of pictures or cartoon strip for the major events in a story.
Information gap activity: One student has information the other lacks and vice
versa. Students have to exchange information to see how it fits together.
Negotiation task: Students work together on a task where they may have
different opinions. They have to reach a conclusion in a limited period of time.
Roleplays: Students are given cue cards with information about their “character”
and the setting. Some students find it difficult to project themselves into an
imaginary situation and this lack of “acting ability” may affect reliability.
Oral presentations: Strive to make them impromtu instead of rehearsed.
Recommendations for Speaking Assessment
Decide with your colleagues which speaking subskills are most important and
adopt a grading scale that fits your program. Whether you adopt a holistic or
analytical approach to grading, create a recording form that enables you to track
students’ production and later give feedback for improvement.
Think about these factors: fluency vs. accuracy, appropriate responses (indicating
comprehension), pronunciation, accent and intonation, use of repair strategies.
Train teachers in scoring and practice together until there is a high rate of inter-
rater reliablity. Use moderation sessions with high-stakes exams.
Keep skill contamination in mind. Don’t give students lengthy written instructions
which must be read and understood before speaking.
Remember that larger samples of language are more reliable. Make sure that
students speak long enough on a variety of tasks.
Choose tasks that generate positive washback for teaching and learning!
Coombe/Hubley 33

CoombeHubley - Assessing Language Skills PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

CoombeHubley - Assessing Language Skills PDF

Загружено:

Авторское право:

Доступные форматы

Assessing Writing

• If you want particular information included, clearly specify it in the prompt

• Since guided writing depends on the students manipulation of the

• Be amenable to revising the anticipated answer even as you grade.

• The use of multiple raters is especially important in evaluating free writing.

• Decide on whether to use holistic, analytical or a combination of the two as

• If using a band scale, adjust it to the task.

• Acquaint students with the marking scheme in advance by using it for

• Subliminally teach good writing strategies by providing students with

• In ES/FL classrooms, be aware of cultural differences and sensitivities

Writing Assessment Scales

Holistic Marking Scales

FL/SL educators have identified a number of advantages to this type of marking.

Analytical Marking Scales

Analytical marking scales are generally more effective with inexperienced

Selecting the Appropriate Marking Scale

Writing Moderation/Calibration Process

For test reliability, it is recommended that clear criteria for grading be

Recommendations for Writing Assessment

Major reading skills include:

Minor reading skills include:

In short, specifications help teachers and administrators establish a clear linkage

The assessment of listening abilities is one of the least understood, least

Approaches to Listening Assessment

Issues in Listening Assessment

A number of issues make the assessment of listening different from the

Within a listening test, it is important to test as wide a range of skills as

It is important to remember to give students time to pre-read the questions

Finding Suitable Texts

To produce more extemporaneous listening recordings, use available

Final Recommendations for Listening Assessment

Always keeping the cornerstones of good assessment in mind, why do we want to

While it is possible to assess speaking skills on an individual basis, most large

Interview: can be teacher to student or student to student. Teacher to student is

Description of a photograph or item: Students describe what they see.

Narration: This is often an elaboration of a description. The student is given a

Oral presentations: Strive to make them impromtu instead of rehearsed.

Recommendations for Speaking Assessment

Вам также может понравиться