Topic 10 Assessment in ESL

Topic
10
Assessment
in ESL
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1.
Explain the basic concepts and constructs of language testing;
2.
Identify the different types of tests and test formats;
3.
Determine the reasons behind language testing;
4.
Identify the characteristics of a good test;
5.
Examine the different types of alternative assessments;
6.
Determine the effects of testing.
INTRODUCTION
Have you ever wondered why there is a necessity for tests in learning? As
teachers, we should be clear about the purposes of testing. Tests can have a
positive or negative effect on student learning. In many cases, tests still conjure
up negative associations and reactions. However, the nature of tests have
changed radically in more recent years and has become less of an imposition,
more humanistic and seen as a neutral assessment of what people do
(McNamara, 2000:4). Newer forms of language assessment do not involve the
ordeal of a single test performance under time constraints. Tests serve the many
needs of teaching. It is thus important for teachers to understand the principles
and practice of language assessment.
212 X
TOPIC 10
ASSESSMENT IN ESL
Topic 10 focuses on basic concepts and constructs in language assessment,

particularly on language testing rather than on the broader issue of assessment.
A brief overview of assessment is provided which includes topics on:
(a)
The nature of testing and evaluating ESL
(b)
Validity, reliability and practicality
(c)
Types of language tests
(d)
Test format
(e)
Alternative assessment
(f)
Effects of testing
10.1
TESTING AND EVALUATING ESL
What means of assessment in Second Language Learning are you familiar with?
Why do teachers need to have a critical understanding of the principles and
practice of language assessment? Some reasons are:
(a)
Language tests play a significant role in many peoples lives. Language

tests are instruments for the institutional control of individuals
(McNamara, 2000: 4).
(b)
As teachers, we need to be aware of what is involved in testing teaching

to a test, administering tests, relying on information from tests to make
decisions on students performance, and even to develop tests to assess
students progress.
(c)
In action research, information about students proficiency is required. You

may need to use measures of the proficiency of the students. You may need
an existing test or develop your own.
Before we discuss tests and the nature of tests, let us first clarify the terms
evaluation, assessment and tests. Table 10.1 provides the definition of each term.
TOPIC 10
ASSESSMENT IN ESL
W 213
Table 10.1: Terms and Definitions

Term
Definition
Evaluation
Systematic gathering of information for the purpose of making

decisions.
Assessment
Procedure used to gain information about student learning and to

form value judgments concerning learning progress.
Test
A procedure for obtaining information on students performance.
Testing and teaching are interrelated and interdependent. However, the focus is
not necessarily similar. Tests focus on the assessment of the products of learning.
On the other hand, teaching emphasises on enabling students to succeed in the
learning process (Chitravelu et al. 1995). A test is a yardstick a teacher uses to
measure a learners performance (Baker, 1989). Most tests are administered under
examination conditions (e.g., formal, standardised tests and school
examinations); whereas others are conducted as an integral part of the teaching
and learning processes. Formal tests are systematic, planned sampling techniques
designed to assist teachers and students in appraising students achievement.
Informal tests are unplanned assessments made as a course is conducted.
Testing can help students in many ways. Madsen (1983) points out that wellconstructed tests:
(a)
Help to create positive attitudes toward the class; and
(b)
Assist students in mastering the language.
Testing helps teachers too, as they are expected to be accountable for the results
of their instruction. Tests help teachers in answering the question, Have I been
effective in my teaching? Thus, tests can act as a check for teachers in
diagnosing teachers own efforts as well as the students. As teachers examine the
students tests, s/he might ask a series of questions, Are my lessons pitched at
the right level? Am I aiming my instruction too high or low? Am I teaching some
skills effectively and others less effectively? What areas need more practice?
Which points need reviewing? (Madsen, 1983: 5). Tests therefore can benefit
students, teachers, and administrators through confirmation of progress. Good
tests can sustain or boost class morale and encourage learning.
For detailed information on testing and evaluation in ESL, visit the following site
http://www.middleweb.com/Assmntlinks.html
214 X
TOPIC 10
ASSESSMENT IN ESL
In constructing tests, three fundamental factors which need to be considered

seriously are:
(a)
Validity
(b)
Reliability
(c)
Practicality
10.1.1
Validity
Validity is defined as the degree to which the test actually measures what it is
supposed to measure (Brown, 2001:387). For instance, if the objective of a test is
to measure a students ability to communicate in English, the test is valid when it
actually measures the students ability to communicate. If it only measures the
students vocabulary, then it is not a valid test. There are four basic types of
validity:
(a)
Face Validity
A test is said to have face validity if it looks like a test that measures a
particular skill, for instance, pronunciation ability. Brown (2001) defines
face validity in the form of the question, Does the test, on the surface of it,
appear to test what it is designed to test?. Face validity is seen in light of
content. If the test samples the actual content of what learners have
achieved or are expected to achieve, only then can face validity be
perceived.
(b)
Content Validity
Content validity is a combination of experts opinion on whether a test is
valid. A test has content validity if its content constitutes a representative
sample of the language skills with which it is meant to be concerned. The
greater the tests content validity, the more likely it is to be an accurate
measure of what it is supposed to measure. Unfortunately, too often the
content of tests is determined by what is easy to test rather than what is
important to test.
(c)
Criterion-Related Validity
Criterion-related validity is the extent to which the results of the test agree
with those provided by some independent and highly dependable
assessment of the candidates ability. There are mainly two types of
criterion-related validity:
(i)
Concurrent validity established when the test and the criterion are
administered at about the same time.
TOPIC 10
(ii)
(d)
ASSESSMENT IN ESL
W 215
Predictive validity concerns the degree to which a test can predict a

candidates future performance.
Construct Validity
A test is construct-valid if it can be shown that it measures just the ability
which it is supposed to measure. Construct refers to any underlying
ability which is hypothesised in the theory of language ability. If we try to
measure the ability to infer the meaning of unknown words from context in
a test, then that part of the test is construct-valid only if we are able to
demonstrate that we are indeed measuring just that ability.
Visit this site for information of construct validity by James Dean Brown:
http://www.jalt.org/test/bro_8.htm
A summary on types of validity is given in Table 10.2.

Table 10.2: Types of Validity
Type of Validity
Test
Face
Looks like a good one to learner/layperson
Content
Accurately reflects the syllabus it is based on
Predictive
Accurately predicts future performance
Concurrent
Gives similar results to already validated tests or other immediate

external criteria (e.g. teachers subjective assessment)
Construct
Reflects closely a valid theory of foreign language learning that it

takes as its model
ACTIVITY 10.1
1.
Distinguish the key difference between content and construct

validity.
2.
If content validity is absent, why does construct validity assume

greater importance?
3.
Explain the fact that there is no final, absolute, and objective

measure of validity. Why does validity ultimately go back to the
subjective opinion of testers and theorists?
216 X
10.1.2
TOPIC 10
ASSESSMENT IN ESL
Reliability
Validity deals with the issue of whether a test measures what the tester wants to
know about the performance of her/his students. Reliability is concerned with
the precision with which the test measures (Chitravelu et al., 1995). A reliable test
must yield the same results on the same sample of students whenever the test is
administered. Reliability is concerned with consistency. There are basically two
types of reliability for language test: test-retest and inter-item reliability.
Test-retest reliability is the extent to which the test achieves the same results
consistently time after time. For example, if a test is given to a group of
students, its results should be technically the same if given to the same group of
students at another time.
Inter-item reliability is the extent to which all items on the test are measuring the
same thing. The reliability of a test can be gauged in terms of a reliability
coefficient which allows the comparison of the reliability of different tests. A
reliability of 1.0 is one which would give precisely the same results for a group of
test-takers regardless of when the test is taken. Well-constructed vocabulary,
structure, and reading tests are usually in the 0.90 to 0.99 range. Auditory
comprehension tests are more often in the range of 0.80 to 0.89. Oral production
tests can be in the 0.70 to 0.79 range. Teacher-constructed or classroom tests often
have a more moderate range, especially if it is constructed the first time.
How can teachers construct reliable language tests? The following are some
guidelines suggested by Brown (1994):
(a)
Take enough samples of behaviour (more items)
(b)
Write clear unambiguous items.
(c)
Provide clear and explicit instructions.
(d)
Ensure that tests are well laid out and perfectly legible.
(e)
Test-takers should be familiar with the format and test techniques.
(f)
Provide uniform and non-distracting conditions of test administration.
(g)
Use items that allow for objective scoring as far as possible.
TOPIC 10
ASSESSMENT IN ESL
W 217
(h)
Make comparisons between test-takers as directly as possible.
(i)
Provide a detailed scoring key.
(j)
Train scorers.
(k)
Agree on what constitutes acceptable responses and appropriate scores at

the outset of scoring.
(l)
Identify candidates by number, not name to avoid bias.
(m) Employ multiple, independent scoring.

Another way of ensuring that the test is valid and reliable is for the teacher to
conduct an item analysis of the test based on the scores in the test in a
preliminary run of the test. That is, if time permits, the teacher should pilot the
test before its actual administering. The scores on the test can be analysed to
determine whether the items are consistent, well-constructed, and valid. Bailey
(1998), Cohen (2001), and Heaton (1990) provide a description of how items on
tests can be analysed.
10.1.3
Practicality
Some tests are ideal theoretically but are difficult in practice. For example, mass
oral testing is difficult. Oral proficiency testing is important in the learning
process. However, it is seldom part of national or school level examinations.
Mass oral proficiency is expensive to conduct, time-consuming, and reliability is
often low because of inter-rater variability. Thus, efficiency or practicality of a
test involves issues of economy, ease of administration, scoring, and
interpretation of results. The longer it takes to construct, administer, and score,
the more skilled personnel and equipment required, the higher the costs are
likely to be.
SELF-CHECK 10.1
In language testing, which test seems easy at a glance but challenging
to answer?
218 X
10.2
TOPIC 10
ASSESSMENT IN ESL
TYPES OF LANGUAGE TESTS
Nunan (1991) lists the following types of tests:

(a)
Direct Tests Versus Indirect Tests

In direct testing, the test-taker performs precisely the skill being measured.
For instance, if we want to test speaking skills, the students tested should
be asked to speak. Indirect testing, on the other hand, attempts to measure
the abilities underlying the skills in which we are interested. Indirect testing
attempts to measure the abilities underlying the skills which many are
interested.
(b)
Discrete Point Versus Integrative Tests

Discrete point testing involves testing one element at a time, item by item.
An example of this would be to have a number of items testing a particular
grammatical structure. By contrast, integrative testing requires the testtaker to combine many language elements in order to complete the task.
(c)
Norm-Referenced Tests Versus Criterion-Referenced Tests

Testing which relates one candidates performance to that of other
candidates is norm-referenced testing. For example, a student obtained a
score that places him/her in the top ten percent of candidates who sat for
the test, and we are not told directly of what the candidate can do. By
contrast, criterion-referenced testing is one in which we classify people
according to whether or not they are able to perform some task or sets of
tasks satisfactorily.
(d)
Objective Tests Versus Subjective Tests

The difference between the two tests is in the form of scoring. If no
judgment is needed on the part of the scorer, the scoring is thus objective. If
judgment is required, the scoring is subjective.
(e)
Communicative Language Testing

Communicative language testing involves using a test which measures the
ability of candidates in taking part in acts of communication including
reading and writing (Weir, 1993). Such tests are normally intended to be a
measure of how test-takers are able to use language in real life situations.
Communicative tests are often context-specific.
TOPIC 10
10.3
ASSESSMENT IN ESL
W 219
TEST FORMATS
There are various test formats or sometimes called test techniques used in
assessing language ability. Through test formats we obtain information about the
candidates language skill or ability. Some formats are suitable to test certain
language skills or abilities. Formats do not, in general, determine what can be
tested. For instance, we can use a multiple-choice format to test grammar,
reading, vocabulary, and for diagnostic testing, etc.
In choosing a particular format, we need to consider the following questions:
(a)
Does the test format allow us to obtain the information we need about the
students ability in the skill we are testing? Will the results we obtain be
valid and reliable?
(b)
Is using the format the most economical way to obtain the information we
want? Will it have good backwash effects? Backwash effect is the effect
testing has on teaching and learning.
(c)
Are the students sufficiently familiar with the format?
The following is a discussion of the more common language tests, namely

multiple-choice item (a common test format) and cloze. For other types of tests
refer to Weir (1993), Heaton (1990) and McNamara (2000).
(a)
Multiple-choice Questions (MCQ)

Hughes (1989) provides the following as the basic structure of a multiplechoice question test (MCQ):
There is a stem:
Enid has been here half an hour
A number of options is provided, one of which is correct, and the others are
distractors:
(i)
during
(ii)
for
(iii) while
(iv) since
220 X
TOPIC 10
ASSESSMENT IN ESL
The Candidate Has to Identify the Correct or Most Appropriate Option

The most obvious advantage of the MCQ is that scoring can be perfectly
reliable, rapid and economical. Another advantage is that it is possible to
include more items because it is quite easy for students to respond to the
questions by putting a mark on the paper. This makes for greater test
reliability.
Limitations of the MCQ
There is a number of disadvantages of the MCQ format. First, if there is no
fit between candidates productive and receptive skills, the MCQ may give
an inaccurate picture of the candidates ability.
Second, guessing may have a considerable but unknowable effect on test
scores. On the average, we expect a person to score 33 on a 100-item test by
mere chance. The restricted number of response allows guessing to happen
for the more difficult questions.
Another limitation is that the format severely restricts what can be tested.
MCQ requires distractors, and distractors are not always available and
require skill to construct. In addition, it is difficult and time-consuming to
write successful items. However, the time saved in time administration and
scoring far outweighs the time spent on constructing a successful test.
A fourth limitation is that backwash effects may be harmful. Practicing for
the test will have a harmful effect on learning and teaching. Practices of the
MCQ items may not be the best way to improve language proficiency.
Lastly, MCQ may facilitate cheating. The responses on the MCQ (a,b,c,d)
are simple enough to communicate to other candidates non-verbally. One
way to avoid this is to have two versions of the test one form has the
reverse order of options.
(b)
Clozentropy or Cloze Procedure

Cloze tests are prepared by deleting a certain number of words from a text
and replacing these deleted words with blanks. Candidates then fill in the
blanks with the appropriate word or answer. Pure cloze is where every nth
word is deleted (fifth, sixth, tenth, etc) depending on the level of difficulty.
When there is no fixed number of words to be deleted consistently, rather
words are deleted wherever the nth word falls, this is called a random
cloze. When the test constructor chooses the type of words to delete, this is
called a rational cloze.
TOPIC 10
ASSESSMENT IN ESL
W 221
The cloze test is used to test grammar, vocabulary, reading and writing
skills. The kinds of blank chosen and the kinds of options given determine
what the cloze test tests.
Constructing the Cloze Test
In choosing texts for the cloze, the teacher must ensure that the language is
appropriate to the students level. In general, a text passage that is too
factual do not lend itself to good cloze testing. The best types for a cloze are
stories, descriptions of process, and explanations of something.
In addition, students should be given clear directions how to sit for the
cloze test and how to respond to the test. Teachers can instruct students to
read through the whole text before attempting to fill in the blanks.
Another guideline in constructing the cloze is that a sentence or two must
be left intact at the beginning of the text. These are called the lead-in. In
writing the options (if alternatives are provided), ensure that the options
are placed as follows:
(i)
Immediately after each blank;
(ii)
In the margin;
(iii) Below the text, in numbered groups; or

(iv) In a separate answer sheet.
The cloze can be scored by marking the ones with alternatives using a key.
Cloze tests which do not have options can be marked either by accepting
words in the original text (before deletion). This is called exact word
marking. The other way is to accept any word which fits the blank, termed
acceptable word marking.
Strengths and Weaknesses of the Cloze:
(i)
It is easily constructed, administered, and scored.
(ii)
It has a high degree of reliability because it is an objective test.
(iii) Cloze is not appropriate for diagnostic purposes.

(iv) The cloze can be a difficult task.
There are other forms of test formats. Refer to Chitravelu et al. (1995) and
other testing books on language for a detailed account of this topic.
222 X
TOPIC 10
ASSESSMENT IN ESL
What types of test format can be used in a reading comprehension test?
ACTIVITY 10.2
Draw a mind-map on testing and evaluating ESL. Think of other
advantages and limitations of using each of the test procedures
described in the Topic.
10.4
ALTERNATIVE ASSESSMENT
Recent developments in testing provide teachers with new perspectives on

testing language abilities. The following are some issues related to this recent
change in testing (Brown, 2001: 403410):
(a)
New Perspectives on Intelligence

Intelligence was seen in the past as the ability to perform linguistic and
logical mathematical problem solving. Today, our world is dominated by
standardised, norm-referenced tests that are timed, multiple-choice, tricky,
long, and artificial. More recent theory of intelligence proposed by Gardner
(1983) states that there are various forms of intelligence, as discussed in an
earlier chapter, as follows:
(i)
Linguistic intelligence
(ii)
Logical-mathematical intelligence
(iii) Spatial intelligence (the ability to find your way around an

environment, to form mental images of reality)
(iv) Musical intelligence (the ability to perceive and create pitch and
rhythmic patterns)
(v)
Bodily-kinesthetic
prowess)
intelligence
(fine
motor
movement,
athletic
(vi) Interpersonal intelligence (the ability to understand others, how they

feel, and to interact effectively with them)
(vii) Intra-personal intelligence (the ability to understand oneself and to
develop a sense of self-identity)
TOPIC 10
ASSESSMENT IN ESL
W 223
These new perspectives on intelligence form a challenge to teachers in

testing students abilities as they are endowed with multiple intelligence,
each of which varies according to students. We now need to be able to test
interpersonal, creative, communicative, interactive skills and in doing so we
have to trust our own subjectivity and intuition (Brown, 2001: 404).
(b)
Performance-based Testing
In educational settings around the world, testing has taken on a new
Agenda. Instead of just the traditional paper-and-pencil single-answer tests,
performance-based tests are being introduced in schools. They involve:
(i)
Open-ended problems
(ii)
Labs
(iii) Hands-on projects

(iv) Student portfolios
(v)
Experiments
(vi) Essay writing

(vii) Group projects
Although such testing is time-consuming and thus expensive, the losses in
practicality are compensated by the higher validity. Students are tested
based on actual performance. Learners are gauged on the process of
performing the criterion and this establishes high content validity. In
language teaching, the teacher needs to create a balance between formal
and informal testing. In more formative evaluation of students
performance of various tasks, the teacher can move towards meeting the
goals of performance-based testing.
(c)
Interactive Language Tests

Interactive tests come under such performance-based testing. Interactive
tests are constructed in line with Gardner s and Strenbergs theories of
intelligence in which students are assessed in the process of interacting with
others (Brown, 2001). Tests thus have to involve people actually performing
the behaviour that we want to measure.
(d)
Traditional and Alternative Assessments

These tests imply a move towards alternative ways of testing in that more
authentic elicitation of meaningful communication is emphasised. The
following Table 10.3 is a comparison between traditional and alternative
assessments (Brown, 2001: 408):
224 X
TOPIC 10
ASSESSMENT IN ESL
Table 10.3: A Comparison between Traditional and Alternative Assessment

Traditional Assessment
Alternative Assessment
one short-term standardised exams
continuous long-term assessment
timed multiple-choice format
untimed, free response format
decontextualised items
contextualised communicative tasks
scores suffice for feedback
formative, interactive feedback
norm-referenced scores
criterion-referenced scores
focus on the right answer
open-ended, creative answers
summative
formative
oriented to product
oriented to process
non-interactive performance
interactive performance
fosters extrinsic motivation
fosters intrinsic motivation
Brown (2001) proposes that traditional testing offers significantly higher levels of
practicality. More time and budget are needed to conduct and evaluate
assessments that require subjective evaluation, individualisation, and interaction
in feedback found in alternative assessments. However, the payoff comes in the
form of useful feedback to students, better chances for intrinsic motivation, and
eventually greater validity.
Brown (2001: 408409) suggests the following four principles for converting
ordinary, traditional tests into authentic, intrinsically motivating learning
opportunities for learners:
(a)
Test Taking Strategies

Teachers can help learners with appropriate and useful strategies for test
taking. To ensure students are prepared to do their best in tests, teachers
should consider the following as a guide:
(i)
Before the Test

Provide the students with the necessary information about the

test.
What will the test cover? Which topics will be the most important?
What kind of items will be included? How long will the test be?
Encourage students to do a systematic review of material, e.g.,

skim the book, outline the major points, write down examples, etc.
TOPIC 10
(ii)
ASSESSMENT IN ESL
Give students practice tests or exercises, if available.
Facilitate the formation of study groups, if possible.
Remind students to get a good nights rest before the test.
Remind students to get to the classroom early.
W 225
During the Test

As soon as the test is distributed, instruct students to quickly skim

through the whole test to get a good grasp of the different parts.
Advise students to concentrate as carefully as possible.
Alert students a few minutes before the test ends so that they can
proofread their answers, catch careless errors, and still finish on
time.
(iii) After the Test
(b)
When the test is given back to students, give feedback on specific

things the students did well on, what s/he did not do well on, and
the possible reasons for such a judgment.
Advise students to concentrate on the feedback you give in class.
Encourage questions from students.
Remind students to focus for the future on points that they are
weak on.
Face Validity
Sometimes students are not fully aware of what is being tested. They may
feel that the test is not testing what it is supposed to test. Face validity
means that the students must perceive the test to be valid. To help foster
this perception, the teacher can:
(i)
Prepare a carefully constructed, well-thought-out format.
(ii)
Develop a test that could be completed within the time given.
(iii) Write items that are clear and uncomplicated.

(iv) Include directions that are crystal clear.
(v)
Give tasks that are familiar and related to their coursework.
(vi) Ensure the difficulty level of the test is appropriate for the students.
226 X
TOPIC 10
ASSESSMENT IN ESL
(c)
Authenticity
Teacher needs to ensure that the language in the test is as authentic as
possible. Provide the language with context so that items are not a string of
unrelated language samples, e.g., thematic organisation of items. The tasks
must also be in the format with which students are familiar. A classroom
test is not the time to introduce new tasks because we will not know
whether student difficulty is a factor of the task or the language we are
testing.
(d)
Washback
The benefit that tests offer is known as washback. Formal tests must be
learning devices through which students can receive feedback and
diagnosis of their strengths and weaknesses. It is important for teachers to
give prompt feedback to foster intrinsic motivation. The t eacher needs to
give a generous number of specific comments as feedback, instead of only
limiting it to a letter grade or number score. Give praise for strengths and
constructive criticism for weaknesses. Washback also means that the
teacher is available for discussion with students to go over their strengths
and limitations. The following section deals with the issue of washback in
more detail.
10.5
EFFECTS OF TESTING
Tests have a strong influence on the curriculum because they determine the
future opportunities of individuals and influence the reputation of teachers and
schools (McNamara, 2000: 73). We cannot agree more on this because, in
Malaysia, standardised examinations play a pivotal role in the opportunities
available to students. We need to know some of the effects of testing on our
teaching and on students learning in order to create a healthy balance between
practicality and the ideals in education.
Washback or the effect of tests on teaching and learning is a constantly debated
issue in education. Ethical language testing practice should work to ensure a
positive washback from tests. For example, it is often argued that performance
assessments have better washback than multiple choice (MCQ) test formats or
cloze. Performance assessments require the integration of knowledge and skills,
therefore they require preparation which encourages both students and teachers
to invest time in practising such tasks in the classroom. In contrast, MCQs often
test knowledge of grammar or vocabulary which may inhibit communicative
approaches to learning and teaching.
TOPIC 10
ASSESSMENT IN ESL
W 227
However, research on assessments and washback found that washback is often

unpredictable (McNamara, 2000:74). McNamara further adds that whether or not
the desired effect is achieved in efforts on curriculum change depends much on
local conditions in the classroom, the established traditions of teaching, the
immediate motivation of learners, and the unpredictable ways in which
interactions develop.
According to Bachman and Palmer (1997:3034), washback is discussed in testing
literature largely as the direct impact of testing on individuals. Hughes (1989)
defines washback as the direct effects of testing on teaching and learning and
points out that testing can either have a beneficial or a harmful effect on teaching
or learning. Cohen (1994 in Bachman and Palmer, 1997:30), discusses the effects
of washback as the effect of assessment instruments on educational practice and
beliefs. Research has also shown that washback is a complex and difficult issue,
rather than simply the effect of testing on teaching (Wall and Alderson (1993).
Some of the impact discussed by Bachman and Palmer (1997:3035) are as
follows:
(a)
Impact on Individuals
Stakeholders who are directly affected by tests are the test-takers, the test
users or decision-makers. Others such as test-takers, future class-mates, or
co-workers and future employers are indirectly affected. Every member of a
particular system (society) is indirectly affected by the use of tests.
(b)
Impact on Test Takers

The following aspects of a testing procedure can affect test takers:
(i)
The experience of taking and preparing for the test;
(ii)
The feedback received on their performance on the test; and
(iii) The decisions made about them based on the test scores.
Figure 10.1: Tests can affect test takers
228 X
TOPIC 10
ASSESSMENT IN ESL
The experience of taking a test can have an impact on the test-taker

(Figure 10.1). If the topical or cultural information in the test is new, this
affects the test- taker s performance. The test-takers language knowledge
can also be affected especially when something provided in the test is
considered grammatically correct, which in fact, is not. In addition, the
types of feedback given to test-taker are likely to affect the test-taker
directly. Feedback thus has to be highly relevant, complete, and meaningful
to the test takers. Verbal descriptions and written comments can have a
positive response on test-takers in terms of perceptions of the test.
Finally, decisions made based on the test-takers test scores directly affect
the test-takers, for example, acceptance into an instructional program,
advancement, or employment are decisions that can have serious
consequences for test-takers.
Are decision procedures and criteria applied uniformly to all groups of testtakers? Fair test use, according to Bachman and Palmer (1997), is related to
the relevance and appropriateness of the test score to the decision. Is it fair
to make a life-affecting decision solely on the basis of a test score?
(c)
Impact on Teachers
Test-users such as teachers are also directly affected by tests. A majority of
teachers are familiar with the amount of influence testing can have on their
instruction. Teachers may prefer to teach in a particular way, but if they
have to do it in a specified way, then they may find teaching to the test
unavoidable. Bachman and Palmer (1997:33) describe term teaching to the
test as doing something in teaching that may not be compatible with
teacher s own values and goals, or with the values and goals of the
instructional program. If teachers feel that what they teach is not relevant
to the test (vice-versa), this is seen as authenticity whereby the test may
have harmful washback or negative impacts on instruction.
(d)
Impact on Society and Education Systems

Test-developers and users need to consider the societal and educational
value systems that inform test use. However, values and goals that inform
test use may vary cross-culturally. For instance, a particular culture may
place a high value on individual effort and achievement, while in another,
cooperation and respect for authority may be highly valued. In addition,
values and goals change over time.
TOPIC 10
ASSESSMENT IN ESL
W 229
Confidentiality and access to information is today regarded as a basic right

for test-takers which were once not considered at all. In high-stakes tests
involving decision-making on a large group of individuals (e.g.,
standardised tests), tests have a direct impact on teaching practices and
programs.
The following is a list of things that we can do to organise our assessment
of potential consequences of tests (Bachman and Palmer 1997: 35):
(i)
List the intended uses of the test;
(ii)
List the potential consequences (positive and negative) of using the

test in these ways;
(iii) Rank the possible outcomes in terms of desirability or undesirability

of their occurring;
(iv) Gather information to determine how likely each of the various
outcomes is.
Analyses of the possible consequences of using a particular test needs to be
complemented by considering too the consequences of using alternatives to
testing to achieve the same purpose.
ACTIVITY 10.3
1.
Think of tests with which you are familiar (e.g., a

classroom test). What do you think is the backwash effect of
each of them? Are they harmful or beneficial? What are your
reasons for your conclusions?
2.
Study a classroom test and describe the test in terms of the

purpose of the test, its validity and reliability, and the potential
backwash effects. Do you think the test provides accurate
information about the students?
230 X
TOPIC 10
ASSESSMENT IN ESL
This topic provided a brief overview of some basic concepts of language

testing and included concepts such as validity, reliability, and practicality.
The different types of test format such as direct versus indirect tests, discrete
versus integrative tests, norm-referenced versus criterion- referenced tests,
and so forth were also described.
The chapter included a discussion of the two common language testing

formats: multiple choice and cloze.
A brief discussion was also included on the topic of alternative assessment.
Finally, the effects of testing on students and society at large were described.
Cloze tests
Face validity
Construct validity
Indirect test
Content validity
Performance based testing
Criterion-related validity
Practicality
Direct test
Reliability

Topic 10 Assessment in ESL

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Topic 10 Assessment in ESL

Загружено:

Авторское право:

Доступные форматы

Topic

Explain the basic concepts and constructs of language testing;

Identify the different types of tests and test formats;

Determine the reasons behind language testing;

Identify the characteristics of a good test;

Examine the different types of alternative assessments;

Determine the effects of testing.

Topic 10 focuses on basic concepts and constructs in language assessment,

The nature of testing and evaluating ESL

Validity, reliability and practicality

Types of language tests

TESTING AND EVALUATING ESL

Language tests play a significant role in many peoples lives. Language

As teachers, we need to be aware of what is involved in testing teaching

In action research, information about students proficiency is required. You

Table 10.1: Terms and Definitions

Systematic gathering of information for the purpose of making

Procedure used to gain information about student learning and to

A procedure for obtaining information on students performance.

Help to create positive attitudes toward the class; and

Assist students in mastering the language.

In constructing tests, three fundamental factors which need to be considered

Predictive validity concerns the degree to which a test can predict a

A summary on types of validity is given in Table 10.2.

Looks like a good one to learner/layperson

Accurately reflects the syllabus it is based on

Accurately predicts future performance

Gives similar results to already validated tests or other immediate

Reflects closely a valid theory of foreign language learning that it

Distinguish the key difference between content and construct

If content validity is absent, why does construct validity assume

Explain the fact that there is no final, absolute, and objective

Take enough samples of behaviour (more items)

Write clear unambiguous items.

Provide clear and explicit instructions.

Test-takers should be familiar with the format and test techniques.

Provide uniform and non-distracting conditions of test administration.

Use items that allow for objective scoring as far as possible.

Make comparisons between test-takers as directly as possible.

Provide a detailed scoring key.

Agree on what constitutes acceptable responses and appropriate scores at

Identify candidates by number, not name to avoid bias.

(m) Employ multiple, independent scoring.

TYPES OF LANGUAGE TESTS

Nunan (1991) lists the following types of tests:

Direct Tests Versus Indirect Tests

Discrete Point Versus Integrative Tests

Norm-Referenced Tests Versus Criterion-Referenced Tests

Objective Tests Versus Subjective Tests

Communicative Language Testing

Are the students sufficiently familiar with the format?

The following is a discussion of the more common language tests, namely

Multiple-choice Questions (MCQ)

The Candidate Has to Identify the Correct or Most Appropriate Option

Clozentropy or Cloze Procedure

Immediately after each blank;

(iii) Below the text, in numbered groups; or

It is easily constructed, administered, and scored.

It has a high degree of reliability because it is an objective test.

(iii) Cloze is not appropriate for diagnostic purposes.

What types of test format can be used in a reading comprehension test?

Recent developments in testing provide teachers with new perspectives on

New Perspectives on Intelligence

(iii) Spatial intelligence (the ability to find your way around an