Академический Документы
Профессиональный Документы
Культура Документы
10
Assessment
in ESL
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1.
2.
3.
4.
5.
6.
INTRODUCTION
Have you ever wondered why there is a necessity for tests in learning? As
teachers, we should be clear about the purposes of testing. Tests can have a
positive or negative effect on student learning. In many cases, tests still conjure
up negative associations and reactions. However, the nature of tests have
changed radically in more recent years and has become less of an imposition,
more humanistic and seen as a neutral assessment of what people do
(McNamara, 2000:4). Newer forms of language assessment do not involve the
ordeal of a single test performance under time constraints. Tests serve the many
needs of teaching. It is thus important for teachers to understand the principles
and practice of language assessment.
212 X
TOPIC 10
ASSESSMENT IN ESL
(b)
(c)
(d)
Test format
(e)
Alternative assessment
(f)
Effects of testing
10.1
What means of assessment in Second Language Learning are you familiar with?
Why do teachers need to have a critical understanding of the principles and
practice of language assessment? Some reasons are:
(a)
(b)
(c)
Before we discuss tests and the nature of tests, let us first clarify the terms
evaluation, assessment and tests. Table 10.1 provides the definition of each term.
TOPIC 10
ASSESSMENT IN ESL
W 213
Definition
Evaluation
Assessment
Test
Testing and teaching are interrelated and interdependent. However, the focus is
not necessarily similar. Tests focus on the assessment of the products of learning.
On the other hand, teaching emphasises on enabling students to succeed in the
learning process (Chitravelu et al. 1995). A test is a yardstick a teacher uses to
measure a learners performance (Baker, 1989). Most tests are administered under
examination conditions (e.g., formal, standardised tests and school
examinations); whereas others are conducted as an integral part of the teaching
and learning processes. Formal tests are systematic, planned sampling techniques
designed to assist teachers and students in appraising students achievement.
Informal tests are unplanned assessments made as a course is conducted.
Testing can help students in many ways. Madsen (1983) points out that wellconstructed tests:
(a)
(b)
Testing helps teachers too, as they are expected to be accountable for the results
of their instruction. Tests help teachers in answering the question, Have I been
effective in my teaching? Thus, tests can act as a check for teachers in
diagnosing teachers own efforts as well as the students. As teachers examine the
students tests, s/he might ask a series of questions, Are my lessons pitched at
the right level? Am I aiming my instruction too high or low? Am I teaching some
skills effectively and others less effectively? What areas need more practice?
Which points need reviewing? (Madsen, 1983: 5). Tests therefore can benefit
students, teachers, and administrators through confirmation of progress. Good
tests can sustain or boost class morale and encourage learning.
For detailed information on testing and evaluation in ESL, visit the following site
http://www.middleweb.com/Assmntlinks.html
214 X
TOPIC 10
ASSESSMENT IN ESL
Validity
(b)
Reliability
(c)
Practicality
10.1.1
Validity
Validity is defined as the degree to which the test actually measures what it is
supposed to measure (Brown, 2001:387). For instance, if the objective of a test is
to measure a students ability to communicate in English, the test is valid when it
actually measures the students ability to communicate. If it only measures the
students vocabulary, then it is not a valid test. There are four basic types of
validity:
(a)
Face Validity
A test is said to have face validity if it looks like a test that measures a
particular skill, for instance, pronunciation ability. Brown (2001) defines
face validity in the form of the question, Does the test, on the surface of it,
appear to test what it is designed to test?. Face validity is seen in light of
content. If the test samples the actual content of what learners have
achieved or are expected to achieve, only then can face validity be
perceived.
(b)
Content Validity
Content validity is a combination of experts opinion on whether a test is
valid. A test has content validity if its content constitutes a representative
sample of the language skills with which it is meant to be concerned. The
greater the tests content validity, the more likely it is to be an accurate
measure of what it is supposed to measure. Unfortunately, too often the
content of tests is determined by what is easy to test rather than what is
important to test.
(c)
Criterion-Related Validity
Criterion-related validity is the extent to which the results of the test agree
with those provided by some independent and highly dependable
assessment of the candidates ability. There are mainly two types of
criterion-related validity:
(i)
Concurrent validity established when the test and the criterion are
administered at about the same time.
TOPIC 10
(ii)
(d)
ASSESSMENT IN ESL
W 215
Construct Validity
A test is construct-valid if it can be shown that it measures just the ability
which it is supposed to measure. Construct refers to any underlying
ability which is hypothesised in the theory of language ability. If we try to
measure the ability to infer the meaning of unknown words from context in
a test, then that part of the test is construct-valid only if we are able to
demonstrate that we are indeed measuring just that ability.
Visit this site for information of construct validity by James Dean Brown:
http://www.jalt.org/test/bro_8.htm
Test
Face
Content
Predictive
Concurrent
Construct
ACTIVITY 10.1
1.
2.
3.
216 X
10.1.2
TOPIC 10
ASSESSMENT IN ESL
Reliability
Validity deals with the issue of whether a test measures what the tester wants to
know about the performance of her/his students. Reliability is concerned with
the precision with which the test measures (Chitravelu et al., 1995). A reliable test
must yield the same results on the same sample of students whenever the test is
administered. Reliability is concerned with consistency. There are basically two
types of reliability for language test: test-retest and inter-item reliability.
Test-retest reliability is the extent to which the test achieves the same results
consistently time after time. For example, if a test is given to a group of
students, its results should be technically the same if given to the same group of
students at another time.
Inter-item reliability is the extent to which all items on the test are measuring the
same thing. The reliability of a test can be gauged in terms of a reliability
coefficient which allows the comparison of the reliability of different tests. A
reliability of 1.0 is one which would give precisely the same results for a group of
test-takers regardless of when the test is taken. Well-constructed vocabulary,
structure, and reading tests are usually in the 0.90 to 0.99 range. Auditory
comprehension tests are more often in the range of 0.80 to 0.89. Oral production
tests can be in the 0.70 to 0.79 range. Teacher-constructed or classroom tests often
have a more moderate range, especially if it is constructed the first time.
How can teachers construct reliable language tests? The following are some
guidelines suggested by Brown (1994):
(a)
(b)
(c)
(d)
Ensure that tests are well laid out and perfectly legible.
(e)
(f)
(g)
TOPIC 10
ASSESSMENT IN ESL
W 217
(h)
(i)
(j)
Train scorers.
(k)
(l)
10.1.3
Practicality
Some tests are ideal theoretically but are difficult in practice. For example, mass
oral testing is difficult. Oral proficiency testing is important in the learning
process. However, it is seldom part of national or school level examinations.
Mass oral proficiency is expensive to conduct, time-consuming, and reliability is
often low because of inter-rater variability. Thus, efficiency or practicality of a
test involves issues of economy, ease of administration, scoring, and
interpretation of results. The longer it takes to construct, administer, and score,
the more skilled personnel and equipment required, the higher the costs are
likely to be.
SELF-CHECK 10.1
In language testing, which test seems easy at a glance but challenging
to answer?
218 X
10.2
TOPIC 10
ASSESSMENT IN ESL
(b)
(c)
(d)
(e)
TOPIC 10
10.3
ASSESSMENT IN ESL
W 219
TEST FORMATS
There are various test formats or sometimes called test techniques used in
assessing language ability. Through test formats we obtain information about the
candidates language skill or ability. Some formats are suitable to test certain
language skills or abilities. Formats do not, in general, determine what can be
tested. For instance, we can use a multiple-choice format to test grammar,
reading, vocabulary, and for diagnostic testing, etc.
In choosing a particular format, we need to consider the following questions:
(a)
Does the test format allow us to obtain the information we need about the
students ability in the skill we are testing? Will the results we obtain be
valid and reliable?
(b)
Is using the format the most economical way to obtain the information we
want? Will it have good backwash effects? Backwash effect is the effect
testing has on teaching and learning.
(c)
during
(ii)
for
(iii) while
(iv) since
220 X
TOPIC 10
ASSESSMENT IN ESL
TOPIC 10
ASSESSMENT IN ESL
W 221
The cloze test is used to test grammar, vocabulary, reading and writing
skills. The kinds of blank chosen and the kinds of options given determine
what the cloze test tests.
Constructing the Cloze Test
In choosing texts for the cloze, the teacher must ensure that the language is
appropriate to the students level. In general, a text passage that is too
factual do not lend itself to good cloze testing. The best types for a cloze are
stories, descriptions of process, and explanations of something.
In addition, students should be given clear directions how to sit for the
cloze test and how to respond to the test. Teachers can instruct students to
read through the whole text before attempting to fill in the blanks.
Another guideline in constructing the cloze is that a sentence or two must
be left intact at the beginning of the text. These are called the lead-in. In
writing the options (if alternatives are provided), ensure that the options
are placed as follows:
(i)
(ii)
In the margin;
(ii)
222 X
TOPIC 10
ASSESSMENT IN ESL
ACTIVITY 10.2
Draw a mind-map on testing and evaluating ESL. Think of other
advantages and limitations of using each of the test procedures
described in the Topic.
10.4
ALTERNATIVE ASSESSMENT
Linguistic intelligence
(ii)
Logical-mathematical intelligence
Bodily-kinesthetic
prowess)
intelligence
(fine
motor
movement,
athletic
TOPIC 10
ASSESSMENT IN ESL
W 223
Performance-based Testing
In educational settings around the world, testing has taken on a new
Agenda. Instead of just the traditional paper-and-pencil single-answer tests,
performance-based tests are being introduced in schools. They involve:
(i)
Open-ended problems
(ii)
Labs
Experiments
(d)
224 X
TOPIC 10
ASSESSMENT IN ESL
Alternative Assessment
decontextualised items
norm-referenced scores
criterion-referenced scores
summative
formative
oriented to product
oriented to process
non-interactive performance
interactive performance
Brown (2001) proposes that traditional testing offers significantly higher levels of
practicality. More time and budget are needed to conduct and evaluate
assessments that require subjective evaluation, individualisation, and interaction
in feedback found in alternative assessments. However, the payoff comes in the
form of useful feedback to students, better chances for intrinsic motivation, and
eventually greater validity.
Brown (2001: 408409) suggests the following four principles for converting
ordinary, traditional tests into authentic, intrinsically motivating learning
opportunities for learners:
(a)
TOPIC 10
(ii)
ASSESSMENT IN ESL
W 225
Alert students a few minutes before the test ends so that they can
proofread their answers, catch careless errors, and still finish on
time.
(iii) After the Test
(b)
Remind students to focus for the future on points that they are
weak on.
Face Validity
Sometimes students are not fully aware of what is being tested. They may
feel that the test is not testing what it is supposed to test. Face validity
means that the students must perceive the test to be valid. To help foster
this perception, the teacher can:
(i)
(ii)
(vi) Ensure the difficulty level of the test is appropriate for the students.
226 X
TOPIC 10
ASSESSMENT IN ESL
(c)
Authenticity
Teacher needs to ensure that the language in the test is as authentic as
possible. Provide the language with context so that items are not a string of
unrelated language samples, e.g., thematic organisation of items. The tasks
must also be in the format with which students are familiar. A classroom
test is not the time to introduce new tasks because we will not know
whether student difficulty is a factor of the task or the language we are
testing.
(d)
Washback
The benefit that tests offer is known as washback. Formal tests must be
learning devices through which students can receive feedback and
diagnosis of their strengths and weaknesses. It is important for teachers to
give prompt feedback to foster intrinsic motivation. The t eacher needs to
give a generous number of specific comments as feedback, instead of only
limiting it to a letter grade or number score. Give praise for strengths and
constructive criticism for weaknesses. Washback also means that the
teacher is available for discussion with students to go over their strengths
and limitations. The following section deals with the issue of washback in
more detail.
10.5
EFFECTS OF TESTING
Tests have a strong influence on the curriculum because they determine the
future opportunities of individuals and influence the reputation of teachers and
schools (McNamara, 2000: 73). We cannot agree more on this because, in
Malaysia, standardised examinations play a pivotal role in the opportunities
available to students. We need to know some of the effects of testing on our
teaching and on students learning in order to create a healthy balance between
practicality and the ideals in education.
Washback or the effect of tests on teaching and learning is a constantly debated
issue in education. Ethical language testing practice should work to ensure a
positive washback from tests. For example, it is often argued that performance
assessments have better washback than multiple choice (MCQ) test formats or
cloze. Performance assessments require the integration of knowledge and skills,
therefore they require preparation which encourages both students and teachers
to invest time in practising such tasks in the classroom. In contrast, MCQs often
test knowledge of grammar or vocabulary which may inhibit communicative
approaches to learning and teaching.
TOPIC 10
ASSESSMENT IN ESL
W 227
Impact on Individuals
Stakeholders who are directly affected by tests are the test-takers, the test
users or decision-makers. Others such as test-takers, future class-mates, or
co-workers and future employers are indirectly affected. Every member of a
particular system (society) is indirectly affected by the use of tests.
(b)
(ii)
(iii) The decisions made about them based on the test scores.
228 X
TOPIC 10
ASSESSMENT IN ESL
Impact on Teachers
Test-users such as teachers are also directly affected by tests. A majority of
teachers are familiar with the amount of influence testing can have on their
instruction. Teachers may prefer to teach in a particular way, but if they
have to do it in a specified way, then they may find teaching to the test
unavoidable. Bachman and Palmer (1997:33) describe term teaching to the
test as doing something in teaching that may not be compatible with
teacher s own values and goals, or with the values and goals of the
instructional program. If teachers feel that what they teach is not relevant
to the test (vice-versa), this is seen as authenticity whereby the test may
have harmful washback or negative impacts on instruction.
(d)
TOPIC 10
ASSESSMENT IN ESL
W 229
(ii)
ACTIVITY 10.3
1.
2.
230 X
TOPIC 10
ASSESSMENT IN ESL
The different types of test format such as direct versus indirect tests, discrete
versus integrative tests, norm-referenced versus criterion- referenced tests,
and so forth were also described.
Finally, the effects of testing on students and society at large were described.
Cloze tests
Face validity
Construct validity
Indirect test
Content validity
Criterion-related validity
Practicality
Direct test
Reliability