Вы находитесь на странице: 1из 4

Book review

Portal, M. editor, 1986: Innovations in language testing.

Windsor: NFER-Nelson. x = 108 pp. ISBN 0-7005- 1035-4.
This collection of papers represent the Proceedings of the Ninth Inter-
national Interuniversitare Sprachtestgruppe (IUS) language testing sympo-
sium held in April 1985, hosted by the National Foundation for Educational
Research (NFER) in Britain. The aim of this volume is 'to tackle directly
some of the most challenging and intransigent of the issues common to many
different areas of testing' (p. vii). There are seven contributions, of which
five are from the field of EFLIESP, with one each from the areas of foreign
language testing (French) and English as a mother tongue.
The first paper is by Keith Morrow who writes on 'The evaluation of tests
of communicative performance'. In his first section, which is descriptive, he
provides a clear overview and details of the test specification for the Com-
municative Use of English as a Foreign Language (CUEFL), an examination
for the Royal Society of Arts. In the second part, which takes up the title of
his paper, he questions, within the context of public examinations, the long
established and accepted criteria for test validity. He adopts a somewhat
extreme position in arguing for another kind of test validity, stressing the
importance of 'washback validity' (p. 6), especially since he admits, 'I am
unsure at this stage how it could be measured' (p. 6). In any case, the
relevance of washback within the context of communicative teaching and
testing is not a new issue, as this was highlighted by Swain (1982). Swain
argued for four general principles highly relevant to large scale communica-
tive language testing, one of which was 'work for washback' (1985: 36). Few
would question its importance, but the difference between the two positions
is that Morrow elevates the notion of washback to the key validity criterion
'the first validity criterion': p. 6) whereas, for Swain (1982), with whom I
would agree, washback remains at the level of principles. As Morrow
admits, there is a problem in establishing this type of 'validity', but he does
make the interesting suggestion that language testing researchers should
become involved in what goes on in the classroom, implying a reduction in
the distance between language testing on the one hand and language teaching
on the other.
Morrow's comments on concurrent validity are less than clear. It would
240 Book review
appear that he would rather dispose of concurrent validity, as irrelevant to
examination boards and the practical consumer, but his argument is not per-
suasive. On the contrary, in the light of his earlier suggestion for involving
testers in the observation of the teaching and learning process, the need for
test results to be acceptable to test candidates' tutors is, I would argue, of key
concern for a variety of reasons. For example, for a test such as the CUEFL,
which is relatively new and trying to establish itself as a reputable measure of
communicative language abilities, acceptability by teachers and students is
central to its survival. Establishing its concurrent validity via teachers'
judgements about their students' performances would provide an excellent
means towards gaining this credibility and acceptability. It would also have
the benefit of creating opportunities for dialogue between teachers, test
takers and examination boards, a feature nonexistent in most formal testing
situations. In this way, the principle of 'washback' not only remains intact,
but also operational and innovative as far as testing practices are concerned.
Throughout, Morrow questions the statistical validation of tests, which is
healthy in some respects. Because the CUEFL is radically different from
tests designed in the structuralist/psychometric tradition, in terms of both
the 'view of language' and 'testing', he cannot see the place for conventional
statistical analyses within the context of a test which, he claims, is criterion-
referenced. The issue, however, is that in design the CUEFL may be
criterion-referenced but in terms of its annual administration it is not.
Candidates should be judged fairly against those who have sat an earlier
CUEFL examination. To this extent, you also have a norm-referenced situa-
tion, hence aspects of norm-referencing are relevant. As a consequence,
there would seem to be a place for both conventional 'quantitative' reports
on students and test performance and other more 'qualitative' analyses of
test performance. There is a growing awareness of the need for different pro-
cedures for analysing the goodness of fit of tests that include both types of
analyses; Morrow is indeed right in drawing attention to the relevance and
importance of this issue and in widening the scope of test validation to
include procedures which require cooperation between testers, teachers and
students. However, it is not the either/or situation that Morrow depicts.
The aim of Harrison's paper, 'Assessing text in action', is to 'lay down
some tentative rules for the assessment of learners' text by analysis of how it
is constructed' (p. 14). The questions he asks and attempts to answer are
interesting ones and will be familiar to anyone involved in the assessment of
students' oral production, i.e., how is language chunked for assessment
purposes? He critically demonstrates the difficulties encountered when
dividing up student language into meaningful units for assessment purposes,
analysing factors such as size and function of utterances, turns and other
categories of language use. His discussion is informative and very well illus-
trated with a range of good examples of different approaches for the com-
municative assessment of spoken text, covering both foreign language and
L1 settings. He sets up guidelines for the analysis of a student's available
language for communicative use and takes on board the inevitable diffi-
culties associated with the need to control student language for objective
Book review 241
marking purposes on the one hand, whilst creating opportunities for the
expression of personal meanings on the other.
In his paper 'A pragmatic approach to criterion-referenced foreign
language testing', Hughes presents an account of his work on the develop-
ment of a criterion-referenced proficiency test (in the areas of listening,
writing and reading) for undergraduate students at Bogazici University, an
English medium university in Istanbul, Turkey. He begins by outlining the
basis of criterion-referenced testing and, like Morrow, emphasises the
positive washback benefits of this approach to student assessment. Simi-
larly, in the same way as Portal (also in this volume) he draws attention to
face validity and the importance of taking the practical consumers of tests
into account. He presents a clear description of the stages in the development
of the university proficiency test, and gives an example of a listening task. He
takes up two aspects of testing for further consideration, by examining the
relevance of a needs analysis approach to test design and the nature of the
padf ai l criterion on a test of the type he describes. He confirms the benefits
that can be obtained from a criterion-referenced approach to testing
evidenced by, in his context, an observed and immediate change in the
language syllabus and materials for the development of students' English
language skills, t o bring the teaching situation in line with the testing process.
It is a pity that the references cited in this contribution are not given in the
bibliography for the volume as a whole.
The paper by Alan Davies: 'Indirect ESP testing: old innovations', is to
some extent cautionary in tone. He usefully questions assumptions under-
lying the need t o test language communicatively, specifically addressing
issues of content validity and what is meant by 'a representative sample of
the language behaviour t o be measured'. He draws on findings from small
scale experiments conducted in the areas of medical English and EAP. In the
course of his paper, he examines the psychologist's position (Anastasi, 1961)
where tests are 'a thorough analysis of the language behaviour it is intended
to predict, not a sample performance of that behaviour' (p. 59) and asks
whether communicative tests are tests at all or whether in fact they are merely
samples of language performance. With reference to the data he cites, he
raises issues such as the notion of directness (indirect or direct) for language
tests and the content of these measures, i.e., language skills or language
knowledge, and also identifies a weak and a strong claim in ESP testing. The
former view would be that 'all texts presented to testees must be samples of
materials in their subject/ professional fields', whereas the latter would hold
that the tasks themselves 'should also be genuine samples of what profes-
sionals in these areas actually do with such texts' (p. 60). What is of value in
this paper is that it questions some of the basic premises of communicative
language testing. In our quest for innovation, he asks whether we are being
theoretically rigorous and using tests appropriately. Whilst ESP testing,
context-based and needs analysis approaches to testing need not be
abandoned, he cautions that 'we should not be simplistic and naive about it'
(P. 66).
Two papers are devoted to work associated with the Assessment of Perfor-
242 Book review
mance Unit (APU) language surveys. The activities of the APU are con-
cerned with building up a national picture of performance of pupils at
different levels in different languages in the skill areas of listening, reading,
speaking and writing, within different age ranges in schools in England,
Wales and Northern Ireland. The main concern of both contributions is the
development of acceptable testing tasks, reflective of the communicative
approach to language testing and teaching, which are at the same time
reliable and praticable within a large-scale testing situation.
The paper by Matthew Portal is concerned with methods of testing
development in speaking in the APU French surveys. Of interest in the
design of these test materials are the constraints imposed by the nature of
large-scale testing on the one hand, and the theoretical demands for the test
base on the other. As far as the French Survey is concerned the measures are
expected to be: (1) communicative in character; (2) suitable for virtually the
whole ability range at the age of 13; and (3) congruent with current concerns
about face validity. At the same time these measures must be reliable. Thus,
Portal's paper covers a range of issues affecting the French speaking tests, in
particular the rationale for the selection of the speaking tasks, face validity
for both teachers and students, and the steps taken in the training of asses-
sors and the reliability of marking. A detailed description of the tasks them-
selves is provided, ranging from highly structured and therefore somewhat
artificial tasks to open-ended formats involving role play, and reporting and
descriptive tasks. Portal also outlines some of the difficulties that pupils
experience, especially the less able ones, demonstrating how the nature of the
different tasks can present varying complexities that affect an individual
student's performance.
Also within the context of the APU, Tom Gorman and Greg Brooks
describe assessments of oracy, i.e., speaking and listening, of English as a
mother tongue. They define the nature of 'talk', as relevant to their purpose
of constructing tests of oracy (p. lo), and because of the inadequacy of suit-
able frameworks, have developed their own for oracy testing. This represents
a departure from traditional approaches in that emphasis is placed on the
inter-relatedness of listening and speaking and the tasks reflect this view, in
that there is an attempt to link the two in a natural way in one activity. Their
work reflects a concern for direct tests that approximate real life com-
munication. They also provide examples of two tasks from the survey, a
descriptive task described as involving 'transmissive' talk, i.e., not requiring
interaction, and a far more interesting task involving a map which illustrates
a category of 'collaborative problem solving which was felt to be one of the
most important types that pupils should encounter and be given the oppor-
tunity to master' (p. 80). This is followed by a detailed analysis of the per-
formance of two pupils at the upper end of the range on the map task,
accompanied by the transcript of their performances.
These two accounts are highly informative. The contribution by Gorman
and Brooks is also innovative in terms of the rationale underpinning the
oracy tests and the types of test used for large-scale national monitoring
purposes. Of particular interest are the purposes of talk that they have
Book review 243
described and the range and variety of activity types, which should prove
influential when thinking about the content of oral tests.
Charles Alderson examines the nature of innovation in language tests and
language test techniques in his paper 'Innovations in language testing?'. He
first explores the nature of innovation by asking a number of very relevant
questions which innovators or those involved in innovations, of whatever
kind, should ask themselves. He considers, for example, whether innova-
tions are necessary or desirable and how this is determined; whether the
changes that are claimed to be innovations are actually new; and whether it
matters that the innovations are new, provided that they actually work. In
this informed way, he provides a very useful checklist of questions to be
asked by those involved in considerations of change. His second section
deals with four specific aspects of and directions for innovation: consumer-
oriented testing; student-centredness; innovations in technology; and the
role of tests as innovators. His analysis of the relevance of student-centred
approaches is particularly valuable, and he considers three ways in which
students, i.e., the test takers, may inform innovations in testing, through: (1)
consultation with testees for their views on test content and method; (2) self-
evaluation procedures; and (3) introspection - all areas which in fact are
receiving increased attention in the literature, at the level of both principles
and practice (see, e.g. Faerch and Kasper, 1987). Alderson also suggests
ways in which innovations in testing can alter teaching practices, (in
harmony with the views of Morrow and Hughes) and concludes that
'perhaps we should be looking for and fostering, not only innovations in
[language testing, but also and importantly, innovations] through language
testing' (p. 105).
Reviewing this volume four years after the date of the conference provides
an excellent opportunity to examine the present relevance of the issues
raised. I consider five of these below. Taking 'washback' first, an issue com-
mented on by several of the contributors, this characteristic appears very
much to have become an accepted function of language tests, at least as far as
the west and the USA are concerned. Secondly, in terms of analysing tests,
the move away from the traditional scientific paradigm of experimental pro-
cedures towards participatory and collaborative (academics/testers, teachers
and students) endeavours was predicted in a number of these papers. In
current practice this is also the case, with a growing literature and research
base that makes use of a variety of investigative techniques (in language
teaching and language testing), i.e., the use of a range of procedures both
quantitative and qualitative (e.g. Grotjahn, 1987). Next, and importantly,
this volume includes discussion of the integrative nature of language skills
and implications for testing practice, demonstrating through the work of
Gorman and Brooks, and to some extent Harrison, a move away from
testing language skills as isolates. Further, the need to assess language 'com-
municatively' has forced testers to introduce different criteria for the assess-
ment of 'communicative performance' and the concerns of Harrison's paper
are still very much current issues. Finally, innovations in technology
(Alderson), such as computer-based English language testing (CBELT) are
244 Book review
still providing plenty of 'challenge' for some testers, but have yet to gain
wide acceptance. More generally, however, there is an awareness of the con-
cept of 'innovation' and a discernible move towards 'tests as innovators'
(Alderson, p. 103).
Whilst superficial in some respects, a number of 'challenging and intran-
sigent . . , issues common to many different areas of testing', including the
place and role of innovations in language testing, are examined in this
collection of papers in a clear and lucid manner and pointers for future work
are to be found. Of particular value is the innovative approach of the APU
work on the assessment of oracy. However, it is a pity that links between the
contributions have not been made, through additional comments either by
the editors or via a report of the discussion of participants at the conference.
Not only would these have brought together the ideas raised by each of the
papers, but the relevance of these points for the other papers could have been
examined in more depth. (For an example of a highly informative format for
reporting proceedings, see Alderson and Hughes, 198 1). In summary, this
slim volume (108 pages) covers a range of concerns in language testing, con-
taining issues of interest and relevance to both language testers and post-
graduate students.
Alderson, J.C. and Hughes, A., editors, 1981: Issues in language testing.
ELT Documents 1 1 1. London: The British Council.
Anastasi, A. 1961 : Psychological testing, second edition. New York:
Faerch, C. and Kasper, G. , editors, 1987: Introspection in second language
research. Multilingual Matters Ltd.
Feldmann, U. and Stemmer, B. 1987: T h i ~ aloud a retro-
spective da- in C - t e taking: diffe
languages - diff- learners - sa- approaches? In Faerch, C.
and Kasper, G., editors, 1987.
Grotjahn, R. 1987: On the methodological basis of introspective measures.
In Faerch, C. and Kasper, G., editors, 1987.
Swain, M. 1985: Large-scale communicative language testing: a case study.
In Lee, Y.P., Fok, A.C., Lord,X. and Low, G., editors, New direc-
tions in language testing, Oxford: Pergamon.
Ealing College of Higher Education Pauline Rea Dickins