Вы находитесь на странице: 1из 10

086784_LTJ_271-286.

qxd

26/2/08

4:27 PM

Page 273

Language Testing 2008 25 (2) 273288

Book reviews

Brown, H. Douglas (2004). Language assessment: Principles


and classroom practices. White Plains, NY: Pearson
Education. 324 pp. $48.00 paper. ISBN 013 098834 0.
Brown, James Dean (2005). Testing in language programs:
A comprehensive guide to English language assessment.
New York: McGraw-Hill. 307 pp. $32.85 paper. ISBN
007294836 1.

Two new textbooks for introductory courses on language testing differentially clarify basic principles of language assessment for relatively
similar audiences. Both H. Douglas Browns Language Assessment:
Principles and Classroom Practices and James Dean Browns Testing
in Language Programs: A Comprehensive Guide to English Language
Assessment give excellent and concise summaries of the processes
involved in test development, but they differ in that H. D. Browns
book views testing from the standpoint of the teacher in the language
classroom, providing, in his own words, a clear and reader-friendly
(p. ix) overview of the foundations of second language (L2) testing
and specifically avoiding instruction on statistics related to testing
which could be viewed as allowing the course instructor more freedom to teach test statistics in the manner and with supplements he or
she chooses. In contrast, J. D. Browns book engages testing primarily from the viewpoint of the program administrator, giving a stepby-step overview of statistical applications and procedures for
testing that, in his own words, provide[s] a balance between the
technical and practical aspects of language testing that is neither too
complex nor too simplistic (p. x).
The contrast between the two books highlights the fact that many
introductory, graduate-level language testing classes often include two
different types of students, those from MA or undergraduate TESOL
programs who plan to teach and/or be program administrators and
2008 SAGE Publications (Los Angeles, London, New Delhi and Singapore)

DOI: 10.1177/0265532207086784

086784_LTJ_271-286.qxd

274

26/2/08

4:27 PM

Page 274

Book reviews

those from PhD programs in applied linguistics who wish to construct


L2 tests for research purposes and/or conduct research in the area of
language testing, and it is often difficult to find an introductory text
that bridges both sets of interests. We believe an instructor of an integrated language testing class may find adopting aspects of both books
to be beneficial. Some of the more general and basic discussions in
language testing described in the H. D. Brown book are very pertinent
for students new to testing. For example, many students do ask questions such as What is the difference between standardized testing and
standards? H. D. Browns book discusses these and other important
general topics clearly. Many of these same students may ask, Why do
we need to know how to calculate the Cronbach alpha coefficient? Do
classroom language teachers ever really use that? The answer right
now may be no, classroom teachers do not really use it or other test
statistics when evaluating classroom tests, but perhaps they should.
And thus J. D. Browns book stands out by clearly teaching basic test
statistics that we believe every educated language professional in the
field should know. Below, we will first review each book, focusing on
the structures and chapters of each volume, along with their unique
contributions. Then, we will discuss some of the books differing contents and how the two books may complement each other.
The aim of H. D. Browns Language Assessment: Principles and
Classroom Practices is to provide teachers with a practical overview
of the fundamental principles for evaluating and designing various
assessments. The intended audience, as mentioned above, is pre- or inservice teachers of ESL, either at the undergraduate or graduate level.
The text is also appropriate for teachers of second or foreign languages
other than English. The most important contribution this book offers
this audience is the directness and clarity of the definitions, which provides students with a solid foundation of basic terminologies. The
book has eleven chapters that can be grouped into three topic areas:
defining the field (the first five chapters), how to assess the language
skills of listening, speaking, reading, and writing (the next four chapters), and alternative and classic systems for grading (the last two
chapters). Students will enjoy the chapters with their short and easily
digestible sections, clear examples, and interesting situations to imagine. Each chapter goes in depth in defining terms related to the chapters topic and has detailed and creative exercises at the end that aptly
highlight the readings. In addition, each chapter concludes with a helpful and carefully selected annotated bibliography of two to three further readings these readings usually being seminal textbooks on the
chapters topic. This is helpful not only to the student readers, but also
to instructors who may be looking for more background.

086784_LTJ_271-286.qxd

26/2/08

4:27 PM

Page 275

Book reviews

275

A point of interest in chapter 1 is a section entitled New Views on


Intelligence, which reviews briefly work by psychologists Howard
Gardner, Robert Sternberg, and Daniel Goldman. This review leads
into a brief discussion on test developers responsibility to create
tests that tap into the interpersonal, creative, communicative, and
interactive subsets of L2 competence, and to that end, H. D. Brown
reviews alternative types of assessment, such as untimed, freeresponse format tests. The chapter ends by providing a brief overview of computer-based tests.
The second chapter reviews five principles of testing: practicality,
reliability, validity, authenticity, and washback. Included also are terms
subsumed under them such as rater reliability and content validity. At
the end of the chapter, 12 testing scenarios are presented in a chart, and
readers are asked to indicate, on a scale of 1 to 5, whether each scenario is fulfilling each of the five testing principles. (For example, on
p. 39, scenario 3 asks the reader to imagine the following for a student
(S): One-on-one oral interview to assess overall oral production ability. S receives one holistic score ranging between 0 and 5.) This is a
nice review of the five principles; however, the chapter lacks discussion on how reliability can actually be estimated, and reliability is not
discussed anywhere else in the book. It is questionable whether readers can fully appreciate and understand the meaning of test consistency
if they are not taught to envision how they can estimate reliability
through basic statistics. Here one must go to another source, such as
J. D. Browns book, to learn how to calculate reliability coefficients
(testretest, equivalent forms, internal consistency, or interrater reliability estimates), which any professional in the field of language testing should know. What H. D. Brown does do very well in this chapter
is familiarize novice teachers with strategies they can use in the classroom to make tests more user-friendly and less anxiety-filled.
It should be noted, however, that H. D. Brown and J. D. Brown
approach the notion of washback in different ways. Unlike most language testing professionals, H. D. Brown uses the term to refer primarily to various forms of feedback from tests in a classroom setting. In
contrast, J. D. Brown focuses on the kind (i.e. positive or negative)
and the degree (i.e. strong to weak) of washback and discusses factors
that may lead to differences in washback, such as test design, test content, and other logistics. Readers of H. D. Browns book may want to
also read J. D. Browns more expanded view of washback to get a fuller
picture of the term and its history in the field of language testing.
Chapter 3 reviews different types of tests (aptitude, proficiency,
placement, diagnostic, and achievement) and provides clear and
practical steps for test construction, including how to design test

086784_LTJ_271-286.qxd

276

26/2/08

4:27 PM

Page 276

Book reviews

specifications and how to create items based on them, with most


emphasis given to the multiple-choice format. An important feature
of this chapter includes a short discussion on to how to score, grade,
and give feedback.
Chapter 4 expertly reviews standardized testing and explains the
objectives and formats of some of the larger standardized tests in
ESL/EFL education. The goal of this chapter is to guide classroom
teachers to understand the development and use of such tests and to
prepare them to work on teams that may revise, expand, or create
standardized tests for programs in which they teach. This chapter
also provides an excellent appendix with examples of some of the
most influential ESL/EFL tests, including the Test of English as
a Foreign Language (TOEFL), the Michigan English Language
Assessment Battery (MELAB), the International English Language
Testing System (IELTS) and the Test of English for International
Communication (TOEIC). One caveat is that the book provides little
guidance for choosing a commercial, off-the-shelf standardized test
for use in a language program. Readers who need to know how to
evaluate the usefulness of commercial tests and to use test manuals
to learn more about them could turn to Stoynoff and Chapelles 2005
volume ESOL Tests and Testing for this information.
Chapter 5 efficiently reviews standards-based assessment. In addition, the chapter covers ethical issues involved in critical language
testing, but a teacher using this book might want to supplement the
topic of test ethics with materials from two other books in addition to
those in the annotated bibliography Shohamys (2001) book, The
Power of Tests, and Snows (2000) edited volume Implementing the
ESL Standards for Pre-K-12 Students Through Teacher Education.
Both of these texts address in greater depth some of the issues
touched upon by H. D. Brown, such as the long-range impacts of
standardized tests. The book entitled Standards for Foreign Language
Learning in the 21st Century, published by ACTFL (2006), can also
help instructors supplement this chapter. This chapter uses Web page
references throughout; for example, on p. 106 the reader is directed to
a Web page to view the standards developed by the California
Department of Education; however, the standards are no longer published at the location given. Readers should be advised that they may
need to hunt for the referenced Web texts.
Chapters 6, 7, 8, and 9 provide guidelines and illustrated examples
for the design of items for listening, speaking, reading, and writing
tests. Each chapter describes the skill generally and at macro- and
micro-levels. Following the skill descriptions, the chapters turn to the

086784_LTJ_271-286.qxd

26/2/08

4:27 PM

Page 277

Book reviews

277

craft of designing items. These chapters can be seen as the heart of the
book, and instructors will want to spend considerable time reviewing
the types of items presented and guiding students through item writing practice sessions. Readers will find the example items quite helpful, especially the multiple picture-description and picture-cue tasks,
which can be used as guides in finding or designing other picturebased prompts for oral or written test tasks at multiple levels, and
which are very beneficial as a testing task for more visual learners.
Chapter 10 is an example of H. D. Browns commitment to treating standardized testing issues and alternatives in assessment equally
and objectively. Here he explains the history of alternative assessment. This history involves a discussion on how standardized tests
have shortcomings and that other options (portfolios, journals, observations, self-assessments, peer-assessments) hold more ethical
potential. The chapter goes on to explain the pros and cons of various types of alternatives in assessment with helpful comparisons
between large-scale standardized tests and portfolios, journals, and
conferences. Another useful topic in this chapter is a discussion of
performance-based assessment. H. D. Brown continues with benefits
and guidelines for using portfolios, journals, conferences/interviews,
observations, and self-/peer-assessments. There is also a nice comparison table (Table 10.1, p. 278) of all the above assessment forms
at the end of the chapter that rates each form of assessment according to the principles of practicality, reliability, face validity, content
validity, washback, and authenticity.
Chapter 11 is enlightening for its frank discussion of the impact
grades have on students lives and for its treatment and attention to
the topic of assigning grades, which is often a central issue of concern
for new teachers. There are sections referring to guidelines for selecting grading criteria, a discussion of absolute versus relative grading,
teachers perceptions of appropriate grade distributions, institutional
expectations and constraints, the question of difficulty from a crosscultural perspective, and the eternal question: what do letter grades
mean? This chapter is a useful reference for new teachers as they
design their first syllabus and form their grading philosophies.
Another excellent feature of chapter 11 is its reference to the
issues covered by the National Center for Fair and Open Testing, also
known as Fairtest, a non-profit watchdog of large-scale, standardized
tests in the USA. Knowledge of the goals of this organization and
knowing how to interpret and use the information stemming from it
will be of particular benefit to US-based MA TESOL students long
after they graduate. On the other hand, since Fairtest is a US-based

086784_LTJ_271-286.qxd

278

26/2/08

4:27 PM

Page 278

Book reviews

organization, an international audience may find this and other references US-centric rather than international in scope. For example,
the Test of English as a Foreign Language (TOEFL) is described and
discussed in detail, but the International English Language Testing
System (IELTS) is merely mentioned in passing.
J. D. Browns book Testing in Language Programs, which is a
revised edition of his 1996 volume, views testing from two perspectives: for making program-level decisions (norm-referenced testing),
and for making classroom-level decisions (criterion-referenced testing). The main goal of the book is to prepare teachers and administrators with the tools they need to critically evaluate L2 tests
reliability, validity, and dependability from several theoretical and
statistical vantage points. Part of this goal is met by instructing readers to conduct robust statistical procedures using Microsoft Excel.
The book spends less time on discussing test construction and test
items and more time on how to evaluate tests and test items that
already exist. The book also focuses on how to use test data to investigate the soundness of a test. There are 11 chapters in this book, and
they can be divided into four areas, as described below.
The first two chapters provide an overview of testing from practical and theoretical perspectives. Furthermore, J. D. Brown delves
into the differences between discrete-point and integrative testing,
prompting the reader to consider which type of testing he or she
prefers. In contrast, H. D. Brown presents these two types of tests
from a theoretical standpoint, emphasizing the arguments for and
against discrete-point and integrative testing in light of the unitary
trait hypothesis, which argues that discrete points of language can
not be separated from each other during language production.
The next two chapters review item development issues from qualitative and quantitative perspectives. Chapter 3 answers the question
of what makes a good item (from a qualitative perspective) and
chapter 4 answers the question of how to analyze items (from a
quantitative perspective). Chapter 3 is highly effective in that it presents principles of sound and valid item construction through a series
of questions; for example J. D. Brown asks, when item writing, have
race, gender, and nationality bias been avoided? (p. 46). He then
explains what this means and how we can avoid such bias in items.
Scoring procedures are also discussed, including how to design and
use rating scales. Chapter 4 begins what is an in-depth yet user-friendly
approach to teaching and learning test statistics. J. D. Brown should
be commended for his attention to detail in this section and those that
follow. A person unversed in mathematics beyond a high-school

086784_LTJ_271-286.qxd

26/2/08

4:27 PM

Page 279

Book reviews

279

level, yet armed with a spreadsheet program such as Excel, can work
through J. D. Browns book and teach him or herself the basics in
test statistics, such as (in chapter 4) calculating item facility (IF) and
item discrimination (ID), calculating an items difference index
(with pre- and posttesting) (DI) or calculating an items B-index.
J. D. Brown takes pains to provide screen shots of an Excel spreadsheet
showing how data can be organized for these statistical analyses, and
describes in detail each and every step, from how to open Excel and
how to enter data, to how to find and use formulas in Excel to automatically calculate IF and ID, so that readers cannot get lost and, at
the end of the chapter, can calculate the statistics explained and
understand what they mean.
Chapters 5 and 6 discuss test results and explain descriptive statistics associated with the interpretation of test results. In addition,
J. D. Brown refers to large-scale tests, such as the SAT, GRE and
TOEFL to help explain standardized scores, thus giving theoretical
points a very concrete reference point.
Chapters 7, 8, 9, and 10 respectively outline the books strongest
point in a review of statistical procedures for demonstrating a tests
(a) correlation with other variables, (b) reliability, (c) dependability,
and (d) validity. These four chapters are the core of the book in that each
chapter explains not only important test theories and their constructs,
but also ways to quantitatively measure these important constructs.
J. D. Brown spends considerable time addressing the different test statistics not just from a mathematical perspective (all formulas are
presented in nicely contained boxes), but also from theoretical and
practical perspectives. Again, as in chapter 4, the statistics themselves
are taught to the reader through clear examples, screen shots of actual
data sheets, and step-by-step instructions on how to do the calculations
using formulas in Excel. J. D. Brown also teaches readers how to use
Excels Chart Wizard, and shows which charts best present data from
which types of statistical analyses. For example, in chapter 7, J. D.
Brown (a) explains what a Pearson productmoment correlation coefficient theoretically can do, (b) explains what to do with missing data,
(c) provides the formula for calculating the Pearson productmoment
correlation coefficient with a key to all mathematical symbols used in
the formula, and (d) provides a screen shot of what data should look
like in Excel prior to calculation. Assumptions of the coefficient are
discussed, diverse scatterplots showing positive and negative correlation coefficients are explained, and finally, following J. D. Browns
detailed directions, readers use sample data in Excel to calculate the
Pearson correlation coefficient and create a scatterplot.

086784_LTJ_271-286.qxd

280

26/2/08

4:27 PM

Page 280

Book reviews

The concluding chapter, chapter 11, deals with the placement of


tests within the curriculum and discusses testing as an integrated
system. This chapter will be of particular interest to program administrators who may mandate the administration of tests at certain points
in the curriculum. What is especially engaging about this chapter is
how it is written, in first person J. D. Brown discusses his personal
experiences being a language testing director at the English Language
Institute at the University of Hawaii at Manoa and describes his experiences with placement, diagnostic, and achievement tests. Two
aspects of this book are worth special mention. First of all, J. D. Brown
provides seven very practical checklists pertaining to test construction
and evaluation that teachers and administrators can easily use. For
example, the detailed test evaluation checklist on p. 32 can be used to
evaluate a test before deciding whether to adopt or adapt it. On p. 172
there is a checklist for potential sources of error variance, which can
be used to identify negative conditions associated with a tests administration. The second highlight of Testing in Language Programs is the
detail it provides in guiding students to use a spreadsheet program to
evaluate tests. The end of chapter 2 begins the book-long lesson on
using a spreadsheet program such as Excel to do everything from
entering test grades and performing item analyses to doing correlations and calculating reliability coefficients and the standard error of
measurement. This is very helpful for a typical MA TESOL class on
testing in which students may exhibit a wide range of knowledge
about spreadsheet programs. In addition, MA and PhD students in
applied linguistics will find it an appropriate starting point for designing valid and reliable tests for SLA research purposes. J. D. Brown
should be commended for making test statistics a very approachable
topic and for empowering readers, armed with no more than a spreadsheet program, with the ability to perform sophisticated test statistics
on their own data, without the purchase of costly statistical packages.
In conclusion, both texts are well written and well conceptualized.
We believe the texts would pair nicely for an introductory course on
language testing in which there are both beginning-level MA TESOL
and advanced PhD students in applied linguistics because together the
books address an extremely wide range of knowledge in language
testing. MA TESOL and PhD students need to know how to design
language tests in the four skills (a strong point of H. D. Browns
book) and how to run statistics on their test data (a strong point of
J. D. Browns book). In a multi-level class, using H. D. Browns book
Language Assessment alone might prove too simple. MA students and
PhD candidates would want for some statistical analysis direction, and

086784_LTJ_271-286.qxd

26/2/08

4:27 PM

Page 281

Book reviews

281

may crave more information about the controversial issues in language


testing that are addressed in J. D. Browns book. H. D. Browns book
oversimplifies some definitions, and after a few weeks into a semesterlong course on language testing, a student may want more information on some of testings more controversial issues. On the other
hand, using J. D. Browns book Testing in Language Programs alone
in a mixed class with some novice teachers may present some of its
own problems. Some of the controversial issues in language testing
presented in the book, such as theoretical, practical, and fairness
issues, may be a bit advanced for beginners in the field who do not
have a solid foundation in linguistics or language testing. On the other
hand, because language testing is a crucially important social science
involving practical activities that can have serious ethical and social
consequences (Davies, 1997), perhaps it is not wrong to push all students to consider the more advanced topics in language testing and to
be trained to use the statistics needed for an advanced analysis of language tests. Bachman (2000) called for more guided practice in
language testing for all those involved in the field, and we feel
J. D. Browns book answers that call in that it does not assume that
statistics are too laborious or superfluous for language teachers. He
also does not assume that all language teachers have access to statistical packages, and does an excellent job empowering those teachers
to use Excel to calculate the statistics needed for analyzing tests.
These quibbles aside, both books possess a number of strengths:
both volumes are reader-friendly in that they place key words in bold
print and provide useful tables to summarize information. The endof-chapter review questions are excellent teaching tools, and both
books cover the essentials of language testing well. We recommend
both books to L2 testing instructors and those interested in learning
more about the basics of L2 testing. We further recommend
J. D. Browns book for teachers of graduate-level L2 testing classes
and for those wishing to learn the fundamentals of test statistics and
how to calculate them in Excel.

References
ACTFL. (2006). Standards for foreign language learning in the 21st century
(3rd ed.). Lawrence, KS: Allen Press.
Bachman, L. F. (2000). Modern language testing at the turn of the century:
Assuring that what we count counts. Language Testing, 17(1), 142.
Davies, A. (1997). Demands of being professional in language testing.
Language Testing, 14, 328339.

086784_LTJ_271-286.qxd

282

26/2/08

4:27 PM

Page 282

Book reviews

Shohamy, E. (2001). The power of tests: A critical perspective on the uses of


language tests. Harlow, Essex: Pearson Education.
Snow, M. A., editor (2000). Implementing the ESL standards for pre-K-12
students through teacher education. Alexandria, VA: TESOL.
Stoynoff, S., & Chapelle, C. A. ESOL tests and testing. Alexandria, VA:
TESOL.3

Jennifer Lund and Paula M. Winke


Michigan State University

Luoma, S. (2004). Assessing speaking. Cambridge: Cambridge


University Press. 212 pp. ISBN: 0521804876 (paperback)

According to Fulcher (2003), tests of second language speaking have


always been thought of as important, but [that] they were too unreliable and impractical to use in the kind of large-scale testing that
emerged in the 1920s with a rapidly expanding educational system
(p. 16). Fulchers book traces the rich history of speaking assessment,
which illuminates an interesting fact: although there are a number of
books that report on research into speaking tests, until recently, very
few texts have targeted practitioners who need to learn about speaking assessment. Fulchers is, of course, one; Sari Luomas Assessing
Speaking is another.
Assessing Speaking is the seventh book in the 10-book Cambridge
Language Assessment Series, edited by J. C. Alderson and L. F. Bachman.
In these texts, as the editors state, applied linguistics theory and
research are drawn upon in order to enhance our understanding of the
nature of what is to be tested and assessed (p. x). Assessing Speaking,
though grounded in research and theory is highly practical and is
aimed at those who need to develop assessments of speaking ability
(p. x). In this review, then, I will evaluate the books strengths and
weaknesses in terms of its theoretical grounding, research base, and
practical advice.
Luomas introductory chapter begins with four scenarios that
entail the assessment of speaking the oral section of a large foreign
language assessment, a tape-mediated speaking test in a language

Вам также может понравиться