Вы находитесь на странице: 1из 8

Editorial

Advances in translating and adapting


educational and psychological tests
Ronald K. Hambleton University of Massachusetts, Amherst,
USA and John H.A.L. de Jong Language Testing Services,
The Netherlands

I Introduction
Today, there is substantial evidence indicating the great need for
multi-language educational and psychological tests (see, for example,
Hambleton, 2002; van de Vijver and Hambleton, in press). For
example, more than 40 countries participated recently in the Third
and Fourth International Mathematics and Science Studies (TIMSS
and TIMSS-R) and achievement tests and questionnaires were pre-
pared in over 30 languages. In the OECD/PISA project to assess the
yield of national education systems 15-year-old students were tested
in reading, mathematics, and science literacy in 32 languages and
cultures in the Ž rst cycle (2000) and in the second cycle (2003) well
over 40 countries are participating. A number of popular personality
and intelligence tests such as Spielberger’s Trait-State Anxiety Inven-
tory and the Wechsler Intelligence Scale for Children are now avail-
able in more than 50 languages each, and many other achievement
tests, credentialing examinations, personality measures, and quality
of life measures have been adapted into 10 languages or more.
The list of translated and adapted tests is long and growing rapidly.
Substantially more test translations and adaptations can be expected
in the future because:
· international exchanges of tests have become more common;
· credentialing exams produced by companies such as Novell and
Microsoft are being made available in many countries; and

Address for correspondence: Ronald K. Hambleton, Department of Educational Policy,


Research, and Administration, 152 Hills South, University of Massachusetts, Amherst, MA
01003–4140, USA; email: rkh@educ.umass.edu

Language Testing 2003 20 (2) 127–134 10.1191/0265532203lt247xx Ó 2003 Arnold


128 Advances in translating tests

· interest in cross-cultural psychology and international comparative


studies of achievement has grown.
The purposes of this special issue are:
· to focus attention on some of the emerging methodological devel-
opments for conducting studies to increase the quality of test
translations and test adaptations; and
· to highlight several practical studies of test translation and adap-
tation.
It should be recognized at the outset that translated and adapted tests
and instruments are nearly always going to be different than the
source language versions. The goal then should be to minimize the
differences to acceptable semantic, psychometric, linguistic, and
psychological levels. The articles in this special issue are all about
Ž nding those acceptable levels of differences to insure valid uses of
the tests and instruments. Contributors to the special issue are from
seven countries: Australia, Belgium, Canada, Israel, Korea, The
Netherlands, and the USA.
In the remainder of this introduction, two topics are addressed.
First, guidelines developed by the International Test Commission
(ITC) for translating and adapting tests are introduced. In the ITC’s
work, test translation is viewed as part of the test adaptation process,
but it is only one of several steps that must be carefully carried out
to produce a test or instrument that is equally valid in two or more
languages and cultures (for more on steps, see Hambleton and Pats-
ula, 1999). Secondly, remarks about the articles in this special issue
are provided in order to set a context for readers.

II International Test Commission guidelines for translating and


adapting tests
In 1992 the International Test Commission (ITC) began a project to
prepare guidelines for translating and adapting tests and psychological
instruments, and establishing score equivalence across language
and/or cultural groups. Several organizations assisted the ITC in
preparing the guidelines:
· the European Association of Psychological Assessment;
· the European Test Publishers Group;
· the International Association for Cross-Cultural Psychology;
· the International Association of Applied Psychology;
· the International Association for the Evaluation of Educational
Achievement;
· the International Language Testing Association; and
Ronald K. Hambleton and John H.A.L. de Jong 129

· the International Union of Psychological Science.


A committee of 12 representatives from these organizations worked
for several years to prepare 22 guidelines, and later these guidelines
were Ž eld-tested (see, for example, Hambleton et al., 1999; Tanzer
and Sim, 1999; Hambleton, 2001; Hambleton et al., in press). Later,
these guidelines were approved by the ITC for distribution to national
psychological societies, test publishers, and researchers. The guide-
lines, organized into four categories, appear below:

1 Context
C.1 Effects of cultural differences which are not relevant or important to
the main purposes of the study should be minimized to the extent
possible.
C.2 The amount of overlap in the construct measured by the test or instru-
ment in the populations of interest should be assessed.

2 Test development and adaptation


D.1 Test-developers/publishers should insure that the adaptation process
takes full account of linguistic and cultural differences among the
populations for whom adapted versions of the test or instrument
are intended.
D.2 Test-developers/publishers should provide evidence that the language
use in the directions, rubrics, and items themselves as well as in the
handbook are appropriate for all cultural and language populations
for whom the test or instrument is intended.
D.3 Test-developers/publishers should provide evidence that the choice
of testing techniques, item formats, test conventions, and procedures
are familiar to all intended populations.
D.4 Test-developers/publishers should provide evidence that item content
and stimulus materials are familiar to all intended populations.
D.5 Test-developers/publishers should implement systematic judgmental
evidence – both linguistic and psychological – to improve the accu-
racy of the adaptation process and compile evidence on the equival-
ence of all language versions.
D.6 Test-developers/publishers should ensure that the data collection
design permits the use of appropriate statistical techniques to establish
item equivalence between the different language versions of the test
or instrument.
D.7 Test-developers/publishers should apply appropriate statistical tech-
niques to (1) establish the equivalence of the different versions of
the test or instrument, and (2) identify problematic components or
aspects of the test or instrument which may be inadequate to one or
more of the intended populations.
D.8 Test-developers/publishers should provide information on the
evaluation of validity in all target populations for whom the adapted
versions are intended.
130 Advances in translating tests
D.9 Test-developers/publishers should provide statistical evidence of the
equivalence of questions for all intended populations.
D.10 Non-equivalent questions between versions intended for different
populations should not be used in preparing a common scale or in
comparing these populations. However, they may be useful in
enhancing content validity of scores reported for each population sep-
arately.

3 Administration
A.1 Test-developers and administrators should try to anticipate the types
of problems that can be expected, and to take appropriate actions to
remedy these problems through the preparation of appropriate
materials and instructions.
A.2 Test administrators should be sensitive to a number of factors related
to the stimulus materials, administration procedures, and response
modes that can moderate the validity of the inferences drawn from
the scores.
A.3 Those aspects of the environment that in uence the administration of
a test or instrument should be made as similar as possible across
populations of interest.
A.4 Test administration instructions should be in the source and target
languages to minimize the in uence of unwanted sources of variation
across populations.
A.5 The test manual should specify all aspects of the administration that
require scrutiny in a new cultural context.
A.6 The administrator should be unobtrusive and the administrator–exam-
inee interaction should be minimized. Explicit rules that are described
in the manual for administration should be followed.

4 Documentation/ score interpretations

I.1 When a test or instrument is adapted for use in another population,


documentation of the changes should be provided, along with evi-
dence of the equivalence.
I.2 Score differences among samples of populations administered the test
or instrument should not be taken at face value. The researcher has
the responsibility to substantiate the differences with other empiri-
cal evidence.
I.3 Comparisons across populations can only be made at the level of
invariance that has been established for the scale on which scores
are reported.
I.4 The test-developer should provide speciŽ c information on the ways
in which the socio-cultural and ecological contexts of the populations
might affect performance, and should suggest procedures to account
for these effects in the interpretation of results.

The guidelines and suggestions for implementing them can be found


in van de Vijver and Hambleton (1996), Muniz and Hambleton
Ronald K. Hambleton and John H.A.L. de Jong 131

(1997), van de Vijver and Tanzer (1997), and Hambleton et al. (in
press). These guidelines have become a frame-of-reference for many
psychologists working in the test translation and adaptation area, and
more general adoption of the guidelines can be expected in the com-
ing years as the guidelines are more widely disseminated and the
standards for translating and adapting tests are increased.
Particularly noteworthy about the guidelines is their emphasis on
the importance of compiling both judgmental as well as empirical
evidence to support the validity of a test or instrument translation and
adaptation. No amount of care in the selection of translators or the
design for using translators can compensate for the value of Ž rst hand
empirical evidence from persons for whom the translated and adapted
test or instrument is intended. The seven articles that follow highlight
the role of both judgmental and empirical procedures in compiling
validity evidence to support the use of a test or instrument in a second
language and culture.
From a practical point of view, two major contexts can be dis-
tinguished for applying the ITC guidelines:

· the translation/adaptation of existing tests and instruments; and


· the development of new tests and instruments for international
use.

The Ž rst context refers to the situation where tests and instruments
that have originally been developed in a particular language for use
in some national context are to be made appropriate for use in one
or more other languages and/or national contexts. Often in such cases
the aim of the translation/adaptation process is to produce a test or
instrument with comparable psychometric qualities as the original.
Even for nonverbal tests, adaptations are necessary not only of the
accompanying verbal materials for administration and score interpret-
ation but also of graphic materials in the test proper to avoid cultural
bias (see van de Vijver and Hambleton, in press). Growing recog-
nition of multiculturalism has raised awareness of the need to provide
for multiple language versions of tests and instruments intended for
use within a single national context.
The second context refers to the development of tests and
instruments that from their conception are intended for inter-
national comparisons. The advantage here is that versions for use
in different languages and or different national contexts can be
developed in parallel, i.e., there is no need to maintain a pre-existing
set of psychometric qualities. The problem here often lies in the sheer
size of the operation: the large number of versions that need to be
developed and the many people involved in the development process.
132 Advances in translating tests

III Overview

Two articles that advance test adaptation methodology appear Ž rst in


the special issue. Bruno Zumbo from the University of British Colum-
bia in Canada demonstrates the inadequacy of judging the suitability
of a test adaptation by looking only at the factor structure of a test
in two or more language versions. It has been common to assume
that if the factor structure of a test remains the same in a second
language version, then the test adaptation was successful. Zumbo pro-
vides convincing evidence that item level bias can still be present
when structural equation modeling of the test in two languages reveals
an equivalent factorial structure. Since it is the scores from a test or
instrument that are ultimately used to achieve the intended purpose,
the scores may be contaminated by item level bias and, ultimately,
valid inferences from the test scores become problematic.
Stephen Sireci from the University of Massachusetts in the USA
and Avi Allalouf from the National Institute for Testing and Evalu-
ation in Israel continue the theme introduced by Zumbo. They
describe several straightforward methods to detect item level  aws
that can arise in the test adaptation process. They illustrate the
methods by describing their recent research to detect  aws in a col-
lege admissions language test adapted from Hebrew into Russian.
Especially important in Sireci and Allalouf’s work are the steps they
take to explain the  aws detected using item bias review procedures.
If the causes can be identiŽ ed, it may be possible to reduce the prob-
lems in the future either through informing item writers about what
content, item format, and test conditions to avoid, or by looking,
speciŽ cally, for similar problems in other tests at the test adaptation
stage.
Jan Lokan and Marianne Fleming from the Australian Council for
Educational Research provide an interesting example of the issues
and approaches for adapting materials from one country to another
in the same language. They describe their efforts to adapt a computer-
assisted career guidance system developed by the Educational Testing
Service in the USA for use in Australia. It is a fascinating study of
the issues that can arise and how they were resolved. Even when
language translations are not involved, it is seen that the issues of
adapting a test from one culture to another can be complex. And, in
this instance, the cultures on the surface appear to be relatively
similar.
Sunhee Chae from the Korea Institute of Curriculum and Evalu-
ation extends the examples of test adaptation work to assessments
involving pictures, and with tests administered to pre-school children.
Ronald K. Hambleton and John H.A.L. de Jong 133

Her article extends the information about translation/adaptation to


populations where very little research has been carried out.
Charles StansŽ eld from Second Language Testing, Inc. in the USA
describes some of his own Ž rst hand experiences to translate and
adapt tests. His article deals with a wide range of problems that arise
in practice, such as issues concerning legal requirements, deciding
whether or not to translate a test, political implications, and even
typesetting. StansŽ eld illustrates these issues with examples from his
survey work on the use of translated and adapted tests in the USA.
One additional feature of this article is the focus on test publishers
and their perspective on the process.
Joy McQueen and Juliette Mendelovits from the Australian Council
for Educational Research describe the steps implemented in the
OECD/PISA/2000 Project to adapt the reading literacy assessment
into 32 languages and cultures. McQueen and Mendelovits present
the perspective of test-developers and provide an overview of all pro-
cedures implemented in the PISA/OECD project to insure the test
materials would be appropriate for international student comparisons.
Their account starts with the beginning steps of collecting materials,
continues with many intermediate steps of international exchange on
test development and cultural reviews, and concludes with the Ž nal
evaluation of Ž eld trial data. They evaluate the procedures as they
were followed against the ITC guidelines for test adaptation. The
article presents a number of interesting examples of problems that
emerged in the practice of their translation/adaptation work.
Aletta Grisay, recently retired from the University of Liège,
Belgium considers in more detail the procedures and standards
implemented in the OECD/PISA/2000 Project for translation/
adaptation and their reports on their effectiveness. Grisay builds on
her experience in previous surveys of educational achievement such
as TIMSS and other studies. In the PISA test adaptation procedures,
the Ž rst step is to produce near equivalent French and English ver-
sions and then to produce additional language and cultural adaptations
from both the French and English versions. Further steps include
adaptations being prepared by separate translators, and then the pro-
ducts of these translators being judged by other translators ultimately
to produce a single adaptation that represents the very best translation
possible from multiple translators. Because at the national level sev-
eral participating countries deviated from the prescribed ideal pro-
cedures, Grisay was able to analyse and compare the effectiveness of
different procedures. Grisay’s work highlights the level of commit-
ment to the test adaptation process that is essential when national
governments intend making policy decisions based on the Ž ndings
from international studies.
134 Advances in translating tests

We hope this collection of seven articles furthers the advancement


of sound test translation and adaptation practices around the world.
While language testing is not the focus of each article, the methodol-
ogical advances and the practical work that is described will surely
be relevant and interesting to researchers working in the language
testing Ž eld. Moreover, as StansŽ eld suggests in his contribution to
this special issue of Language Testing, it may be highly appropriate
that the speciŽ c expertise of language testers be involved to some
degree in any test translation/adaptation activity.

IV References
Hambleton, R.K. 2001: The next generation of the ITC test translation and
adaptation guidelines. European Journal of Psychological Assessment
17, 164–72.
—— 2002: Adapting achievement tests into multiple languages for inter-
national assessments. In Porter, A. and Gamoran, A., editors, Methodo-
logical advances in large-scale cross-national education surveys.
Washington, DC: National Academy of Sciences, 58–79.
Hambleton, R.K., Merenda, P. and Spielberger, C., editors, in press:
Adapting educational and psychological tests for cross-cultural
assessment. Hillsdale, NJ: Lawrence Erlbaum.
Hambleton, R.K. and Patsula, L. 1999: Increasing the validity of adapted
tests: myths to be avoided and guidelines for improving test adaptation
practices. Journal of Applied Testing Technology 1, 1–16.
Hambleton, R.K., Yu, J. and Slater, S.C. 1999: Field-test of the ITC guide-
lines for adapting psychological tests. European Journal of Psycho-
logical Assessment 15, 270–76.
Muniz, J. and Hambleton, R.K. 1997: Directions for the translation and
adaptation of tests. Papeles del Psicologo, August, 63–70.
Tanzer, N.K. and Sim, C.O.E. 1999: Adapting instruments for use in
multiple languages and cultures: a review of the ITC guidelines for
test adaptations. European Journal of Psychological Assessment 15,
258–69.
van de Vijver, F.J.R. and Hambleton, R.K. 1996: Translating tests: some
practical guidelines. European Psychologist 1, 89–99.
—— in press: Adapting educational tests for multicultural assessment.
Educational Measurement: Issues and Practice.
van de Vijver, F.J.R. and Tanzer, N.K. 1997: Bias and equivalence in
cross-cultural assessment: an overview. European Review of Applied
Psychology 47, 263–79.

Вам также может понравиться