Вы находитесь на странице: 1из 5

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/318055217

A test is much more than just the test itself

Article in European Journal of Psychological Assessment July 2017


DOI: 10.1027/1015-5759/a000428

CITATIONS READS

0 69

2 authors:

Samuel Greiff Dragos Iliescu


University of Luxembourg University of Bucharest
134 PUBLICATIONS 1,059 CITATIONS 41 PUBLICATIONS 185 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

AFM: Assessment of Fundamental Motives View project

The Training of Complex Problem Solving View project

All content following this page was uploaded by Samuel Greiff on 01 July 2017.

The user has requested enhancement of the downloaded file.


Editorial
A Test Is Much More Than Just the
Test Itself
Some Thoughts on Adaptation and Equivalence
Samuel Greiff1 and Dragos Iliescu2
1
Cognitive Science and Assessment, University of Luxembourg, Esch, Luxembourg,
2
Department of Psychology, University of Bucharest, Romania

The very substance of psychological assessment is based on research on psychological assessment instruments, with
carefully developed tests, but the field is not exclusively a aspects related to the adaptation process (both the develop-
tributary to the test instrument itself. In fact, tests, and ment/adaptation and the actual testing process) and not only
indirectly through such tests, the field of psychological with regard to the actual outcome of the adaptation (the
assessment is to a large extent influenced by the character- final adapted test). However, editorials can serve only as
istics of the testing process: How the test is administered, an initial spark and as a way to raise awareness of an impor-
scored, reported, secured, or disposed of are important tant topic. Comprehensive information on test adaptation
circumstantial variables. Stated differently, assessment is and equivalence is provided by the work of the International
more than just the test, and research on assessment super- Test Commission, such as its various guidelines (e.g.,
sedes attention to just the instrument. Many of the caveats International Test Commission [ITC], 2006, 2012, 2014,
of test usage and maybe even of failed attempts to show 2015, 2017). More specifically, some process-related issues
validity are due to process-related rather than to test-related for test adaptation have been covered by such important
issues. This holds in particular for test adaptations, a highly documents as the International Guidelines on Computer-
relevant topic given that many instruments are used across Based and Internet Delivered Testing (ITC, 2006, 2012,
languages and cultures. Of note, a substantial number of 2014, 2015, 2017), the International Guidelines on Quality
papers published in the European Journal of Psychological Control in Scoring, Test Analysis, and Reporting of Test
Assessment involve adapted versions of psychological test Scores (ITC, 2012), the International Guidelines on the
instruments and their validity. Security of Tests, Examinations, and Other Assessments
Test adaptation, which often also goes under the names (ITC, 2014), the International Guidelines for Practitioner
test localization or test indigenization, is a scientific Use of Test Revisions, Obsolete Tests, and Test Disposal
and professional activity that refers to the development of (ITC, 2015), and others.
a derived test version, the adapted test, which is obtained Although they may be strictly correct in their approach,
by transferring the original test from its source language researchers who use or develop test adaptations seem to
or culture to a target language or culture. The adaptation have absorbed the technical and statistical skills more easily
process usually includes a translation of the instrument, than the underlying (and yet equally important) philosophy
but is much more than mere translation: It involves a of test development and adaptation, which emphasizes the
thorough scientific process and is guided by the principles process and assigns relatively greater importance to the a
of the scientific method, most prominent of all being the priori steps (before data collection) than to the a posteriori
need to offer proof of the psychometric appropriate- steps (after data collection) of test development and adapta-
ness of the adapted test in the new language and culture tion. This, in turn, leads to a number of gaps in the general
and of its psychometric similarity (equivalence) to the manner in which research papers are approached:
original test.
The current editorial is written in the light of the often Low variability in the statistical approach used in
insufficiently covered topics of adaptation and equivalence. the various papers (mostly structural equation model-
It champions the need to be more inclusive in our reports of ing (SEM) instead of a broad array, including

 2017 Hogrefe Publishing European Journal of Psychological Assessment 2017; Vol. 33(3):145148
DOI: 10.1027/1015-5759/a000428
146 Editorial

multidimensional scaling, cluster analysis, item which tends to focus on only some narrow aspects of con-
response theory (IRT), and so forth). struct and item bias/equivalence. For example, in terms of
Lack of integration of multiple sources of data (e.g., construct (non)equivalence, construct contamination (i.e.,
data from test-takers and experts). the incomplete overlap of construct-relevant indicators
A lack of detail when reporting on the actual develop- across the source and target culture), construct deficiency
ment or adaptation process (the test craft, based on (i.e., the incomplete coverage of the construct in the target
important a priori analyses; see below), and a tendency culture), or the differential appropriateness of construct-
to lean more toward the statistical (confirmatory) relevant indicators across the source and target culture are
aspects. rarely if ever discussed. In terms of method (non)equiva-
Lack of sophistication in the adaptation design (e.g., lence, sample bias, which may emerge from a lack of
using mostly target-monolingual designs based on comparability or even minor differences in sample charac-
teristics between the source and target culture samples, is
back translation, and avoiding mono- or multisample
hardly discussed at all. Second, instrument bias, which
bilingual designs).
may emerge from phenomena such as the differential famil-
iarity of test takers from the two cultures with the stimulus
We argue that a stronger and more inclusive emphasis on material or with the response procedures, is not discussed
the a priori steps in test adaptation and test validation is much either. And finally, administration bias, which may
needed. Along this line of thinking, the current editorial emerge from technological, physical, or social administra-
is a continuation of a previous editorial (Ziegler & Bensch, tion conditions or the differential expertise of test adminis-
2013) that outlined how mere translations, as they offer a trators, is also rarely if ever discussed. In terms of item
very limited approach to test adaptations, are not interesting (non)equivalence, incorrect or poor item translations,
to the European Journal of Psychological Assessment. inadequate item formulation, or cultural variations in item
To extend this view, in this editorial we would like to familiarity or item appropriateness are usually not
encourage (a) a holistic understanding of equivalence in test addressed either.
adaptations as well as (b) a more comprehensive view of the All of these different forms of equivalence could,
methodologies employed to establish equivalence. generally speaking, be established on the basis of a posteri-
ori analyses. However, tests are rarely extremely rarely
(a) A Holistic Understanding of Equivalence in Test perfectly equivalent. They might be equivalent at a certain
Adaptations level (for a review, see Schmitt & Kuljanin, 2008), which
Equivalence, often referred to (with a more statistical implies that they are also nonequivalent at a certain level.
undertone) as invariance, refers to the comparability of The sources of nonequivalence should be carefully docu-
scores that are obtained from the administration of different mented. Of note, documentation of the actual reason behind
forms (original vs. adapted) of a test and is considered a nonequivalence cannot be performed with a posteriori anal-
specific source of validity. The fact that one form of a test yses but requires analytical depth, qualitative reasoning, and
is equivalent to another has two important implications. supplementary data. For instance, take a case in which the
First, test scores derived from the two forms can be researcher establishes partial equivalence and advances the
directly compared (at the level of equivalence they reflect). hypothesis that nonequivalence stems from differential item
Second, any evidence generated by one form is also valid familiarity. Whereas IRT (or SEM) can isolate the biased
for the other form in the sense that validity evidence is items, confirming the source of bias (differential familiarity
transferable. The terms equivalence and bias are in the two cultures) is impossible without a dedicated
closely connected: Bias is associated with error and is an design, more data, and professional reasoning.
expression of nonequivalence. If the original and adapted As a first approach to a more inclusive view on adapta-
forms of a test are not equivalent, not only can responses tion and equivalence, we encourage authors to consider the
collected with the two forms of the test not be directly following suggestions and guidelines.
compared, but conclusions based on evidence from the
source form cannot be advanced for scores generated with
Including different forms of bias in the analyses (not
the target form.
only construct, or item bias): method bias, which is
Bias and equivalence related to adaptations of
psychological tests come in at least three major forms considered by some authors to be the most insidious
(van de Vijver & Poortinga, 2005): construct bias (i.e., and pervasive of all the three types of bias (e.g., van
incomplete overlap of the measured constructs in the orig- de Vijver & Leung, 2011), as well as the interplay
inal and adapted versions of the test), method bias (i.e., between item and construct bias.
nuisance factors arising from aspects of the method, e.g., Collecting supplementary variables about the context
sample, instrument, or administration conditions), and item in which the test was administered; this may help in
bias (i.e., anomalies in items such as those stemming from identifying specific forms of bias. For instance, instru-
incorrect or poor item translation, differential item familiar- ment bias may pass undetected if data are not collected
ity, cultural appropriateness, and so forth). on variables such as the familiarity of test takers
Most of these forms of bias, albeit all important when with the item format, cultural response sets, or the
establishing equivalence, are not sufficiently considered (culturally moderated) social desirability of items.
by much of the published research with adapted tests, Administration bias may pass undetected if data are

European Journal of Psychological Assessment 2017; Vol. 33(3):145148  2017 Hogrefe Publishing
Editorial 147

not collected on variables such as the physical, social, of the family and ideally together contributing to a
or technological administration conditions, the comprehensive picture. This would, for instance, imply
differential expertise of test administrators, or other comparing several different forms with each other rather
administrator/interviewer characteristics. Obviously, than comparing just one form with another. A good
considering these aspects places high demands on the example that adopted such an approach is work by Byrne
research design, but it also increases sophistication and van de Vijver (2014), who compared the structure of
and aids the discovery of new areas of research; the Family Values Scale across 27 countries with a multi-
Combining qualitative with quantitative approaches: level equivalence framework. However, few if any papers
Some forms of bias (e.g., construct deficiency; i.e., have comprehensively discussed the issue of the
the incomplete coverage of the construct in the target equivalence of adapted forms, which is actually one that
culture) are virtually undetectable through exclusively is very relevant for the field of psychological assessment
and, thus, also for the European Journal of Psychological
quantitative research;
Assessment. From an editorial perspective, we wonder
Iterating across several cycles of test adaptation: If bias
how much of the current replication crisis in the social
is detected, simply shrugging the flagged item(s) off sciences might be due to a lack of measurement equiva-
(e.g., by deletion or minimization of impact) is the lence between the test forms. After all, equivalence is a
easy way out. More appropriate explanations may be form of validity (Iliescu, 2017), and a lack of validity,
gained by going back and specifically collecting including a lack of equivalence, has many sources and
additional data to unravel the mechanism through implications.
which the bias was generated.
(b) On the Preferred (and the Forgotten) Methods
The ways in which we construe the relation of an original for Establishing Equivalence
test with its various test adaptations have evolved over Many papers that deal with adapted tests report models with
time along with our understanding of a tests validity. a good fit (please note that an editorial on model fit is
Logically and chronologically, we could probably outline planned to appear later in 2017), for instance, when it
three such evolutionary phases. The first and most simplistic comes to the internal structure of the adapted instrument.
view is that the test is the label (i.e., the name of the test), no Often, the reader might have the impression that these
matter whether we are looking at the original form or at an almost perfect versions just fell into the researchers lap
adaptation. In this view, no evidence of equivalence is without much effort at all: all that was needed was a quick
needed the fact that the adaptation carries the same name translation, and then the data were collected, and the results
as the original is considered proof supreme that it is the same of the analyses were reported. However, anybody who has
test. Obviously, this view is outdated today. ever been involved in test adaptation knows how difficult
The second understanding emphasizes the difference and sometimes even cumbersome it is to obtain an appro-
between the original form and the various adaptations. Test priate translation, how much tweaking, pulling, and pushing
adaptations are derived versions, which are legally and of items it requires, and how many cycles of going back
empirically different from the original though inspired by and forth often with regard to minor and unexpected
it. Items may change, scales may disappear; cultural indig- cultural issues between the original and the new version
enization may change the adapted form of the original test are required.
in such a way that, although the original may still be This effort is rarely (if ever) reported, and papers often
discernible, the two forms are not identical but share a focus exclusively on the quantitative (and a posteriori)
complicated relationship. aspects of adapting the test, for instance, by establishing
The third and most recent understanding emphasizes the measurement invariance in SEM. However, robust reports
fact that we cannot have real indigenization without strong on the cultural comparability of various forms of the same
evidence of equivalence (see also Ziegler & Bensch, 2013). test should be inspired by both a priori (usually judgment-
When such evidence exists, the two forms are virtually driven) and a posteriori (usually data-driven) methods
identical for a given purpose (e.g., measurement equiva- (van de Vijver, 2011). A priori procedures are applied to
lence), and in a sense, this makes the adapted form of the prevent the appearance of bias. Examples of a priori
test part of the original: Equivalence of the adapted form procedures are judgment-based approaches regarding the
contributes to the validity of the entire test and all its forms translation and cultural adaptation of the various compo-
(original and other forms). This is, of course, in many nents of the test, such as items, rating scales, instructions,
respects an oversimplification, and the depth of the full as well as structured or unstructured approaches to the work
process of cultural adaptation (see the nine levels at which of the adaptation committee, including qualitative (e.g.,
test indigenization is conducted as outlined by Church, think-aloud studies) and quantitative methods (e.g., ratings
2001), the method used for translation and indigenization of item appropriateness by members of the translation
of item content (e.g., back translation vs. decentering), panel). Interactions with actual test takers, as these are
and other variables may play a role. not aimed at collecting data, also qualify as a priori proce-
In this third understanding of equivalence, a test needs dures, for instance, cognitive interviews or ratings of the
to be considered an extended family, with the original cultural appropriateness or social desirability of items
form and derived forms contributing to the overall validity (Iliescu, 2017).

 2017 Hogrefe Publishing European Journal of Psychological Assessment 2017; Vol. 33(3):145148
148 Editorial

A posteriori procedures are those that are used after the International Test Commission. (2012). International guidelines
data have been collected and are much more commonly on quality control in scoring, test analysis, and reporting of
seen in published manuscripts. These procedures may test scores. Retrieved from http://www.intestcom.org.
doi: 10.1037/e508492012-002
detect the existence of bias, and they may sometimes International Test Commission. (2014). International guidelines
statistically control for the effects of nonequivalence, but on the security of tests, examinations, and other assessments.
they can never actively generate the adapted form of the test Retrieved from http://www.intestcom.org. doi: 10.1080/
by themselves. That is, a posteriori procedures are reactive; 15305058.2015.1111221
they may tell the research that there is a problem but not International Test Commission. (2015). International guidelines
what the problem is or how to solve it. Examples of a pos- for practitioner use of test revisions, obsolete tests, and test
disposal. Retrieved from http://www.intestcom.org
teriori procedures include invariance analyses in SEM or, International Test Commission. (2017). The ITC guidelines for
more generally, the analysis of psychometric characteristics. translating and adapting tests (2nd ed.). Retrieved from
Here, we highlight the fact that test adaptation should http://www.intestcom.org
mainly result from carefully conducted a priori procedures Schmitt, N., & Kuljanin, G. (2008). Measurement invariance.
(possibly after repeated iterations) and should only margin- Review of practice and implications. Human Resource
ally rely on a posteriori procedures. Given this situation, it Management Review, 18, 210222. doi: 10.1016/j.hrmr.
is worth noting that we continue to see mostly reports of a 2008.03.003
van de Vijver, F. J. R. (2011). Bias and real differences in
posteriori analyses across various papers, without much cross-cultural differences. Neither friends not foes. In F. J. R.
insight into the a priori approaches. The a priori efforts van de Vijver, A. Chasiotis, & S. M. Breugelmans (Eds.),
are the actual craft (the test craft if you will) of test Fundamental questions in cross-cultural psychology (pp.
development or adaptation, and they are invaluable learning 235258). New York, NY: Cambridge University Press.
points for future research and practice. Statistical reports doi: 10.1017/cbo9780511974090.011
cannot inspire future projects as much as prospective van de Vijver, F. J. R., & Leung, K. (2011). Equivalence and
bias. A review of concepts, models, and data analytic
insights into the craft can. procedures. In D. R. Matsumoto & F. J. R. van de Vijver
(Eds.), Cross-cultural research methods in psychology (pp.
1745). New York, NY: Cambridge University Press.
doi: 10.1017/cbo9780511779381.003
van de Vijver, F. J. R., & Poortinga, Y. (2005). Conceptual and
Conclusion methodological issues in adapting tests. In R. K. Hambleton,
P. F. Merenda, & C. D. Spielberger (Eds.), Adapting
In a nutshell, this editorial asks for a more holistic approach educational and psychological tests for cross-cultural
to the conducting of test adaptations and the methods assessment (pp. 3964). Mahwah, NJ: Erlbaum.
employed to do so. Obviously, the request for more infor- doi: 10.1007/s11336-007-9014-3
mation conflicts with the natural limitations of journal Ziegler, M., & Bensch, D. (2013). Lost in translation. Thoughts
regarding the translation of existing psychological measures
space (which is rather strict in the European Journal of into other languages. European Journal of Psychological
Psychological Assessment). An excellent opportunity to Assessment, 29, 8183. doi: 10.1027/1015-5759/a000167
be more inclusive is to make use of electronic supplemen-
tary material (ESM) to inform readers about both the a
priori and the a posteriori measures that were used to estab-
lish equivalence. This will allow researchers to achieve two Samuel Greiff
ends at the same time: informing readers concisely and
precisely of what was done in the main article and provid- Cognitive Science and Assessment
ing additional information in the ESM to readers who University of Luxembourg
would like to dive into the adaptation process more deeply. 6, rue Richard Coudenhove-Kalergi
4366 Esch-sur-Alzette
Luxembourg
samuel.greiff@uni.lu
References
Byrne, B., & van de Vijver, F. J. R. (2014). Validating factorial Dragos Iliescu
structure of the family values scale from a multilevel-
multicultural perspective. International Journal of Testing,
14, 168192. doi: 10.1080/15305058.2013.870903 Department of Psychology
Church, T. A. (2001). Personality measurement in cross- University of Bucharest
Sector 6, Sos Panduri 90
cultural perspective. Journal of Personality, 69, 9791006.
050663 Bucharest
doi: 10.1111/1467-6494.696172
Romania
Iliescu, D. (2017). Adapting tests in linguistic and cultural
dragos.iliescu@fpse.unibuc.ro
contexts. Cambridge, UK: Cambridge University Press.
International Test Commission. (2006). International guideli-
nes on computer-based and Internet-delivered testing.
International Journal of Testing, 6, 143171. doi: 10.1207/
s15327574ijt0602_4

European Journal of Psychological Assessment 2017; Vol. 33(3):145148  2017 Hogrefe Publishing

View publication stats

Вам также может понравиться