Getting Entangled in The Nomological Net

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/263917340
Lost in Translation: Thoughts Regarding the

Translation of Existing Psychological Measures
Into Other Languages
Article in European Journal of Psychological Assessment May 2013

DOI: 10.1027/1015-5759/a000167
CITATIONS READS
16 718
2 authors:
Matthias Ziegler Doreen Bensch

Humboldt-Universitt zu Berlin Humboldt-Universitt zu Berlin
126 PUBLICATIONS 1,577 CITATIONS 8 PUBLICATIONS 50 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
European Journal of Psychological Assessment View project
S-Pan [dropout phenomena in higher education with special regard to primary (science) education]
View project
All content following this page was uploaded by Matthias Ziegler on 14 September 2015.
The user has requested enhancement of the downloaded file.

European Journal of Psychological Assessment
2013 Hogrefe Publishing
Editor-in-Chief Matthias Ziegler, Department of Psychology, Humboldt University Berlin, Rudower Chaussee 18, 12489 Berlin,
Germany (Tel. +49 30 2093 9447, Fax +49 30 2093 9361, E-mail zieglema@hu-berlin.de)
Editorial Assistant Doreen Bensch, Department of Psychology, Humboldt University Berlin, Rudower Chaussee 18, 12489 Berlin,
Germany (Tel. +49 30 2093 9441, Fax +49 30 2093 9360, E-mail benschdx@cms.hu-berlin.de)
Associate Editors Itziar Alonso-Arbiol, Spain; Martin Bckstrm, Sweden; Laurence Claes, Belgium; Johnny Fontaine, Belgium;
Richard Griffith, USA; Carolyn MacCann, Australia; Symeon Vlachopoulos, Greece
Consulting Editors Paul De Boeck, Belgium Helfried Moosbrugger, Germany

Christine DiStefano, USA Janos Nagy, Hungary
Anastasia Efklides, Greece Willibald Ruch, Germany
Roco Fernndez-Ballesteros, Spain Manfred Schmitt, Germany
Brian F. French, USA Stphane Vautier, France
Christoph Kemper, Germany Fons J.R. van de Vijver, The Netherlands
Klaus Kubinger, Austria Cilia Witteman, The Netherlands
Kerry Lee, Singapore
Founders Roco Fernndez-Ballesteros and Fernando Silva
Supporting The journal is the official organ of the European Association of Psychological Assessment (EAPA). The EAPA was
Organizations founded to promote the practice and study of psychological assessment in Europe as well as to foster the exchange
of information on this discipline around the world. Members of the EAPA receive the journal in the scope of their
membership fees. Further, the Division for Psychological Assessment and Evaluation, Division 2, of the Interna-
tional Association of Applied Psychology (IAAP) is sponsoring the journal: Members of this association receive the
journal at a special rate (see below).
Publisher Hogrefe Publishing, Merkelstr. 3, D-37085 Gttingen, Germany, Tel. +49 551 999-500, Fax +49 551 999-50111,
E-mail publishing@hogrefe.com, Web http://www.hogrefe.com
North America: Hogrefe Publishing, 38 Chauncy Street, Suite 1002, Boston, MA 02111, USA, Tel. +1 866 823-
4726, Fax +1 617 354-6875, E-mail customerservice@hogrefe-publishing.com, Web http://www.hogrefe.com
Production Regina Pinks-Freybott, Hogrefe Publishing, Merkelstr. 3, D-37085 Gttingen, Germany, Tel. +49 551 999-500,
Fax +49 551 999-50111, E-mail journalsproduction@hogrefe.com
Subscriptions Hogrefe Publishing, Herbert-Quandt-Strasse 4, D-37081 Gttingen, Germany, Tel. +49 551 50688-900,
Fax +49 551 50688-998
Advertising/Inserts Melanie Beck, Hogrefe Publishing, Merkelstr. 3, D-37085 Gttingen, Germany, Tel. +49 551 999-500,
Fax +49 551 999-50111, E-mail marketing@hogrefe.com
ISSN ISSN-L 1015-5759, ISSN-Print 1015-5759, ISSN-Online 2151-2426
Copyright 2013 Hogrefe Publishing. This journal as well as the individual contributions and illustrations contained within
Information it are protected under international copyright law. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, microfilming, recording
or otherwise, without prior written permission from the publisher. All rights, including translation rights, reserved.
Publication Published in 4 issues per annual volume.
Subscription prices Annual subscription, Institutions (2013): e284.00, US $384.00, 228.00

Annual subscription, Individuals (2013): e142.00, US $192.00, 114.00
Postage and handling: e12.00, US $16.00, 10.00
Special rates: IAAP/Colegio Oficial de Psiclogos members: e85.00, US $114.00, 74.00 (+ e12.00, US
$16.00, 10.00 postage and handling); EAPA members: Included in membership
Single issues: e70.00, US $98.00, 56.00 (+ postage and handling)
Payment Payment may be made by check, international money order, or credit card, to Hogrefe Publishing, Merkelstr. 3,
D-37085 Gttingen, Germany, or, for North American customers, to Hogrefe Publishing, 38 Chauncy Street,
Suite 1002, Boston, MA 02111, USA.
Electronic Full Text The full text of the European Journal of Psychological Assessment is available online at www.psyjournals.com
and in PsycARTICLES.
Abstracting/Indexing The journal is abstracted/indexed in Current Contents / Social & Behavioral Sciences (CC/S&BS), Social Sciences
Services Citation Index (SSCI), Social SciSearch, PsycINFO, Psychological Abstracts, PSYNDEX, ERIH, and Scopus.
Impact Factor (2011): 2.529.
2013 Hogrefe Publishing

european journal of
psychological
assessment
Volume 29, No. 3, 2013
Official Organ of the

European Association of Psychological Assessment
European Journal of Psychological Assessment 2013;V ol. 29(3) 2013 Hogrefe Publishing
Contents
Editorial Getting Entangled in the Nomological Net: Thoughts on Validity and Conceptual
Overlap
Matthias Ziegler, Tom Booth, and Doreen Bensch 157
Original Articles Framing Work Competencies Through Personality Traits: The Big Five
Competencies Grid
Chiara Consiglio, Guido Alessandri, Laura Borgogni, and Ronald F. Piccolo 162
Development and Validation of the German Test for (Highly) Intelligent Kids
T(H)INK
Tanja Gabriele Baudson and Franzis Preckel 171
The MMPI-2-Restructured Form and the Standard MMPI-2 Clinical Scales
in Relation to DSM-IV
Paul T. van der Heijden, Jos I. M. Egger, Gina M. P. Rossi, Gitte Grundel,
and Jan J. L. Derksen 182
The Factor Structure and Psychometric Properties of the Greek Version
of the Eating Disorders Examination Questionnaire (EDE-Q)
Theodoros Giovazolias, Ioannis Tsaousis, and Christina Vallianatou 189
Self-Efficacy in Regulating Positive and Negative Emotions: A Validation
Study in Germany
Catherine Gunzenhauser, Tobias Heikamp, Maria Gerbino, Guido Alessandri,
Antje von Suchodoletz, Laura Di Giunta, Gian Vittorio Caprara,
and Gisela Trommsdorff 197
Reliability and Validity of the German Version of the Young Schema
Questionnaire Short Form 3 (YSQ-S3)
Levente Kriston, Janina Schfer, Gitta A. Jacob, Martin Hrter,
and Lars P. Hlzel 205
Validation of the German Version of the Exercise Dependence Scale
Astrid Mller, Laurence Claes, Dirk Smits, Olaf Gefeller, Anja Hilbert,
Antonia Herberg, Verena Mller, Dirk Hofmeister, and Martina de Zwaan 213
Age Differences and Measurement Invariance of Working Memory in
5- to 12-Year-Old Children
K. Michalczyk, N. Malstdt, M. Worgt, T. Knen, and M. Hasselhorn 220
European Journal of Psychological Assessment 2013; Vol. 29(3) 2013 Hogrefe Publishing
M.
European Journal of Psychological Assessment Ziegler
2013;
2013 et29(3):157161
Vol. al.:Publishing
Hogrefe Editorial
Editorial
Getting Entangled in the Nomological Net
Thoughts on Validity and Conceptual Overlap
Matthias Ziegler1, Tom Booth2, and Doreen Bensch1

1
Humboldt-Universitt zu Berlin, Germany
2
Centre for Cognitive Ageing and Cognitive Epidemiology,
Department of Psychology, The University of Edinburgh, UK
Psychological research strongly relies heavily on tests and ported by evidence of convergent and discriminant validation.
questionnaires to measure constructs and traits. Thus, tests (p. 100)
and questionnaires are not only in high demand, but are Within this Editorial we would like to outline some prob-
constantly being developed anew. Likewise, researchers lems related to construct validity and suggest some re-
have suggested many new traits or constructs. Such sug- search lines to solve these. Schweizer (2012) already dis-
gestions then cause a wave of test and questionnaire devel- cussed problems with convergent validity at length in an
opment. This investment of time and research resources is editorial in this journal. Therefore, we will try to enlarge
necessary to ensure high-quality measurement tools that on the focus and include some additional issues we deem
can be trusted by other researchers and practitioners as well important as well.
to assess the intended trait or construct.
For this reason, journals such as the European Journal
of Psychological Assessment publish studies evaluating the
psychometric properties of such new measurement tools.
Typically, such evaluation studies include some estimate of
The Idea of Convergent and
reliability (Schweizer, 2011) and concentrate on demon- Discriminant Validity as Proposed by
strating the validity of the score derived from the new Campbell and Fiske (1959)
measure. A look at the history of this journal reveals that
the number of publications that apply some form of factor
Campbell and Fiske (1959) started their seminal paper by
analysis has risen from around 28% in the 1990s to 40%
pointing out four aspects important to a validation process.
since the year 2000 (Alonso-Arbiol & van de Vijver, 2010).
They stated, first, that a convergent validity necessarily re-
From this one can assume that the factorial validity of the
quires independent measurement procedures, i.e., it is nec-
published measurement tools has been a central theme of
essary to apply different measurement approaches (e.g., pa-
published research. Factorial validity is of course an ex-
per-pencil and observation). Their second demand is that,
tremely important issue and provides information neces-
besides convergent validity-related evidence, discriminant
sary to all scoring procedures. Since it consists of trait
validity-related evidence is also required. Only then does
scores (in their various forms) which are most commonly
a full picture of validity emerge. The third point Campbell
used in applied studies, factorial validity should not be ne-
and Fiske made is that each measure includes variance due
glected. However, while a newly devised measurement tool
to a trait and variance due to method. Without disentan-
may demonstrate factorial validity and produce reliable test
gling these different variance sources, validity estimates for
scores, its utility in the field is far from assured. Construct
a test score might be inflated. Finally, in their fourth point
validity-related evidence is still necessary to ensure that the
Campbell and Fiske stated that, in order to achieve these
new measure truly captures the trait it was intended to cap-
goals, it is necessary to employ more than one method and
ture. Campbell and Fiske (1959) asserted this as follows:
to assess more than one trait. The approach they suggest
We believe that before one can test the relationships between the multitrait-multimethod matrix (MTMM) allows all of
a specific trait and other traits, one must have some confidence these aspects to be part of an analysis. Such a matrix sum-
in ones measures of that trait. Such confidence can be sup- marizes the correlations computed based on data for sever-
2013 Hogrefe Publishing European Journal of Psychological Assessment 2013; Vol. 29(3):157161
DOI: 10.1027/1015-5759/a000173
158 M. Ziegler et al.: Editorial
al traits, all assessed with the same methods. Importantly, Visualizing the Nomological Net of Personality
it should be done with more than one method. Within the
matrix Campbell and Fiske differentiate reliability diago- Pace and Brannick (2010) conducted a bare bones meta-
nals, validity diagonals, heterotrait-monomethod triangles, analysis (corrections only for sampling error) for different
and heterotrait-heteromethod triangles. Moreover, by spec- Big Five questionnaires. The underlying assumption here
ifying the relationships between validity diagonals and tri- was that all questionnaires should basically capture the
angles as well as between the correlational patterns within same trait. Pace and Brannick concluded:
the triangles, Campbell and Fiske defined what evidence is
Convergent validities were lower than expected, indicating
needed to speak of convergent and discriminant validity.
substantial differences among tests. Such a result begs for an
Schweizer (2012) outlined some of the problems around
explanation of the differences among tests as well as a consid-
this approach, which we do not need to repeat here. eration of the implications of such differences for theory and
practice. (p. 674)
In fact, the largest overall convergent correlation was found

for Extraversion at .56, whereas the estimated overall reliabil-
ity of Extraversion measures was .83. Thus, even in the best
Two Implications From Campbell and case, about 50% of the instruments reliable variance is not
Fiskes MTMM Approach shared but rather unique to the specific questionnaires.
What the Pace and Brannick (2010) study highlights is
what has become known as the jingle-jangle fallacy, name-
Issue 1: Selecting Traits for the Study of ly, that scales with the same name may measure different
Convergent and Discriminant Validity things, and that scales with a different name may measure the
same thing. Here we demonstrate the utility of network dia-
We want to focus on two important issues. The first is im- grams (see Epskamp, Cramer, Waldorp, Schmittmann, &
portant to discriminant validity-related evidence. In select- Borsboom, 2012) in representing cross-sectional association
ing discriminant traits, Campbell and Fiske emphasized the matrices to visualize the jingle-jangle within personality
importance of providing a definition as well as positioning inventories and to highlight the challenge of selecting traits
the discriminant trait within a nomological net of the trait in discriminant and convergent validity studies.
to be measured by the new instrument. Such a framework Figure 1 shows a network representation of the corre-
provides the necessary depth of information to select ap- lation matrix between 113 personality facet scale scores
propriate discriminant traits. Surprisingly, validity studies from the NEO-PI-R, HEXACO, 6FPQ, 16PF, MPQ, and
often include correlations with numerous different opera- JPI derived from the Eugene-Springfield Community
tionalizations of the same trait, suggesting convergent va- Sample (Goldberg, 2005). Correlations are based on a
lidity-related evidence. When it comes to discriminant va- sample of 459 participants for whom complete data were
lidity-related evidence, sometimes no clear underlying ra- available. Within the figure, each facet scale score is a
tionale for selecting exactly these traits becomes obvious, node (circle) and the magnitude of the correlations be-
making it look arbitrary. However, as Campbell and Fiske tween them is depicted as an edge (line). The thickness of
already pointed out: the edge represents the magnitude of the associations. For
clarity, associations less than r = .35 have been sup-
When a dimension of personality is hypothesized, when a con- pressed.
struct is proposed, the proponent invariably has in mind dis-
Marked in gray are three clusters and two pairs of facets
tinctions between the new dimension and other constructs al-
that represent a series of situations with respect to the jin-
ready in use. One cannot define without implying distinctions,
and the verification of these distinctions is an important part gle-jangle fallacy and new test construction. First, consider
of the validational process. (p. 84) the two pairs of scales on the right hand side of Figure 1.
The two nodes labeled TR are the facet scales of Tradition-
Following this statement and the demands for selecting alism from the MPQ and JPI. As might be expected from
discriminant traits, it seems necessary to again recall the two scales that share a label, they are highly associated (r
requirement to clearly define the trait to be measured, em- = .77). However, the two nodes labeled CR and IN are the
bed it in a nomological net, and base the selection of dis- Creativity facet of the HEXACO and the Innovation scale
criminant traits on this network. This way, obtaining dis- of the CPI, respectively. Despite being labeled differently,
criminant validity-related evidence is more difficult but the pairwise association between these scales is nearly
more informative. It is necessary to show that a new meas- identical (r = .76) to that of the Traditionalism scales. Thus,
ure assessing a trait can be distinguished from an existing the question is whether this correlation is evidence for con-
measure, capturing a (closely) related trait. Nevertheless, vergent or discriminant validity?
these findings tell us a lot more about discriminant valid- This same question arises when we consider both
ity than do correlations with measures assessing very dis- broader clusters of traits within the nomological net and
tant traits. traits that may be considered to be among the most highly
European Journal of Psychological Assessment 2013; Vol. 29(3):157161 2013 Hogrefe Publishing
M. Ziegler et al.: Editorial 159
Figure 1. A network diagram of the correlations between 113 facet scales from the NEO-PI-R, HEXACO, 6FPQ, 16PF,
MPQ and JPI derived from the Eugene-Springfield Community Sample (n = 459). The diagram was constructed using the
qgraph package in R (Epskamp, Cramer, Waldrop, Schmittmann & Borsboom, 2012). The graph uses the spring option
and produces a diagram in which the length of the edges is dependent on the weight (correlation) between nodes. This has
the visual effect of drawing more closely associated nodes together in the graph layout.
researched in the field. For example, consider the cluster NEO-PI-R, two are the Perfectionism (PF) facets of the
of nodes at the bottom center of Figure 1. Two nodes are 16PF and HEXACO, and two are the Organization (OG)
the Social Boldness (SB; r = .70) facets of the 16PF and facets of the JPI and HEXACO. Of note here is that, while
HEXACO. The remaining three nodes, which share equiv- most nodes are quite highly related something we may
alent associations (mean r = .72; range = .61 to .79) with expect as they can be argued to all cluster under some
the other nodes in the node cluster, are the Exhibition facet (perhaps higher-order) Conscientiousness factor the two
of the 6FPQ (EX), Social Potency facet of the MPQ (SP), Perfectionism scales have a notably different pattern of
and the Social Confidence facet of the JPI (SC). If we associations with other related scales, despite sharing a
regard the magnitude of associations between the two So- facet label. As such, when selecting a Perfectionism scale
cial Boldness scales as being indicative of their conver- from an extant inventory to study the convergent or dis-
gent validity, then we have four different labels for the criminant validity of a new measure, our choice of com-
same construct within this single cluster. Next, consider parison Perfectionism scale may have profound implica-
the cluster of four nodes to the left of Figure 1, which tions for whether we consider our new scale to be distinct
represent the Anxiety (AX) facet from the NEO-PI-R, JPI, or not.
and HEXACO, and the Stress Reaction (SR) facet of the A cursory glance at the rest of the network graph shown
MPQ. The situation within this cluster is the same as that in Figure 1 highlights many other areas of local clustering
found for the sociability scales. The Anxiety scales have not emphasized here. Thus, when researchers follow
a mean correlation of .68, whereas the average correlation Campbell and Fiskes guidelines and select measures that
of the Stress Reaction scale with the three Anxiety scales purportedly capture the same trait to ascertain discrimi-
is .72. nant validity, they might be in for a surprise: The pattern
Finally, consider the cluster of nodes at the top of Fig- of convergent and discriminant correlations may not be as
ure 1. Two are the Order (OR) facets of the 6FPQ and expected. Test constructors have to be careful when se-
160 M. Ziegler et al.: Editorial
lecting convergent measures and ensure the highest pos- & Trierweiler, 2003; Podsakoff, MacKenzie, Lee, & Pod-
sible conceptual and statistical overlap. Again, this judg- sakoff, 2003). Despite these new modeling techniques,
ment requires a clearly defined construct embedded in a Campbell and Fiskes remark should call our attention to
clearly defined nomological net. Network visualizations the fact that we still do not know enough about the nature
of the nomological net of personality facets may greatly of method variance. Oftentimes method variance is per-
aid such decisions during scale development. ceived as variance due to the administration mode (e.g.,
paper-pencil). However, method variance could also be so-
cial desirability (Ziegler & Bhner, 2009), response sets or
Possible Reasons for the Jingle-Jangle styles (Wetzel, Carstensen, & Bhnke, 2013), or acquies-
cence (Rammstedt & Kemper, 2011), to name just three
Reasons for the low convergent validities Pace and Bran- examples. All of these terms are well known. However,
nick provided in their paper are item context (e.g., general with the possible exception of social desirability (Paulhus,
context or work specific context), breadth of the instru- 2002; Ziegler, MacCann, & Roberts, 2011), elaborated the-
ment, and test family. The latter refers to the differences ories of such method variance producing phenomena is
between instruments from the NEO family and those from scarce. Thus, researchers with an interest in psychological
the BFI family (see also Miller, Gaughan, Maples, & assessment should strengthen their efforts to shed light on
Price, 2011). However, the first as well as the second rea- method variance producing phenomena like social desir-
son stated bear further implications for assessment-orient- ability, response sets and styles, or acquiescence.
ed research. It is a well-documented fact that changing the
context of an item, for example, by adding in school
changes, (mostly) improves test-criterion correlations.
Reasons for this might be found within the ideas of Bruns-
wiks lens model (see also Miller et al., 2011). More im- Conclusion
portant here though is the question how this added piece
of information might change construct validity of the In this Editorial we wanted to raise awareness concerning
measurement tool used. Thus, we need empirical research some of the problems we believe to have been identified
to investigate these effects. regarding convergent and discriminant validation efforts.
The second reason for the low convergent correlations was Summarizing the thoughts outlined above, let us stress
breadth of the measurement tool. It is no new insight that most three aspects that papers reporting validation studies
traits can be described as being hierarchically organized: Be- should follow: (1) The trait to be measured should be
low a rather abstract domain there are narrower facets. As clearly defined and embedded within a nomological net-
before, there is evidence suggesting that such facets improve work. (2) Besides convergent validity, discriminant valid-
test-criterion correlations (Brunswik, 1955). However, for ity is important in order to gain a more complete picture
most traits there is no common agreement about the number of the validity of an instrument. To this end, different traits
and nature of such facets. Pace and Brannick stated: have to be assessed with different methods. The nomolog-
ical network should guide the selection of the discriminant
Recognition of the facets measured by tests may lead toward
understanding similarities and differences among personality trait(s). (3) Effects of method variance should be modeled.
tests, and perhaps the nature of any differential prediction by Moreover, this Editorial also revealed the need for more
tests. (p. 675) research in the areas of method variance producing phe-
nomena (e.g., acquiescence, response sets and styles, and
social desirability), effects of item context (e.g., items spe-
Issue 2: The Issue of Method Variance cifically phrased for school or work context), and the facet
structure underlying and defining broad domains.
The second issue we want to raise with regard to Campbell We want to end this Editorial with a quote from Camp-
and Fiske is method variance. Campbell and Fiske (1959) bell and Fiske (1959), which in our opinion is as true today
wrote: as it was in 1959:
The interpretation of the validity diagonal in an absolute fash- The test constructor is asked to generate from his literary con-
ion requires the fortunate coincidence of both an independence ception or private construct not one operational embodiment,
of traits and an independence of methods, represented by zero but two or more, each as different in research vehicle as pos-
values in the heterotrait-heteromethod triangles. . . . In prac- sible. Furthermore, he is asked to make explicit the distinction
tice, perhaps all that can be hoped for is evidence for relative between his new variable and other variables, distinctions
validity, that is, for common variance specific to a trait, above which are almost certainly implied in his literary definition. In
and beyond shared method variance. (p. 84) his very first validational efforts, before he ever rushes into
print, he is asked to apply the several methods and several traits
This pessimistic conclusion can be mitigated today. There jointly. His literary definition, his conception, is now best rep-
are different methodological approaches to modeling all resented in what his independent measures of the trait hold
kinds of method effects (e.g., Eid, Lischetzke, Nussbeck, distinctively in common. (p. 101)
M. Ziegler et al.: Editorial 161
Schweizer, K. (2012). On issues of validity and especially on the

References misery of convergent validity. European Journal of Psychologi-
cal Assessment, 28, 249254. doi 10.1027/1015-5759/a000156
Alonso-Arbiol, I., & van de Vijver, F. J. R. (2010). A historical Wetzel, E., Carstensen, C. H., & Bhnke, J. R. (2013). Consisten-
analysis of the European Journal of Psychological Assess- cy of extreme response style and nonextreme response style
ment, 26, 238247. doi 10.1027/1015-5759/a000032 across traits. Journal of Research in Personality, 47, 178189.
Brunswik, E. (1955). Representative design and probabilistic the- doi 10.1016/j.jrp.2012.10.010
ory in a functional psychology. Psychological Review, 62, Ziegler, M., & Bhner, M. (2009). Modeling socially desirable
193217. responding and its effects. Educational and Psychological
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discrim- Measurement, 69, 548565.
inant validation by the multitrait-multimethod matrix. Psycho- Ziegler, M., MacCann, C., & Roberts, R. (Eds.). (2011). New per-
logical Bulletin, 56, 81105. spectives on faking in personality assessments. New York: Ox-
Eid, M., Lischetzke, T., Nussbeck, F. W., & Trierweiler, L. I. ford University Press.
(2003). Separating trait effects from trait-specific method ef-
fects in multitrait-multimethod models: A multiple-indicator
CT-C(M-1) model. Psychological Methods, 8, 3860. doi
10.1037/1082-989x.8.1.38 Matthias Ziegler
Epskamp, S., Cramer, A. O. J., Waldorp, L. J., Schmittmann,
V. D., & Borsboom, D. (2012). Qgraph: Network visualiza- Institut fr Psychologie
tions of relationships in psychometric data. Journal of Statis- Humboldt University Berlin
tical Software, 48, 118. Rudower Chaussee 18
Goldberg, L. R. (2005). The Eugene-Springfield community sam- 12489 Berlin
ple: Information available from the research participants Germany
(Vol. 45, 1, Technical Report). Eugene, OR: Oregon Research Tel. +49 30 2093-9447
Institute. Fax +49 30 2093-9361
Miller, J. D., Gaughan, E. T., Maples, J., & Price, J. (2011). A E-mail zieglema@hu-berlin.de
comparison of agreeableness scores from the Big Five In-
ventory and the NEO PI-R: Consequences for the study of
narcissism and psychopathy. Assessment, 18, 335339. doi
10.1177/1073191111411671 Doreen Bensch
Pace, V. L., & Brannick, M. T. (2010). How similar are personal- Institut fr Psychologie
ity scales of the same construct? A meta-analytic investiga- Humboldt University Berlin
tion. Personality and Individual Differences, 49(7), 669676. Rudower Chaussee 18
doi 10.1016/j.paid.2010.06.014 12489 Berlin
Paulhus, D. L. (2002). Socially desirable responding: The evolu- Germany
tion of a construct. In H. I. Braun, D. N. Jackson, & D. E. Wiley Tel. +49 30 2093 9447
(Eds.), The role of constructs in psychological and educational Fax +49 30 2093 9361
measurement (pp. 4969): Mahwah, NJ: Erlbaum. E-mail benschd@cms.hu-berlin.de
Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P.
(2003). Common method biases in behavioral research: A crit-
ical review of the literature and recommended remedies. Jour-
nal of Applied Psychology, 88, 879903. Tom Booth
Rammstedt, B., & Kemper, C. J. (2011). Measurement equiva-
lence of the Big Five: Shedding further light on potential caus- Centre for Cognitive Ageing and Cognitive Epidemiology
es of the educational bias. Journal of Research in Personality, Department of Psychology
45, 121125. The University of Edinburgh
Schweizer, K. (2011). On the changing role of Cronbachs in Edinburgh EH8 9AD
the evaluation of the quality of a measure. European Journal United Kingdom
of Psychological Assessment, 27, 143144. doi 10.1027/1015- Tel. +44 131 650-8405
5759/a000069 E-mail tom.booth@ed.ac.uk
C. Co nsiglio
European et al.:Psychological
Journalof Work Competencies and
Assessment the Five-Factor
2013;
2012 Vol.
Hogrefe Mod el
29(3):162170
Publishing
Original Article
Framing Work Competencies

Through Personality Traits
The Big Five Competencies Grid
Chiara Consiglio1, Guido Alessandri1, Laura Borgogni1, and Ronald F. Piccolo2
1
Department of Psychology, Sapienza, University of Rome, Italy, 2Rollins College, Winter Park, FL, USA
Abstract. The present study investigated the construct validity and reliability of the Big Five Competencies grid (BFC grid), an instrument
framed and validated within a well-established conceptual model, the Five Factor Model (FFM) of personality. A first sample of 1,307
employees (65% males) in a variety of job types completed the BFC grid and a list of Big Five prototypical adjectives. Exploratory
structural equation modeling attested a six-factor competencies structure corresponding to four broad traits (Extraversion, Agreeableness,
Emotional stability, Openness) and two facets of conscientiousness. The six scales showed adequate reliability and dominant associations
with the corresponding personality traits. In a second sample of 150 employees (87% males), the six-factor structure was confirmed
together with convergent validity between self- and other-ratings of work competencies. Implications and suggestions for future results
are discussed.
Keywords: competencies, ESEM, Big Five traits, BFC grid
In most modern organizational settings, which tend to be or- despite the great popularity and apparent utility of a compe-
ganic, fluid, and characterized by near constant change, de- tency-based approach, there are at least two main concerns
scriptions for preferred employee behavior (i.e., job descrip- with application and measurement of these models: (1) the
tions) have evolved to capture a broad range of diverse func- lack of a widely accepted conceptual definition and interpre-
tional activities. Employees are expected to be flexible in tation of competency; and (2) the need for reliable and valid
their work activities, not limited in terms of their contribu- competency-based models and applications.
tions to organizational functioning. Traditional human re- Over the years, a number of scholars have proposed varied
source management (HRM) systems, however, apply a definitions of the word competency (Spencer & Spencer,
functional perspective that relies on context-specific skills 1993), some that may or may not include different individual
based on traditional job-based management of activities, characteristics causally related to successful performance.
which often fail to adequately capture the broad array of skills Consequently, the majority of competency models used in
and behaviors that employees bring to modern organizations. organizations are expressed as a miscellanea of indistinct in-
As an alternative, we assert that organizations benefit from dividual descriptors from varied constructs such as traits, mo-
employee-assessment systems based on work competencies, tivations, abilities, skills, knowledge, and behaviors (Markus,
which are more dynamic, flexible, and generalizable than Cooper-Thomas, & Allpress, 2005), whereas they should re-
context-dependent functional skills (Schippmann et al., fer exclusively to observable work behaviors (Bartram,
2000). Competencies are expressed in behaviors that reflect 2005). In addition, competency models are seldom validated,
how individuals perform successfully in a variety of circum- increasing the risk of bias and evaluation errors (Markus et
stances, in order to capture skills that are transferable and al., 2005). Hence, practitioners have routinely called for reli-
more indicative of work activity in contemporary organiza- able and validated competency tools so critical for the effec-
tions. Competency models, by identifying the specific di- tiveness of HRM programs.
mensions linked to successful performance applicable to dif- The use of a competency-based approach implies the
ferent roles and situations (Bartram, 2005), provide a map of choice between specific and universal models. Many HR
behaviors that is valued and rewarded within organizations. departments seem to prefer generic behavioral models
A competency approach to evaluating personnel has be- which are simpler, user-friendly, and allow for comparisons
come increasingly popular in the last several years, such that across roles. However, most of these models lack a valid
approximately 7580% of companies use competency-based conceptual framework that might identify which compe-
HRM systems (Becker, Huselid, & Ulrich, 2001). However, tencies should be assessed and developed (Hayes, Rose-
DOI: 10.1027/1015-5759/a000139
C. Consiglio et al.: Work Competencies and the Five-Factor Model 163
Quirie, & Allison, 2005; for a notable exception see Bar- most reliable predictor of job performance across occupa-
tram, 2005). tions, organizations and countries, while different combi-
In that vein, we believe one such conceptual framework nations of traits have yielded strong validities for specific
is the Five Factors Model of personality (FFM), which rep- occupations (Barrick, Mount, & Judge, 2001; Ones, Dil-
resents a universal and parsimonious model for personality chert, Viswesvaran, & Judge, 2007). A similar organiza-
description. The FFM may provide a relevant source of tional approach was taken by the NEO Job Profiler (Costa,
universal work competencies that is able to describe the McCrae, & Kay, 1995), which was designed to identify the
main behavioral domains related to job performance. Since different job requirements defined in terms of traits that
competencies are primarily predicated in traits (Bartram, contribute to performance in specific jobs.
2005; Kanfer & Ackerman, 2005), we propose in the pre- In our view, broad personality dimensions may serve as
sent study that a comprehensive and well-validated person- general frame to organize sets of effective work behaviors,
ality description model, namely the FFM, represents a con- namely generic competencies which may apply to all pro-
sistent and sound frame of reference for rooting a compe- fessions. If traits portray an individuals behavioral patterns
tency model applicable across diverse roles and in both work and nonwork related situations, then compe-
organizations. tencies, being rooted in traits (Kanfer & Ackerman, 2005;
Hence, the purpose of our study is fourfold: Spencer & Spencer, 1993), set the limits of how individual
1. To present the Big Five Competencies grid (BFC grid) potentialities turn into effective organizational behaviors.
for assessing work competencies. This grid is framed
within a valid personality model (the FFM);
2. To conduct a preliminary test of the construct validity The Big Five Competencies Grid
and reliability of the BFC grid;
3. To examine relationships between each of the five per- Given that personality is the most important source of com-
sonality traits and the competencies assessed with the petent behavior, we used the FFM as the theoretical frame-
BFC grid; work for generating the BFC grid. Because research attests
4. To examine the convergent validity between self and that all five traits predict work performance across different
other ratings of the BFC grid. professional groups (Ones et al., 2007), we assert that the
basic five traits may represent relevant sources of five dis-
tinct sets of effective work behaviors, each one operation-
Traits and Competencies alized as a generic work competency, namely:
1. Proactivity (i.e., starting and rapidly carrying on ones
Personality traits are pervasive and enduring styles of own work also through fast decision making) related to
thought, emotional expression, and behavior that account for Extraversion;
individual differences (McCrae & Costa, 1997). Traits repre- 2. Teamwork (i.e., effectively cooperating with colleagues,
sent stable individual proclivities acting as potentialities in- supporting them and integrate the contribution of differ-
forming individual functioning. The FFM created at the con- ent people) related to Agreeableness;
vergence of lexical and questionnaire approaches to the study 3. Innovation (i.e., handling work problems and situations
of personality, represents a framework of individual differ- in innovative and creative manners) pertaining to Open-
ences that is extensively validated across languages and cul- ness;
tures (McCrae & Costa, 1997). The model provides a consen- 4. Emotion management (i.e., effectively facing work
sual, objective, quantifiable description of essential surface stress, and emotionally effortful situations, such as neg-
tendencies that capture the between-person uniqueness. Spe- ative feedback or pressure on deadlines) related to Emo-
cifically, the FFM provides a comprehensive, economical, tional Stability;
comprehensible, and reasonably sufficient framework for or- 5. Accomplishment (i.e., organizing work activities and
ganizing individual differences in personality disposition persisting over obstacles and impediments to get goals)
(Caprara & Cervone, 2000, p. 72). The FFM include Extra- related to Conscientiousness.
version, Agreeableness, Emotional Stability, Conscientious-
ness, and Openness (see Caprara & Cervone, 2000, for a com- The present article presents two studies. The first, conduct-
plete description of the FFM). Consistent with the original ed on a large sample of employees from different positions
lexical hypothesis of personality description (Goldberg, and organizational settings, explores the factorial validity
1992), the FFM was originally formed using an adjectival of the hypothesized five competencies and its relationship
descriptors approach to measuring traits, which has demon- with traits, measured by adjectives. We do so by testing two
strated high levels of scale reliability and construct validity hypotheses:
(Barbaranelli, Caprara, & Steca, 2002). Hypothesis 1: The five identified competencies (namely,
The FFM has been largely employed in organizational Proactivity, Teamwork, Innovation, Emotion manage-
settings, and is widely regarded as occupationally relevant ment, and Accomplishment) represent five different but
given its consistent relationship to effective work behaviors correlated latent factors, hence each item will load on its
(e.g., Barrick & Mount, 1991). Conscientiousness is the corresponding factor.
164 C. Consiglio et al.: Work Competencies and the Five-Factor Model
Hypothesis 2: Each of the five competency scales is pri- to form the final version of the BFC grid comprising the
marily related with the corresponding personality trait following scales:
measured by prototypical adjectives. Proactivity, measured with 6 items (e.g., At work I ini-
tiate, rather than wait for instructions); Teamwork, mea-
The second study, carried out on an independent work sam- sured by 8 items (e.g., I help colleagues that are in trou-
ple, replicates the results of the first study concerning the bles);
factorial validity of the BFC grid and examines the conver- Innovation, measured by 7 items (e.g., I adopt innova-
gent validity between self and other ratings (colleagues) tive approaches when faced with problems);
assessment of work competencies. Competency tools are Emotion management, measured by 8 items (e.g., I
mainly used in multisource evaluations systems which in- keep calm even in presence of a negative work cli-
volve different informants (e.g., the job incumbent, super- mate);
visor, colleagues). Hence, in order to test whether the BFC Accomplishment, measured by 11 items (e.g., I guaran-
grid is a valid measure of both self and others ratings of tee the accomplishment of the goals assigned to me).
competencies, we test the following hypothesis:
Hypothesis 3: Each competency of the BFC grid will Each item was evaluated on a frequency scale ranging from
demonstrate an adequate level of convergent validity be- 1 (= never) to 7 (= always). This choice was made because
tween self and other ratings. behavioral frequency scales to assess competencies, as
compared to other judgment scales, are considered more
evidence based, acceptable, reliable and at the same time
independent from the job content (Grote, 1996).
Study 1
Personality
Method
Each participant gave self-descriptions by responding to a
Participants and Procedure list of 25 prototypical adjective markers of the FFM in the
Italian lexicon (Barbaranelli et al., 2002), indicating the ex-
The research involved a sample of 1,307 employees (35% tent to which each adjective was characteristic of his or her
females) from a cross-section of organizations (52% from personality (1 = not at all, 5 = very much). The list included
the public sector). Job positions include a wide range of five markers for each dimension: Extraversion (e.g., active,
occupational roles (28% were clerks, 32% were profession- dynamic); Agreeableness (e.g., cordial, unselfish); Consci-
als, 29% were blue-collars). The majority of these employ- entiousness (e.g., reliable, scrupulous); Emotional Stability
ees (44%) had worked between 7 and 10 years within the (e.g., level-headed, relaxed), and Openness (e.g., sharp, up-
same organization. The employees completed the question- to-date). The coefficients were .85, .70, .77, .81, and .83,
naire anonymously. In order to reduce sequence effects, respectively.
presentation of the measures of interest was counterbal-
anced such that half of the respondents had a survey with
items from the BFC grid first and adjectives from the FFM Statistical Analyses
second; while half had a survey with the FFM model first,
BFC grid second. We measured the two scales using dif- When the number of factors is expected to be known but
ferent items and anchors in order to reduce common meth- the specific pattern of salient and nonsalient loadings is not,
od bias. the exclusive pursuit of strictly exploratory or strictly con-
firmatory models is discouraged (Bollen, 1989). Alterna-
tively, partially confirmative models (Jreskog & Sorbom,
Measures 1979) are recommended. Accordingly, using MPlus (ver-
The BFC Grid sion 5.2, Muthn & Muthn, 2008), we performed an ex-
ploratory structural equation modeling (ESEM) procedure,
Initially, 90 items were derived from well-known existing which provides parameter estimates, standard errors, and
competency models (Bartram, 2005; Spencer & Spencer, goodness-of-fit statistics typically associated with factor
1993) together with a list of competency repertoires used analysis. Unlike CFA, in which each indicator is forced to
in performance appraisal systems of different organiza- load on factor and to have zero loadings on all other factors,
tions. Only observable work behaviors formulated in ac- ESEM estimates all possible factor loadings subject to con-
tion terms were selected, worded in the first person. Based straints, allowing the model to be identified (Asparouhov
on content analysis, three independent judges assigned & Muthn, 2009). This approach is preferred when (1)
each item to one of the five competency categories, or to a items pertaining to the same dimensions may differ broadly
sixth category if an item was deemed unfit for one of the in content and yield significant loadings on secondary fac-
five categories. The 40 items conjointly assigned by all the tors (as may be the case in this study), and when (2) a sim-
three judges to one of the five competencies were selected ple structure (i.e., solution with the highest number of un-
ambiguous marker variables per factor and the fewest num- 31), with an overall item mean (M) of 5.21 and standard
ber of variables with meaningful loadings, > .40, on more deviation (SD) of 0.57. The scores revealed a reasonably
than one factor), rather than an independent clusters model normal distribution according to skewness 0.83 (SD =
(ICM) (i.e., items have significant loadings on a factor and 0.32), and kurtosis 0.69 (SD = 1.01). With the exception of
zero loading on all other factors), is hypothesized. three items (19, 25, and 36), none of the items had skewness
As we expected positive relationships among different or kurtosis greater than 1.
competencies, we used an oblique Geomin rotation with =
.05 as recommended by Asparouhov and Muthn (2009). Be-
cause multivariate normality cannot be retained in our data ESEM Analysis1
(skewness normalized estimate = 74.98, p < .001; kurtosis
normalized estimate = 1011.20, p < .001), we used a Robust The five-factor solution showed a good data fit, (556) =
Maximum likelihood estimator (MLR), which takes the non- 1592.35, p < .01, CFI = .918, RSMEA = .042 (90% CI =
normal distribution of the data into account. For the evalua- .040 .045), SRMR = .029, AIC = 123300,96, and ac-
tions of covariance structure models, we used the goodness counted for 42.17% of variance. An inspection of the ro-
of fit supplemented by RMSEA, SRMR, and comparative fit tated matrix revealed that the fifth factor was comprised
indexes (CFI). Given that few applications of ESEM have of only four of the 11 hypothesized items for Accomplish-
systematically addressed the appropriateness of traditional ment (the remaining seven were sparse across factors).
indices of model fit (Marsh et al., 2009), in absence of well- Thus, we proceeded further in extracting a six-factor so-
established standards, we evaluated models utilizing the lution, which showed an adequate fit to the data, (522) =
same recommended standards for traditional structural equa- 1327,70, p < .01, CFI = .936, RSMEA = .039 (90% CI =
tion model analysis (e.g., RMSEA values < .08, .036 .041), SRMR = .025, AIC = 123104,31, and ac-
SRMR < .06, CFI > .90 for the CFI, see Kline, 2005). With counted for 54% of the variance. This solution provided
the aim of reliably comparing the alternative models, we a better fit than the five-factor solution (34) = 264.65,
adopted the Akaikes information index (AIC), which takes p < .01, and showed a lower AIC2. Each factor was mean-
into account not only model fit but also model parsimony ingfully and readily interpretable, with the Accomplish-
(Akaike, 1974): The lower the AIC index, the better is the ment items (6 and 5, respectively) loading on two different
goodness of fit. Moreover, in the evaluation of our models factors (Table 1).
fit, we also considered: (1) parameter estimates in relation to The first factor (27% variance explained; labeled Pro-
a priori hypotheses, (2) findings from previous research, (3) cess Management) was comprised of six items from the
generative theory, and (4) common sense, as recommended original Accomplishment scale. These six items referred
by Marsh and colleagues (2009). to behaviors concerning planning, organization and time-
We also computed factor determinacies (i.e., correlations liness of the work processes (e.g., In my job, I regularly
between item score estimates and their respective factors). organize my activities). The second factor (7% variance
These are crucial in the evaluation of factor analytic findings. explained) was comprised of the six items aimed at meas-
A highly indeterminate factor can produce radically different uring Proactivity; the third factor (5% variance ex-
factor scores that are nonetheless equally consistent with the plained) by the eight items aimed at measuring Emotion
obtained factor loadings. Relations among the FFM person- Management; the fourth factor (4% variance explained)
ality traits and competencies were examined using zero order by the eight items aimed at measuring Teamwork; the
and partial correlations (i.e., by computing the partial corre- fifth factor (4% variance explained) by the seven items
lation between each competence and its related trait, control- aimed to measure Innovation. Finally, the sixth factor
ling for the other four traits). (3% variance explained) was comprised of the five Ac-
complishment items that referred to behaviors con-
cerning goals attainment through effort and persistence
Results (e.g., I guarantee the accomplishment of the goals as-
signed).
Means (M), standard deviations (SD), skewness, and kur- All primary loadings were greater than .40 (M = .50;
tosis were calculated for each of the 40 items in this study. SD = 0.09); all secondary loadings were lower than .29
Means of the items ranged from 4.76 (item 5) to 6.16 (item (M = .04, SD = 0.08)3. Correlations between factors
1 All models tested differed only by the number of factors they implied. All were congeneric models, with no correlations among uniqueness
allowed.
2 As the number of factors in the final model was different from what hypothesized, we performed a Parallel analysis (Horn, 1965), in order
to further corroborated this unexpected result. Only six eigenvalues of the unreduced and unrotated correlation matrix (i.e., 10.69, 2.34, 1.62,
1.45, 1.35, 1.24, 1.05, 1.03, .95, .92, .88, .85, .80, .77, .76), resulted greater in value than the corresponding eigenvalue (at the 95th percentile)
associated of random data (i.e., 1.38, 1.34, 1.28, 1.25, 1.23, 1.21, 1.19, 1.18, 1.16, 1.14, 1.13, 1.11, 1.10, 1.08, 1.08). Accordingly, the
extraction of six factors seemed justified.
3 Following a recommendation by Asparouhov and Muthn (2009), we also performed a Target rotation. The results further corroborated
those presented in Table 1. When target-rotation was used, all items showed a substantial loading on the intended factors of the six-factor
Table 1. The ESEM solution for the best fitting model

F1 F2 F3 F4 F5 F6
Process management
Item1 .62** .10* .06 .00 .07 .07
Item2 .55** .06 .05 .09 .19* .05
Item3 .41** .15* .17 .00 .02 .07
Item4 .40** .18* .08 .03 .11* .05
Item5 .52** .02 .02 .04 .23* .11*
Item6 .51** .04 .04 .05 .11* .06
Proactivity
Item7 .15** .43** .02 .13** .05 .01
Item8 .07 .46** .06 .05 .18** .11*
Item9 .05 .66** .09 .12** .02 .13**
Item10 .08 .55** .00 .01 .01 .05
Item11 .17** .59** .03 .03 .06 .11*
Item12 .05 .76** .12* .02 .03 .09
Emotion management
Item13 .06 .16** .11* .42** .08 .04
Item14 .03 .07 .15** .44* .09 .00
Item15 .12** .02 .18** .40** .04 .01
Item16 .05 .05 .14** .61** .04 .01
Item17 .01 .02 .04 .49** .06 .26**
Item18 .04 .28** .01 .49** .00 .11*
Item19 .03 .11* .19** .45** .03 .13**
Item20 .04 .05 .00 .42** .19** .01
Teamwork
Item21 .02 .15** .46** .04 .03 .21**
Item22 .01 .08 .49** .00 .03 .26**
Item23 .09 .01 .44** .10* .05 .14**
Item24 .02 .11* .68** .06 .03 .04
Item25 .02 .01 .54** .06 .22** .01
Item26 .10* .04 .42** .08 .08 .21**
Item27 .05 .04 .41** .21** .12** .23**
Item28 .03 .00 .41** .15** .04 .18**
Innovation
Item29 .04 .08 .04 .00 .69** .03
Item30 .12** .00 .01 .01 .52** .10*
Item31 .00 .26** .01 .02 .56** .03
Item32 .01 .08 .05 .16** .52** .09
Item33 .07 .04 .01 .05 .46** .07
Item34 .10* .13** .05 .14** .42** .09
Item35 .09 .17 .07 .01 .49** .15**
Accomplishment
Item36 .20** .04 .07 .08 .05 .57**
Item37 .13** .06 .03 .03 .03 .45**
Item38 .12* .00 .06 .03 .01 .48**
Item39 .06 .11* .15** .04 .03 .43**
Item40 .06 .11* .15** .04 .03 .43**
Table 1 (continued)
F1 F2 F3 F4 F5 F6
1 2 3 4 5 6
1. Process management (.74)
2. Proactivity .49** (.71)
3. Emotion management .49** .35** (.79)
4. Teamwork .41** .40** .33** (.78)
5. Innovation .36** .51** .28** .37** (.76)
6. Accomplishment .25** .26** .21** .10* .24** (.76)
Notes. *p < .05, **p < .01. Primary loadings are in bold. Reliabilities are on the diagonal within brackets.
Table 2. Correlations between competencies and the Big Five personality traits
PM EM TM IN AC PR
r rp r rp r rp r rp r rp r rp
A .21** .08* .29** .06 .63** .37* .18** .01 .22** .09* .06 .02
E .36** .10* .26** .14** .21** .08* .27** .19* .31** .10* .38** .18**
C .60** .49** .29** .10* .31** .16* .28** .02 .53** .43** .23** .11**
S .08 .06 .45** .26** .21** .02 .11** .03 .06 .09* .03 .04
O .02 .05 .04 .11** .07** .01 .59** .35** .01 .04 .06 .08
Notes. R = Pearson 0 order correlation; rp = partial correlation. Numbers in bold indicate hypothesized competenciesBig five relationships.
A = Agreeableness; E = Energy; C = Conscientiousness; S = Emotional stability; O = Openness; PM = Process Management; EM = Emotion
Management; TM = Teamwork; IN = Innovation; AC = Accomplishment; PR = Proactivity.
ranged from .08 to .51 (M = .34, SD = 0.13) indicating a ment were both strongly associated with Conscientiousness,
good degree of discrimination among the extracted fac- Emotion Management had dominant correlations with Emo-
tors4. Moreover, all items showed a satisfying degree of tional Stability. Finally, Innovation was primarily related to
factor simplicity (e.g., the ratio between the percentage Openness. We further investigated relationships between
of variance shared with its factor and the percentage of competencies and personality traits using ESEM. First, we
variance shared with all other factors). Factor score de- established a measurement model for the BFC grid and the
terminacies proved to be adequately high (from .80 to Big Five traits. In particular, the five traits were modeled as
.93) as were reliabilities for the unit weighted scales (see single-indicator latent variables. Then, this measurement
Table 1). Finally, according to the Harman (1967) single model for the five traits was combined with the six factors
factor test, we extracted a one-factor solution to deter- model of the BFC grid. In this measurement model, we freely
mine whether the majority of the item variances-covari- estimated the correlations between latent variables (i.e., be-
ances can be accounted for by one general factor that cap- tween all the six competencies and all the traits). This model
tures the influence of method bias on observed item co- showed an adequate fit (687) = 1706.56, p < .01, CFI =
variances. This solution explained 24% of variance and .928, RSMEA = .038 (90% CI = .036 .040), SRMR = .026.
showed a poor data fit: (702) = 3988.10, p < .01, CFI = Then, the correlation of each competency with the conceptu-
.739, RSMEA = .067 (90% CI = .065.069), SRMR = ally most closely related trait was fixed to one. Fixing the
.059. correlation between (1) Process Management and Conscien-
tiousness ((1) = 220.61, p < .01), (2) Emotion Manage-
ment and Emotional Stability ((1) = 401.61, p < .01); (3)
Relationships with the Big Five Teamwork and Agreeableness ((1) = 200.61, p < .01); (4)
Innovation and Openness ((1) = 196.51, p < .01); (5) Ac-
All six factors of the competencies scale were strongly asso- complishment and Conscientiousness ((1) = 356.61, p <
ciated with individuals personality traits (Table 2). Accord- .01); and (6) Proactivity and Extraversion ((1) = 391.03,
ing to partial correlations, Teamwork was primarily associat- p < .01) resulted in a significant decrease in fit. Accordingly,
ed with Agreeableness, Proactivity was primarily associated competencies can be considered empirically distinct from
with Extraversion, Process Management and Accomplish- personality.
solution (M = .51; SD = 0.07). Furthermore, we found a fully comparable pattern of small secondary loadings (M = .03, SD = 0.09).
4 To better understand the instrument functioning, we compared the interscale correlations with the interfactor correlations, by computing the
individuals mean scores for each dimensions. In general, factor correlation resulted lower (M = .07; SD = 0.04) than observed scores
correlations.
Figure 1. CTCM model of self- and coworker-evaluated competencies. Note. All coefficients are significant at p < .05,
except those in italics. Self = self-evaluations; COW = coworker evaluations.
Results
Study 2
As a preliminary step, we ran two ESEM analyses, one for
Participants and Procedure self-ratings and one for coworker evaluations of competen-
cies. With regard to self-rated competencies, goodness of
The participants of this study were 150 employees working fit indexes attested a good fit for the six-factor solution
for a national mail-delivery company. The mean age of par- (522) = 890.88, p < .01, CFI = .943, RSMEA = .036
ticipants was 34.62 (SD = 9.99); 87% were men. The ma- (90% CI = .026 .071), SRMR = .029. All items showed
jority (64%) had worked between 10 and 15 years in the significant primary loading (M = .55; SD = 0.05) on their
same organization. For each participant, we obtained a peer intended factors, whereas the secondary loadings were gen-
rating from one of his/her colleagues on the dimensions of erally low (M = .04, SD = 0.08). Similarly, the ESEM six-
interest. Workers were separated to avoid possible data ma- factor solution provided a good fit to the data for cowork-
nipulations (e.g., impression management, socially desir- er-rated competencies, (522) = 771.90, p < .01, CFI =
able responding). .951, RSMEA = .031 (90% CI = .023 .081), SRMR =
.0225. Primary loadings were high (M = .53; SD = 0.06),
and secondary loadings were small (M = .05, SD = 0.11).
To evaluate interrater agreement, we suited the correlat-
Measures ed traits correlated methods (CTCM) approach using con-
firmatory factor analysis (CFA), and the estimated factor
Competencies were assessed using the same 40 items of scores for self and co-worker rated competencies6. This
Study 1. Reliability coefficients for each of the 6 subscales framework treats each informant (self and coworker) as a
were adequate with s ranging from .75 (Proactivity) to different method in order to decompose the observed score
.85 (Teamwork). These same items were worded in the variance into two orthogonal components: the method (un-
third person for the coworker measure of participants substantive or nuisance), and the construct (or substantive)
competencies. Again, reliability was good with s ranging variance. It also allows the latent method factors to be cor-
from .79 (Accomplishment) to .92 (Emotion manage- related across informants. This model fit our data well
ment). (27) = 21.42, p = .77, CFI = 1.00, RMSEA = .00 (.00,
5 The five-factor solution, was also tested for the self-ratings ((556) = 990.10, p < .01, CFI = .891, RSMEA = .041 (90% CI = .043 .111),
SRMR = .038), and coworker ratings ((556) = 891.10, p < .01, CFI = .902, RSMEA = .039 (90% CI = .029 .093), SRMR = .032) BFC
grid. In both cases the six-factor solution fitted better, as revealed by a significant partial test for both self (34) = 99.22, p < .01, and
coworker rating of BFC grid (34) = 99.0, p < .01.
6 The correlation matrix for this analysis may be obtained upon request by the corresponding author.
.052), SRMR = .059. Standardized parameter estimates are (FFM) that tap different aspects of performance. Our pur-
presented in Figure 1. Factor loadings were all statistically pose was to test a competency model of generic work be-
significant. haviors, in order to focus on general areas that can be rel-
Overall, convergent validity was found for all compe- evant across different jobs (clerks, blue-collars, and profes-
tencies. Moreover, all competencies were substantively sionals), across private and public organizations, and across
correlated. Higher latent correlations were observed be- different raters. In that vein, we assert that BFC grid can be
tween Teamwork and Process Management, Teamwork used in performance appraisal systems to evaluate different
and Emotion Management, Emotion Management, and occupational roles, covering all relevant personality areas
Proactivity. for performance and giving a contribution in reducing eval-
uation errors and with all scales operationalized at the be-
havioral level. Moreover, the BFC grid offers the opportu-
nity to integrate the performance appraisal system with oth-
Discussion and Implications er tools used in HR processes, such as selection and
potential evaluation, all framed within the same conceptual
Results support the factorial validity of the BFC grid pro- model, namely the FFM. Lastly, the attested convergent
viding evidence of a six-competency model, namely Ac- validity among different sources of evaluation can be con-
complishment, Process Management, Innovation, Emotion sidered an encouraging result given the fact that one of the
Management, Teamwork, and Proactivity. Contrary to our possible uses of the BFC grid is related to multisource per-
expectations, six and not five competencies emerged from formance appraisal systems and 360 feedback.
the ESEM analysis with two primary and differentiated fac- Concerning the limitations of this study, a potential
tors associated with conscientiousness. Process Manage- weakness is that, in the first study, traits and competencies
ment is likely associated with the dependability facet of were assessed by self-report, which exposes the observed
conscientiousness, defined as the tendency to be reliable, results to social desirability. As all items are positively
meet attendance standards and deadlines, avoid risks, and keyed and competency behaviors are all socially desirable,
refrain from impulsiveness; such aspects correspond to the people could have altered their responses regarding the oc-
inhibitory aspects of conscientiousness (Moon, 2001). Dif- currence of effective work behaviors. Even if it has been
ferently, Accomplishment is likely derived from the active demonstrated that social desirability does not destroy the
aspect of conscientiousness, ascribable to the achievement- construct validity of the five factors tools (Ones & Viswes-
orientation facet, defined as the tendency to set challenging varan, 1998), future research should add to the BFC grid a
goals, work diligently, and strive to improve performance. social desirability scale in order to account for the potential
It is well known that conscientiousness is the most impor- risk of positive distortion. That being said, our second study
tant and generalizable predictor of job performance (Bar- overcomes this limitation by integrating self-assessment
rick et al., 2001). Moreover among its facets, dependability with evaluations of competencies made by other raters. An-
and achievement orientation have been identified as the other limitation is related to the lack of an objective meas-
two dimensions more closely associated with job perfor- ure of performance. However, all the BFC items were de-
mance (Ones at al., 2007). rived from well-known competency model and from exist-
A satisfactory level of internal consistency was observed ing performance appraisal systems of different
for the model as each item was highly correlated with its organizations. By the way, some authors argue that other
respective scale. As predicted, personality was strongly cor- rated work behaviors may constitute an external perfor-
related with perceived competencies. Specifically, in Study mance criterion depending on the reliability of the col-
1, we were able to identify strong convergence between each leagues. In fact, colleagues ratings are routinely used to
individual competency and the hypothesized FFM dimen- obtain evaluations of contextual performance (Nielsen,
sion, thus providing support for the theoretical associations Hrivnak, & Shaw, 2009) and of job performance (Barrick
between each competency and the corresponding trait. More- et al., 2001; Piccolo, Greenbaum, den Hartog, & Folger,
over, the distinction between each of the six competencies 2010). Future studies should explore the specific contribu-
and its corresponding personality factors was also demon- tion and incremental validity of each competency in pre-
strated. In Study 2, we confirmed the psychometric properties dicting objective performance criteria beyond traits. More-
of the BFC grid and its convergent validity for the two differ- over, the role of alternative predictors of job performance,
ent informants (self and peer). Hence, competencies and traits such as mental ability, and its relationship with the six com-
represent different but related constructs. petencies should also be explored. Finally, the results of
Taken together, these results attest to the high psycho- Study 1 revealed the existence of some low to moderate
metric validity of this instrument offering evidence for in- secondary loadings in our model; hence factor scores
ternal consistency, factorial validity and convergent valid- would yield a more accurate representation of the con-
ity in relation to traits and sources of evaluations. structs of the BFC grid. In this regard, future research
One direct implication of this study is the development should further evaluate the validity of factor scores as com-
of a valid and reliable tool able to assess relevant work pared to unit weighted scores of competencies in predicting
behaviors (competencies) related to personality traits job performance.
In conclusion, organizations would benefit from includ- son-oriented perspective. In A. J. Elliot & C. S. Dweck (Eds.),
ing the BFC grid in their performance appraisal systems in Handbook of competence and motivation (pp. 336353). New
order to assess and develop generic work competencies York: Guilford.
with a theoretically sound instrument that has applicability Kline, R. B. (2005). Principles and practices of structural equa-
to a variety of jobs and organizations. tion modeling. New York: Guilford.
Markus, L. H., Cooper-Thomas, H. D., & Allpress, K. N. (2005).
Confounded by Competencies. An evaluation and use of com-
petency models. New Zealand Journal of Psychology, 34,
117126.
References Marsh, H. W., Muthn, B., Asparouhov, T., Ldtke, O., Robitzsch,
A., Morin, A. J. S., & Trautwein, U. (2009). Exploratory struc-
Akaike, H. (1974). A new look at the statistical model identifica- tural equation modeling, integrating CFA and EFA: Applica-
tion. IEEE Transactions on Automatic Control, 19, 716723. tion to students evaluations of university teaching. Structural
Asparouhov, T., & Muthn, B. (2009). Exploratory structural equa- Equation Modeling, 16, 439476.
tion modeling. Structural Equation Modeling, 16, 397438. McCrae, R. R., & Costa, P. T. (1997). Personality trait structure
Barbaranelli, C., Caprara, G. V., & Steca, P. (2002). BFA. Big Five as a human universal. American Psychologist, 52, 509516.
Adjectives, manual. Firenze: Giunti OS.
Moon, H. (2001). The two faces of conscientiousness: Duty and
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality achievement striving in escalation of commitment dilemmas.
dimensions and job performance: A meta-analysis. Personnel Journal of Applied Psychology, 86, 533540.
Psychology, 44, 126.
Muthn, L. K., & Muthn, B. (2008). MPlus users guide. Los
Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). Personality
Angeles, CA: Muthn & Muthn.
and job performance at the beginning of the new millennium:
What do we know and where do we go next? International Nielsen T. M., Hrivnak G. A., & Shaw M. (2009). Organizational
Journal of Selection and Assessment, 9, 930. citizenship behavior and performance: A meta-analysis of
group-level research. Small Group Research, 40, 555577.
Bartram, D. (2005). The great eight competencies: A criterion-
centric approach to validation. Journal of Applied Psychology, Ones, D. S., Dilchert, S., Viswesvaran, C., & Judge T. A. (2007).
90, 11851203. In support of personality assessment in organizational settings.
Becker, B. E., Huselid, M. A., & Ulrich, D. (2001). The HR Score- Personnel Psychology, 60, 9951027.
card: Linking people, strategy, and performance. Boston, MD: Ones, D. S., & Viswesvaran, C. (1998). The effects of social de-
Harvard Business School Press. sirability and faking on personality and integrity assessment
Bollen, K. A. (1989). Structural equations with latent variables. for personnel selection. Human Performance, 11, 245269.
New York: Wiley. Piccolo, R. F., Greenbaum, R., den Hartog, D., & Folger, R.
Caprara, G. V., & Cervone, D. (2000). Personality. Determinants, (2010). The relationship between ethical leadership and core
dynamics, and potentials. Cambridge, UK: Cambridge Uni- job characteristics. Journal of Organizational Behavior, 31,
versity Press. 259278.
Costa, P. T., McCrae, R. R., & Kay, G. (1995). Persons, places, Schippmann, J. S., Ash, R. A., Battista, M. A., Carr, L., Eyde,
and personality: Career assessment using the revised NEO per- L. D., Hesket, B., . . . Sanchez, J. I. (2000). The practice of
sonality inventory, Journal of Career Assessment, 3, 123139. competency modeling. Personnel Psychology, 53, 703740.
Goldberg, L. R. (1992). The development of the markers of the Spencer, L. M., & Spencer, S. M. (1993). Competence at work:
big five factor structure. Psychological Assessment, 4, 2642. Models for superior performance. New York: Wiley.
Grote, R. C. (1996). The complete guide to performance apprais-
al. New York: Amacom. Published online: May 11, 2012
Harman, H. H. (1967). Modern factor analysis. Chicago, IL: Uni-
versity of Chicago Press.
Hayes, J., Rose-Quirie, A., & Allison, C. V. (2005). Senior man-
agers perceptions of the competencies they require for effec- Chiara Consiglio
tive performance; implications for training and development.
Personnel Review, 29, 92105. Department of Psychology
Horn, J. L. (1965). A rationale and test for the number of factors Sapienza, University of Rome
in factor analysis. Psychometrika, 30, 179185. Via dei Marsi 78
Jreskog, K. G., & Srbom, D. (1979). Advances in factor anal- 00185 Rome
ysis and structural equation models. Cambridge, MA: Abt Italy
Books. Tel./Fax +39 64991-7723
Kanfer, R., & Ackerman, P. L. (2005). Work competence: A per- E-mail chiara.consiglio@uniroma1.it
T. G. Bau dson & F. Preckel: T(H)INK:
European Development
Journalof and
Psychological Validation
Assessment of anVol.
2013;
2012 Intelligence
Hogrefe Test
29(3):171181
Publishing
Original Article
Development and Validation

of the German Test for (Highly)
Intelligent Kids T(H)INK
Tanja Gabriele Baudson and Franzis Preckel
Department of Psychology, Giftedness Research and Education, University of Trier, Trier, Germany
Abstract. Background: Currently, no recently normed group intelligence test for primary school children is available in Germany to
cover verbal, numerical, and figural reasoning abilities. Furthermore, no test for this age group is difficult enough to be able to differentiate
especially at the upper end of the intelligence distribution. Aims: The Test for (Highly) Intelligent Kids T(H)INK was developed to fill
this gap. Methods: A total of 1,629 children from grades 1 to 3 of primary school took the 36-item test (one version per class level).
Results: At all three class levels, the test proved sufficiently reliable (Cronbachs = .73.80 for the full scale). Retest reliabilities after
1 year, assessed for a random subsample, ranged from .61 to .73. Factorial validity was established by exploratory and confirmatory
factor analyses, with a g factor explaining 57.6163.67% of the variance. Furthermore, convergent validity (correlations with a nonverbal
intelligence test, teacher and parent ratings, grades, and the probability for the child to be nominated for a gifted and talented program)
was satisfying, as was the predictive validity for average marks a year later. Discussion: Implications of the findings and ideas for future
research and development of the test are discussed.
Keywords: intelligence, intelligence test, cognitive ability, cognitive assessment, primary school, giftedness
Although the conception of cognitive ability and its develop- mains is currently available. Furthermore, no test for this
ment is still a matter of debate, intelligence can be considered age group is difficult enough to allow for differentiation at
one of the best predictors of academic and professional at- the right tail of the intelligence distribution. The newly de-
tainment, explaining a larger amount of variance than all oth- veloped and validated group intelligence test presented
er predictors (e.g., Gottfredson, 1997; Neisser et al., 1996). It here aims to fill this gap.
can be assessed objectively, reliably, and validly by psycho-
metric tests.1 In children, intelligence tests have proved their
usefulness in developmental and educational counseling and
interventions (e.g., identification of underachievement or
Structure and Development of Childrens
specific learning disabilities). Anastasi and Urbina (1997) Intelligence
outlined the advantages of individual testing (e.g., closer rap-
port between child and administrator, less boredom or frus- The early observation that different cognitive abilities tend
tration due to more flexible and adaptive procedures). These to correlate (Spearman, 1904) led to the now well-support-
issues are of particular importance when testing children ed assumption of an underlying general cognitive ability,
(e.g., Flanagan & Kaufman, 2009). However, group intelli- or g (Carroll, 1993). g has consistently been found to ac-
gence tests have specific advantages: They allow for an eco- count for the greatest share of interindividual differences
nomic assessment of childrens ability level (e.g., Ford & in cognitive ability (Jensen, 1998). As to intellectual devel-
Dahinten, 2005) and are therefore highly suitable for research opment, investment theory (Cattell, 1963; revision Schwei-
or screening purposes. In general, the choice of the assess- zer & Koch, 2001) posits that, during childhood and ado-
ment method depends on its purpose. lescence, fluid intelligence (Gf, the innate ability to solve
In Germany, no recently normed group intelligence test novel problems, e.g., by drawing inferences) is invested
for primary school children comprising crucial content do- into the acquisition of crystallized intelligence (Gc, the
1 Other factors influencing success (e.g., other ability domains, or person or environmental factors moderating the relationship between ability
and its successful actualization, in itself a fuzzy term), have been addressed, for instance, in the Munich Model of Giftedness (e.g., Heller
& Perleth, 2008), which conceives of ability as a multidimensional construct requiring beneficial person and environmental conditions to
fully develop. (The authors thank an anonymous reviewer for bringing forward this complementary aspect.)
DOI: 10.1027/1015-5759/a000142
172 T. G. Baudson & F. Preckel: T(H)INK: Development and Validation of an Intelligence Test
knowledge that has been gained, e.g., Horn, 1991). Because 1. They may not include all the content areas described
children have had fewer opportunities to acquire knowl- above (verbal, numerical, and figural, e.g., CFT 13 and
edge and because schools tend to standardize the growth CFT 20-R),
of the knowledge base during the time of schooling 2. They may provide outdated norms (e.g., KFT 13),
(Schweizer & Koch, 2001, p. 78), Gf can be assumed to be 3. They may allow for individual testing only (e.g., WISC-
the best estimate for childrens g.2 Both Gf-Gc and g factor IV),
theory can be subsumed under the widely acknowledged 4. They can be used from year 3 on only (e.g., MHBT-P,
CHC model of intelligence (McGrew, 2005). KFT 412 +, CFT 20-R), or
5. They do not have sufficiently high ceilings to allow for
differentiated assessment at the right tail of the IQ dis-
Measurement of Fluid Intelligence in tribution (applies to all tests mentioned except for the
Children MHBT-P).
To assess Gf in children, one must take verbal (VA), numeri- The Test for (Highly) Intelligent Kids T(H)INK was de-
cal (NA), and figural abilities (FA) into account (Marshalek, veloped to fill this gap: T(H)INK targets Gf by tapping ver-
Lohman, & Snow, 1983); especially reasoning in the three bal, numerical, and figural reasoning, accounting for the
domains is the best indicator of Gf and g (Lohman, 2002). VA theoretical and empirical findings outlined above.
and NA are valid predictors of scholastic and professional
achievement (Wai, Lubinski, & Benbow, 2005), whereas FA
are far less important in curriculum development (Wai, Lu- Test Development: Task Selection,
binski, & Benbow, 2009) and, hence, to academic success. Construction Principles, and Pilot Testing
However, uneven ability profiles in favor of FA have even
reported to be detrimental (Smedler & Trestad, 1996) or Besides representing the three targeted content areas, test
related to underachievement (Gohm, Humphreys, & Yao, tasks should be suitable for group testing (which excluded,
1998), underlining the importance of their early identifica- e.g., working memory tasks such as the WISC digit span
tion. FA also play a role in many disciplines of the fields of subtest). Because not all pupils are able to read and write
science, technology, engineering and mathematics (STEM). fluently by the end of the first grade, mainly multiple-choice
Furthermore, FA tests use nonverbal materials and may there- items (paper and pencil) using images or numbers as solu-
fore be a better estimate of g for children with verbal difficul- tions were used. For the numerical and figural domain, many
ties (e.g., nonnative or otherwise disadvantaged speakers), such tasks exist (e.g., number series or figural matrices) and
which is also one reason why they have frequently been used could thus be adapted to the requirements of the present test
in so-called culture-fair testing (CFT, see Ortiz & Dynda, (see Materials section for a detailed description of the final
2005, for a critical discussion). Often, such abstract materials tasks). However, only few adequate verbal tasks were avail-
are equally unfamiliar to all testtakers, reducing potential in- able, and only for children in grade 3 and above. Thus, new
fluences of Gc (as recommended by Beauducel & Kersting, types of items were developed for the present test. The con-
2002). Reasoning in all three domains verbal, numerical, struction principles will be described in the following.
and figural should thus be considered in the assessment of
Gf in children.
Development of Verbal Tasks: Construction
Principles
Rationale for Developing a New Test
Carroll (1993) differentiates two dimensions of verbal abil-
Several German tests have been proposed to assess primary ities: (1) receptive vs. productive skills, and (2) oral vs.
school childrens cognitive abilities, e.g., the WISC-IV as the written language. During early primary school, children do
most frequently used individual test (Petermann & Peter- not yet master writing, which excluded written production.
mann, 2011); the CFT 1 (Cattell, Wei, & Osterland, 1997) Because the test was intended as a group test, oral produc-
and the CFT 20-R (Wei, 2008) are German adaptations of tion had to be excluded as well. Therefore, T(H)INK fo-
Cattells Culture Fair Tests and are group tests that assess cuses on receptive skills, using a written instruction (to be
figural abilities; the KFT 13 (Heller & Geisler, 1983) and read aloud to the younger pupils) and pictures.
the KFT 412+R (Heller & Perleth, 2000; the German equiv- Language is structured into different interacting levels
alents to Lohmans CogAT) and the MHBT-P (Heller & Per- (Weinert, 2010). T(H)INK consists of a grammatical under-
leth, 2007) derived from the latter as multidimensional as- standing and a vocabulary part, thus including both syntactic
sessment procedures to be used from year 4 and 3 on, respec- (deep structure) and semantic (surface structure) elements.
tively. All tests have their shortcomings, though: Grammatical understanding was designed to tap verbal rea-
2 Results from a recent study (Schweizer, Troche, & Rammsayer, 2011) examining university students suggest this applies to older populations
as well.
T. G. Baudson & F. Preckel: T(H)INK: Development and Validation of an Intelligence Test 173
soning as one aspect of Gf. The child has to decode which use of strategies (Krajewski, Kron, & Schneider, 2004; Lockl
picture represents a given sentence consisting of familiar & Schneider, 2010). A more efficient strategy usually results
words, but varying in syntactic complexity. Item construction in time and resource gains only after it has been automatized.
followed theoretical and empirical findings about language During this automatization phase, however, children may ac-
development, especially acquisition of grammatical struc- tually require more time than with their prior (inferior) strat-
tures. Understanding of surface structures (semantics) always egy. Because many such metacognitive abilities are acquired
precedes the development of increasingly complex deep during primary-school age, the test was thus constructed as a
structures (syntax/morphology; Grimm & Schler, 1991). power test (150250% of the time the slowest child had need-
For instance, negative clauses are more complex than affir- ed in pilot testing) so that each child had the chance to work
mative clauses; compound clauses (two joint main clauses) on every single task (see Materials section for a detailed test
more so than single main clauses; and neither-nor construc- description). Finally, although the test was constructed ac-
tions, combining negation and compound clauses, represent cording to classical test theory, we did not want to preclude
an even higher difficulty level. Vocabulary, on the other hand, the possibility for item-response theoretic models (see Dis-
is one of the most important predictors of reading acquisition cussion section), which is preferably applied to unspeeded
and, as such, of scholastic achievement (e.g., Cunningham & tests (de Ayala, 2009).
Stanovich, 1997). Although vocabulary can be assumed to
represent Gc rather than Gf, it was included as an important
predictor of both understanding of grammatical structures
and later achievement (Grimm & Schler, 1991). Items were Method
derived from two sources: first, the most comprehensive em-
pirical study providing extensive information on childrens Participants and Procedure
active use of words (Pregel & Rickheit, 1987) as a proxy for
receptive (passive) vocabulary in the sense of Carroll (1993); Sample
and second, by informally asking parents of primary-school
children about words their children had recently learned and Data collection was carried out in two waves. In Phase I, data
used (productive, or active vocabulary). Common group vo- were gathered during the last 8 weeks of the school year (May
cabulary tests use written words; yet even if all answer op- to early July 2009) at 20 primary schools in Rhineland-Pala-
tions are read aloud, it may still be difficult for younger chil- tinate. In Phase I, only pupils from Y1 (35 classes), Y2 (36
dren to keep all solutions in mind, due to limited working classes), and Y3 (36 classes) took part. Phase II took place 1
memory capacities. T(H)INK therefore uses pictures (of con- year later (June 2010) at two out of the 20 schools; 5 first, 4
crete words mostly, as it is difficult to represent abstract terms second, 5 third, and 4 fourth classes participated. Data from
graphically). Altogether, 12 words of increasing difficulty both phases were collapsed for students attending the same
were thus selected. year. Thus, e.g., Y1 students who were tested again 1 year
later entered both the Y1 and the Y2 sample. For the factor
analyses, where data from different class levels were related
Pilot Testing to each other, the second cohort was excluded from the Y23
subsamples to prevent effects of repeated measurements of
We assessed the understanding of instructions with 16 pupils the same child. Six schools were located in communities with
from year 1 (Y1). The items were piloted in two phases. First, more than 5,000 inhabitants (urban), yielding slightly less
a pool of 45 items was administered to pupils from Y1 than half the sample (46.3348.14%) for Y1 to Y3. This is
through Y4 at two schools (Ns for Y1/Y2/Y3/Y4: comparable to overall Rhineland-Palatinate, where 45.1% of
31/68/39/49). Variance, difficulty, and corrected item-total the population live in communities of 5,000 or fewer inhab-
correlations were evaluated, whereupon items were dropped itants (Statistisches Bundesamt, 2010). The final sample con-
(11), modified (1), or newly designed (10 verbal, 14 numeri- sisted of 40 classes from Y1 (N = 533; 52.2% girls, 7.3%
cal, 4 figural). The four resulting test versions for Y1 to Y4 nonnative speakers [NN]), 40 from Y2 (N = 531; 49.7% girls,
(36 items each) were then administered at another school (Ns 9.4% NN), 41 from Y3 (N = 565; 54.7% girls, 7.8% NN), and
for Y1/Y2/Y3/Y4: 18/12/16/12) to fine-tune instruction 4 from Y4 (N = 76; 46.1% girls, 9.2% NN). For a part of the
wording and timing. Unlike in other group intelligence tests, sample, parent reports of the childrens age were available
time limits were avoided. Schweizer and Koch (2001) as- (Y1: 7 years 3 months, SD = 4.98 months, N = 444; Y2: 8
sume untimed testing to be better suited to assess the limits years 4 months, SD = 5.34 months, N = 481; Y3: 9 years 5
of reasoning abilities (or, in other terms, capacity). This re- months, SD = 5.45 months, N = 497). The percentage of non-
quires tasks of sufficient difficulty, which was the case with German native speakers was about representative for Rhine-
the present test. In primary-school pupils in particular, the land-Palatinate (overall rate: 7.6%; Statistisches Landesamt
time required to accomplish a task may not be the best indi- Rheinland-Pfalz, 2011). However, because the goal of the
cator of level of ability (Lavergne & Vigneau, 1997). One present study was to develop and validate the test and not yet
possible reason are changes in metacognitive development to standardize it, drawing a representative sample was not one
occurring during childhood, especially the acquisition and of our primary issues.
Procedure is not white, but in front of the square) has to be iden-

tified.
Prior to testing, the study was approved by the School Super- 5. V-Vocabulary: Subjects are to identify an object de-
visory Board and the Rhineland-Palatinate data protection scribed in the instruction out of 5 pictures (e.g., Which
commissioner; parental approval was obtained. In all parts of one is the reptile? Solution: alligator).
the test, students participated voluntarily and were rewarded 6. N-Arithmetic problems: Complex arithmetic problems
with a small toy. Children were tested in their classrooms. The including conversions between scales (e.g., from min-
main examiner and his/her assistant (in Y1 or if else deemed utes to hours) are to be solved.
necessary) presented themselves, handed out the test book- 7. N-Number series: The missing item from a number se-
lets, explained the tasks to come and the procedure. Instruc- ries is to be identified.
tions and items were also shown on overhead transparencies. 8. N-Routes on a map (in the style of SAGES-2; Johnsen
Each subtest was preceded by an example, which was solved & Corn, 2001): A route is described by the instructor
in the group and then explained again to ensure all children and has to be pursued on a simplified map by the subject,
had understood the idea. In Y1 and Y2, all text-based items who then computes the length of the way.
were read aloud. The test took about 90 min in Y1 (including
two breaks), 75 min in Y2 (one break), and 45/40 min in
Y3/Y4 (no break required). In Phase II, children were tested Culture-Fair Tests
with an additional nonverbal IQ test (Y1/Y2: CFT1, Cattell
et al., 1997; administered 12 days later; Y3/Y4: CFT 20-R, Both the CFT 1 (Culture-Fair Test, Scale 1; Cattell et al.,
Wei, 2008; administered after a break). Both CFT versions 1997; age- and class-based standardized IQ norms sampled
are partial adaptations of Cattells Culture Fair Test and meas- in 1995) and the CFT 20R (Wei, 2008; age- and class-
ure fluid intelligence using figural tasks. The tests were ad- based norms sampled in 2003/2004), as partial adaptations
ministered under speeded power conditions, as outlined in the of Cattells original tests, measure fluid intelligence with 5
manual. or 4 types of figural tasks, respectively (CFT 1: substitu-
tions, labyrinths, classifications, similarities, matrices;
CFT 20R: series, classifications, matrices, topologies).
Items are presented under speeded power conditions in a
Materials
multiple-choice format (paper and pencil). In the present
T(H)INK study, the complete CFT 1 and the short form of the CFT
20-R (part 1, 4 subtests) were administered.3
The final test consisted of four versions (Y1Y4), each
comprising the 36 items with the best item characteristics
for this class level from the pilot phase (12 figural, 12 ver- Teacher and Parent Questionnaires
bal, and 12 numerical items), with a partial overlap between
the four class levels (see Appendix for an overview of the Teacher and parent ratings of cognitive abilities, creativity,
items shared by the versions for Y1Y3; taken together, motivation, social behavior, and work attitude were assessed
T(H)INK 14 comprises 64 different items). The test con- with a self-developed questionnaire based on existing mea-
sisted of 8 subtests (3 figural [F], 2 verbal [V], and 3 nu- sures (e.g., Gifted Rating Scale, School Form; Pfeiffer & Ja-
merical [N]) presented in the following sequence: rosevich, 2003; Child Behavior Check List 418/Teachers
1. F-Picture concepts (similar to the German WISC-IV; Report Form; Achenbach & Rescorla, 2001; Fragebogen zur
Petermann & Petermann, 2011): 23 pictures are pre- Erfassung emotionaler und sozialer Schulerfahrungen
sented in 23 rows. Subjects are to select one item per FEESS; Rauer & Schuck, 2003, 2004). Teachers rated each
row so that all selected items go together. child on 19 items (7-point Likert scale from 0 = shows this
2. F-Graphical series completion (similar to the Charkow characteristic less than all other pupils in class to 6 = shows
tasks used in the Berlin Test of Intelligence Structure this characteristic more than all other pupils in class). The
BIS-4; Jger, S, & Beauducel, 1997): A series of sim- teacher rating scale consisted of 6 reliable scales (sample
ple graphics following a rule (e.g., a zig-zag line alter- items and Cronbachs ranges for Y1 to Y3 in brackets):
nately going up and down) is to be continued by two cognitive ability (understands new contents quickly, 6
items. items; = .96.97); creativity (has many ideas, 3 items;
3. F-Matrices: Subjects are to select the solution out of 5 = 92 in all cases), motivation (tries to solve really difficult
distractors which completes a 2 2 matrix. tasks, 3 items; = .85.86), social behavior (gets along
4. V-Understanding of grammar: One out of 45 images well with his/her classmates, 2 items; = .77.80), and
corresponding to a complex sentence (e.g., The triangle work attitude (works in a neat, clean and organized way, 2
3 We opted for the CFT a nonverbal test because it measures Gf (as intended with T(H)INK) but focuses on a different content factor. Both
the CFT 13 and the CFT 20-R are shorter than other group tests (e.g., the KFT suite), thus providing less strain, and are easy to administer
to groups. In addition, the CFT 20-R provides recent norms.
items; = .78.83). Scale sums were averaged if at least 80% sums for figural, verbal, and numerical reasoning) were ana-
of the items had been answered and were then z-standardized lyzed for difficulty and corrected item-total correlations ac-
for each class. Furthermore, the probability (0100%) with cording to classical test theory. Third, internal consistency of
which the teacher would recommend the child for a gifted and the total scale was examined (Cronbachs ). Test-retest reli-
talented program was assessed. ability was established by correlating pupils results with
The parent rating scale comprised 34 Likert-type items what they achieved 1 year later. In a fourth step, extensive
(ranging from 0 = is not at all true for my child to 5 = is validity examinations were conducted. Factorial validity was
perfectly true for my child). Only those items teacher and assessed using both exploratory (EFA) and confirmatory fac-
parent questionnaire had in common were considered for the tor analyses (CFA). Construct validity was established by
present study. As above, averaged scale sums were used if at correlations with teacher and parent ratings, school marks,
least 80% of the items had been answered. Except for social and the nonverbal IQ measures. Because the German grading
behavior, standardized alphas were acceptable across all age scale ranges from 1 = high distinction to 6 = insufficient,
groups (cognitive ability: .88.91; creativity: .77.79; moti- marks were inverted such that high values indicate higher
vation: .70.73; social behavior: .36.55; work attitude: achievement. All data were analyzed using PASW 18.0; the
.71.80). Furthermore, parents were asked about demograph- CFA was computed with AMOS 18.0.
ical data, their childs age and scholastic achievement. Be-
cause no marks are given in year 1 of the German school
system, the Y1 pupils parents were asked to make an estima-
tion of their childs achievement. For the parent ratings, only Results
questionnaires that had been answered by mothers, fathers,
or both were considered, because these groups did not differ Group Comparisons
with respect to their assessment. Other raters judgment (e.g.,
Regarding full scale sums, sex had an influence in Y2 only
grandparents, older siblings, foster parents, or neighbors; Y1:
(Mgirls = 18.61, SD = 5.66; Mboys = 19.93, SD = 5.73;
N = 16; Y2: N = 15; Y3: N = 13) differed significantly,
t(529) = 2.67, p < .01, Cohens d = .23). Significant dif-
though; these cases were therefore excluded.
ferences between language groups (German native speak-
ers/bilingual pupils with German as a first language [GB]
versus nonnative speakers [NN]) were observed in Y1 to
Data Analyses Y3 (Y1: MGB = 17.51, SD = 4.85; MNN = 15.28, SD = 4.50;
t(527) = 2.78, p = .006, Cohens d = .48; Y2: MGB = 19.61,
Except for the multiple group comparisons, which used data
SD = 5.73; MNN = 16.10, SD = 4.64; t(529) = 4.19, p <
from Phase I only, the following analyses are based on the
.001, Cohens d = .67; Y3: MGB = 21.40, SD = 5.09; MNN =
complete sample, including data from both Phase I and II.
17.68, SD = 5.40; t(560) = 4.63, p < .001; Cohens d = .71).
Incomplete data sets and other problematic cases (e.g., obvi-
ous noncompliance to instructions) were eliminated; the
number of invalid cases thus determined for Y1/Y2/Y3 were Table 1. Difficulties, standard deviations, and corrected
32 (5.7%)/23 (4.2%)/19 (3.3%). Prior to analyzes, data were item-total correlations
parceled by averaging the raw score sums for each content Year N Parcel p (SD) ritc Percentiles (25/50/75)
area (figural, verbal, numerical). A content-based differenti- 1 533 F .39 (0.18) .47 14/17/21
ation at a medium aggregation level makes sense from both V .57 (0.15) .36
theoretical and practical points of view. Theoretically (e.g., in
N .49 (0.20) .51
the BIS, Jger, 1984; or in the Radex model, Marshalek et al.,
Mean .48 .45
1983) and empirically (e.g., considering findings by Benbow
2 531 F .48 (0.20) .53 15/20/24
and colleagues), the three ability domains are well estab-
lished. Practically speaking, greater measurement error is to V .54 (0.18) .50
be expected with the younger children, caused by factors such N .58 (0.22) .59
as unfamiliarity with test-like situations, fluctuating atten- Mean .54 .54
tion, test anxiety, etc. (Ford & Dahinten, 2005); their results 3 565 F .49 (0.18) .46 18/22/25
are therefore less reliable than those of older children. To V .59 (0.17) .46
minimize measurement error, it makes sense to aggregate N .68 (0.21) .55
data at a level the researcher considers meaningful (see Little, Mean .59 .49
Cunningham, Shahar, & Widaman, 2002, for a critical dis- Note. Difficulties and item-total correlations were computed based on
cussion of parceling). The procedure has also been used in the content parcels (averaged sums of the three content areas figural
other intelligence tests, e.g., the SAGES-2 (Johnsen & Corn, (F), verbal (V), and numerical (N) ability). p = difficulty, SD = stand-
2001). ard deviation, ritc = corrected item-total correlation. Averaged difficul-
ties and item-total correlations for each year are given in the bottom
First, the T(H)INK full scale sums were examined for dis- lines of each respective year (in italics). The last column shows the
tribution and mean differences between sex and language number of items children at the respective percentile level had solved
group using t-tests. Second, item parcels (averaged scale correctly. No child at any class level attained the maximum of 36.
Item Analysis a year (i.e., VA, NA, and FA were not assigned the same
weights in Y1; same for Y2 and Y3). In addition, error vari-
Item difficulties and corrected item-total correlations for ances were equated within each year (i.e., error variances for
Y1 to Y3 (range and mean values across the three content VA, NA, and FA were the set to be equal in Y1; same for Y2
parcels) are reported in Table 1. Data for individual items and Y3), but not between years (i.e., error variances for VA
are given in the Appendix. in Y1, Y2, and Y3 were not equated; same for NA and FA).
This makes sense from a developmental point of view, as
factors external to the test itself have a stronger influence in
Reliability younger children; yet there was no reason to assume different
error variances for the three content areas within a year. The
Cronbachs was acceptable for all three full scales (36 MGC model showed excellent fit (/df = .73, p = .73; CFI =
items each; Y1/Y2/Y3: Cronbachs = .73/.80/.76). Sta- 1.00; RMSEA = .00, 90% confidence interval [.00; .02];
bility indices across 1 year were satisfactory at all class SRMR = .01). Again, standardized regression weights were
levels (Y1/Y2: r = .73, N = 51; Y2/3: r = .70, N = 66; highest for NA and lowest for VA for Y1 to Y3 (s for
Y3/Y4: r = .61, N = 64; all ps < .001). Y1/Y2/Y3: VA .60/.58/.59, NA .76/.75/.76, FA .65/.64/.65).
Construct Validity
Validity
Factorial Validity Teacher and parent ratings of cognitive abilities, creativity,
motivation, social behavior, and work attitude were used as
Factorial validity of the g-factor model was established us- criteria for construct validity; so was the probability
ing EFA and CFA. To identify g, we performed an unrotat- (0100%) with which the teacher would recommend the
ed principal components analysis using the three content child for a gifted and talented program. All correlations are
parcels (figural, verbal, and numerical reasoning) for Y1 to detailed in Table 3. For teachers, intelligence was related to
Y3, respectively. Data were adequate for factor analysis (all all domains; for parents, only to achievement-related do-
KMOs > .60, all ps/Bartletts test of sphericity < .01; Hair, mains. In both cases, the highest correlations were found for
Anderson, Tatham, & Black, 1998). One factor was extract- cognitive ability ratings. Convergent validity was also estab-
ed in all grades, explaining between 57.61 and 63.67% of lished by correlations between T(H)INK and CFT results
the total variance. At all three class levels, component load- from Phase II (Y1: r = .44, N = 40; Y2: r = .45, N = 54; Y3:
ings and communalities exceeded .67 (highest for numeri- r = .61, N = 89; all ps < .01). Because the CFT measures
cal and lowest for verbal tasks; see Table 2). nonverbal reasoning only, the results can be considered sat-
isfactory. Correlations with marks for mathematics (rs for
Table 2. Component loadings (CL), communalities (h2), ei- Y1/Y2/Y3: .36/.52/.62, all ps < .001) and German (rs for
genvalues (EV), and percent of variance explained Y1/Y2/Y3: .31/.38/.49, all ps < .001) yield further evidence
one-factor solution for the tests concurrent validity, while correlations with
CL h2 % variance sports as a nonintellectual subject corroborate its discriminant
Year F V N F V N Eigenvalues explained
1 .78 .68 .81 .61 .46 .66 1.73 57.61
Table 3. Correlations between T(H)INK and Teacher (TR)
2 .79 .77 .83 .62 .59 .70 1.91 63.67 and Parent Ratings (PR)
3 .75 .75 .83 .57 .56 .68 1.81 60.39
Notes. Method of extraction: principal components analysis, unrotat- Year 1 Year 2 Year 3
ed; one g factor extracted. Domain TR PR TR PR TR PR
COG .54*** .27*** .56*** .39*** .60*** .41***
To check whether the g-factor model would hold simulta- CREA .45*** .11* .48*** .25*** .53*** .18***
neously at all class levels, we conducted a CFA (multiple MOT .46*** .12* .47*** .28*** .51*** .24***
group comparison, MGC) for Y1 to Y3. Only data from
SOC .19*** .04ns .19*** .00ns .16*** .06ns
Phase 1 were considered (Y1/Y2/Y3: N = 486/474/476). In
WORK .36*** .15** .31*** .11* .40*** .25***
this analysis also, the g-factor model was tested using the
averaged content parcels (figural/FA, verbal/VA, and numer- N (listwise) 508 435 492 458 485 482
ical ability/NA) as indicators. Because content areas were G&T program .47*** n/a .48*** n/a .44*** n/a
represented by different items in the three class levels, par- N 464 n/a 489 n/a 462 n/a
celing had the further advantage that it allowed for compari- Notes. Teachers and parents rated childrens abilities in the following
sons despite these differences. The variance of g was set to 1 domains: cognitive abilities (COG), creativity (CREA), motivation
(MOT), social behavior (SOC), and work attitude (WORK). Teacher
across all three groups. Each content parcel was assigned the ratings were z-standardized classwise. G&T program = probability
same weight across Y1Y3 (i.e., weights for VA in Y1, Y2, with which a teacher would recommend a child for a gifted and tal-
and Y3 were equated; same for NA and FA), but not within ented program. n/a = not applicable. *p < .05, **p < .01, ***p < .001.
validity (rs for Y1/Y2/Y3: .05/.11/.06, p < .05 in Y2 only; turn, results from all these abilities combined; a recent study
in Y1/Y2/Y3, N = 383/440/464 parents provided information (Baudson, 2011) found that teachers implicit theories com-
on all 3 subjects). Finally, predictive validity was established prise an intellect/achievement dimension integrating sev-
by correlating cognitive abilities from Phase I to GPA from eral of the characteristics described in the teacher rating
Phase II (Y1/Y2: r = .48, N = 47; Y2/3: r = .56, N = 65; scales. Alternatively, it is conceivable that the impression a
Y3/Y4: r = .64, N = 64; all ps < .01). teacher has of a childs intellectual ability impacts his/her
judgment of other characteristics as well (halo effect).
A more differentiated view was observed in parents,
where test results correlated highest with ratings of the
Discussion childs cognitive abilities. Parents may thus perceive their
children in a more differentiated way than teachers do, a
The present study yields first evidence for the validity of a route possibly worth exploring. In this context, the low re-
new intelligence test assessing Gf in primary school children liability of the social behavior scale (parent rating) needs
across figural, verbal, and numerical reasoning as central to be considered. Low internal consistency indicates a het-
content domains of intelligence. EFA and CFA results docu- erogeneous construct, which might explain the low corre-
ment its factorial validity. A g-factor model was identified to lations as well. Despite higher reliability, however, teacher
represent cognitive ability well, which is in line with promi- ratings on this scale did not show much higher correlations.
nent models of intelligence structure (e.g., Jensen, 1998; Mc- In the light of the possibly more differentiated view parents
Grew, 2005). A multiple-group comparison across class lev- have on their children, it is conceivable that this is the case
els using CFA points to a comparable intelligence structure for social behavior as well. The two aspects addressed by
from Y1 to Y3 as assessed by the new test T(H)INK. Internal the scale (behavior toward peers vs. adults) may thus be
consistencies and retest reliabilities are acceptable to good. perceived as more distinct by parents than by teachers.
The test is of medium difficulty for Y1Y3; yet for Y1, a
sufficiently high ceiling can be safely assumed, thus allowing
for differentiated gifted identification. Correlations with Limitations
teacher/parent ratings, marks, and a nonverbal Gf test support
the concurrent and divergent validity of the test: Those con- Internal consistencies are acceptable for a group intelligence
structs that can be considered conceptually closest to Gf (e.g., test but might be improved by replacing a few bad apples.
teacher and parent ratings of cognitive ability; marks for Test items had been selected based on pilot test results; it was
math) showed the highest correlations, whereas more remote not foreseeable some would not pass the main test. However,
constructs (e.g., social behavior or marks for sports) showed parceling the items by content groups for the reasons outlined
substantially lower correlations. The test predicted students above (Little et al., 2002) buffered some of the shortcomings.
GPA 1 year later well. This issue will be addressed in future versions.
T(H)INK seems to work well as a group intelligence test. In addition, the validity of the teacher and parent rating
Whether its results are sufficient for individual case diagnos- scales needs to be further investigated. While being of high
tics as well remains to be examined. This question has hardly face validity and modeled on existing scales, correlations
been addressed empirically, and, to our knowledge, no avail- of the newly developed scales with other questionnaires
able group intelligence test provides differentiated norms for and objective measures need to be assessed. However,
individual vs. group testing. In practice, group-based norms much evidence speaks for the overall validity of the test.
are often uncritically transferred to individual testing, al- Thus, despite these minor shortcomings, T(H)INK can be
though the different situations are likely to elicit different considered a valid indicator of childrens cognitive ability.
responses as well (e.g., Kaplan & Saccuzzo, 2008), especially
in younger children. Which type of test is suited also depends
on the goals of the diagnostics. For screening and research Future Research
purposes, group tests like T(H)INK are well suited; for more
complex issues such as placement decisions or interventions, Efforts to remedy the above shortcomings are being under-
individual testing may be the better alternative. taken. Recently, new and more difficult items have been
An interesting secondary finding of our study was that pilot tested in the context of a large-scale study in Baden-
intelligence is related not only to teacher ratings of the childs Wrttemberg, proving that, even in its preliminary version,
cognitive abilities, but almost as strongly to their ratings of T(H)INK can attract interest from the scientific communi-
creativity, motivation, and work attitude. Although meta-an- ty. The aim of more difficult items is to avoid possible ceil-
alytic studies suggest that, for instance, divergent thinking ing effects beyond Y 1, especially because the test is to be
and reasoning are only slightly related (r = .17; Kim, 2005), validated with gifted children. Clinical validation samples
intelligence and creativity (and, furthermore, motivation and may include children with learning disabilities or specific
work attitude) seem to be conceptually closer to teachers than learning issues (e.g., dyslexia).
such empirical studies suggest. Teachers are trained to assess Results for a Y 4 version are promising, too: A prelim-
students achievement not their potential. Achievement, in inary version currently being extended has been success-
fully tested with a sample of N = 76, confirming the validity References

results for Y1Y3 presented here.
Beyond the figural CFT tasks, correlations between Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA
T(H)INK and specific numerical or verbal tests should be School-Age Forms and Profiles. Burlington, VT: University of
examined. Although T(H)INK was developed to measure Vermont.
Gf, tests assessing verbal abilities in particular might yield Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.).
evidence for its validity, considering the large impact of this Upper Saddle River, NJ: Prentice Hall.
domain during primary school age in particular. Baudson, T. G. (2011). The (mis-) measure of childrens cognitive
Further sampling variables (e.g., parent SES) should be abilities. Unpublished doctoral dissertation, University of Trier,
considered as well. Furthermore, the representative yet over- Trier.
all small quota of non-German native speakers is a possible Beauducel, A., & Kersting, M. (2002). Fluid and crystallized intel-
issue, which may be overcome by oversampling underrepre- ligence and the Berlin Model of Intelligence Structure (BIS).
sented groups (e.g., Reynolds & Kamphaus, 2003). Consid- European Journal of Psychological Assessment, 18, 97112.
ering the significant differences between German and non- Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-
German native speakers, separate norms might make sense, analytic studies. New York: Cambridge University Press.
although the practice has not remained undisputed. In this Cattell, R. B. (1963). Theory of fluid and crystallized intelligence: A
critical experiment. Journal of Educational Psychology, 54, 122.
line, future studies should also pay attention to how long a
Cattell, R. B., Wei, R. H., & Osterland, J. (1997). Grundintelligenz-
student has been learning German. For instance, in the BIS-
test Skala 1 CFT 1 [Test of Basic Intelligence, Scale 1] (5th ed.).
HB norming sample (Jger et al., 2006), group differences Gttingen: Hogrefe.
were no longer observable only after students had learned Cunningham, A. E., & Stanovich, K. E. (1997). Early reading acqui-
German for more than 8 years. This points out the importance sition and its relation to reading experience and ability 10 years
of attending to heterogeneity in groups that are less homoge- later. Developmental Psychology, 33, 934945.
neous than they appear at first sight. de Ayala, R. J. (2009). The theory and practice of item response
A particularly important aim is to extend the data base theory. New York: Guilford.
to provide norm data. T(H)INK has been successfully test- Flanagan, D. P., & Kaufman, A. S. (2009). Essentials of WISC-IV
ed only in Rhineland-Palatinate; a standardization sample assessment (2nd ed.). Hoboken, NJ: Wiley.
(this was not the goal of the present study), which is cur- Ford, L., & Dahinten, V. S. (2005). Use of intelligence tests in the
rently being planned, will include other German Federal assessment of preschoolers. In D. P. Flanagan & P. L. Harrison
states and other German-speaking countries (Austria, Swit- (Eds.), Contemporary intellectual assessment: Theories, tests,
zerland). The standardization will be performed using and issues (2nd ed., pp. 487503). New York: Guilford.
T(H)INK-2, the second version being based on the findings Gohm, C. L., Humphreys, L. G., & Yao, G. (1998). Underachieve-
presented here as well as on successfully piloted new items. ment among spatially gifted students. American Educational Re-
search Journal, 35, 515531.
Up to now, data have only been analyzed according to clas-
Gottfredson, L. S. (1997). Mainstream science on intelligence: An
sical test theory, calling for application of item response theory editorial with 52 signatories, history, and bibliography. Intelli-
models. Also, possibilities of test equating using shared items gence, 24, 1323. [Reprinted from The Wall Street Journal, De-
to make test scores comparable across different versions are to cember 13, 1994.]
be explored, with a special focus on longitudinal research. Grimm, H., & Schler, H. (1991). Heidelberger Sprachentwick-
Considering the good retest reliabilities and the comparable lungstest H-S-E-T II. Handbuch [Heidelberg Test of Language
factor structure across age groups, we assume T(H)INK will Development. Manual] (2nd ed.). Gttingen: Hogrefe.
be a promising route for economic testing of childrens cogni- Hair, J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (1998).
tive abilities in cross-sectional and longitudinal research. Multivariate data analysis (5th ed.). Upper Saddle River, NJ:
Prentice Hall.
Heller, K. A., & Geisler, H. J. (1983). KFT 13. Kognitiver Fhig-
Acknowledgments keitstest fr 1. bis 3. Klassen [Cognitive Ability Test for Grades
1 to 3]. Gttingen: Hogrefe.
The present research was funded by the Rhineland-Palatin- Heller, K. A., & Perleth, C. (2000). KFT 412+R. Kognitiver Fhig-
ate Ministry for Education, Science, Youth, and Culture, keitstest fr 4. bis 12. Klassen, Revision [Cognitive Ability Test
Germany. All statements expressed in this paper are the for Grades 4 to 12, revised]. Gttingen: Hogrefe.
authors own and do not reflect the official opinions or pol- Heller, K. A., & Perleth, C. (2007). MHBT-P. Mnchner Hochbega-
bungstestbatterie fr die Primarstufe [Munich Test Battery for
icies of the authors host affiliations.
the assessment of giftedness Primary school level]. Gttingen:
The authors wish to thank all participating students, Hogrefe.
teachers, and parents as well as the following student as- Horn, J. L. (1991). Measurement of intellectual capabilities: A re-
sistants for their invaluable help in administering the tests view of theory. In K. S. McGrew, J. K. Werder, & R. W. Wood-
and digitizing the data (in alphabetical order): Max Auers- cock (Eds.), WJ-R technical manual (pp. 197232). Chicago, IL:
wald, Hannah Frh, Johannes Gorges, Sven Gorholt, Lilia Riverside.
Held, Caroline Irmisch, Dorothee Katzenberger, Elisabeth Jger, A. O. (1984). Intelligenzstrukturforschung: Konkurrierende
Meier, Sarah Muno, and Vibeke Nordstrand. Modelle, neue Entwicklungen, Perspektiven [Research on intel-
ligence structure: Competing models, recent developments, per- Rauer, W., & Schuck, K.-D. (2003). Fragebogen zur Erfassung
spectives]. Psychologische Rundschau, 35, 2135. emotionaler und sozialer Schulerfahrungen von Grundschulkin-
Jger, A. O., Holling, H., Preckel, F., Schulze, R., Vock, M., S, dern dritter und vierter Klassen FEESS 34 [Assessment of
H.-M., & Beauducel, A. (2006). Berliner Intelligenzstruktur-Test third- and fourth-grade pupils emotional and social school expe-
fr Jugendliche: Begabungs- und Hochbegabungsdiagnostik riences]. Gttingen: Hogrefe.
(BIS-HB) [Berlin Test of Intelligence Structure for youth: As- Rauer, W., & Schuck, K.-D. (2004). Fragebogen zur Erfassung
sessment of giftedness and talent]. Gttingen: Hogrefe. emotionaler und sozialer Schulerfahrungen von Grundschulkin-
Jger, A. O., S, H.-M., & Beauducel, A. (1997). Berliner Intelli- dern erster und zweiter Klassen FEESS 12 [Assessment of
genzstruktur-Test. BIS-Test, Form 4 [Berlin Test of Intelligence first- and second-grade pupils emotional and social school ex-
Structure, BIS test, form 4]. Gttingen: Hogrefe. periences]. Gttingen: Hogrefe.
Jensen, A. R. (1998). The g factor: The science of mental ability. Reynolds, C. R., & Kamphaus, R. W. (2003). Handbook of psycho-
Westport, CT: Praeger. logical and educational assessment of children. New York: Guil-
Johnsen, S. K., & Corn, A. L. (2001). SAGES2. Screening assess- ford.
ment for gifted elementary and middle school students (2nd ed.). Schweizer, K., & Koch, W. (2001). A revision of Cattells invest-
Austin, TX: ProEd. ment theory: Cognitive properties influencing learning. Learning
Kaplan, R. M., & Saccuzzo, D. P. (2008). Psychological testing: Prin- and Individual Differences, 13, 5782.
ciples, applications, and issues. Stanford, CT: Cengage Learning. Schweizer, K., Troche, S., & Rammsayer, T. H. (2011). On the spe-
Kim, K. H. (2005). Can only intelligent people be creative? A meta- cial relationship between fluid and general intelligence: New ev-
analysis. The Journal of Secondary Gifted Education, 16, 5766. idence obtained by considering the position effect. Personality
Krajewski, K., Kron, V., & Schneider, W. (2004). Entwicklungsver- and Individual Differences, 50, 12491254.
nderungen des strategischen Gedchtnisses beim bergang Smedler, A.-C., & Trestad, B. (1996). Verbal intelligence: A key to
vom Kindergarten in die Grundschule [Developmental changes basic skills? Educational Studies, 22, 343356.
in strategic memory during the transition from kindergarten to
Spearman, C. (1904). General intelligence objectively determined
primary school]. Zeitschrift fr Entwicklungspsychologie und
and measured. American Journal of Psychology, 15, 201293.
Pdagogische Psychologie, 36, 4758.
Lavergne, C., & Vigneau, F. (1997). Response speed on aptitude Statistisches Bundesamt/Federal Statistical Office. (2010). Statisti-
tests as an index of intellectual performance: A developmental per- cal yearbook 2010 for the Federal Republic of Germany includ-
spective. Personality and Individual Differences, 23, 283290. ing international tables. Wiesbaden: Author.
Little, T. D., Cunningham, W. A., Shahar, G., & Widaman, K. F. Statistisches Landesamt Rheinland-Pfalz/Statistical Office of
(2002). To parcel or not to parcel: Exploring the question, weigh- Rhineland-Palatinate. (2011). Statistische Berichte: Haushalte
ing the merits. Structural Equation Modeling, 9, 151173. und Familie 2010 [Statistical reports: Households and families].
Lockl, K., & Schneider, W. (2010). Kognitive Grundlagen: Denken, Bad Ems: Author.
Gedchtnis und Metakognition [Cognitive foundations: Think- Wai, J., Lubinski, D., & Benbow, C. P. (2005). Creativity and occu-
ing, memory, and metacognition]. In E. Walther, F. Preckel, & S. pational accomplishments among intellectually precocious
Mecklenbruker (Eds.), Befragung von Kindern und Jugendli- youth: An age 13 to age 33 longitudinal study. Journal of Edu-
chen (pp. 7195). Gttingen: Hogrefe. cational Psychology, 94, 785794. doi 10.1037/0022-0663.97.3.
Lohman, D. F. (2002). Fluid intelligence, inductive reasoning, and 484
working memory: Where the theory of multiple intelligences falls Wai, J., Lubinski, D., & Benbow, C. P. (2009). Spatial ability for
short. Paper presented at the annual meeting of the American STEM domains: Aligning over 50 years of cumulative psycho-
Educational Research Association in Seattle, WA. logical knowledge solidifies its importance. Journal of Educa-
Marshalek, B., Lohman, D. F., & Snow, R. E. (1983). The complex- tional Psychology, 101, 817835. doi 10.1037/a0016127
ity continuum in the radex and hierarchical models of intelli- Weinert, S. (2010). Erfassung sprachlicher Fhigkeiten [Assessment
gence. Intelligence, 7, 107127. of verbal abilities]. In E. Walther, F. Preckel, & S. Mecklenbru-
McGrew, K. S. (2005). The Cattell-Horn-Carroll theory of cognitive ker (Eds.), Befragung von Kindern und Jugendlichen (pp. 227
abilities. Past, present and future. In D. P. Flanagan, & P. L. Har- 262). Gttingen: Hogrefe.
rison (Eds.), Contemporary intellectual assessment: Theories, Wei, R. H. (2008). Grundintelligenztest Skala 2 Revision (CFT
tests, and issues (2nd ed., pp. 136181). New York: Guilford. 20-R) [Test of Basic Intelligence, Scale 2 Revised]. Gttingen:
Neisser, U., Boodoo, G., Bouchard, T. J., Jr., Wade Boykin, A., Bro- Hogrefe.
dy, N., Ceci, S. J., . . . Urbina, S. (1996). Intelligence: Knowns
and unknowns. American Psychologist, 51, 77101.
Published online: May 11, 2012
Ortiz, S. O., & Dynda, A. M. (2005). Use of intelligence tests with
culturally and linguistically diverse populations. In D. P. Flan-
agan & P. L. Harrison (Eds.), Contemporary intellectual assess-
ment (pp. 545556). New York: Guilford. Tanja Gabriele Baudson
Petermann, F., & Petermann, U. (2011). WISC-IV. (German trans-
lation and adaptation of the WISC-IV by David Wechsler; 4th Department of Psychology, Gifted Research and Education
ed.). Frankfurt a. M., Germany: Pearson. Universittsring 15
Pfeiffer, S. I., & Jarosevich, T. (2003). Gifted Rating Scales School 54286 Trier
form. San Antonio, TX: Harcourt Assessment. Germany
Pregel, D., & Rickheit, G. (1987). Der Wortschatz im Grundschul- Tel. +49 651 201-4558 / Fax +49 651 201-4578
alter [Primary school childrens vocabulary]. Hildesheim: Olms. E-mail baudson@uni-trier.de
Appendix
Item characteristics and overview of overlapping items for year 13

Figural abilities
Grade level
1 (N = 533) 2 (N = 531) 3 (N = 565)
Item p SD ritc p SD ritc p SD ritc
PC1 .53 0.50 .20 .73 0.44 .20 .72 0.45 .21
PC2 .15 0.36 .12
PC3 .08 0.27 .14 .16 0.37 .22 .24 0.42 .25
PC4 .21 0.41 .14 .24 0.43 .06 .26 0.44 .11
PC5 .29 0.46 .31 .45 0.50 .35
PC 4 4 4
GSC1 .56 0.50 .31
GSC2 .33 0.47 .30 .60 0.49 .34 .64 0.48 .22
GSC3 .39 0.49 .26 .58 0.49 .39
GSC4 .26 0.44 .35 .51 0.50 .36 .50 0.50 .30
GSC5 .66 0.48 .36 .79 0.41 .37
GSC6 .11 0.32 .20
GSC 4 4 4
MA1 .55 0.50 .15
MA2 .53 0.50 .17 .76 0.42 .18
MA3 .54 0.50 .37 .71 0.45 .36
MA4 .50 0.50 .20 .54 0.50 .26 .62 0.49 .20
MA5 .48 0.50 .30
MA6 .24 0.43 .12 .21 0.41 .01
MA7 .58 0.49 .23
MA 4 4 4
Verbal abilities
Grade level
1 (N = 533) 2 (N = 531) 3 (N = 565)
UG1 .85 0.36 .19
UG2 .77 0.42 .19 .86 0.35 .23
UG3 .63 0.48 .29 .72 0.45 .29 .77 0.42 .25
UG4 .43 0.50 .31 .60 0.49 .34 .70 0.46 .29
UG5 .24 0.43 .24 .42 0.49 .33 .35 0.48 .23
UG6 .11 0.31 .03
UG7 .89 0.31 .31
UG8 .84 0.37 .38
UG 6 3 6
VO1 .89 0.31 .11
VO2 .61 0.49 .08 .69 0.46 .14
VO3 .90 0.29 .18 .94 0.22 .24
VO4 .83 0.37 .22
VO5 .27 0.45 .01 .36 0.48 .18 .29 0.46 .16
VO6 .24 0.43 .06 .31 0.46 .14 .33 0.47 .10
VO7 .86 0.35 .18
VO8 .44 0.50 .28 .55 0.50 .26

VO9 .55 0.50 .15 .64 0.48 .01
VO10 .40 0.49 .25
VO11 .24 0.43 .23 .38 0.48 .21
VO12 .51 0.50 .08
VO 6 9 6
Numerical abilities
Grade level
1 (N = 533) 2 (N = 531) 3 (N = 565)
AP1 .40 0.49 .26 .74 0.44 .31 .76 0.42 .36
AP2 .15 0.36 .25 .31 0.46 .28 .28 0.45 .25
AP3 .30 0.46 .12
AP4 .19 0.38 .22 .49 0.50 .44
AP5 .63 0.48 .36
AP6 .76 0.42 .45
AP7 .70 0.46 .43
AP 4 4 4
NS1 .68 0.47 .37
NS2 .59 0.49 .41
NS3 .44 0.50 .11 .64 0.48 .21
NS4 .36 0.48 .37 .78 0.41 .44 .87 0.34 .27
NS5 .75 0.43 .42 .87 0.34 .32
NS6 .41 0.49 .34 .50 0.50 .29
NS7 .17 0.37 .12
NS 4 4 4
RM1 .73 0.45 .40 .85 0.35 .34 .90 0.30 .29
RM2 .80 0.40 .41
RM3 .50 0.50 .31 .70 0.46 .29 .66 0.48 .33
RM4 .75 0.44 .28
RM5 .57 0.50 .49 .56 0.50 .44
RM6 .60 0.49 .34 .61 0.49 .32
RM 4 4 4
Notes. p = item difficulty, SD = standard deviation, ritc = corrected item-total correlation. The sequence of the subtests corresponds to their
sequence in the test. Abbreviations of subtests: PC = Picture Concepts; GSC = Graphical Series Completion; MA = Matrices; UG = Under-
standing of Grammar; VO = Vocabulary; AP = Arithmetic Problems; NS = Number Series; RM = Routes on a Map (see text for detailed
descriptions). The sum sign () indicates the number of items a particular subtest includes in the respective version.
P. T. van derEuropean
Heijden Journalof
et al.: Evaluation of theAssessment
Psychological MMPI-2 RC Scales
2013;
2012 Vol.with
Hogrefe the SCID
29(3):182188
Publishing
Original Article
The MMPI-2-Restructured Form and

the Standard MMPI-2 Clinical Scales
in Relation to DSM-IV
Paul T. van der Heijden1, Jos I. M. Egger2,3,4, Gina M. P. Rossi5,
Gitte Grundel6, and Jan J. L. Derksen7
1
Centre for Adolescent Psychiatry, Reinier van Arkel Mental Health Institute, s-Hertogenbosch, The
Netherlands, 2Behavioral Science Institute and Donders Institute for Brain, Cognition and Behavior,
Centre for Cognition, Radboud University Nijmegen, The Netherlands, 3Centre of Excellence for
Neuropsychiatry, Vincent van Gogh Institute for Psychiatry, Venray, The Netherlands, 4Pompe Institute
for Forensic Psychiatry, Forum GGZ, Nijmegen, The Netherlands, 5Faculty of Psychology and
Educational Sciences, Department of Clinical and Life Span Psychology, Vrije Universiteit Brussel,
Belgium, 6Centre for Adolescent Psychiatry, Reinier van Arkel Mental Health Institute,
s-Hertogenbosch, The Netherlands, 7Behavioral Science Institute, Radboud University Nijmegen, The
Netherlands
Abstract. In a Dutch sample of psychiatric outpatients (N = 94), we linked the Minnesota Multiphasic Personality Inventory2 (MMPI-2;
Butcher et al., 2001) Clinical scales and MMPI-2-Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008) Higher-Order (H-O)
scales, Restructured Clinical (RC) scales and Specific Problem scales to the Structured Clinical Interview for the Diagnostic and Statis-
tical Manual of Mental Disorders (DSM-IV, 4th ed.; American Psychiatric Association, 1994) by First, Spitzer, Gibbon, and Williams
(1995). Overall, the H-O scales, RC scales, and Specific Problem scales display meaningful relationships to Axis I and Axis II disorders
conceptualized by the DSM-IV. In addition, the RC scales demonstrate a moderate improvement in validity over the standard Clinical
scales. Theoretical and clinical implications are considered.
Keywords: MMPI-2-RF, MMPI-2, international adaptation, predictive validity, DSM-IV
Within the field of clinical psychology, the Minnesota Mul- I and Axis II disorders by First, Spitzer, Gibbon, and Wil-
tiphasic Personality Inventory-2 (MMPI-2; Butcher et al., liams (1995) and First, Spitzer, Gibbon, Williams, and Ben-
2001) is worldwide the most widely used and researched jamin (1997; e.g., SCID-I and SCID-II, respectively).
self-report measure of psychopathology and personality The RC scales were developed by Tellegen et al., (2003)
characteristics (Camara, Nathan, & Puente, 2000). The to solve some major interpretive difficulties of the MMPI
most recent update in the long history of the MMPI-2 is the Clinical scales. Among these difficulties are the high inter-
development of the MMPI-2 Restructured Form (MMPI- correlations due to item overlap and common variance re-
2-RF; Ben-Porath & Tellegen, 2008). The MMPI-2-RF, lated to psychopathology or emotional distress in general
with its RC scales and three H-O scales based on the RC (but not to specific diagnoses) (Helmes & Reddon, 1993).
scales as its core components, is shorter than the MMPI-2 Also, the Clinical scale item pool contains a vast amount
(i.e., 338 items versus 567 items) and contains one new and of information regarding a broad range of psychopatholo-
seven revised Validity scales, 23 Specific Problem Scales, gy. In developing the RC scales, Tellegen et al. (2003)
two Interest Scales, and five revised PSY-5 scales. The cur- wanted to preserve this valuable information of the Clinical
rent investigation presents a predictive validity study of the scales and simultaneously improve their uniqueness and
H-O scales, RC scales, and Specific Problem scales with discriminant validity.
respect to the Structured Clinical Interview for the Diag- Construction of the RC scales was performed in four
nostic and Statistical Manual of Mental Disorders (DSM- steps. First, the conceptualization and labeling of the com-
IV, 4th ed.; American Psychiatric Association, 1994) Axis mon distress factor in the Clinical scales was guided by the
DOI: 10.1027/1015-5759/a000140
P. T. van der Heijden et al.: Evaluation of the MMPI-2 RC Scales with the SCID 183
two-factor model of mood and affect (Tellegen, Watson, & RC1, RC2, and RC7 were related to current and lifetime
Clark, 1999). This distress factor was labeled Demoraliza- diagnosis of depression from the SCID, with a medium
tion and was constructed through factor analysis of items effect size (i.e., .30 r < .50; Cohen, 1992). However,
of Scale 2 (Depression) and Scale 7 (Psychasthenia), seeing depressive disorders were better predicted by Scale 2
that these scales have the most demoralization-related than by RC2. Ratings of anxiety disorders were correlat-
items. Then, these demoralization items were added to each ed most strongly with the restructured and traditional
Clinical scale, and a combined item factor analysis was versions of Scales 1, 2, 7, and 8 as well as with RCd.
performed, which for each Clinical scale yielded a demor- Anxiety disorders were better predicted by Scale 7 than
alization factor and a distinctive factor representing the by RC7. RC1 and Scale 1 displayed equal relations to
core element of that particular Clinical scale. Items that ratings of somatoform disorders, whereas substance
defined these core elements of each Clinical scale were abuse disorders were better predicted by RC4 than by
then correlated to the entire MMPI-2 item pool. Items from Scale 4. Simms and colleagues did not consider the
the MMPI-2 item pool with unique and strong correlations MMPI-2-RF H-O scales, nor did they consider SCID-II
were added to these core elements to enhance internal con- classifications for personality disorders (PDs).
sistency and content representativeness of the RC scales. The study by Simms and colleagues (2005) is the only
Finally, the scales were further improved by correlating one so far to relate the RC scales to the SCID-I. To our
them with relevant extra test criteria and by removing items knowledge, no study has investigated the relationship be-
that displayed poor validity. No item overlap between any tween the RC scales and the SCID-II. This is remarkable
RC scales was allowed. Nine RC scales were developed in because the SCID-I is commonly seen as the golden
this manner; no RC scales were developed for Clinical standard in the assessment of clinical diagnoses (Shear
scale 5 and 0 as they are not measures of psychopathology. et al., 2000). The SCID-I is therefore a valuable instru-
The restructuring of RC3 (Cynicism) was a bit different: A ment for exploring correlates of the RC scales, although
separate factor marked as navet was identified as the dis- the RC scales were not explicitly designed to predict clin-
tinctive core component of Scale 3 (Hysteria) because ical diagnoses (as the Clinical scales were). Arbisi, Sell-
items reflecting somatic complaints were already assigned bom, and Ben-Porath (2008) suggest the use of structured
to RC1 (Somatic Complaints). Items reflecting navet diagnostic interviews in validation studies to allow for an
were then scored inversely, and the final scale RC3 was
investigation of the diagnostic implications of elevated
named Cynicism, which also contains items of the original
scores on the RC scales. In addition, the SCID-II could
Scale 6 (Paranoia).
be considered an appropriate measure for the evaluation
Specific Problem scales were developed to measure
of PDs because personality-disordered patients often lack
distinctive components from the Clinical scales that were
self-insight (Westen, 1997), and a diagnostic interview
not captured by the RC scales, in addition to some facets
provides the possibility to ask for elaborations or con-
of the RC scales that require separate assessment for
crete examples for clarification.
example, substance abuse, which is part of RC4 (Antiso-
cial Behavior; Ben-Porath & Tellegen, 2008). Presently, there is paucity in research on the RC scales
Factor analysis of the RC scales revealed three under- outside the United States, while crosscultural compari-
lying dimensions representing internalizing problems, sons are needed to investigate whether U. S. validation
externalizing problems, and thought disorders. These studies on the RC scales may be generalized to other
three dimensions were the primary markers of three (Western) countries. For all of these reasons, we decided
Higher Order (H-O) scales Emotional/Internalizing Dys- to investigate the relationships of the H-O scales, RC
function (EID) [based mainly on the items from Demor- scales, Specific Problem scales, and standard Clinical
alization (RCd), Low Positive Emotions (RC2) and Dys- scales with the SCID-I and SCID-II. Based on prior re-
functional Negative Emotions (RC7)], Behavioral/Exter- search (e.g., Simms et al., 2005), we hypothesized that
nalizing Dysfunction (BXD) [based on items from overall the RC scales and Clinical scales would demon-
Antisocial Behavior (RC4) and Hypomanic Activation strate comparable convergent and divergent validity with
(RC9)] and Thought Dysfunction (THD) [consisting of respect to the SCID-I. Based on research with the SNAP
items from the Ideas of Persecution (RC6) and Aberrant (Simms et al., 2005), we hypothesized that the RC scales
Experiences (RC8) scales]. These dimensions parallel the would demonstrate better validity with respect to the
classical MMPI code types 2/7, 4/9, and 6/8 and are fairly SCID-II than their Clinical scale counterparts. In addi-
consistent with the higher-order factors found among tion, we hypothesized that, for several diagnostic criteria,
common mental disorders (Markon, 2010). the Specific Problem scales would demonstrate strong
Simms, Casillas, Clark, Watson, and Doebbeling correlation [e.g., Stress/Worry (STW) and Anxiety
(2005) compared the RC scales with the Clinical scales (AXY) for the anxiety-related disorders, Substance
with respect to the Schedule for Nonadaptive and Adap- Abuse (SUB) for substance use disorders and the Inter-
tive Personality (SNAP; Clark, 1993) and the SCID in a personal Scales for the schizoid, histrionic, narcissistic,
sample of psychology clinic clients (N = 285) and a sam- avoidant, and dependent PDs; Tellegen & Ben-Porath,
ple of military veterans (N = 567). They found that RCd, 2008].
184 P. T. van der Heijden et al.: Evaluation of the MMPI-2 RC Scales with the SCID
(2005). Participants were classified with a depressive dis-

Method order if they met criteria for one or more depression-related
disorders (except bipolar disorders), an anxiety disorder if
Participants they met criteria for any of the anxiety disorders, and with
a somatoform disorder if they met criteria for any of the
All participants (N = 101) were recruited in an outpatient
somatoform disorders or for fibromyalgia, irritable bowel
mental health institute in The Netherlands. Data were col-
syndrome or chronic fatigue syndrome (Wesley & White,
lected as part of the clinical routine, and the confidentiality
2004). A category substance use disorder (SUD) was made
of participants identities was maintained throughout the
for people with alcohol or drug abuse and/or dependency.
study process. Participants were excluded from the study
We did not create a category for psychotic disorders as the
who had MMPI-2 profiles that met the following criteria:
frequency of this type of disorders was too small in our
Cannot Say raw scores 30, VRIN and TRIN T score 80
outpatient sample. Only current diagnoses were used in the
and L > 80 (Butcher et al., 2001); or MMPI-2-RF profiles
analyses. For the PDs we calculated number of criteria met,
that met the following criteria: Cannot Say raw scores 15,
instead of diagnoses.
VRIN-r and TRIN-r T score 80, Fp-r 100 and L-r 80
(Ben-Porath & Tellegen, 2008). Seven participants were
excluded based on these criteria, one of whom demonstrat-
ed VRIN = 80 but no significant elevations on the F scale
Statistical Analyses
or Clinical scales. The other six excluded participants dis- First, we calculated that a minimum of 85 participants was
played high F scores (103 F T score 120) and Fp scores needed to realize a power of .80, based on = .05 and given
(102 Fp T score 117) and met criteria for several PDs an expected medium effect size (Cohen, 1992; Simms et
(M = 3, range 28) including borderline PD. The final sam- al., 2005). Reliability coefficients and intercorrelations
ple consisted of 94 participants (46 men and 48 women) were calculated for the MMPI-2 and MMPI-2-RF scales.
with a mean age of 36.6 years at day of testing (SD = 8.99; In addition, we investigated the diagnostic overlap in the
range 2157 years). current sample. Then, zero order correlations were calcu-
lated for the H-O scales, RC scales, Specific Problem
scales, and standard Clinical scales with the SCID-I and II
Measurements classifications. We calculated Fishers Z to test the differ-
ences in correlations between the Clinical scales and the
The Dutch language version of the MMPI-2 was translated RC scales with the SCID-I and II; Cohens q was calculated
and standardized for Belgium and the Netherlands by Derk- as effect size for these comparisons. Because of the large
sen, De Mey, Sloore, and Hellenbosch (1993). Detailed in- number of comparisons and Z-tests, we set the p-value for
formation about the translation process and the psychomet- significance conservatively to .01 for all the analyses.
ric properties of the MMPI-2 and MMPI-2-RF scales in the
Dutch normative sample is provided by Derksen et al.
(1993) and Van der Heijden, Egger, and Derksen (2008,
2010). No commercial version of the Dutch MMPI-2-RF Results
is available yet, and the Dutch language version of the
MMPI-2-RF is available only for scientific research and Detailed information about reliability statistics of the H-O
validation studies. We used uncorrected raw scores in the scales, RC scales, Specific Problem scales and Clinical
analyses. scales, intercorrelations among RC scales and Clinical
The SCID-I (First, Spitzer, Gibbon, & Williams, 1995; scales, and differences in average T-scores for RC scales
Groenestijn, Akkerhuis, Kupka, Schneider, & Nolen, 1998) and corresponding Clinical scales are available on request
and SCID-II (First, Spitzer, Gibbon, Williams, & Benja- from the first author. In terms of DSM-IV, there was con-
min, 1997; Weertman, Arntz, & Kerkhofs, 2000) are wide- siderable diagnostic overlap in the current sample. For ex-
ly used structured clinical interviews to measure clinical ample, the mean number of current Axis I disorders was
syndromes (CSs) and PDs based on criteria from the DSM- 2.88 (SD = 2.45). Of the 50 participants with depression-
IV. Three interviewers accounted for all the interviews in related disorders, 35 (70%) also met criteria for at least one
the current research. These interviewers were adequately of the anxiety disorders. On the other hand, of the 54 pa-
trained and experienced in administering the interview to tients with anxiety-related disorders, 35 (65%) patients also
guarantee good interrater reliability. When interviewers are met criteria for one of the depression-related disorders. The
sufficiently trained, the SCID-I and SCID-II showed excel- mean number of PDs in the current sample was 1.54 (SD =
lent interrater reliability and test-retest reliabilities in di- 1.68; range 07). The mean number of PDs for participants
verse studies (Weertman, Arntz, Dreessen, Van Velzen, & who met criteria of at least one PD (N = 58) was 2.5 (range
Vertommen, 2003; Zanarini et al., 2000). 17). The avoidant and borderline PD had the highest fre-
In the analyses, we reduced all SCID-I classifications to quencies in the sample (i.e., 26 and 23, respectively).
four broad disorders in line with Simms and colleagues Table 1 presents the bivariate correlations for H-O scales,
Table 1. Correlations for HO scales, RC scales, and clinical scales with clinical syndromes and personality disorders
(N = 94)
MMPI-2/MMPI-2-RF SCID-I classifications SCID-II classifications for personality disorders
HO scales, RC scales and Clinical Depr Anx Som SUD Avoid DepenObs Pass Depr Par Szt Szd His Nar Bor Asp
scales
Emotional/internalizing dysfunction .51 .44 .27 .04 .54 .37 .31 .24 .66 .37 .28 .10 .07 .00 .44 .02
(EID)
Thought dysfunction (THD) .05 .10 .13 .04 .11 .18 .09 .21 .29 .44 .35 .08 .02 .15 .36 .16
Behavioral/externalizing dysfunction .10 .01 .12 .21 .10 .07 .06 .27 .04 .31 .13 .11 .02 .16 .37 .65
(BXD)
Demoralization (RCd) .50 .41 .23 .03 .54 .38 .25 .22 .62 .35 .29 .07 .13 .03 .43 .02
Somatic complaints (RC1) .13 .25 .33 .06 .21 .29 .14 .19 .41 .43 .19 .04 .19 .03 .38 .06
Scale 1: Hypochondriasis (Hs) .22 .32 .35 .04 .25 .33 .14 .21 .46 .49 .22 .09 .17 .01 .45 .08
Low positive emotions (RC2) .47 .30 .16 .01 .44 .26 .17 .04 .40 .10 .01 .18 .05 .04 .16 .16
Scale 2: Depression (D) .52 .45 .29 .02 .44 .33 .14 .16 .54 .27 .11 .01 .18 .03 .35 .17
Cynicism (RC3) .08 .20 .14 .03 .08 .01 .16 .33 .23 .39 .21 .18 .15 .22 .29 .20
Scale 3: Hysteria (Hy) .29 .19 .28 .11 .03 .23 .08 .02 .24 .16 .04 .21 .26 .05 .28 .07
Antisocial behavior (RC4) .07 .07 .21 .30 .02 .07 .09 .31 .16 .41 .22 .06 .00 .01 .45 .58
Scale 4: Psychopathic deviate (Pd) .35 .30 .29 .11 .11 .17 .14 .40 .45 .47 .22 .09 .08 .04 .46 .19
Ideas of persecution (RC6) .02 .16 .18 .10 .09 .11 .03 .24 .28 .44 .27 .13 .01 .18 .25 .03
Scale 6: Paranoia (Pa) .33 .25 .09 .16 .19 .22 .14 .23 .42 .41 .19 .15 .04 .25 .32 .03
Dysfunctional negative emotions .24 .37 .23 .04 .41 .36 .19 .27 .56 .51 .31 .03 .11 .06 .52 .16
(RC7)
Scale 7: Psychasthenia (Pt) .40 .37 .27 .03 .44 .37 .26 .25 .61 .47 .29 .01 .17 .01 .54 .12
Aberrant experiences (RC8) .01 .07 .17 .06 .13 .27 .14 .17 .31 .40 .37 .01 .07 .08 .50 .21
Schizophrenia (Sc) .30 .32 .27 .01 .36 .34 .27 .33 .53 .53 .35 .07 .17 .03 .54 .18
Hypomanic activation (RC9) .17 .03 .01 .06 .19 .02 .23 .36 .02 .41 .18 .08 .16 .20 .40 .47
Scale 9: Hypomania (Ma) .10 .10 .06 .00 .23 .00 .10 .28 .03 .38 .14 .02 .19 .17 .40 .32
Notes. Depr = Depression; Anx = Anxiety; Som = Somatoform; SUD = Substance Use disorder; Avoid = Avoidant PD; Depen = Dependent
PD; Obs = Obsessive-Compulsive PD; Pass = Passive Aggressive PD; Depr = Depressive PD; Par = Paranoid PD; Szt = Schizotypal PD; Szd =
Schizoid PD; His = Histrionic PD; Nar = Narcissistic PD; Bor = Borderline PD; Asp = Antisocial PD (Personality Disorder). Correlations with
a strong effect size (i.e., r = .50) are in bold, correlations with a medium effect size are in italics. rs = .27 are significant at p < .05.
RC scales, and corresponding Clinical scales with SCID-I anxiety-related disorders, whereas Stress/Worry (STW)
and II classifications. Table 2 provides correlations for the displayed a medium correlation (r = .43) with these disor-
Specific Problem scales with the same criteria. As Table 1 ders. As hypothesized, Substance Abuse (SUB) showed a
indicates, the RC scales demonstrate convergent and discrim- medium correlation with SUD. Juvenile Conduct Problems
inant validity in relation to the SCID-I and II classifications: (JCP) demonstrated a strong relationship with antisocial
RCd (Demoralization) has a strong correlation with depres- PD and Suicidal/Death ideation with borderline PD. Shy-
sive disorders and a medium correlation with anxiety-related ness (SHY) was strongly correlated with the avoidant PD
disorders. RC2 (Low Positive Emotions) demonstrates the and Activation (ACT; heightened excitation and energy
strongest relationship with depressive disorders, whereas level) with borderline PD. Disaffiliativeness (DSF; dislik-
RC7 (Dysfunctional Negative Emotions) demonstrates its ing people and being around them) showed medium corre-
strongest relationship with anxiety-related disorders. RC1 lations with depressive, schizoid and avoidant PD. The In-
(Somatic Complaints) displays a medium relationship with terpersonal Scales were not related to the histrionic and
somatoform disorder (r = .33), and RC4 (Antisocial Behav- narcissistic PDs.
ior) is clearly related to SUD (i.e., medium effect, r = .30). We did compare correlations for RC scales and Clinical
The RC scales also exemplify meaningful relationships with scales with DSM-IV Axis I and Axis II disorders. Here, we
the PDs. For example, RC6 (Ideas of Persecution) shows a found some significant differences in correlations, but none
unique and medium correlation with the paranoid PD, and of these differences reached a large effect size (i.e., Cohens
RC4 (Antisocial Behavior) demonstrates a strong correlation q = .50). For example, RC4 (Antisocial Behavior) dis-
with the antisocial PD. played a stronger correlation with antisocial PD than Scale
Concerning the Specific Problem scales, Anxiety 4 did (i.e., Z = 3.10; p < .01; Cohens q = .46). Similarly,
(AXY) demonstrated a small correlation (r = .24) with the RC3 (Cynicism) demonstrates stronger relationships with
Table 2. Correlations for Specific Problem scales with clinical syndromes and personality disorders (N = 94)
SCID-I classifications SCID-II classifications for personality disorders
MMPI-2-RF Specific Problem scales Depr Anx Som SUD Avoid Depen Obs Pass Depr Par Szt Szd His Nar Bor Asp
Somatic/Cognitive and Internalizing Scales

Malaise (MLS) .42 .33 .27 .06 .25 .31 .18 .22 .52 .44 .16 .04 .15 .05 .50 .12
Gastrointestinal complaints (GIC) .16 .27 .41 .10 .25 .24 .16 .15 .37 .30 .21 .07 .04 .04 .32 .17
Head pain complaints (HPC) .15 .21 .26 .02 .13 .31 .08 .09 .33 .31 .22 .07 .19 .03 .32 .06
Neurological complaints (NUC) .01 .15 .19 .06 .10 .15 .06 .24 .27 .38 .07 .03 .20 .12 .30 .03
Cognitive complaints (COG) .34 .33 .31 .06 .35 .38 .28 .21 .48 .32 .24 .07 .20 .01 .45 .11
Suicidal/death ideation (SUI) .23 .07 .05 .09 .03 .23 .24 .18 .16 .29 .29 .01 .06 .11 .50 .21
Helplessness/hopelessness (HLP) .41 .26 .17 .03 .46 .35 .22 .21 .53 .29 .30 .13 .03 .07 .35 .02
Self-doubt (SFD) .29 .31 .21 .03 .48 .28 .20 .08 .57 .21 .27 .00 .00 .14 .29 .08
Inefficacy (NFC) .19 .26 .18 .08 .47 .36 .25 .19 .45 .37 .33 .04 .05 .08 .41 .17
Stress/worry (STW) .38 .43 .24 .01 .39 .32 .23 .28 .60 .41 .25 .08 .19 .07 .49 .11
Anxiety (AXY) .15 .24 .25 .06 .22 .26 .01 .19 .34 .40 .26 .09 .25 .02 .47 .17
Anger proneness (ANP) .18 .13 .21 .05 .01 .11 .35 .41 .36 .37 .07 .01 .15 .04 .42 .22
Behavior-restricting fears (BRF) .06 .17 .15 .04 .19 .26 .15 .10 .26 .22 .23 .01 .21 .02 .31 .12
Multiple specific fears (MSF) .001 .20 .10 .06 .19 .17 .09 .02 .21 .20 .18 .14 .04 .24 .21 .12
Externalizing and Interpersonal Scales
schizoid PD than Scale 3 (Z = 2.67; p < .01; Cohens q = tualized as demoralization (Tellegen et al., 2003). In our
.40). We also compared our correlation table (RC scales sample, RC2 (Low Positive Emotions) had a stronger rela-
with Axis I classifications) with correlations found by tionship with depression than RC7 (Dysfunctional Nega-
Simms and colleagues (2005) in a sample with 564 military tive Emotions), whereas for anxiety the opposite was true.
veterans (88% male). None of 36 comparisons demonstrat- RCd (Demoralization) has equal correlations with both
ed a significant difference (p < .01). anxiety and depression.
RC3 (Cynicism) displayed medium correlations with the
paranoid and passive-aggressive PDs. High scores on RC3
reflect cynical beliefs about others, believing others look
Discussion out only for their own interests, and being distrustful of
others, which are indeed core beliefs of the paranoid and
As hypothesized, the RC scales demonstrate convergent passive aggressive PDs (Millon & Davis, 1996). RC9 (Hy-
and discriminant validity with respect to the SCID-I and II. pomanic Activation) exemplifies medium-size correlations
For example, our findings concerning the relationships be- with the paranoid, borderline, antisocial, and passive ag-
tween RC2 and RC7 and the DSM-IV measures for anxiety gressive PDs. Elevated scores on RC9 reflect aggression,
and depression were in line with Tellegens model (1985). poor impulse control, mood instability, sensation seeking,
According to this model, low positive emotions (reflected and other forms of undercontrolled behavior (Ben-Porath
by RC2) are distinctive for depressive disorders, whereas & Tellegen, 2008). However, high scores on RC9 are also
high negative affectivity (i.e., RC7) is related to anxiety. associated with narcissistic personality features, although
The common variance that is often seen in depression, and the correlation of RC9 with the narcissistic PD is small in
anxiety can be considered as a higher order factor concep- the current sample. This may be due to the fact that the
criteria for the narcissistic PD from the SCID-II are very (APA, 1994). Also, comorbidity is widespread among
obvious and transparent and therefore not easy to admit in common mental disorders (Krueger, 1999). Moreover,
an interview. In our sample only 4 persons met the criteria the categories we made are the most common ones in
for a narcissistic PD. Simms et al. (2005) found a strong mental health practice, and we calculated that our sample
correlation for RC9 with the narcissistic PD in the SNAP. should be big enough to reach a power of .80. Further-
The Specific Problem scales showed meaningful rela- more, we included patients with a variety of Axis I and
tionships with the DSM-IV classifications. The internaliz- Axis II disorders, which reduces the possibility to restrict
ing scales were related primarily to the internalizing disor- range effects. Another limitation might be that we did not
ders, whereas the externalizing scales were related primar- consider some of the MMPI-2 content and PSY-5 scales
ily to SUD and antisocial PD. Contrary to our hypotheses, that show considerable conceptual overlap with the RC
we didnt find significant correlations for the Interpersonal scales (Nichols, 2006; Van der Heijden et al., 2008). That
scales and the narcissistic and histrionic PDs, probably be- would make a fairer comparison if one wants to evaluate
cause of low frequencies of these disorders in the current possible redundancy of the RC scales (Nichols, 2006).
sample. Anxiety (AXY; pervasive anxiety, frights, frequent Yet, these particular scales do not appear in the MMPI-
nightmares) showed only a small correlation with anxiety- 2-RF so the possible redundancy is not an issue for cli-
related disorders, whereas it showed medium correlations nicians who solely use the new MMPI-2-RF scales.
with depressive, borderline, and paranoid PD. A possible An interesting direction for future research could be
explanation might be that the anxiety-related disorders in the connections of the MMPI-2-RF conceptualizations
the current sample is a broad dimension, whereas the AXY with current proposals for the DSM-5 (personality) dis-
scale is a narrow defined construct. orders. Furthermore, the treatment utility (Nelson-Gray,
Our hypotheses for comparisons of both sets of scales 2003) of the MMPI-2-RF scales should be considered in
with the DSM-IV disorders were only partially con- future research. Ultimately, clinicians are most interested
firmed. RC scales measuring externalizing behaviors and in the way these scales can add in the prediction of rele-
psychotic symptoms demonstrate some improvement in vant issues such as treatment response and treatment out-
convergent and discriminant validity over their Clinical come.
scale counterparts. For example, RC4, RC6, and RC8 are As mentioned above, there is no commercial version
less saturated with depression-related variance and thereof the MMPI-2-RF available yet in the Netherlands, but
fore display better divergent validity with respect to de- publication is planned. In the United States, both MMPI-
pression-related disorders. However, for RC scales meas- 2 versions are available at the same time. So which one
uring internalizing pathology this is not the case; these to chose? As far as the current research is concerned, the
RC scales display no improvement compared to the Clin- H-O scales, RC scales, and Specific Problem scales show
ical scales with respect to internalizing disorders. That meaningful correlations with the SCID-I and II classifi-
being said, the RC scales definitely do not perform worse cations, although they were not explicitly designed to
than their Clinical scale counterparts with respect to the predict these diagnoses. Moreover, the RC scales show a
SCID, although these scales were not designed to predict moderate improvement compared to the more compre-
these diagnoses. Moreover, the RC scales have fewer hensive Clinical scales. The RC scales show the same
items and thus could be seen as more efficient. Overall, reliability statistics, less intercorrelations (Tellegen et al.,
we agree with Simms and colleagues (2005), who stated 2003; Van der Heijden et al., 2008), and slightly better
that the RC scales appear to represent a modest psycho- discriminant validity than the Clinical scales. In addition,
metric improvement over the standard clinical scales the MMPI-2-RF has 40% less items. The RC scales pro-
(p. 357). Interestingly, our findings were very much in vide an efficient and homogeneous measure, whereas the
correspondence with findings by Simms and colleagues, Clinical scales may show more syndrome fidelity and can
offering a first indication for the generalizability of U.S. rest on an enormous empirical tradition. However, new
validation studies to the Dutch language version of the research on the MMPI-2 is becoming more and more re-
MMPI-2. lated to the RC scales and H-O scales instead of the Clin-
Potential limitations of the current study must also be ical scales. So the development of an equally impressive
considered. First of all, we had to reduce the DSM-IV and extensive research base for the MMPI-2-RF scales to
Axis I disorder to four broad categories (without a cate- draw upon during test interpretation seems just a matter
gory with psychotic disorders) because of the relative of time. That makes the MMPI-2-RF more promising for
small sample and the low frequency of some severe psy- the future.
chiatric disorders. In addition, the SCID classifications
were not mutually exclusive; there is considerable diag-
nostic overlap in the current sample. This limits our abil- Acknowledgments
ity to draw inferences about discriminant validity. On the
other hand, one should have access to an overwhelming We thank Ben Hogerheijde, Ilvy Mayen, Marcel van de
big sample with an extensive variety of problems and dis- Langenberg, and Shakira Cornelissen for their participation
orders to capture all 374 classifications in DSM-IV in this research project.
chotic patients in community clinics. American Journal of Psy-

References chiatry, 157, 581587.
Simms, L. J., Casillas, A., Clark, L. A., Watson, D., & Doebbeling,
American Psychiatric Association. (1994). Diagnostic and statis- B. N. (2005). Psychometric evaluation of the restructured clin-
tical manual of mental disorders (4th ed.). Washington, DC: ical scales of the MMPI-2. Psychological Assessment, 17,
American Psychiatric Association. 345358.
Arbisi, P. A., Sellbom, M., & Ben-Porath, Y. S. (2008). Empirical Tellegen, A. (1985). Structures of mood and personality and their
correlates of the MMPI-2 restructured clinical (RC) scales in relevance to assessing anxiety, with an emphasis on self-re-
psychiatric inpatients. Journal of Personality Assessment, 90, port. In A. H. Tuma & J. D. Maser (Eds.), Anxiety and the anx-
122128. iety disorders (pp. 681706). Hillsdale, NJ: Erlbaum.
Ben-Porath, Y. S., & Tellegen, A. (2008). MMPI-2-RF users Tellegen, A., & Ben-Porath, Y. S. (2008). MMPI-2-RF Technical
guide for reports. Minneapolis, MN: University of Minnesota manual. Minneapolis, MN: University of Minnesota Press.
Press. Tellegen, A., Ben-Porath, Y. S., McNulty, J. L., Arbisi, P., Gra-
Butcher, J. N., Graham, J. R., Ben-Porath, Y. S., Tellegen, A., ham, J. R., & Kaemmer, B. (2003). MMPI-2 restructured clin-
Dahlstrom, W. G., & Kaemmer, B. (2001). Minnesota Multi- ical (RC) scales: Development, validation and interpretation.
phasic Personality Inventory-2: Manual for administration Minneapolis, MN: University of Minnesota Press.
and scoring (revised ed.). Minneapolis, MN: University of Tellegen, A., Watson, D., & Clark, L. A. (1999). On the dimen-
Minnesota Press. sional and hierarchical structure of affect. Psychological Sci-
Camara, W. J., Nathan, J. S., & Puente, A. E. (2000). Psycholog- ence, 10, 297303.
ical test usage: Implications in professional psychology. Pro- Van der Heijden, P. T., Egger, J. I. M., & Derksen, J. J. L. (2008).
fessional Psychology: Research and Practice, 31, 141154. Psychometric evaluation of the MMPI-2 restructured clinical
Clark, L. A. (1993). Schedule for nonadaptive and adaptive per- scales in two Dutch samples. Journal of Personality Assess-
sonality: Manual for administration, scoring, and interpreta- ment, 90, 456464.
tion. Minneapolis, MN: University of Minnesota Press. Van der Heijden, P. T., Egger, J. I. M., & Derksen, J. J. L. (2010).
Cohen, J. (1992). Quantitative methods in Psychology. A Power Comparability of scores on the MMPI-2-RF scales generated
Primer. Psychological Bulletin, 112, 155159. with the MMPI-2 and MMPI-2-RF Booklets. Journal of Per-
Derksen, J., De Mey, H., Sloore, H., & Hellenbosch, G. (1993). sonality Assessment, 93, 254259.
MMPI-2: Handleiding bij afname, scoring en interpretative Weertman, A., Arntz, A., Dreessen, L., Van Velzen, C., & Ver-
[Minnesota Multiphasic Personality Inventory-2: MMPI-2: tommen, S. (2003). Short-interval test-retest interrater reliabil-
Manual for administration, scoring, and interpretation]. Nijme- ity of the Dutch version of the Structured Clinical Interview
gen, The Netherlands: PEN Test Publishers. for DSM-IV Personality Disorders (SCID-II). Journal of Per-
First, M. B., Spitzer, R. L., Gibbon, M., & Williams, G. B. W. (1995). sonality Disorders, 17, 562567.
Structured clinical interview for DSM-IV axis I disorders pa- Weertman, A., Arntz, A., & Kerkhofs, M. (2000). Handleiding
tient edition (SCID-I/P), Version 2.0. New York: Biometrics Re- Gestructureerd Klinisch Interview voor DSM-IV As-II Per-
search Department, New York State Psychiatric Institute. soonlijkheidsstoornissen [Manual structured clinical interview
First, M. B., Spitzer, R. L., Gibbon, M., Williams, G. B. W., & for DSM personality disorders]. Lisse, The Netherlands: Swets
Benjamin, L. (1997). Structured clinical interview for DSM-IV Test Publishers.
axis II personality disorders (SCID-II). Washington, DC: Wessely, S., & White, P. D. (2004). There is only one functional
American Psychiatric Press. somatic syndrome. British Journal of Psychiatry, 185, 9596
Groenestijn, M. A. C., van, Akkerhuis, G. W., Kupka, R. W., Westen, D. (1997). Divergences between clinical and research
Schneider, N., & Nolen, W. A. (1998). SCID I Gestructureerd methods for assessing personality disorders: Implications for
Klinisch Interview voor DSM-IV As-I Stoornissen [Structured research and the evolution of Axis II. American Journal of
clinical interview for DSM-IV Axis I disorders]. Lisse: Swets Psychiatry, 154, 895903.
& Zeitlinger. Zanarini, M. C., Skodol, A. E., Bender, D., Dolan, R., Sanislow,
Helmes, E., & Reddon, J. R. (1993). A perspective on develop- C., & Schaefer, E. (2000). The Collaborative Longitudinal Per-
ments in assessing psychopathology: A critical review of the sonality Disorders Study: Reliability of axis I and II diagnoses.
MMPI and MMPI-2. Psychological Bulletin, 113, 453471. Journal of Personality Disorders, 14, 291299.
Krueger, R. F. (1999). The structure of common mental disorders.
Archives of General Psychiatry, 56, 921926
Published online: May 11, 2012
Markon, K. E. (2010). Modeling psychopathology structure: A
symptom-level analysis of Axis I and Axis II disorders. Psy-
chological Medicine, 40, 273288.
Millon, T., & Davis, R. D. (1996). Disorders of personality: DSM- Paul T. van der Heijden
IV and beyond (2nd ed.). New York: Wiley.
Nelson-Gray, R. O. (2003). Treatment utility of psychological as- Reinier van Arkel Groep
sessment. Psychological Assessment, 15, 521531. P.O. Box 70058
Nichols, D. S. (2006). The trials of separating bath water from 5201 DZ s-Hertogenbosch
baby. A review and critique of the MMPI-2 Restructured Clin- The Netherlands
ical Scales. Journal of Personality Assessment, 87, 121138. Tel. +31 73 658-6651
Shear, M. K., Greeno, C., Kang, J., Ludewig, D., Frank, E., Fax +31 73 658-6648
Swartz, H. A., & Hanekamp, M. (2000). Diagnosis of nonpsy- E-mail p.t.vander.heijden@rvagroep.nl
T. Giovazolias
European Journalof Psychological et al.:
Assessment The Vol.
2013;
2012 EDE -Q Publishing
Hogrefe in Greece
29(3):189196
Original Article
The Factor Structure and

Psychometric Properties of the Greek
Version of the Eating Disorders
Examination Questionnaire (EDE-Q)
Theodoros Giovazolias1, Ioannis Tsaousis1, and Christina Vallianatou2
1
Department of Psychology, School of Social Sciences, University of Crete, Crete, Greece
2
Hellenic Air Force Academy, Athens, Greece
Abstract. This study investigates the latent structure and the psychometric characteristics of the attitudinal subscales of the Greek
translation of the EDE-Q (G-EDE-Q) using two samples. The first sample consisted of 500 university female students and tested the
latent structure of the G-EDE-Q. The second sample consisted of 164 female psychology students and examined the internal consistency
as well as the concurrent, convergent, and discriminant validity of the G-EDE-Q. The results from confirmatory factor analysis (CFA)
showed that a three-factor solution fit the data better than the other models examined, providing evidence to contradict the originally
proposed structure of the instrument. The results support both, the internal consistency as well as the concurrent, convergent, and dis-
criminant validity of the G-EDE-Q global scale and its subscales. The discussion focuses on the critical review of previous statistical
procedures employed and the clinical and research implications of the present findings.
Keywords: EDE-Q, CFA, eating disorders, factorial validity, Greece
Eating disorders (EDs), a growing field of research and ing dietary restraint, eating concerns, shape concerns and
practice in a number of countries across the world, are het- weight concerns. Additionally, it yields a global score
erogeneous in manifestation as well as in background. which is the average of the four subscale scores. Previous
Therefore, the careful psychometric assessment thereof is research demonstrated the EDE-Q global scale and sub-
essential to diagnosis, case formulation, and treatment scales have sound psychometric properties (Hrabosky et
planning. It is generally recognized that the best method of al., 2008; Luce & Crowther, 1999; Peterson et al., 2007).
assessing and diagnosing EDs is the clinical interview con- The increase in the prevalence of eating disorders
ducted by trained clinicians. The Eating Disorders Exami- across cultures points to the need for thorough investiga-
nation (EDE; Cooper & Fairburn, 1987), a semistructured, tion in the different cultural contexts. A growing body of
investigator-based clinical interview, is widely considered research indicates that the nature of eating disorders may
the method of choice for that purpose (Wilson, 1993), al- be culturally bound. It has been argued that different eth-
though it has certain disadvantages (i.e., is time consuming, nic populations vary in their attitudes toward eating and
requires special training, and is not cost effective). For body disturbances (Diaz, Mainous, & Pope, 2007). In
these reasons, self-report questionnaires are often used as Mediterranean cultures, for example, the changing role
an alternative method of evaluation. of women and the transition to a less patriarchal society
Several self-report instruments are utilized for the iden- may have contributed to a differing manifestation of par-
tification of early signs of eating disturbances and likewise ticular aspects of eating disorders. Therefore, using mea-
the tendency to develop eating psychopathology. One of sures developed in Anglo-Saxon populations may need
the most widely used is the Eating Disorders Examination- careful examination before being administered in a dif-
Questionnaire (EDE-Q; Fairburn & Beglin, 1994), derived ferent ethnic group.
from the EDE interview. The EDE-Q provides frequency A considerable amount of evidence suggests that eating
data on key behavioral aspects of eating disorder psycho- disorders have their onset in late adolescence or early adult-
pathology and includes four attitudinal subscales measur- hood. Mond, Hay, Rodgers, and Owen (2006) provided
DOI: 10.1027/1015-5759/a000138
190 T. Giovazolias et al.: The EDE-Q in Greece
comprehensive normative data for the EDE-Q among that the four-factor model, as proposed by Fairburn and
young adult women, based on a large and representative Beglin (1994), provided a good fit to their data. This is the
Australian sample, showing the high rates of eating disor- only study so far that has provided evidence regarding the
ders among this specific group. Additionally, Luce, Crow- robustness of Fairburn and Beglins original four-factor
ther and Pole (2008) using the EDE-Q in a group of college model in a different cultural context.
women in the United States, aged 1825 years, presented Given the contradictory results, the main purpose of this
similar norms. Further, the EDE-Q works better than most study was to use CFA to investigate the latent structure of
self-report measures when used to identify cases of clini- the four attitudinal subscales of the Greek translation of the
cally significant eating disorders among young adult wom- EDE-Q. The different models under investigation were as
en in the community (Mond, Hay, Rodgers, Owen, & Beu- follows:
mon, 2004). 1. A one-factor model that assumes that a single latent fac-
So far, the EDE-Q has been translated into German, tor underlies all the EDE-Q items;
Swedish, Spanish, and Norwegian, but only for the first 2. The original four-factor model proposed by Fairburn and
three versions is there any evidence regarding its factor Beglin (1994);
structure. Hilbert, Tuschen-Caffier, Karwautz, Niederho- 3. Hilberts three-factor model;
fer, and Munsch (2007) examined the German version of
4. Petersons three-factor.
the EDE-Q in a mixed sample of clinical, subclinical, and
nonclinical individuals. Using exploratory factor analysis
(EFA), they retained only 17 of the 22 items and suggested Another aim of the study was to examine the psychometric
a three-factor solution with the Weight and Shape Concern properties of the G-EDE-Q. For that purpose, we examined
items merging in one factor (apart from reaction to pre- the internal reliability and the item-to-scale homogeneity
scribed weighing), 4 items from Eating Concern, 1 item as well as the concurrent, the convergent, the divergent, and
from Restraint (empty stomach) and 2 items from Weight the discriminant validity of the G-EDE-Q global scale and
Concern (preoccupation with shape/weight, reaction to subscales.
prescribed weighing) forming a second factor, and 4 items Given that previous research has confirmed that eating
from the Restraint subscale (except for empty stomach) disorders are associated with individual characteristics,
loading on a third factor. Peterson et al. (2007), using EFA personality traits as well as self-esteem were used to inves-
in a sample of Swedish bulimic women, proposed an alter- tigate the convergent validity of the G-EDE-Q. For exam-
native three-factor solution, in which the first factor was a ple, specific personality traits such as neuroticism and con-
combination of the Weight and Shape subscales (except for scientiousness have been found to correlate with eating dis-
preoccupation with weight and shape), the second factor order symptoms (Diaz-Marza, Carrasco, & Saiz, 2000;
comprised of all the Eating Concern subscale items along Ghaderi & Scott, 2000; MacLaren & Best, 2009). Further,
with empty stomach and preoccupation with weight and with regard to potential risk factors, self-esteem has been
shape, and the third factor included all items from the Re- identified as an important etiological aspect in reviews of
straint subscale, apart from empty stomach. the relevant literature (e.g., Jacobi, Hayward, de Zwaan,
One of the main criticisms of the above-mentioned stud- Kraemer, & Argas, 2004; Stice, 2002) and has been shown
ies is that they adopted EFA as a method for investigating to remain stable over time in nonclinical samples of under-
the factor structure of EDE-Q, although EFA is used mainly graduate women (Berg, Frazier, & Sherr 2009). On the oth-
to describe, summarize, or reduce data and make them er hand, since previous studies showed no relationship be-
more easily understood. This is why it is often considered tween eating disorder symptoms and extraversion and
to be more appropriate in the early stages of scale develop- openness to experience dimensions of personality (e.g.,
ment (Hurley et al., 1997). On the other hand, confirmatory Ghaderi & Scott, 2000), nonsignificant correlations among
factor analysis (CFA), as opposed to EFA, seems to be more these variables could be considered as evidence for the di-
appropriate in the investigation of the crossvalidation of the vergent validity of the G-EDE-Q.
latent structure of a scale, since it allows the researcher, Finally, since the majority of the previous studies inves-
after specifying an a priori model, to optimally match the tigated the psychometric properties of the EDE-Q using
observed and theoretical factor structures for a given data clinical samples, we chose to test its psychometric sound-
set in order to determine the goodness of fit of the prede- ness in a nonclinical sample of undergraduate women, col-
termined factor model. lege women being acknowledged as a particular high-risk
To date, only two studies have employed CFA in the group for developing eating disturbances (Berg et al.,
investigation of the latent structure of the EDE-Q. In the 2009). Researchers have suggested that the unique social
first study, Hrabosky et al. (2008) performed a CFA on a and academic stressors associated with the university envi-
reduced version of the EDE-Q (12 items) derived from the ronment may place female students at increased risk for
use of an EFA and suggested the following four latent fac- developing disordered eating habits (Compas, Wagner, Sla-
tors: Dietary Restraint, Eating Disturbance, Appearance vin, & Vannatta, 1986). Therefore, careful assessment of
Concerns, and Shape/Weight Overvaluation. In the second eating attitudes and behaviors in these populations seems
study, Villarroel, Panelo, Portell, and Raich (2009), found to become of great importance.
T. Giovazolias et al.: The EDE-Q in Greece 191
based only on the 22 attitudinal items. The four subscales are

Method Restraint, Eating Concern, Shape Concern, and Weight Con-
cern. An overall Global score is also obtained by summing
Participants the four subscale scores. The EDE-Q was translated into
Two samples were used in this study. The first sample con- Greek in three consecutive stages. During the first stage, four
sisted of 500 undergraduate female students used to ex- bilingual psychologists, experts in eating disorders, individ-
amine the latent structure of the G-EDE-Q. The mean age ually translated the measure from English into Greek to ob-
of the sample was 20.55 years (range 1744, SD = 3.27); tain linguistic equivalence. In the second stage, a committee
145 (29%) were 1st-year students, 105 (21%) were 2nd- consisting of the same four translators examined the four
translated versions, discussed any discrepancies, misunder-
year students, 81 (16.2%) were 3rd-year students, and 164
(32.8%) were 4th-year or higher students. Regarding field standings, mistranslations, or inaccuracies that emerged, and
of study, 359 (71.8%) were studying psychology, 106 made proper adjustments to ensure conceptual equivalence.
(21.2%) were studying sociology, and 32 (6.4%) were Finally, a pilot study (n = 15) was carried out to determine
whether both the original and the translated versions were
studying literature; 6 (0.6%) participants did not report
their field of study. Finally, with regard to place of resi- equivalent using as criteria the percent of missing data, floor
dence, 190 (38%) were living in cities with population and ceiling effects, and alpha reliabilities. The results from
over 150,000, 162 (32.4%) were living in cities with pop- this analysis revealed that none of the adopted criteria re-
ulation up to 100,000, 72 (14.4%) were living in small vealed substantial differences between the two versions, sug-
towns up to 10,000 and 76 (15.5%) were living in villages. gesting that the Greek and the original versions of EDE-Q
The mean body mass index (BMI) for this sample was were indeed equivalent.
21.72 (SD = 3.75), where 26.7% had a BMI below 20 and
11.2% had a BMI above 25. Using a cutoff score of 4
Eating Attitudes Test (EAT-26; Garner & Garfinkel,
(Fairburn & Beglin, 1994), 8% of the participants ob-
tained scores in the clinically significant range on the 1979)
Global EDE-Q, 6.4% on the Restraint subscale, 2% on the
The EAT-26 is a widely used self-report questionnaire that
Eating Concern, 16.6% on the Shape Concern and 13.4%
assesses behaviors related to disturbed eating and attitudes
on the Weight Concern subscale, respectively.
relating to body weight. It comprises 26 questions that refer
The second sample consisted of 164 undergraduate fe-
to thoughts, feelings, and behaviors; these are organized
male psychology students used to examine the psycho-
into three subscales (Diet, Bulimia and Food Preoccupa-
metric quality of the G-EDE-Q. The mean age was 20.90
tion, and Oral Control) as well as a global scale. In our
years (range 1845, SD = 3.29); 46 (28%) were 1st-year
study, the reliability of EAT-26 Global scale was consid-
students, 50 (30.5%) were 2nd-year students, 24 (32%)
ered satisfactory ( = .85). The EAT-26 has been standard-
were 3rd-year students, and 44 (26.8%) were 4th-year or
ized in Greek (Douka, Grammatopoulou, Skordilis, &
higher students. The majority of the participants lived in
Koutsouki, 2009) and possesses excellent psychometric
urban areas: 54 (32.9%) were living in cities with popu-
properties for the Greek population.
lation over 150,000, 52 (31.7%) were living in cities with
population up to 150,000, 25 (15.2%) were living
in small towns up to 10,000, and 33 (20.1%) were living
Body Shape Questionnaire (BSQ-34; Cooper,
in villages. The BMI for this second sample was 22.83
Taylor, Cooper, & Fairburn, 1987)
(SD = 4.38), where 17.1% had a BMI below 20 and
17.7% above 25. Regarding other clinical characteristics,
The BSQ-34 is a 34-item self-report questionnaire measur-
10.4% scored in the clinically significant range on the
ing body dissatisfaction, including preoccupation with and
Global EDE-Q, 9.8% on the Restraint, 4.3% on the Eat-
distress about body shape. Using a 6-point scale (never to
ing Concern, 18.9% on the Shape Concern and 18.9% on
always) it provides a global score in which high scores in-
the Weight Concern subscales.
dicate higher levels of body dissatisfaction. The reliability
of the BSQ-34 was found to be = 97 in our study. The
measure has now been adapted to the Greek language (Kos-
Measures kina & Giovazolias, 2010) and demonstrates acceptable
psychometric properties with the Greek population.
Eating Disorder Examination-Questionnaire

(EDE-Q; Fairburn & Beglin, 1994) The Big Five Inventory (BFI; Benet-Martinez &
John, 1998)
The EDE-Q is a 36-item self-report instrument assessing be-
havioral and attitudinal aspects of eating psychopathology. The BFI is a 44-item self-report measure designed to meas-
Responses to the questionnaire provide four subscale scores ure the Big Five dimensions (i.e., Extraversion, Neuroti-
Table 1. Goodness-of-fit indices of the hypothesized latent-factor models of the G-EDE-Q (N = 500)
Model GFI AGFI NFI SRMR AIC
Model I (one-factor) 10979.30 .97 .96 .96 .09 11067.30
Model II (Fairburn & Beglin, 1994) Model is not admissible
Model III (Hilbert et al., 2007) 5604.15 .98 .97 .97 .07 5678.15
Model IV (Peterson et al., 2007) 7572.98 .98 .98 .97 .07 7666.98
Notes. GFI = goodness-of-fit index; AGFI = adjusted goodness-of-fit index; NFI = normed fit index; SRMR = standardized root-mean square
residual; AIC = Akaike information criterion.
cism, Openness to experience, Agreeableness, and Consci- 1999). In addition, because of the particular method of es-
entiousness) using 5-point ratings (1 = strongly disagree to timation adopted, the GFI, AGFI, NFI, and SRMR good-
5 = strongly agree). In previous research, its domain scales ness-of-fit indices were used. Generally, for GFI, AGFI,
showed high reliability, clear factor structure, strong con- and NFI a value close to 0.95 indicates a model with a good
vergence with longer Big Five measures, and substantial fit; for the SRMR values less than .08 indicate good fit (Hu
self-peer agreement (Benet-Martinez & John, 1998). Alpha & Bentler, 1999). Table 1 presents the fit statistics for the
reliabilities for this study ranged between = .70 and .74. different G-EDE-Q proposed models.
The results from the analysis showed that model II, the
original four-factor model, yielded a nonpositive definite
The Rosenberg Self-Esteem Scale (RSE; Rosenberg, matrix solution, which means that some variance estimates
1965) are negative, or that some exogenous variables have an es-
timated covariance matrix that is not positive definite.
This 10-item self-report instrument purports to assess self- Thus, this model was not acceptable. Regarding model I,
esteem using a four-point scale ranging from 1 to 4 (strong- although some of the fit indices were within the acceptable
ly agree to strongly disagree). The RSE is coded so that a range, some others were not ( = 1079.30, GFI = .97,
high score indicates low self-esteem. Alpha reliability for AGFI = .96, NFI = .96, SRMR = .09), suggesting an only
this study was = .80. marginally acceptable model. For model III ( = 5604.15,
GFI = .98, AGFI = .97, NFI = .97, SRMR = .07) and model
IV ( = 7572.98, GFI = .98, AGFI = .98, NFI = .97,
Procedure SRMR = .07), all fit indices were within the standard rec-
ommended range, suggesting that both fit the data well.
The participants filled out the questionnaires during an or- In order to decide which of the three models yielding an
dinary lecture. Participation in the study was voluntary, and acceptable fit better represents the underlying structure of the
participants were informed that all data were held confi- G-EDE-Q, we used the Akaike information criterion (AIC),
dential. After completing a consent form, students filled out the appropriate goodness-of-fit index for choosing between
the questionnaires in the presence of the researchers, who competing statistical models. The AIC index showed that
were available for any clarifications needed. For the first model III (i.e., Hilberts three-factor model), better reproduced
sample there was no payment or other incentive to com- the data than model IV (i.e., Petersons, three-factor model)
plete the study, whereas participants of the second sample and model I (i.e., one-factor model) (the AICs were 5,678.15,
received course credit for their participation. 7,666.98 and 11,067.30, respectively). However, one should
bear in mind that Hilberts model contains fewer items than
the original version (17 rather than 22), which means that the
estimated parameters in Hilberts model are fewer, and that the
Results chi-square and AIC values are inevitably lower compared to
a model that contains all the items (and of course more esti-
In order to study the latent structure of the G-EDE-Q, we mated parameters), such as Petersons model, which contains
specified and estimated CFA alternative models using all 22 items. Thus, although AIC statistic suggests that Hil-
AMOS 6 (Arbuckle, 2005). First, we examined whether the berts model better fits the data, Petersons model seems more
assumption of multivariate normality of the data was satis- inclusive and more cohesive, since it is closer to the original
fied. Using the SPSS macro developed by DeCarlo (1997), version. For that reason it seems more preferable.
we found that the majority of the skewness and kurtosis
values were significant (p < .05) and the total Mardias co-
efficient was 257.53; these results suggest deviations from Internal Consistency of the G-EDE-Q
normality. For that reason, we decided to use the unweight-
ed least square (ULS) method of estimation, since it is less To evaluate the internal reliability and item-to-scale ho-
sensitive to data that deviate from normality (Hu & Bentler, mogeneity of the G-EDE-Q, we calculated alpha coeffi-
Table 2. Pearsons correlations between G-EDE-Q and validity measures (N = 164)

Measures Mean SD 1 2 3 4
1 Restraint 6.68 6.16 1 .54** .63** .76**
2 Eating concern 6.63 8.31 1 .72** .85**
3 Shape and weight concern 24.39 18.03 1 .96**
4 Global EDE-Q 37.70 29.04 1
5 EAT-26 Global Scale 13.79 10.47 .45** .65** .62** .67**
6 Dieting 7.23 7.20 .57** .65** .70** .74**
7 Bulimia and food preoccupation 2.57 3.17 .20** .58** .52** .53**
8 BSQ Global Scale 85.38 34.90 .58 .72** .89** .88**
9 Global Self-Esteem 19.59 5.52 .10 .31** .37** .34**
10 Extraversion 26.40 4.55 .05 .07 .15 .13
11 Neuroticism 25.93 5.46 .15 .16 .30** .27**
12 Openness to experience 36.24 6.22 .01 .06 .08 .07
13 Agreeableness 34.73 5.71 .12 .25** .17 .21*
14 Conscientiousness 31.32 5.64 .05 .25** .17 .17
Note. *p < .05, **p < .001.
cients and mean interitem correlations. The results Convergent and Divergent Validity of the
showed that Cronbachs coefficients were .92 for the G-EDE-Q
G-EDE-Q global scale, .78 for the Restraint subscale, .74
for the Eating Concern subscale, and .91 for the combined Since previous research showed that personality character-
Shape and Weight Concern subscale, indicating adequate istics (Ghaderi, & Scott, 2000; MacLaren & Best, 2009)
internal consistency for all G-EDE-Q subscales. The mean and self-esteem (Stice, 2002) are to some extent related to
interitem correlation coefficients for the G-EDE-Q global eating behavior, we expected that the G-EDE-Q global
scale was .42, while the corresponding values for Re- scale and subscales would be related to both constructs.
strain, Eating Concern, and Shape and Weight Concern The results in Table 2 (lower part) showed that G-EDE-Q
subscales were .47, .29, and .49, respectively. All values global scale was positively associated with Neuroticism
were within the acceptable recommendable range for mul- (r = .27) and negatively related to Agreeableness (r =
tifactor scales (Briggs & Cheek, 1986). .21). Eating Concern subscale was negatively related to
Agreeableness (r = .25) and Conscientiousness (r = .25).
Shape and Weight Concern subscale was positively related
Concurrent Validity of the G-EDE-Q to Neuroticism (r = .30). The vast majority of these corre-
lations were significant (p < .001), although weak in mag-
Concurrent validity was assessed by correlating the G- nitude. Restraint subscale was not significantly related to
EDE-Q Global scale and subscales with two instruments: any personality dimension. Finally, Global Self-esteem
the EAT-26, a measure assessing behavioral and attitudi- scale was positively correlated with Global G-EDE-Q (r =
nal aspects of eating psychopathology; and the BSQ, a .34) as well as with Eating Concern (r = .31) and Shape and
measure assessing dissatisfaction with shape and preoccu- Weight Concern (r = .37) subscales. Restraint subscale was
pation with body shape. The results from the analysis (Ta- not significantly related to self-esteem. All of these corre-
ble 2, upper part) showed that G-EDE-Q is strongly cor- lations were significant (p < .001), although moderate in
related with both measures, providing evidence to support magnitude.
the concurrent validity of the Greek translation of the
EDE-Q. Particularly, the majority of the G-EDE-Q global
scale and subscales were positively correlated with all Discriminant Validity of the G-EDE-Q
EAT-26 scales with correlations ranging from .45 to .74.
Only the Restraint subscale was weakly correlated with Discriminant validity indicates the ability of a scale to dis-
the Bulimia and Food Preoccupation scale (r = .20). Ad- tinguish different groups of individuals. An instrument for
ditionally, G-EDE-Q global scale and subscales were pos- assessing the severity of a psychopathological behavior
itively and highly correlated with the global BSQ scale, such as eating psychopathology should clearly be able to
with correlations ranging from .58 to .89 (Table 2, upper distinguish between individuals with and without the ten-
part). dency to develop eating disordered attitudes. For that rea-
son, two separate logistic regression analyses were per- technique of CFA. This statistical technique appears to be
formed to investigate whether individual G-EDE-Q scores more appropriate and robust than EFA, which has been
could predict the risk to develop eating disordered atti- used in most of the previous studies investigating the fac-
tudes. The variable to be predicted was the tendency to torial structure of the EDE-Q in different crosscultural set-
develop a type of eating disorder (yes vs. no) as indicated tings. The results from the CFA suggested that the original
by EAT-26 global score (> 20). According to the literature, four-factor model (Fairburn & Beglin, 1994) was not ac-
respondents who score exactly at or above the cutoff score ceptable, since the correlation matrix was not positively
of 20 belong to the group of people who has the tendency definite. This could be either because the model was wrong
to develop disordered eating (Garner, Olmsted, Bohr, & or because the sample size was too small (Jreskog & Sr-
Garfinkel, 1982) bom, 1996). However, in our study, the sample size was
In the first analysis, the Global G-EDE-Q scale was en- adequate, since according to Muthn and Muthn (2002)
tered as predictor variable. A significant regression model the minimum sample size for nonnormally distributed data
was produced using a forced entry method of estimation should be at least 315.
[model (1) = 55.93; p = .000]. This model correctly pre- The analysis also revealed that both Hilbert et al.s
dicts the overall presence or absence of disturbed eating (2007) and Peterson et al.s (2007) three-factor models had
attitudes in 86.6% of the cases. Test sensitivity and speci- acceptable fit indices, suggesting that the Shape and Weight
ficity of this regression model were 51.5% and 94.7%, re- Concern subscales should be considered as a single factor,
spectively, while predictive values positive and negative instead of two separate factors as they have been introduced
were 71% and 89%, respectively. In the second analysis, in the original four-factor model. This result is in line with
Restraint, Eating Concern, and Shape and Weight Concern findings from other studies in eating disorders, which al-
were entered as predictor variables. A test of the full model ready consider weight and shape concern as a single com-
with the three predictors was statistically reliable, (3) = ponent and relate it to different manifestations of eating
59.71, p < .001, indicating that, as a set, the predictors re- disturbances (e.g., Allen, Byrne, McLean, & Davis, 2008).
liably distinguished between high-risk (HR) and no-risk The present study provides further evidence and supports
(NR) individuals in developing eating disorder problems. this perspective empirically, since, according to the logistic
The variance in the HR status accounted for was large, with regression results, the combined scale of Shape and Weight
Nagelkerke R2 = .48. Prediction was good, with 54.5% of Concern is the only subscale that can successfully discrim-
high-risk individuals and 94.7% of no-risk individuals cor- inate between high- and low-risk individuals. A possible
rectly being predicted, for an overall success rate of 86.6%, theoretical explanation for the emergence of shape and
while predictive values positive and negative were 72% weight concern as a single factor is that individuals emit-
and 89%, respectively. Table 3 shows the regression coef- ting concerns about their shape are likely to present also
ficient, Wald statistic, and odds ratio for each of the three weight-related concerns, as a means of controlling changes
predictors. According to the Wald criterion, only Shape and in their body figure.
Weight Concern score predicted HR status. This finding Although Hilberts and Petersons three-factor solutions
implies that the odds ratio of this predictor (1.10) shows a provide acceptable data fit, the latter seems more prefera-
great change in the likelihood of the individual developing ble, since it is closer to Fairburn and Beglins original mod-
eating disorder problems on the basis of one unit change in el. Hilberts model appears important apart from the fact
this variable (i.e., Shape and Weight Concern scale). that contains fewer items, as a result of the inappropriate
use of EFA, and because many items are misplaced com-
pared to Fairburn and Beglins (1994) and Petersons et al.
Table 3. Logistic regression predicting presence of disor-
(1997) models. Finally, the one-factor model was also mar-
dered eating attitudes (N = 164)
ginally fit the data. This was not an unexpected result, since
Scales B SE Odds ratio Wald statistic the Global EDE-Q factor has been often used as a broad
(95% CI) indicator of eating disturbances (Masheb & Grilo, 2008).
Restraint .001 .05 1.00 (0.911.09) .001 Another aim of this study was to investigate the internal
Eating concern .030 .03 1.03 (0.961.10) .760 consistency of the G-EDE-Q global scale and the four sub-
Weight and .094 .02 1.10 (1.051.15) 17.52** scales. The alpha coefficients obtained are within the min-
shape concern imum acceptable range ( > .70), and support the findings
Global EDE-Q .060 .01 1.06 (1.041.08) 32.97** from previous studies (Hilbert et al., 2007; Luce & Crow-
Note. **p < .01. ther, 1999; Peterson et al., 2007). Furthermore, the homo-
geneity of the global scale and the three subscales was con-
firmed according to the results from the mean interitem
Discussion coefficients.
The results of the present study add to existing evidence
This study investigates the factor structure of the Greek for the concurrent, convergent, and divergent validity of the
version of the Eating Disorders Examination Questionnaire translated version of the scale. In terms of concurrent va-
(G-EDE-Q) in a nonclinical sample, using the statistical lidity, it was found that there is a substantial relationship
among two measures of eating psychopathology (EAT-26 particular questionnaire may prove an important screening
and BSQ) and G-EDE-Q subscales. Furthermore, in line tool for university counselors and/or other clinicians in
with previous findings (e.g., Diaz-Marza et al., 2000; Gha- similar contexts. Additionally, the availability of normative
deri & Scott, 2000; MacLaren & Best, 2009), it was found data for community samples would be an advantage for
that there is a relationship between attitudinal aspects of using the G-EDE-Q in primary-care settings, where practi-
eating disorders and specific personality traits, such as tioners (e.g., psychologists, counselors) may be facilitated
Neuroticism (positive) and Agreeableness and Conscien- to detect cases of eating disturbances and thus choose an
tiousness (negative), all of which support the convergent appropriate referral route.
validity of the scale. The insignificant relationship between
Extraversion and Openness with most of the G-EDE-Q
scales found in this study provide evidence regarding the
divergent validity of the measure, supporting previous find- References
ings (e.g., Ghaderi & Scott, 2000). Finally, the positive re-
lationship found between self-esteem and G-EDE-Q Glob- Allen, K. L., Byrne, S. M., McLean, N. J., & Davis, E. A. (2008).
al scale and subscales was also in line with findings from Overconcern with weight and shape is not the same as body
previous studies, which demonstrated that self-esteem is an dissatisfaction: Evidence from a prospective study of preado-
important etiological factor in eating disorders (e.g., Stice, lescent boys and girls. Body Image, 5, 261270.
2002), since it has been shown to remain stable over time Arbuckle, J. L. (2005). AMOS 6.0. users guide. Spring House,
in nonclinical samples of undergraduate women (Berg et PA: Amos Development Corporation.
al., 2009). Benet-Martinez, V., & John, O. P. (1998). Los Cinco Grandes
An important limitation of the present study is the nature across cultures and ethnic groups: Multitrait-multimethod
analyses of the Big Five in Spanish and English. Journal of
of the sample used (nonclinical student sample). One could
Personality and Social Psychology, 75, 729750.
argue that EDE-Q is a clinical instrument and therefore
Berg, K. C., Frazier, P., & Sherr, L. (2009). Change in eating dis-
should be used solely with clinical samples. However, order attitudes and behavior in college women: Prevalence and
based on the continuum hypothesis proposed by Nylander predictors. Eating Behaviors, 10, 137142.
(1971) and as evidenced by its expansion by Mintz and Briggs, S. R., & Cheek, J. M. (1986). The role of factor analysis
Betz (1988), eating disturbances occur along a continuum, in the development and evaluation of personality scales. Jour-
constituting three different groups (asymptomatic, sympto- nal of Personality, 54, 106148.
matic, and eating disordered). In that sense, eating disor- Compas, B. E., Wagner, B. M., Slavin, L. A., & Vannatta, K.
ders occur when people simply display extreme manifesta- (1986). A prospective study of life events, social support, and
tions of common behavioral, attitudinal, and psychological psychological symptomatology during the transition from high
dimensions, so that group differences are a matter of degree school to college. American Journal of Community Psycholo-
and not kind. Gleaves, Lowe, Green, Cororve, and Wil- gy, 14, 241257.
liams (2000) also concluded that eating disturbances ap- Cooper, P. J., Taylor, M. J., Cooper, Z., & Fairburn, C. G. (1986).
The development and validation of the Body Shape Question-
pear to fall along a continuum of symptom severity, lending
naire. International Journal of Eating Disorders, 6, 485494.
partial support to the continuum model. To this end, it has
Cooper, Z., & Fairburn, C. G. (1987). The Eating Disorders Ex-
been suggested that research should be directed toward the amination: A semistructured interview for the assessment of
investigation of the psychometric properties of existing the specific psychopathology of eating disorders. International
measures in nonclinical samples (Engelsen & Laberg, Journal of Eating Disorders, 6, 18.
2001). Moreover, female university students were reported DeCarlo, L. T. (1997). On the meaning and use of kurtosis. Psy-
to present high rates of eating disorders and eating disorder chological Methods, 2, 292307.
symptoms (Berg et al., 2009; Compas et al., 1986). Future Diaz, V. A., Mainous III, A. G., & Pope, C. (2007). Cultural con-
research should investigate the factorial invariance of the flicts in the weight loss experience of overweight Latinos. In-
translated version in a more diverse population in terms of ternational Journal of Obesity, 31, 328333.
age, educational level, and socioeconomic background, Diaz-Marsa, M., Carrasco, J. L., & Saiz, J. (2000). A study of
since student samples are quite homogeneous. Finally, an- temperament and personality in anorexia nervosa and bulimia
other limitation concerns the use of EAT-26 as the only nervosa. Journal of Personality Disorders, 14, 352359.
criterion for discriminating participants with or without Douka, A., Grammatopoulou, E., Skordilis, E., & Koutsouki, D.
(2009). Factor analysis and cut-off score of the 26-item eating
eating disorders: Future research should employ a more
attitudes test in a Greek sample. Biology of Exercise, 5, 5167.
clinical methodology (e.g., clinical interview) to identify
Engelsen, B. K., & Laberg, J. K. (2001). A comparison of three
clinically diagnosed cases. questionnaires (EAT-12, EDI, and EDE-Q) for assessment of
The present study showed that the three-factor version eating problems in healthy female adolescents. Nordic Journal
of the G-EDE-Q may be utilized as a useful measure in the of Psychiatry, 55, 129135.
assessment of attitudinal features of disturbed eating atti- Fairburn, C. G., & Beglin, S. J. (1994). Assessment of eating dis-
tudes in female university students. Given the fact that col- orders: Interview or self-report questionnaire? International
lege women are a high-risk group for eating disorders and Journal of Eating Disorders, 16, 363370.
present high rates of subclinical eating disturbances, the Garner, D. M., & Garfinkel, P. E. (1979). The Eating Attitudes
Test: An index of the symptoms of anorexia nervosa. Psycho- not moderate, specific outcomes. Behavior Research and Ther-
logical Medicine, 9, 273279. apy, 46, 428437.
Garner, D. M., Olmsted, M. P., Bohr, Y., & Garfinkel, P. E. (1982). Mintz, L. B., & Betz, N. E. (1988). Prevalence and correlates of
The Eating Attitudes Test: Psychometric features and clinical eating disordered behavior among undergraduate women.
correlates. Psychological Medicine, 12, 871878. Journal of Counseling Psychology, 35, 463471.
Ghaderi, A., & Scott, B. (2000). The big five and eating disorders: Mond, J. M., Hay, P. J., Rodgers, B., & Owen, C. (2006). Eating
A prospective study in the general population. European Jour- Disorder Examination Questionnaire (EDE-Q): Norms for
nal of Personality, 14, 311323. young adult women. Behavior Research and Therapy, 44,
Gleaves, D. H., Lowe, M. R., Green, B. A., Cororve, M. B., & Wil- 5362.
liams, T. L. (2000). Do anorexia and bulimia nervosa occur on Mond, J. M., Hay, P. J., Rodgers, B., Owen, C., & Beumont, P. J.
a continuum? A taxometric analysis. Behavior Therapy, 31, (2004). Validity of the Eating Disorder Examination Question-
195219. naire (EDE-Q) in screening for eating disorders in community
samples. Behavior Research and Therapy, 42, 551567.
Hilbert, A., Tuschen-Caffier, B., Karwautz, A., Niederhofer, H.,
Muthn, L. K., & Muthn, B. O. (2002). Teacherss corner: How
& Munsch, S. (2007). Eating disorder examination-question-
to use a Monte Carlo study to decide on sample size and de-
naire: Evaluation der deutschsprachigen bersetzung. Diag-
termine power. Structural Equation Modeling, 9, 599620.
nostica, 53, 144154.
Nylander, J. (1971). The feeling of being fat and dieting in a
Hrabosky, J. I., White, M. A., Masheb, R. M., Rothschild, B. S.,
school population: Epidemiological interview investigation.
Burke-Martindale, C. H., & Grilo, C. M. (2008). Psychometric
Acta Sociomedica Scandinavica, 3, 1726.
evaluation of the eating disorder examination-questionnaire
Peterson, C. B., Crosby, R. D., Wonderlich, S. A., Joiner, T., Crow,
for bariatric surgery candidates. Obesity, 16, 763769.
S. J., Mitchell, J. E., . . . le Grange, D. (2007). Psychometric
Hu, L., & Bentler, P. M. (1999). Cut off criteria for fit indexes in properties of the Eating Disorder Evaluation-Questionnaire:
covariance structure analysis: Conventional criteria versus Factor structure and internal consistency. International Jour-
new alternatives. Structural Equation Modeling, 6, 155. nal of Eating Disorders, 40, 386389.
Hurley, A. E., Scandura, T. A., Schriesheim, C. A., Brannick, M. T., Rosenberg, M. (1965). Society and the adolescent self-image.
Seers, A., Vandenberg, R. J., & Williams, L. J. (1997). Explorato- Princeton, NJ: Princeton University Press.
ry and confirmatory factor analysis: Guidelines, issues, and alter- Stice, E. (2002). Risk and maintenance factors for eating pathol-
natives. Journal of Organizational Behavior, 18, 667683. ogy: A meta-analytic review. Psychological Bulletin, 128,
Jacobi, C., Hayward, C., de Zwaan, M., Kraemer, H. C., & Argas, 825848.
W. S. (2004). Coming to terms with risk factors for eating dis- Villarroel, A. M., Panelo, E., Portell, M., & Raich, R. M. (2009).
orders: Application of risk terminology and suggestions for a Screening for eating disorders in undergraduate women: Norms
general taxonomy. Psychological Bulletin, 130, 1965. and validity of the Spanish version of the Eating Disorder Exam-
Jreskog, K. G., & Srbom, D. (1996). Lisrel 8: Structural equa- ination Questionnaire (EDE-Q). Journal of Psychopathology
tion modeling with the SIMPLIS command language. Chicago, and Behavioral Assessment. doi 10.1007/s10862-009-9177-6
IL: Scientific Software International. Wilson, G. T. (1993). Assessment of binge eating. In C. G. Fair-
Koskina, N., & Giovazolias, T. (2010). The effect of attachment burn & G. T. Wilson (Eds.), Binge eating: Nature, assessment,
insecurity in the development of eating disturbances across and treatment (12th ed., pp. 227249). New York: Guilford.
gender: The role of body dissatisfaction. Journal of Psychol-
ogy, 144, 449471. Published online: May 11, 2012
Luce, K. H., & Crowther, J. H. (1999). The reliability of the eating
disorder examination-Self-report questionnaire version (EDE-
Q). International Journal of Eating Disorders, 25, 349351.
Theodoros Giovazolias
Luce, K. H., Crowther, J. H., & Pole, M. (2008). Eating Disorder
Examination Questionnaire (EDE-Q): Norms for undergraduate Department of Psychology
women. International Journal of Eating Disorders, 41, 273276. School of Social Sciences
MacLaren, V. V., & Best, L. A. (2009). Female students disor- University of Crete
dered eating and the big five personality facets. Eating Behav- Crete, 74100
iors, 10, 192195. Greece
Masheb, R. M., & Grilo, C. M. (2008). Prognostic significance of Tel. +30 28 3-107-7529
two subcategorization methods for the treatment of binge eat- Fax +30 28 3-107-7578
ing disorder: Negative affect and overvaluation predict, but do E-mail giovazolias@psy.soc.uoc.gr
C.Journalof
European Gunzenh auser et al.: Self-Efficacy
Psychological in
Assessment2013;Regulating
2012 Vol.
Hogrefe Emotions
29(3):197204
Publishing
Original Article
Self-Efficacy in Regulating
Positive and Negative Emotions
A Validation Study in Germany
Catherine Gunzenhauser1, Tobias Heikamp2, Maria Gerbino3,
Guido Alessandri3, Antje von Suchodoletz1, Laura Di Giunta3,
Gian Vittorio Caprara3, and Gisela Trommsdorff2
1
University of Freiburg, Germany, 2University of Konstanz, Germany,
3
University of Rome La Sapienza, Italy
Abstract. Perceived self-efficacy in emotion regulation facilitates various aspects of psychosocial adjustment. The Regulatory Emotional
Self-Efficacy scale (RESE) by Caprara and Gerbino (2001) measures perceived capabilities to express positive emotions (POS) and to
manage negative emotions, namely, despondency/distress (DES) and anger/irritation (ANG). The present research investigated the va-
lidity of the RESE scale in Germany. Study 1 investigated the factor structure and convergent validity of the scale in a sample of university
students. In order to test the generalizability of findings from Study 1, in Study 2 we studied the factor structure, cross-gender invariance,
and convergent validity of a slightly revised version of the scale in a sample of parents. The previously found factor structure was
successfully replicated in both samples. Partial invariance on the scalar level was confirmed across gender. All self-efficacy subscales
were positively correlated with life satisfaction and with reappraisal (a cognitive emotion regulation strategy). Suppression, a strategy of
regulating emotional expression, was negatively related to POS. Findings suggest that the RESE scale is a valid instrument to assess
emotion regulation self-efficacy in German-speaking samples.
Keywords: self-efficacy beliefs, emotion regulation, measurement invariance
Self-efficacy beliefs are a central component of human tions are associated with rewarding social relationships,
agency. They are defined as perceived capabilities to act in health, and occupational success (Davidov & Grusec,
the way necessary to attain desired outcomes in specific 2006; Lyubomirsky, King, & Diener, 2005). Difficulties in
situations (Bandura, 1997). Self-efficacy beliefs increase downregulating strong negative emotions are associated
the ability and flexibility to adjust to social and situational with problematic interpersonal behavior (Eisenberg et al.,
demands by motivating people to set higher goals, to invest 2001).
greater effort, and to persist when facing difficulties (see
Bandura, 1997). Domain-specific self-efficacy beliefs are
also the basis for successful emotion regulation (Caprara et The Regulatory Emotional Self-Efficacy Scale
al., 2008; Caprara & Gerbino, 2001).
The Regulatory Emotional Self-Efficacy scale (RESE;
Caprara et al., 2008; Caprara & Gerbino, 2001) was de-
Regulation of Positive and Negative veloped to assess self-efficacy beliefs in the domain of
Emotions emotion regulation. The RESE scale assesses self-efficacy
in expressing positive emotions (POS) and self-efficacy in
Emotion regulation refers to attempts individuals make to managing negative emotions (NEG). POS is defined as
influence which emotions they have, when they have them, the perceived capability to experience and to allow one-
and how these emotions are experienced and expressed self to express positive emotions such as joy, enthusiasm
(Gross, Richards, & John, 2006, p. 14). The regulation of and pride in response to success or pleasant events (Ca-
positive emotions and the regulation of negative emotions prara et al., 2008, p. 228). NEG refers to the perceived
are distinctly related to emotional and social adjustment. capability to ameliorate negative emotional states once
For instance, experience and expression of positive emo- they are aroused in response to adversity or frustrating
DOI: 10.1027/1015-5759/a000151
198 C. Gunzenhauser et al.: Self-Efficacy in Regulating Emotions
events and to avoid being overcome by emotions such as tidimensional latent structure of the RESE scale. Addi-
anger, irritation, despondency, and discouragement (Ca- tionally, Study 1 examined convergent validity by inves-
prara et al., 2008, p. 228). POS constitutes a first-order tigating the relations between self-efficacy beliefs in reg-
factor, whereas NEG constitutes a second-order factor ulating emotions, life satisfaction, and two emotion-regu-
represented by two first-order-factors: self-efficacy in lation strategies. Life satisfaction is defined as a
managing despondency/distress (DES) and self-efficacy in conscious cognitive judgment of ones life in which the
managing anger/irritation (ANG). Confirmatory factor criteria for judgment are up to the person (Pavot & Die-
analyses have supported this structure in Italian, Bolivian, ner, 1993, p. 164). RESE beliefs have been shown in other
and U. S. samples (Caprara et al., 2008). Moreover, Ca- countries to be positively correlated with life satisfaction
prara et al. (2008) documented positive associations of the (Caprara & Steca, 2005). Therefore, we assumed that all
RESE subscales with indicators of well-being in an Italian RESE subscales would be positively related to life satis-
sample. faction. We further examined the relations of RESE with
two specific emotion-regulation strategies, reappraisal
and suppression (Gross & John, 2003). Reappraisal refers
Importance of Cross-Gender Invariance to a cognitive reframing of an emotion-eliciting event; it
modulates emotional experience, which is mirrored in
In general, women engage in emotion regulation more fre- emotional expression. For example, a woman who re-
quently than men and use a greater variety of emotion-reg- ceives an ugly necklace for her birthday could reappraise
ulation strategies (Nolen-Hoeksema & Aldao, 2011; Tam- the situation by focusing on how sweet it is that the giver
res, Janicki, & Helgeson, 2002). At the same time, women tried so hard to please her. This reappraisal would make
are more susceptible to depressive symptoms, whereas her feel and look happier. Individuals skilled at reappraisal
men tend to be more likely to show aggressive reactions can be expected to find themselves capable of downregu-
to emotional arousal (Knight, Guthrie, Page, & Fabes, lating negative emotions and of pointing out the positive
2002; Nolen-Hoeksema & Aldao, 2011). Emotion regula- features of a situation to themselves. Thus, we assumed
tion has been linked to gender differences in both depres- positive correlations of reappraisal with POS, DES, and
sive symptoms and aggressive reactions (Knight et al., ANG. Suppression is an attempt to hide external signs of
2002; Nolen-Hoeksema & Aldao, 2011; Sontag & Graber, emotion, without changing the emotional experience. For
2010). For example, Nolen-Hoeksema and Aldao (2011) instance, another woman who receives very beautiful ear-
found that maladaptive emotion-regulation strategies, rings for her birthday might restrain herself from jumping
which are used more frequently by women (Tamres et al., for joy because she feels it might be inappropriate or em-
2002), are associated with depression across gender. Son- barrassing. She might still feel happy, but her suppression
tag and Graber (2010) found that emotional disengage- strategy would make her seem less so. In contrast to the
ment was related to aggression in boys but not in girls. woman in our example, individuals with high self-efficacy
Thus, cross-gender invariance of emotion-related meas- in expressing positive emotions feel capable to express
urement scales has relevance for further research in the their positive feelings. Thus, we assumed that POS should
areas of developmental and clinical psychology. The study be negatively associated with suppression. Since DES and
by Caprara et al. (2008) revealed measurement invariance ANG emphasize the management of emotional experience
of RESE across gender in Italian, Bolivian, and U. S. sam- rather than emotional expression, we did not expect DES
ples. and ANG to be associated with suppression.
Study Aims Materials and Methods

The goals of the present research were Participants
1. to examine the reliability and the factor structure of the
RESE scale in a German-speaking sample, Participants were 192 male and 307 female undergraduate
2. to further investigate the convergent validity of the Ger- students from southern Germany. Students were between
man scale, and 18 and 26 years old (M = 21.44, SD = 1.46). Participants
3. to examine the cross-gender invariance of the German could participate in a lottery of EUR 20 gift certificates or
scale. receive course credit.
Measures
Study 1 Regulatory Emotional Self-Efficacy Scale
In Study 1, we studied the factor structure of a preliminary Study 1 used a version of the RESE scale that consists of
version of the German RESE scale and replicated the mul- 12 items (Caprara et al., 2008). POS, DES, and ANG were
C. Gunzenhauser et al.: Self-Efficacy in Regulating Emotions 199
measured with four items each. The scale was translated Missing Data
from English by native German speakers fluent in both lan-
guages. Participants rated their self-efficacy beliefs on a The rate of missing data was below 1% for each variable.
5-point Likert scale from 1 (= not at all well) to 5 (= very However, missing data were handled with MLR estimation
well). Due to a translation error, the word upset had been in the confirmatory factor analyses. Before creating mean
translated into the German word for sad in one ANG item scores for each scale to investigate convergent validity,
(How well can you avoid getting upset when others keep missing values were replaced using the regression proce-
giving you a hard time?). Thus, the translated item was dure in SPSS (IBM SPSS Statistics, 2010).
not adequate in terms of content validity to measure self-
efficacy in managing anger. This item was excluded from
the analyses. Results
Means and standard deviations for POS, DES, and ANG
Emotion Regulation Strategies were 4.16 (0.71), 3.09 (0.66), and 3.04 (0.70), respective-
ly. Cronbachs s were (POS) = .76, (DES) = .65 and
Reappraisal (six items) and suppression (four items) were
(ANG) = .50, while reliability for the summated scale
assessed using the Emotion Regulation Questionnaire
of self-efficacy in managing negative emotions was
(ERQ; Gross & John, 2003). All items were translated from
= .72.
English to German by a psychologist fluent in both lan-
guages. Participants rated their agreement on a 7-point Lik-
ert scale from 1 (= strongly disagree) to 7 (= strongly Confirmatory Factor Analyses
agree). Reliability and validity of the ERQ have been dem-
onstrated in several samples of college students (Gross & Three models were tested. Model 1 was a one-factor model.
John, 2003). Cronbachs s were .80 for reappraisal and Model 2 was a two-factor oblique model with NEG and
.78 for suppression. POS. Model 3 was a model which treated NEG as a sec-
ond-order factor represented by ANG and DES. As nega-
Life Satisfaction tive emotions encompass both high-activation negative
emotions and low-activation negative emotions (Lang,
Life satisfaction was assessed using a German version of 1995), loadings of DES and ANG on NEG were con-
the Satisfaction with Life Scale (SWLS; Slva, Baumann, strained to be equal. For information on the anchor items
& Lettner, 1995) by Diener, Emmons, Larsen, and Griffin used see Table 3.
(1985). The unidimensional nature, reliability, and validity Model 1 did not fit the data well (see Table 1). Model
of the SWLS have been confirmed in several studies (see 2 and Model 3 both showed a satisfying fit. However, the
Pavot & Diener, 1993). Participants answered five items on Satorra-Bentler test comparing Model 2 and Model 3
a 7-point Likert scale ranging from 1 (= strongly disagree) was significant, indicating that Model 3 provided a better
to 7 (= strongly agree). Cronbachs was. 81. fit to the data than Model 2, SB-(1) = 3.95, p < .05.
A modification index of 35.32 suggested a covariance be-
tween the errors of items 3 and 4. As those items both
Analytic Approach refer to the open expression of joy unrelated to gratifica-
tion due to personal achievement, this covariance was
specified (Model 3a). Standardized factor loadings of the
We conducted confirmatory factor analyses using Mplus
11 items on their respective factors ranged between .30
5.21 (Muthn & Muthn, 2009). Since the estimates of
and .82.
multivariate skewness (Srivastavas b1p = 79.35 (11); p <
.001) and kurtosis (Mardias b2p = 161.46, p < .001) re-
vealed that the data deviated from multivariate normal dis- Table 1. Fit indices of the confirmatory factor analyses of
tribution, we employed maximum-likelihood estimation the RESE scale
with restricted standard errors (MLR) and used Satorra- Model SB- dfa CFI RMSEA 90% CI SRMR
Bentler (S.-B.) scaled difference tests to compare nested Study 1
models (Satorra, 2000). Global model fit was evaluated us-
Model 1 595.49 44 .42 .16 [.15, .17] .14
ing root mean square error of approximation (RMSEA),
comparative fit index (CFI) and standardized root-mean- Model 2 135.26 43 .90 .07 [.05, .08] .05
square residual (SRMR). Models were accepted when Model 3 130.15 42 .91 .07 [.05, .08] .05
RMSEA was lower than .08 (Browne & Cudeck, 1993). Model 3a 96.67 41 .94 .05 [.04, .07] .05
For SRMR, a cutoff value of < .08 was used (Kelloway, Study 2
1998). Following OBoyle and Williams (2011), we used a Model 3a 54.66 32 .96 .05 [.03, .08] .06
two-stage cutoff criterion for CFI, with CFI > .90 indicating Note. aEach model indicated a significant with p < .001 due to the
an acceptable fit, and CFI > .95 indicating a good fit. large sample size.
Table 2. Correlations of POS, DES, and ANG (log-trans- in a German-speaking sample. Positive correlations of all
formed scores) with Life Satisfaction, Self-Es- RESE subscales with life satisfaction and reappraisal as
teem, and Emotion Regulation Strategies (male well as the negative correlation between POS and suppres-
and female participants combined) sion were in line with our expectations. Unexpectedly,
however, ANG was positively correlated with suppres-
Indicator M SD POS DES ANG
sion. Moreover, internal consistencies were not entirely
Life satisfaction
satisfying. In particular, the reliability of the ANG sub-
Study 1 4.92 1.06 .36** .41** .20** scale was low.
Study 2 7.51 1.67 .21** .24** .14*
Reappraisal
Study 1 4.55 0.98 .19** .24** .16**
Study 2 4.45 1.16 .26** .14* .22** Study 2
Suppression
Study 1 3.28 1.27 .38** .03 .16** In Study 2, self-efficacy beliefs in regulating emotions
Study 2 3.07 1.10 .23** .03 .07 were assessed in parents as a part of a larger study on child
Notes. Pearsons correlation coefficients. *p < .05, **p < .01. development. Here, we used a revised version of the Ger-
man RESE scale (RESE-R) for two reasons. First, given
Convergent Validity the unsatisfactory reliabilities found in Study 1, the Ger-
man wording of ANG and DES items had to be revised.
For an evaluation of the convergent validity in terms of In contrast to Study 1, items were translated from Italian
relations to other constructs, we correlated POS, DES, into German. Second, recent research had led to a revision
ANG, and NEG with life satisfaction, reappraisal, and sup- of the original RESE scale (Caprara, Di Giunta, Pastorelli,
pression. Log transformation was performed on all RESE & Eisenberg, 2011). The revised RESE scale comprises
subscales to approach normal distribution. Results are pre- 10 of the 12 items of the original scale (POS: 4 items,
sented in Table 2. DES: 3 items, ANG: 3 items), which we included in the
German RESE-R (see Table 3). We aimed to document
that improved item wording would provide satisfactory
Conclusion reliabilities, while preserving the factor structure and the
correlations to life satisfaction and emotion-regulation
In Study 1 the factor structure previously found in Italy, strategies. Moreover, we aimed to test for cross-gender
Bolivia, and the United States was successfully replicated invariance of the revised scale.
Table 3. Scale items of the German RESE-R (German text in italics)

Scale How well can you . . ./Wie gut gelingt es Ihnen, . . .
Self-efficacy in ex- (1) . . . rejoice over your successes? / . . . sich ber Ihre Erfolge zu freuen?
pressing positive (2) . . . feel gratified over achieving what you set out to do? / . . . zufrieden zu sein, wenn Sie das erreicht haben, was Sie
emotions sich vorgenommen hatten?
(3) . . . express joy when good things happen to you? / . . . Freude zu zeigen, wenn Ihnen etwas Gutes passiert?
(4) . . . express enjoyment freely at parties? / . . . auf Partys offen zu zeigen, dass Sie Spa haben?
Self-efficacy in (1) . . . keep from getting dejected when you are lonely? / . . . sich vor Niedergeschlagenheit zu schtzen, wenn die Perso-
regulating despon- nen, die Ihnen wichtig sind, Ihnen nicht beistehen knnen, wenn Sie sie brauchen?
dency/distress (2) . . . keep from getting discouraged in the face of difficulties? / . . . sich von Schwierigkeiten nicht entmutigen zu las-
sen?
(3) . . . keep from getting discouraged by strong criticism? / . . . nicht den Mut zu verlieren, wenn Sie stark kritisiert wer-
den?
Self-efficacy in (1) . . . avoid flying off the handle when you get angry? / . . . zu verhindern, dass Sie die Kontrolle ber Ihr Handeln ver-
regulating anger/ lieren, wenn Sie wtend sind?
irritation (2) . . . get over irritation quickly for wrongs you have experienced? / . . . ber rger hinwegzukommen, wenn Ihnen Un-
recht getan wurde?
(3) . . . avoid getting upset when others keep giving you a hard time?/. . . Ihre Wut im Zaum zu halten, wenn andere Men-
schen Ihnen das Leben schwer machen?
Notes. German items refer to items used in Study 2 (German RESE-R). English items refer to corresponding items published by Caprara et al.
(2008). Anchor items in Study 2 were items 1 of each subscale. In Study 1, ANG and DES subscales contained one additional item respectively
(ANG: manage negative feelings when reprimanded by your parents or significant others; DES: reduce your upset when you dont get the
appreciation you feel you deserve?). Anchor items in Study 1 were items 1 of the POS and DES subscales and the ANG item not used in
Study 2.
Materials and Method means between groups (partial invariance). Evaluation of

model fit and treatment of missing data followed the pro-
Participants cedures described in Study 1. The rate of missing data was
Participants were N = 144 mothers and N = 120 fathers below 4% per variable.
(including 118 couples) of preschool children in Southern
Germany. Participants were between 22 and 59 years old
(women: MAge = 38, SD = 5.44; men: MAge = 41, SD = 5.43; Results
based on information on 138 women and 117 men). Some
52% of the women and 64% of the men held at least a Means and standard deviations for DES, ANG, and POS
college degree (based on information on 126 women and were 3.40 (0.68), 3.15 (0.75) and 4.25 (0.58), respectively.
110 men). Participants received a EUR 5 gift certificate The internal consistencies of the summated scales were
from a book store. (POS) = .79, (DES) = .69, (ANG) = .68, and (NEG)
(6 items) = .72.
Measures
Factor Structure and Cross-Gender Invariance
Self-Efficacy in Regulating Emotions
Model 3a from Study 1 represented a good model-data fit
POS items were adopted from Study 1. DES and ANG (see Table 1 for fit indices and Table 3 for information on
items were translated from Italian to German by a native anchor items). We then tested this model in a multiple-
German speaker trained in psychology and backtranslated group approach with gender as a group variable. Fit indices
by a professional translator. were acceptable, SB- (64) = 100.12, CFI = .94, RMSEA
= .07, 90% CI [.04, .09], SRMR = .07. Thus, configural
Life Satisfaction invariance was confirmed. When item loadings were con-
A single-item measure was preferred because participants strained to be equal across gender, the change in Satorra-
were required to complete several questionnaires for the Bentler was not significant, SB- (7) = 13.28, p = .07.
larger study. Therefore, life satisfaction was assessed with Also, after imposing additional equality constraints for the
the item How satisfied are you currently with your life as loadings of ANG and DES on the second-order factor
a whole? rated on an 11-point Likert scale from 0 (= totally NEG, the model fit did not change significantly. Next, item
dissatisfied) to 10 (= totally satisfied). This item has proved intercepts were constrained to be equal across gender. Here,
to be a valid measure (Fujita & Diener, 2005). the test indicated that the model fit decreased significantly,
SB- (7) = 18.85, p < .01. When equality constraints
Emotion Regulation Strategies were lifted from item POS/3, the change in became non-
significant, SB- (6) = 11.26, p = .08. Last, equality con-
Study 2 used a recently validated German version of the straints were imposed on the intercepts of ANG and DES,
ERQ (Abler and Kessler, 2009). Again, participants used a which led to a significant change in model fit, SB- (2)
7-point Likert scale from 1 (= strongly disagree) to 7 (= = 23.85, p < .001. When we tested partial scalar invariance
strongly agree). Cronbachs s were .82 for reappraisal and on this level by lifting the equality constraint from the in-
.61 for suppression. tercept of DES, the change in became nonsignificant,
SB- (1) = 1.47, p = .23. The fit indices for the second-
order partial scalar invariance model were SB- (79) =
Analytic Approach 127.70, CFI = .92, RMSEA = .07, 90% CI [.05, .09], and
SRMR = .11.
We tested the best-fitting model from Study 1 (i.e., model
3a) with the data from our total sample. Again, we used
MLR estimation to account for multivariate nonnormality Convergent Validity
(skewness: Srivastavas b1p = 27.48 (10), p < .001; kurtosis
(Mardias b2p = 131.08, p < .001). Since our sample con- As in Study 1, we correlated log-transformed RESE-R sub-
tained parent couples, we used the clustering procedure scales with life satisfaction, reappraisal, and suppression.
available in Mplus 5.21 (Muthn & Muthn, 2009). Cross- The results are shown in Table 2. Correlation coefficients
gender measurement invariance was examined using mul- refer to the total sample.
tigroup analyses. Following the stepwise procedure sug-
gested by Chen, Sousa, and West (2005), we tested a series
of nested models to assess configural, metric, and scalar Conclusion
invariance. According to Steenkamp and Baumgartner
(1998), at least one item besides the anchor item needs to Study 2 examined the German RESE-R in a nonstudent
be invariant on each factor to allow for comparisons of sample. The factor structure found in Study 1 was success-
fully replicated. Internal consistencies were satisfying for Study 1, caution is warranted when interpreting this result.
all subscales. Full scalar invariance was supported for the In Study 2, the association between ANG and suppression
ANG and DES factors. POS and the second-order factor was not significant.
NEG were found to be partially invariant. In line with our
suggestions, we found positive associations of the RESE
subscales with life satisfaction and reappraisal. Moreover,
POS was negatively related to suppression. Limitations and Next Steps
Whereas participants in Study 1 as well as participants in
previous studies in other countries (Caprara et al., 2008)
Discussion were young adults in their early twenties, Study 2 covered
a sample of adults who were the parents of preschool chil-
The goals of this research were to investigate the reliability, dren. Basically, the general factor structure and the pattern
factor structure, and convergent validity of the German of correlations found in Study 1 were replicated in Study
RESE scale, and to examine cross-gender invariance of the 2. These findings indicate that the results concerning relia-
scale for German speakers. In Studies 1 and 2, CFAs re- bilities and cross-gender invariance of the German RESE-
vealed that self-efficacy in regulating emotions can be R found in Study 2 might also apply to a population of
meaningfully described as consisting of two factors, self- younger adults. However, future studies should further ex-
efficacy in managing negative emotions (NEG) and self- plore the validity of the German RESE-R across samples
efficacy in expressing positive emotions (POS). NEG was varying in age and demographic background. In particular,
represented by two first order factors: self-efficacy in man- caution is necessary when administering the scale to per-
aging despondency/distress (DES) and self-efficacy in sons from different cultural backgrounds (Trommsdorff &
managing anger/irritation (ANG). These findings are con- Rothbaum, 2008). Moreover, the present research did not
sistent with those reported by Caprara et al. (2008) for Bo- uncover the sources of the multivariate skewness of the
livian, U. S., and Italian samples. For statistical reasons, a RESE item distributions. Further studies should investigate
model containing POS, DES, and ANG as three correlated whether scores might be biased by social desirability. Fi-
factors would provide a similar fit as the model with a sec- nally, even though the factor structure found in Italy, Bo-
ond-order factor NEG. However, the latter model includes livia, and the United States was replicated, systematic
the theoretically meaningful distinction between positive crosscountry investigations are needed.
and negative emotions which is in line with previous theo-
rizing and findings (Caprara et al., 2008). Internal consis-
tencies could be substantially improved by the revised Ger-
man RESE scale (RESE-R) used in Study 2. They were Conclusions
comparable to the internal consistencies of the Italian,
Spanish, and English versions of the scales, which range Findings suggest that the RESE scale is a valid instrument
from .64 to .85 (Caprara et al., 2008). Partial scalar cross- to assess self-efficacy in expressing positive emotions and
gender invariance of the German RESE-R was demonstrat- regulating negative ones in both men and women in Ger-
ed in Study 2. Equality constraints could be maintained for man-speaking samples. As the revised version of the Ger-
all items of the DES and ANG subscales, and for all but man scale showed better internal consistencies than the ver-
one item of the POS subscale. According to Steenkamp and sion of the scale used in Study 1, we recommend using the
Baumgartner (1998), cross-gender comparisons of mean German RESE-R.
values of the POS, DES, and ANG factors are thus mean-
ingful. Nevertheless, caution is necessary because estimat-
ed factor mean differences may differ depending on the Acknowledgments
anchor indicators chosen for the factor models (Vanden-
berg, 2002). Convergent validity of the RESE scale was The authors contributions are acknowledged in alphabet-
investigated and confirmed in both studies. The significant ical order: Development of the study design and selection
positive correlations of POS, DES, and ANG with life sat- of instruments were carried out by Gian Vittorio Caprara,
isfaction and reappraisal were consistent with our hypoth- Maria Gerbino, Catherine Gunzenhauser, Tobias Heikamp,
eses. Furthermore, the suggested negative correlation be- Antje von Suchodoletz, and Gisela Trommsdorff. Data col-
tween POS and suppression was supported by the data. In lection was conducted by Catherine Gunzenhauser, Tobias
Study 1, we found an unexpected positive correlation be- Heikamp, and Antje von Suchodoletz. Statistical analyses
tween ANG and suppression. This might be due to the item were performed by Guido Alessandri and Catherine Gun-
that was excluded in Study 2 (managing negative feelings zenhauser. The article was composed and revised by Guido
when reprimanded by parents), which may be related to Alessandri, Gian Vittorio Caprara, Laura Di Giunta, Maria
suppression of anger in asymmetric relationships. Howev- Gerbino, Catherine Gunzenhauser, Tobias Heikamp, Antje
er, considering the low internal consistency of ANG in von Suchodoletz, and Gisela Trommsdorff. Study 1 was
financed by a grant from the German Research Foundation internalizing problem behavior. Child Development, 72,
(DFG) as a part of the project Developmental Conditions 11121134. doi 10.1111/1467-8624.00337
of Intentionality (Principal Investigator: Prof. Dr. Gisela Fujita, F., & Diener, E. (2005). Life satisfaction set point: Stability
Trommsdorff, DFG GZ, TR 169/14-2) within the interdis- and change. Journal of Personality and Social Psychology, 88,
ciplinary research group Limits of Intentionality (DFG 158164. doi 10.1037/00223514.88.1.158.
Research Unit 582) at the University of Konstanz, Germa- Gross, J. J., & John, O. P. (2003). Individual differences in two
emotion regulation processes: Implications for affect, relation-
ny. Study 2 was financed by grants given to the Research
ships, and well-being. Journal of Personality and Social Psy-
Group The Empirics of Education: Economic and Behav- chology, 85, 348362. doi 10.1037/0022-3514.85.2.348
ioral Perspectives in the context of the German Initiative Gross, J. J., Richards, J. M., & John, O. P. (2006). Emotion regu-
of Excellence at the University of Freiburg, Germany. We lation in everyday life. In D. K. Snyder, J. Simpson, & J. N.
thank Eva Grunenberg, Natalie Hoffmann, and Antonia Hughes (Eds.), Emotion regulation in couples and families:
Skock for their great help with data collection and data Pathways to dysfunction and health (pp. 1335). Washington,
preparation. DC: American Psychological Association. doi 10.1037/11468-
001037/11468-001
IBM SPSS Statistics. (2010). SPSS Statistics version 19.0 [Com-
puter software]. Somers, NY: IBM Corporation.
References Kelloway, E. K. (1998). Using LISREL for structural equation
modeling: A researchers guide. Thousand Oaks, CA: Sage.
Abler, B., & Kessler, H. (2009). Emotion Regulation Question- Knight, G. P., Guthrie, I. K., Page, M. C., & Fabes, R. A. (2002).
naire Eine deutschsprachige Fassung des ERQ von Gross Emotional arousal and gender differences in aggression: A
und John [Emotion Regulation Questionnaire A German ver- meta-analysis. Aggressive Behavior, 28, 366393. doi 10.
sion of the ERQ by Gross and John]. Diagnostica, 55, 1002/ab.80011
144152. doi 10.1026/00121924.55.3.144 Lang, P. L. (1995). The emotion probe. American Psychologist,
Bandura, A. (1997). Self-efficacy: The exercise of control. New 50, 372385. doi 10.1037//0003-066X.50.5.372
York: Freeman. Lyubomirsky, S., King, L., & Diener, E. (2005). The benefits of
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assess- frequent positive affect: Does happiness lead to success? Psy-
ing model fit. In K. A. Bollen & S. J. Long (Eds.), Testing chological Bulletin, 131, 803855. doi 10.1037/0033-
structural equation models (pp. 136162). Newbury Park, CA: 2909.131.6.803
Sage. doi 10.1177/0049124192021002005 Muthn, B., & Muthn, L. (2009). Mplus (Version 5.21) [Com-
Caprara, G. V., Di Giunta, L., Eisenberg, N., Gerbino, M., Pasto- puter software]. Los Angeles, CA: Authors.
relli C., & Tramontano, C. (2008). Assessing regulatory emo- Nolen-Hoeksema, S., & Aldao, A. (2011). Gender and age differ-
tional self-efficacy in three countries. Psychological Assess- ences in emotion-regulation strategies and their relationship to
ment, 20, 227237. doi 10.1037/1040-3590.20.3.227 depressive symptoms. Personality and Individual Differences,
Caprara, G. V., Di Giunta, L., Pastorelli, C., & Eisenberg, N. 51, 704708. doi 10.1016/j.paid.2011.06.012
(2011). Mastery of negative affect: The hierarchical structure OBoyle, E. H., & Williams, L. J. (2011). Decomposing model fit:
of self-efficacy beliefs related to discrete negative emotions. Measurement vs. theory in organizational research using latent
Manuscript submitted for publication. variables. Journal of Applied Psychology, 96, 112. doi
Caprara, G. V., & Gerbino, M. (2001). Autoefficacia Emotiva Per- 10.1037/a0020539
cepita: La capacit di regolare laffetivit negative e di esprimere Pavot, W., & Diener, E. (1993). Review of the Satisfaction with
quella positive [Affective perceived self-efficacy: The perceived Life Scale. Psychological Assessment, 5, 164172. doi
capacity to regulate negative affect and to express positive af- 10.1037/1040-3590.5.2.164
fect]. In G. V. Caprara (Ed.), La valutazione dellautoefficacia Satorra, A. (2000). Scaled and adjusted restricted tests in multi-
(pp. 3550). Trento, Italy: Edizioni Erickson. sample analysis of moment structures. In R. D. H. Heijmans,
Caprara, G., & Steca, P. (2005). Self-efficacy beliefs as determi- D. S. G. Pollock, & A. Satorra (Eds.), Innovations in multivar-
nants of prosocial behavior conductive to life satisfaction iate statistical analysis: A Festschrift for Heinz Neudecker
across ages. Journal of Social and Clinical Psychology, 24, (pp. 233247). London: Kluwer. doi 10.2139/ssrn.189431
191217. doi 10.1521/jscp.24.2.191.62271 Slva, M., Baumann, U., & Lettner, K. (1995). Wohlbefinden:
Chen, F. F., Sousa, K. H., & West, S. G. (2005). Testing measure- Definitionen, Operationalisierungen, empirische Befunde
ment invariance of second-order factor models. Structural Equa- [Well-being: Definitions, operationalizations, empirical re-
tion Modeling, 12, 471492. doi 10.1207/s15328007sem1203_7 sults]. Zeitschrift fr Gesundheitspsychologie, 3, 292309. doi
Davidov, M., & Grusec, J. E. (2006). Untangling the links of pa- 10.1024/1421-0185/a000003
rental responsiveness to distress and warmth to child out- Sontag, L. M., & Graber, J. A. (2010). Coping with perceived peer
comes. Child Development, 77, 4458. doi 10.1111/j.1467- stress: Gender-specific and common pathways to symptoms of
8624.2006.00855.x psychopathology. Developmental Psychology, 46, 16051620.
Diener, E., Emmons, R. A., Larsen, R. J., & Griffin, S. (1985). The doi 10.1037/a0020617
Satisfaction with Life Scale. Journal of Personality Assess- Steenkamp, J. E. N., & Baumgartner, H. (1998). Assessing meas-
ment, 49, 7175. doi 10.1207/s15327752jpa4901_13 urement invariance in cross-national consumer research. Jour-
Eisenberg, N., Cumberland, A., Spinrad, T. L., Fabes, R. A., Shep- nal of Consumer Research, 25, 7890. doi 10.1086/209528
ard, S. A., Reiser, M., . . . Guthrie, I. K. (2001). The relations Tamres, L. K., Janicki, D., & Helgeson, V. S. (2002). Sex differ-
of regulation and emotionality to childrens externalizing and ences in coping behavior: A meta-analytic review and an ex-
amination of relative coping. Personality and Social Psychol- Published online: July 20, 2012
ogy Review, 6, 230. doi 10.1207/S15327957PSPR0601_1
Trommsdorff, G., & Rothbaum, F. (2008). Development of
emotion regulation in cultural context. In M. Vandekerckhove, Catherine Gunzenhauser
C. v. Scheve, S. Ismer, S. Jung, & S. Kronast (Eds.), Regulating
emotions: Culture, social necessity, and biological inheritance Research Group The Empirics of Education
(pp. 85120). Malden, MA: Blackwell. doi 10.1002/ University of Freiburg
9781444301786.ch4 Starkenstr. 44
Vandenberg, R. J. (2002). Toward a further understanding of im- 79085 Freiburg
provement in measurement invariance methods and proce- Germany
dures. Organizational Research Methods, 5, 139158. doi Tel. +49 761 2039-7589
10.1177/1094428102005002001 Fax +49 761 2039-7592
E-mail catherine.gunzenhauser@psychologie.uni-freiburg.de
L. Kriston
European et al.:
Journalof Reliability and
Psychological Validity
Assessment of theVol.
2013;
2012 German
Hogrefe YSQ-S3
29(3):205212
Publishing
Original Article
Reliability and Validity of the

German Version of the
Young Schema Questionnaire
Short Form 3 (YSQ-S3)
Levente Kriston1, Janina Schfer2, Gitta A. Jacob2,3,
Martin Hrter1, and Lars P. Hlzel2
1
Department of Medical Psychology, University Medical Center Hamburg-Eppendorf, Hamburg,
Germany, 2Department of Psychiatry and Psychotherapy, University Medical Center Freiburg, Freiburg
i. Br., Germany, 3Institute of Psychology, University of Freiburg, Freiburg i. Br., Germany
Abstract. To date, the psychometric properties of the German version of the Young Schema Questionnaire Short Form 3 (YSQ-S3)
have not yet been examined. We investigated the reliability and validity of the German version of the YSQ-S3. In a community sample
of 1,150 participants and a clinical sample of 30 psychiatric inpatients, we performed reliability analysis, confirmatory factor analysis,
and correlation analyses, and tested for group differences using analysis of variance. The YSQ-S3 proved to be reliable and corresponded
to the theoretically proposed 18-dimensional structure. Schema scores were positively associated with measures of psychopathology and
personality disorder, indicating convergent validity. Furthermore, the YSQ-S3 differentiated between participant subgroups defined by
level of health-care utilization, supporting discriminant validity. We conclude that the YSQ-S3 is a psychometrically sound instrument
that can be used in German-speaking countries in research on early maladaptive schemas. Further research is necessary particularly in
larger clinical samples.
Keywords: early maladaptive schema, Young Schema Questionnaire, factor structure, reliability, validity
Introduction EMS are usually assessed with self-report question-

naires, mainly the Young Schema Questionnaire (YSQ).
Schema therapy is a psychotherapy approach combining The original version of this instrument was developed by
traditional cognitive-behavioral therapy with elements of Young (1990) to assess 16 schemas. It consists of 205
psychodynamic approaches, Gestalt therapy, and humanis- items, and it was shown to be reliable and valid in large
tic therapies (Young, Klosko, & Weishaar, 2003). It recent- clinical and student samples (Schmidt, Joiner, Young, &
ly gained increased attention, since outcome studies dem- Telch, 1995). Based on the findings of Schmidt et al.
onstrated its efficacy and effectiveness in patients with per- (1995), Young and Brown (1994) developed a short form
sonality disorders, mainly borderline personality disorder of the YSQ comprising five items for each of the 16 EMS.
(Farrell, Shaw, & Webber, 2009; Giessen-Bloo et al., 2006; The psychometric properties of various translations of both
Gude & Hoffart, 2008; Nadort et al., 2009; Nordahl & YSQ versions have been investigated in several studies
Nysaeter, 2005). (overview in Oei & Baranoff, 2007). In their review, Oei
Central to schema therapy is the concept of early mal- and Baranoff (2007) conclude that internal consistency,
adaptive schemas (EMS), thought to develop in childhood test-retest reliability, convergent validity, and discriminant
when essential needs of the child remain unmet. EMS com- validity of the YSQ are sufficient to warrant the use of the
prise feelings, cognitions, bodily sensations, and coping be- YSQ in research settings. However, since findings on the
haviors (Young et al., 2003). Young et al. defined 18 sche- factorial structure are conflicting, the relevance of further
mas, clustered in five schema domains, each of which is research was highlighted (Oei & Baranoff, 2007).
considered to be connected with specific unmet childhood The short form of the YSQ has been revised repeatedly.
needs (see Table 1). In its latest form, the YSQ-S3, it comprises 90 items as-
DOI: 10.1027/1015-5759/a000143
206 L. Kriston et al.: Reliability and Validity of the German YSQ-S3
sessing 18 schemas with five items each (Young, 2005). To Instruments

date, four studies have tested the psychometric qualities of
the YSQ-S3 in four different languages (Portuguese: Rijo The YSQ-S3 (Young, 2006) is a self-report instrument. Peo-
& Gouveia, 2008; Finnish: Saariaho, Saariaho, Karila, & ple are asked to describe themselves by rating descriptive
Joukamaa, 2009; Turkish: Soygt, Karaosmano6lu, & Ca- statements through a 6-step Likert-type response format
kir, 2009; Romanian: Trip, 2006). All studies found good ranging from completely untrue of me to describes me per-
discriminant validity with regard to group differences (pa- fectly. Higher values indicate a stronger presence of the re-
tient groups always scored higher than psychologically spective schema. The YSQ-S3 assesses 18 EMS (see Table
healthy groups) and correlations with other measures of 1) with five items per scale, resulting in a total of 90 items.
psychopathology, but results regarding the factorial struc-
ture were less conclusive. Trip (2006) found high internal
consistency of all subscales in a Romanian version. Rijo Table 1. Schemas, schema domains, and associated needs
and Gouveia (2008) confirmed the proposed factorial struc- Schema domains Associated needs Schemas
ture in a Portuguese sample by confirmatory factor analy- Disconnection Safe attachment, Emotional deprivation
sis; however, six items had to be removed due to substantial acceptance, nurtur- Abandonment/Instability
crossloadings. With a Turkish version, Soygt et al. (2009) ing, protection Mistrust/Abuse
found 14 of the proposed 18 factors in a principal compo- Social isolation/Alienation
nents analysis. Saariaho et al. (2009) tested a Finnish trans- Defectiveness/Shame
lation of the YSQ-S3 and verified the 18-factor structure Impaired autono- Autonomy, compe- Failure to achieve
using confirmatory factor analysis. Furthermore, internal my and achieve- tency, identity Dependence/Incompetence
ment Vulnerability to harm or ill-
consistency of all subscales was high. Putting together the ness
findings from theses studies suggests that the YSQ-S3 is a Enmeshment/Undeveloped
psychometrically sound instrument. self
So far, the German translation of the YSQ-S3 (Young, Impaired limits Realistic limits and Entitlement/Grandiosity
2006; German translation: Berbalk, Grutschpalk, Parfy, & self-control Insufficient self-control
Zarbock) has yet to be tested for psychometric properties. Other-directed- Free expression of Subjugation
An increasing number of both schema therapy studies in ness needs and emotions Self-sacrifice
the German-speaking region and international schema ther- Approval-seeking
apy studies with German-speaking centers, however, call Exaggerated vigi- Spontaneity and Emotional inhibition
for a psychometric test of the German YSQ-S3. With this lance and inhibi- play Unrelenting standards
tion Negativity/Pessimism
study we examined the reliability as well as factorial, con- Punitiveness
vergent, and discriminant validity of the German YSQ-S3.
A short-form of the Symptom Checklist 90-R (SCL-90-R;

Derogatis, 1992), the SCL-K-9 (Klaghofer & Brhler, 2001),
Materials and Methods was administered as a global measure of psychological dis-
tress. Participants rated the intensity of psychopathological
Sampling Procedure symptoms within the past week on nine 5-step Likert-scaled
items. The SCL-K9 measures a single dimension of general
Cross-sectional data were collected in a convenience com- symptom severity and correlates with the General Severity
munity sample and a clinical sample. Data in the commu- Index (GSI) of the SCL-90-R to a degree of 0.93 with a Cron-
nity sample were collected via internet from July 14 to Au- bachs of 0.87 (Klaghofer & Brhler, 2001).
gust 13, 2009. The survey link was distributed via email to Furthermore, a brief self-rating instrument to assess
psychology faculties at German universities, several inter- symptoms of personality disorders was used. The Standard-
net forums, and the of authors surroundings. Participants ized Assessment of Personality Abbreviated Scale (SA-
did not get incentives for study participation, and no per- PAS; Moran et al., 2003) contains eight items with a binary
sonal data (e.g., IP-address) were recorded. response format. Germans, van Heck, Moran, and Hodia-
Data from the clinical sample were collected among in- mont (2008) used it as a self-rating instrument and found a
patients of the psychiatric Department of a university med- good test-retest reliability of 0.89 and a Cronbachs of
ical center between August 14 and September 25, 2009. 0.45. Discriminant validity was sufficient with a correct
Patients gave written informed consent and were then classification rate of 81% in subjects with and without per-
asked to fill out a paper-and-pencil version of the instru- sonality disorders.
ments used in the online survey. Completed questionnaires Additionally, self-reported demographic (sex, age,
were collected in envelopes that were closed by the patients family status, level of education, employment status) and
themselves to ensure anonymity. clinical data (self-reported diagnosis of a mental disorder
The study protocol was approved by the local ethics and past health care utilization due to a mental disorder)
committee. were collected.
L. Kriston et al.: Reliability and Validity of the German YSQ-S3 207
Statistical Analysis rather than the correlation matrix because the latter is like-
ly to lead to statistical and generalizability problems (e.g.,
Due to the limited size of the clinical sample, analysis of Cudeck, 1989; Hair, Black, Babin, & Anderson, 2009).
reliability as well as factorial and convergent validity was Although the responses to the items of the YSQ-S3 should
calculated only with data from the community sample. be considered ordered categorical rather than continuous,
Reliability was estimated by calculating item discrimina- we applied the maximum likelihood estimator, since it
tion statistics and Cronbachs for each scale. Factorial was shown to be robust in similar situations. Furthermore,
validity was analyzed with confirmatory factor analysis. alternative estimation methods would have required a
The predefined structure consisted of 18 oblique (corre- considerably larger sample size (e.g., Green, Akey, Flem-
lated) factors (representing schemas) and uncorrelated ering, Hershberger, & Marquis, 1997; Muthn & Kaplan,
rors corresponding to Youngs model. The covariance ma- 1985; Tabachnick & Fidell, 2007). Local fit was investi-
trix of the items was analyzed by means of maximum like- gated by examining factor loadings, factor reliabilities,
lihood estimation. We preferred to analyze the covariance average extracted variance in items, and congruence (cor-
relation) between factor scores and corresponding scale
Table 2. Sample characteristics sum scores. According to expert recommendations (e.g.,
Hair et al., 2009; Kline, 2010; Schweizer, 2010), we as-
Community Clinical sessed global model fit using the discrepancy statistic,
sample sample the normed statistic, the Bentler comparative fit index
n = 1,150 n = 30 (CFI), the root mean square error of approximation
Sex; n (%) (RMSEA), and the standardized root mean square residual
female 899 (78.2) 19 (67.9) (SRMR). A more detailed evaluation of the latent factorial
male 251 (21.8) 9 (32.1) structure of the YSQ-S3 was reported by Kriston, Schfer,
Age; years Hrter, and Hlzel (2010). Convergent validity was esti-
mean (SD) 29.3 (10.1) 40.5 (11.4)
mated via Pearson correlation coefficients between sche-
ma scales and general distress (SCL-K-9) as well as symp-
Family status; n (%)
toms of personality disorders (SAPAS), respectively.
single 896 (77.9) 17 (56.7) Discriminant validity was examined by comparing
married 188 (16.3) 9 (30.0) schema scale scores of the clinical and the community
separated 15 (1.3) 2 (6.7) sample and by comparing subjects with different intensi-
divorced 48 (4.2) 2 (6.7) ty of health care utilization within the community sample
widowed 3 (0.3) 0 (0) (former counseling, former outpatient treatment, former
Level of education; n (%) inpatient treatment due to a mental disorder). Mean sche-
ma scale scores were compared across groups by univar-
low 29 (2.5) 5 (16.7)
iate analysis of variance (ANOVA). All analyses were
middle 223 (19.4) 12 (40.0)
performed using SPSS 15.0 (SPSS Inc., Chicago, IL) and
high 898 (78.1) 13 (43.3) Mplus 5.1 (Muthn & Muthn, 1998).
Occupational status; n (%)
employed 451 (39.2) 9 (33.3)
homemaker 23 (2.0) 1 (3.7)
pensioner 23 (2.0) 5 (18.5) Results
student 557 (48.4) 3 (11.1)
unemployed 35 (3.0) 9 (33.3)
Participant Characteristics
other 61 (5.3) 0 (0) Data were collected from 1,150 online survey participants
Lifetime diagnosis of a mental health disorder; n (%) (community sample) and 30 psychiatric patients (clinical
yes 244 (21.2) sample). Sample characteristics are given in Table 2.
no 906 (78.8) Community sample participants were mostly female,
Health care utilization due to mental health disorder; n (%)a rather young, and highly educated. One fifth reported a life-
none 755 (65.7) time diagnosis of a mental disorder. The most frequent en-
counseling 220 (19.1)
tries were affective disorder (n = 134), anxiety disorder (n
= 43), eating disorder (n = 32), and borderline personality
outpatient treatment 229 (19.9)
disorder (n = 25).
inpatient treatment 81 (7.0) The majority of the clinical sample was female. Their
Notes. Missing data were not present in the community sample. Valid mean age was 40 years. Most patients had a medium to high
sample size varies between 27 and 30 in the clinical sample because
of missing data (valid percentages are reported). Data on lifetime level of education and around one third were employed.
mental health-disorder diagnosis and health-care utilization were not The reasons for inpatient treatment (including comorbid
collected in the clinical sample. amultiple selections possible. diseases) were affective disorder (n = 17), psychotic disor-
Table 3. Results of the reliability analysis

Corrected item-total correlations
Scale (schema) 1st item 2nd item 3rd item 4th item 5th item
Emotional deprivation 0.70 0.74 0.70 0.80 0.60 0.88
Abandonment/Instability 0.67 0.71 0.76 0.63 0.67 0.87
Mistrust/Abuse 0.52 0.62 0.66 0.72 0.58 0.82
Social isolation/Alienation 0.77 0.61 0.82 0.79 0.81 0.90
Defectiveness/Shame 0.75 0.84 0.79 0.79 0.74 0.91
Failure to achieve 0.73 0.50 0.79 0.78 0.74 0.87
Dependence/Incompetence 0.63 0.61 0.58 0.55 0.71 0.82
Vulnerability to harm or illness 0.61 0.70 0.44 0.47 0.54 0.77
Enmeshment/Undeveloped self 0.57 0.57 0.43 0.60 0.59 0.78
Subjugation 0.57 0.66 0.53 0.61 0.68 0.82
Self-sacrifice 0.53 0.51 0.67 0.55 0.63 0.80
Emotional inhibition 0.71 0.73 0.58 0.68 0.77 0.87
Unrelenting standards 0.52 0.57 0.51 0.54 0.36 0.74
Entitlement/Grandiosity 0.34 0.55 0.30 0.51 0.43 0.67
Insufficient self-control 0.61 0.57 0.52 0.61 0.56 0.80
Approval-seeking 0.42 0.62 0.62 0.60 0.58 0.79
Negativity/Pessimism 0.71 0.72 0.71 0.58 0.65 0.86
Punitiveness 0.54 0.44 0.45 0.47 0.39 0.70
Note. refers to Cronbachs .
der (n = 6), anxiety disorder (n = 4), borderline personality extracted variance in items of the scales Entitle-
disorder (n = 4), obsessive-compulsive disorder (n = 2), ment/Grandiosity and Punitiveness was substantially low-
posttraumatic stress disorder (n = 2), attention deficit er than required (0.34 and 0.37, respectively). Factor-scale
hyperactivity disorder (n = 2), substance dependence (n = congruence (correlation between factor scores and corre-
1), and somatoform disorder (n = 1). sponding scale sum scores) was high (at least 0.95) for 17
scales. Congruence of the scale Punitiveness yielded 0.85.
All factors were positively and statistically significantly
Reliability (p < .001) interrelated, with correlations ranging from
0.21 to 0.87.
Results of the reliability analysis are shown in Table 3. Item The discrepancy test indicated a statistically signif-
discriminations (corrected item-total correlations) exceed- icant misfit between defined and observed variance-co-
ed the recommended threshold of 0.40 in all but four cases variance matrix (2 = 11,516.1; df = 3,762; p < .001). The
(one item of the scale Unrelenting Standards, two items of normed statistic (3.06) marginally exceeded the
the scale Entitlement/Grandiosity, and one item of the scale threshold for an acceptable fit (below 3.00). The CFI
Punitiveness). Internal consistency of 17 of the 18 scales (0.847) missed the required threshold for an acceptable
was sufficient (Cronbachs > 0.70). Internal consistency fit (above 0.90). The RMSEA (0.042; 95% CI =
of the scale Entitlement/Grandiosity was 0.67. 0.0410.043) was less than 0.050, indicating a good fit.
The SRMR (0.060) stayed below 0.100 signifying an ac-
ceptable model fit.
Factorial Validity
Factorial validity results are reported in Table 4. Loadings Convergent Validity
of all but three items exceeded the required threshold of
0.40 (one item of the scale Entitlement/Grandiosity, two Pearson correlation coefficients (r) between YSQ-S3 scales
items of the scale Punitiveness). Factor reliability was sat- and general psychopathology as measured by the SCL-K-9
isfactory (> 0.70) in all scales except Entitlement/Grandi- ranged from 0.28 (Scale Entitlement/Grandiosity) to 0.65
osity. Average extracted variance failed to reach the de- (scales Abandonment/Instability and Negativity/Pessi-
sired threshold of 0.50 but was still above 0.40 for the mism) as reported in Table 4. Twelve of the 18 scales
scales Vulnerability to Harm or Illness, Enmeshment/Un- showed a medium or strong association (r > 0.5) with the
developed Self, Self-Sacrifice, Unrelenting Standards, In- SCL-K-9.
sufficient Self-Control, and Approval-Seeking. Average Associations of the YSQ-S3 scales with symptoms of
Table 4. Results of the factorial and convergent validity analysis

Factor loadings Factor Variance Factor Correlation with
Scale (schema) 1st item 2nd item 3rd item 4th item 5th item reliability extracted congru- SCL-K-9 SAPAS
ence
Emotional deprivation 0.74 0.81 0.78 0.87 0.64 0.88 0.62 0.98 0.50 0.48
Abandonment/Instability 0.70 0.74 0.85 0.69 0.77 0.87 0.61 0.98 0.65 0.56
Mistrust/Abuse 0.58 0.73 0.76 0.78 0.63 0.83 0.51 0.95 0.60 0.59
Social isolation/Alienation 0.81 0.65 0.86 0.86 0.86 0.91 0.70 0.99 0.57 0.51
Defectiveness/Shame 0.80 0.87 0.84 0.84 0.80 0.92 0.72 0.99 0.59 0.54
Failure to achieve 0.78 0.58 0.85 0.85 0.80 0.89 0.65 0.99 0.55 0.41
Dependence/Incompetence 0.69 0.69 0.66 0.62 0.81 0.83 0.54 0.95 0.61 0.51
Vulnerability to harm or illness 0.79 0.82 0.48 0.56 0.60 0.78 0.47 0.95 0.60 0.47
Enmeshment/Undeveloped self 0.65 0.62 0.51 0.65 0.76 0.78 0.47 0.95 0.50 0.45
Subjugation 0.70 0.73 0.55 0.68 0.76 0.82 0.53 0.95 0.57 0.50
Self-sacrifice 0.60 0.57 0.79 0.62 0.73 0.80 0.48 0.98 0.34 0.22
Emotional inhibition 0.77 0.80 0.67 0.75 0.81 0.87 0.62 0.99 0.44 0.42
Unrelenting standards 0.64 0.66 0.61 0.67 0.45 0.74 0.42 0.96 0.37 0.41
Entitlement/Grandiosity 0.44 0.72 0.38 0.67 0.56 0.69 0.34 0.96 0.28 0.28
Insufficient self-control 0.67 0.68 0.61 0.67 0.67 0.80 0.49 0.97 0.51 0.40
Approval-seeking 0.48 0.72 0.79 0.65 0.63 0.79 0.46 0.96 0.38 0.34
Negativity/Pessimism 0.79 0.80 0.78 0.64 0.70 0.86 0.61 0.97 0.65 0.57
Punitiveness 0.72 0.57 0.34 0.39 0.68 0.67 0.37 0.85 0.42 0.39
Notes. All reported parameters are statistically significantly different from zero at p < .001. SCL-K-9 = Nine-item short-form of the Symptom
Checklist 90-R (SCL-90-R). SAPAS = Standardized Assessment of Personality Abbreviated Scale.
Figure 1. Mean schema profiles of subgroups according to health-care utilization. The overall difference between group
means in one-factorial analysis of variance is statistically significant for each scale at p < .001.
personality disorder as measured by the SAPAS were (with test is oversensitive (i.e., almost always statistically sig-
the exception of the scale Unrelenting Standards) weaker nificant) in sample sizes over 300, and that the CFI tends
than associations with psychopathological symptoms, but to indicate poorer fit with increasing model complexity.
still statistically significant (see Table 4). Seven YSQ-S3 Schweizer (2010) also points out that the recommended
scales showed an at least medium-sized association (r > cutoff limits for the normed statistic may need to be
0.5) with the SAPAS, with the scales Mistrust/Abuse, Neg- enlarged in large samples. Since our study was conducted
ativity/Pessimism, and Abandonment/Instability yielding with a large sample size (more than 1,000 participants)
the strongest interrelations. and a rather complex model (18 correlated factors), we
relied on RMSEA and SRMR more strongly than on the
test and the CFI. But, it should be noted that our con-
Discriminant Validity firmatory factor-analytic model support is still slightly
weaker than in the study by Rijo and Gouveia (2008) and
Statistically significant (p < .001) group differences in all considerably inferior to the findings reported by Saariaho
YSQ-S3 scales were found for groups defined by health- et al. (2009).
care utilization (see Figure 1). Most schemas were most Surprisingly, YSQ-S3 scores are seemingly more
pronounced in the clinical sample, followed by former strongly related to general psychopathology than to per-
inpatients from the community sample. Similarly, out- sonality disorder symptoms. Based on this finding one
patients scored lower than inpatients but higher than sub- might argue that the association between personality dis-
jects reporting counseling sessions. Participants reporting order features and reported schema scores are moderated
no mental-health related health-care utilization scored by schema modes that may weaken the intercorrelation
lowest on all scales. Overall differences were most pro- when avoidant/detached coping styles are present. How-
nounced for schemas in the domains Disconnection and ever, Lobbestael, van Vreeswijk, and Arntz (2008) argue
Impaired Autonomy and Achievement (see Table 1). that schema modes, particularly the so-called detached
protector mode, which covers avoidant/detached coping,
can reflect general psychopathology. Further investiga-
tion of the multiple associations between personality psy-
Discussion chopathology, global psychopathology, schemas, and
schema modes are clearly necessary. However, a detailed
The present study revealed strong empirical support for clinical discussion of this issue would go beyond the
the psychometric soundness of the German version of the scope of the present psychometric study. It should also
YSQ-S3. The instrument proved to be reliable and showed be noted that the instrument used for assessing personal-
acceptable factorial validity. Schema scores were posi-
ity disorder symptoms (SAPAS) uses binary items that
tively associated with measures of psychopathology and
have higher response thresholds than items of the SCL.
personality disorder, indicating convergent validity. The
This is likely to lead to a decreased variance of the SA-
YSQ-S3 differentiated between subgroups with different
PAS score, which in turn may deteriorate the Pearson cor-
levels of health-care utilization, supporting discriminant
relation coefficients. In summary, no definite conclusion
validity. Although the scales Entitlement/Grandiosity and
on the comparison of these interrelations should be
Punitiveness showed some problems with regard to reli-
drawn.
ability and factorial validity, we still consider them suffi-
ciently good to be used without serious revision. A major limitation of our study is our investigating a
All YSQ-S3 scales are associated with self-rated gen- convenience community sample. The online survey for-
eral psychopathology, personality disorder severity, and mat allowed us to collect complete data from a large sam-
health care utilization. Furthermore, YSQ-S3 scales are ple, but it did not allow for controlling the selection of
highly interrelated. These results raise questions regard- participants. However, a substantial proportion of the
ing the specificity of the schema constructs, i.e., whether community sample reported current or earlier health-care
different schemas can indeed be regarded as different treatment, and variance in YSQ-S3 scores was satisfying-
constructs (Kriston et al., 2010), and/or whether they can ly high. Demographic and clinical characteristics ranged
be grouped in a hierarchical structure (Rijo & Gouveia, broadly, leading to a heterogeneous sample without a
2008). Schema theory suggests five higher-order schema clear population to which the findings could be general-
domains (see Table 1; Young et al., 2003), however, the ized. Although we examined an additional clinical sam-
validity of schema domains is currently being discussed ple, its size was rather small because of limited resources.
(Lockwood, 2011). Thus, further investigations in large clinical samples on
At first glance, global fit indexes are inconclusive in the psychometric properties of the YSQ-S3 are necessary.
confirmatory factor analysis. While some (discrepancy To our knowledge this study is the first to test the Ger-
chi-square test, normed , CFI) indicate poor fit, others man version of the YSQ-S3. Largely in agreement with
(RMSEA, SRMR) support the hypothesized structure. previous findings in other countries and with former ver-
However, Hair et al. (2009) note that the discrepancy sions of the instrument, we conclude that the YSQ-S3
provides a valid and reliable measurement of Youngs Annual International Society of Schema Therapy Meeting,
early maladaptive schemas. Berlin, Germany.
Lobbestael, J., van Vreeswijk, M., & Arntz, A. (2008). An empir-
ical test of schema mode conceptualizations in personality dis-
orders. Behavior Research and Therapy, 46, 854860.
Acknowledgments Lockwood, G. (2011, February 12). Do schema domains exist? [On-
line blog]. Retrieved from http://www.isst-online.com/node/226
We thank all persons who participated in the study either Moran, P., Leese, M., Lee, T., Walters, P., Thornicroft, G., &
as inpatients or by filling out the online survey. We are Mann, A. (2003). Standardized Assessment of Personality
grateful to the two anonymous reviewers for their helpful Abbreviated Scale (SAPAS): Preliminary validation of a brief
comments on a previous version of the manuscript. screen for personality disorder. British Journal of Psychiatry,
183, 228232.
Muthn, B., & Kaplan, D. (1985). A comparison of some meth-
odologies for the factor analysis of nonnormal Likert variables.
British Journal of Mathematical and Statistical Psychology,
References 38, 171189.
Muthn, L. K., & Muthn, B. O. (1998). Mplus users guide (6th
Cudeck, R. (1989). Analysis of correlation matrices using covari- ed.). Los Angeles, CA: Muthn & Muthn.
ance structure models. Psychological Bulletin, 105, 317327. Nadort, M., Arntz, A., Smit, J. H., Giessen-Bloo, J., Eikelenboom,
Derogatis, L. R. (1992). The SCL-90-R, administration, scoring M., Spinhoven, P., . . . van Dyck, R. (2009). Implementation
and procedures manual (2nd ed.). Towson, MD: Clinical Psy- of outpatient schema therapy for borderline personality disor-
chometric Research. der with versus without crisis support by the therapist outside
Farrell, J. M., Shaw, I. A., & Webber, M. A. (2009). A schema-fo- office hours: A randomized trial. Behavior Research and Ther-
cused approach to group psychotherapy for outpatients with apy, 47, 961973.
borderline personality disorder: A randomized controlled trial. Nordahl, H. M., & Nysaeter, T. E. (2005). Schema therapy for pa-
Journal of Behavior Therapy and Experimental Psychiatry, 40, tients with borderline personality disorder: A single case se-
317328. ries. Journal of Behavior Therapy and Experimental Psychia-
Germans, S., van Heck, G. L., Moran, P., & Hodiamont, P. P. G. try, 36, 254264.
(2008). The self-report Standardized Assessment of Personal- Oei, T. P. S., & Baranoff, J. (2007). Young schema questionnaire:
ity-Abbreviated Scale: Preliminary results of a brief screening Review of psychometric and measurement issues. Australian
test for personality disorders. Personality and Mental Health, Journal of Psychology, 59, 7886.
2, 7076. Rijo, D., & Gouveia, P. (2008, October). A confirmatory factor
Giessen-Bloo, J., van Dyck, R., Spinhoven, P., van Tilburg, W., analysis study of the YSQ-S3 in a large Portuguese sample.
Dirksen, C., van Asselt, T., . . . Arntz, A. (2006). Outpatient Paper presented at the 3rd Annual International Society of
psychotherapy for borderline personality disorder: random- Schema Therapy Meeting, Coimbra, Portugal.
ized trial of schema-focused therapy vs. transference-fo- Saariaho, T., Saariaho, A., Karila, I., & Joukamaa, M. (2009). The
cused psychotherapy. Archives of General Psychiatry, 63, psychometric properties of the Finnish Young Schema Ques-
649658. tionnaire in chronic pain patients and a nonclinical sample.
Green, S. B., Akey, T. M., Fleming, K. K., Hershberger, S. L., & Journal of Behavior Therapy and Experimental Psychiatry, 40,
Marquis, J. G. (1997). Effect of the number of scale points 158168.
on chi-square fit indices in confirmatory factor analysis. Schmidt, N. B., Joiner, T. E., Young, J. E., & Telch, M. J. (1995).
Structural Equation Modeling: A Multidisciplinary Journal, The Schema Questionnaire: Investigation of psychometric
4, 108120. properties and the hierarchical structure of maladaptive sche-
Gude, T., & Hoffart, A. (2008). Change in interpersonal problems mas. Cognitive Therapy and Research, 19, 295321.
after cognitive agoraphobia and schema-focused therapy ver- Schweizer, K. (2010). Some guidelines concerning the modeling
sus psychodynamic treatment as usual of inpatients with ago- of traits and abilities in test construction. European Journal of
raphobia and cluster C personality disorders. Scandinavian Psychological Assessment, 26, 12.
Journal of Psychology, 49, 195199. Soygt, G., Karaosmano6lu, A., & Cakir, Z. (2009). Assessment
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2009). of early maladaptive schemas: A psychometric study of the
Multivariate data analysis (7th ed.). Upper Saddle River, NJ: Turkish young schema questionnaire-short form-3. Trk Psiki-
Prentice Hall. yatri Dergisi, 20, 7584.
Klaghofer, R., & Brhler, E. (2001). Konstruktion und teststatis- Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate sta-
tische Prfung einer Kurzform der SCL-90-R [Construction tistics (5th ed.) Boston, MA: Pearson.
and psychometric evaluation of a short-form of the SCL-90- Trip, S. (2006). The Romanian version of Young Schema Ques-
R]. Zeitschrift fr Klinische Psychologie, Psychiatrie und Psy- tionnaire Short Form 3 (YSQ-S3). Journal of Cognitive and
chotherapie, 49, 115124. Behavioral Psychotherapies, 6, 173181.
Kline, R. B. (2010). Principles and practices of structural equa- Young, J. E. (1990). Cognitive therapy for personality disorders:
tion modeling (3rd ed.). New York: Guilford. A schema-focused approach. Sarasota, FL: Professional Re-
Kriston, L., Schfer, J., Hrter, M., & Hlzel, L. (2010, July). All source Exchange.
the same? Factorial structure of the Young Schema Question- Young, J. E. (2005). Young Schema Questionnaire Short Form
naire Short Form 3 (YSQ-S3). Paper presented at the 4th 3 (YSQ-S3). New York: Cognitive Therapy Center.
Young, J. E. (2006). Young Schema Questionnaire (German ver- Levente Kriston

sion, YSQ-S3) (H. Berbalk, J. Grutschpalk, E. Parfy, & G. Zar-
bock, Trans.). Eckernfrde: Institut fr Schematherapie. Department of Medical Psychology
Young, J. E., & Brown, G. (1994). Young Schema Questionnaire University Medical Center Hamburg-Eppendorf
Short Form. New York: Cognitive Therapy Center. Martinistr. 52
20246 Hamburg
Young, J. E., Klosko, J. S., & Weishaar, M. (2003). Schema ther-
Germany
apy: A practitioners guide. New York: Guilford.
Tel. +49 40 7410-56849
Fax +49 40 7410-54965
Published online: July 20, 2012 E-mail l.kriston@uke.de
A. Mller et al.: Validation of the German
European Journalof VersionAssessment
Psychological of the Exercise
2012Dependence
2013; Vol.
Hogrefe Scale
29(3):213219
Publishing
Original Article
Validation of the German Version

of the Exercise Dependence Scale
Astrid Mller1, Laurence Claes2, Dirk Smits2,3, Olaf Gefeller4, Anja Hilbert5,
Antonia Herberg1, Verena Mller1, Dirk Hofmeister6, and Martina de Zwaan1
1
Department of Psychosomatic Medicine and Psychotherapy, Medical School Hannover, Germany,
2
Department of Psychology, University of Leuven, Belgium, 3HUB, Brussels, Belgium, 4Department of
Medical Informatics, Biometry, and Epidemiology, University of Erlangen-Nuremberg, Erlangen,
Germany, 5Integrated Research and Treatment Center (IFB) Adiposity Diseases, University of Leipzig,
Germany, 6Department of Medical Psychology and Medical Sociology, University of Leipzig, Germany
Abstract. This study evaluated the psychometric properties of the German version of the 21-item Exercise Dependence Scale (EDS-G),
which was administered in a large representative German sample (N = 1,611). In order to examine convergent validity, we also asked
participants to answer the Eating Disorder Examination-Questionnaire (EDE-Q). To investigate the discriminant validity, clients of fitness
centers (N = 129), sport students (N = 85), and medical students (N = 129) filled out the EDS-G. In this German population-based sample,
the seven-factor structure suggested by the original authors as well as a higher-order model (7 + 1 factor model) were supported. The
total EDS-G scale showed an excellent internal reliability and was positively related to the EDE-Q total and subscale scores. The EDS-G
scores differentiated between samples with varying degrees of exercise. In summary, the results indicate that the EDS-G is a psychomet-
rically sound and potentially valid brief measure for the assessment of exercise dependence symptoms.
Keywords: exercise dependence, eating disorder symptoms, measurement, validity
Introduction lowing subscales: tolerance, withdrawal effects, continuance,

lack of control, reductions in other activities, time, and inten-
Exercise dependence is characterized by a preoccupation tion. The final version consists of 21 items to be rated on a
with exercise and extreme physical activity; it is relatively 6-point Likert scale ranging from 1 (= never) to 6 (= always),
resistant to change and is associated with impaired psycho- with three items per subscale (i.e., seven factors or subscales)
logical, social, physical, and behavioral functioning (Bam- (Symons Downs, Hausenblas, & Nigg, 2004). Higher mean
ber, Cockerill, Rodgers, & Carroll, 2003). Typically, indi- scores indicate a higher incidence of exercise dependence
viduals with exercise dependence continue exercising de- symptoms. The scale is a widely used instrument (e.g., Cook
spite recurrent physical or psychological problems (such as & Hausenblas, 2008; Edmunds, Ntoumanis, & Duda, 2006)
injury, illness, fatigue, or personal demands) that are likely and has shown good convergent and discriminant validity as
to have been caused, or exacerbated, by exercise. Previous well as excellent internal and test-retest reliability (e.g., Sy-
research indicated a link between exercise dependence and mons Downs et al., 2004). To distinguish among individuals
eating disorders since many patients with eating disorders who are at risk, have some symptoms, or have no symptoms
exercise compulsively in order to control weight and shape for exercise dependence, the authors described the following
(e.g., Bewell-Weiss & Carter, 2010; Cook & Hausenblas, three categories:
2008). 1) In order to be classified as at risk for exercise depend-
Hausenblas and Symons Downs (2002a) suggested that ence, three or more of the DSM-IV diagnostic criteria
exercise dependence should be conceptualized as a maladap- for substance dependence should be present (i.e., a total
tive pattern of exercise that causes clinically significant im- score of 15 or more on at least three subscales).
pairment and is operationalized based on three or more diag- 2) Nondependent-symptomatic refers to heightened scores
nostic criteria adopted from the DSM-IV criteria for sub- on an minimum of three EDS-21 subscales (i.e., for all
stance dependence (APA, 1994). Based on these criteria who have not been classified as at risk for exercise de-
Hausenblas and Symons Downs (2002c) developed the Ex- pendence a total score ranging of 7 or more on at least
ercise Dependence Scale (EDS-21), which comprises the fol- three subscales).
DOI: 10.1027/1015-5759/a000144
214 A. Mller et al.: Validation of the German Version of the Exercise Dependence Scale
3) Nondependent-asymptomatic is a categorization where- Population-Based Sample

by one of the three categories obtained using the EDS-21
is generated by a scoring algorithm consisting of flow- A random sample of the German general population older than
chart decision rules, in which items or combinations of 14 years of age was selected with the assistance of a demo-
items determine whether an individual would be classi- graphic consulting company. The sampling procedure was de-
fied in the at-risk, symptomatic, or asymptomatic cate- signed to yield random samples representative of the German
gory (Hausenblas & Symons Downs, 2002b). population in terms of age, sex, and education. It followed the
establishedguidelinesonhowtoconstructarandompopulation
Allegre and Therme (2008) validated the French version of sample in Germany when no access to a population roster is
the EDS-21 in a sample of 516 half-marathon runners aged possible. This sampling design involves three consecutive
between 17 and 74 years. They found the same 7-factor steps:First,agridof258regionalsamplingareaswasrandomly
structure as the original version. Recently, Sicilia and Gon- selected from a roster of such nonoverlapping grids centrally
zlez-Cutre (2011) validated the Spanish version in a sam- assembled to enhance representativeness in stratified regional
ple of 531 sport center users aged between 16 and 60 years. sampling in Germany. As a second step, a random procedure
Their findings supported the first-order seven-factor model to select households of the respective area was implemented
as well as a higher-order factor model (seven first-order within all sampling areas. In a third and final step, one member
factors and one second-order factor). To date, no validated of the selected household fulfilling the inclusion criteria (age
German version is available. 14 or older, able to read and understand the German language)
The aim of this study was to validate the German ver- wassampledrandomlyinaprespecifiedmanner.Afirstattempt
sion of the EDS-21 (EDS-G) in a large representative wasmadefor4,091addresses,ofwhich4,069werevalid.Ifthe
German sample. In order to examine convergent validity, person was not at home, a maximum of three attempts was
we additionally administered a self-rating instrument for made to contact him or her. All subjects were visited by a study
eating disorder symptoms. Given the association between assistant,whoinformedthemabouttheinvestigation,obtained
eating disorders and excessive exercising, we expected a writteninformed consent, and presented them witha question-
significant positive correlation between the EDS-G and nairepackageincludingtheEDS-G.Thesurveywasconducted
the eating disorder measure. Because the decision rules between November27andDecember16,2009.A totalsample
for categorizing someone as at risk for exercise depend- of 2,520 individuals from 14 to 95 years of age agreed to par-
ence proposed by Hausenblas and Symons Downs ticipate (participation rate:61.9% ofthevalid addresses).Sim-
(2002b) are rather difficult to handle, we looked for an ilar to the Spanish validation study (Sicilia & Gonzlez-Cutre,
alternative way to differentiate individuals who would be 2011) respondents younger than 16 (N = 47) and older than 60
at risk from the nondependent group by using a cut- (N= 842)wereexcludedaswedidnotexpectahighprevalence
off value on the total EDS-G score. We further attempted of excessive exercising in individuals agedbelow 16 years and
to estimate the prevalence of exercise dependence in the above 60 years. Additionally, cases were dropped when data
general population based on this cutoff. To investigate were missing (N = 20). This provided a final sample of 1,611
the discriminant validity, we asked clients of fitness cen- individuals for analysis with a mean age of 40.8 years (SD =
ters, sport students as well as medical students to fill out 12.5, range 1660); 53.6% were female, 58.3% lived with a
the EDS-G. We hypothesized that sport students and cli- partner,and 9% were unemployed. Themean body mass index
ents of fitness centers would report higher EDS-G scores (BMI) was 24.9 kg/m2 (SD = 4.3, range 14.255.4).
than medical students. Additionally, medical students
also answered questions concerning their weekly physi-
cal activity, which we expected to positively correlate Additional Samples
with their EDS-G total scores.
The EDS-G was additionally administered to clients of fit-
ness centers (GYM-group), sports students (SP-group),
and medical students (MD-group). The GYM-group in-
Method cluded 129 individuals with a mean age of 24.5 years (SD
= 8.4, range 1650). The SP-group consisted of 85 sports
Participants students with a mean age of 22.5 years (SD = 1.9, range
1929). In the MD-group there were 129 students with a
All individuals participated on a voluntary basis and gave mean age of 23.2 years (SD = 2.8, range 2036).
their written informed consent before entering the study.
The study was approved by the Institutional Ethics Com-
mittee of the University of Erlangen-Nuremberg, Germany. Instruments
The population-based survey met the ethical guidelines of
the international Code of Marketing and Social Research The authorized German translation and backtranslation
Practice by the International Chamber of Commerce and (Brislin, 1970) of the original American EDS-21 was per-
the European Society for Opinion and Marketing Research. formed by a licensed translator. The backtranslation was
A. Mller et al.: Validation of the German Version of the Exercise Dependence Scale 215
verified for discrepancies against the original form by the tion-based sample by using a receiver operating character-
American author of the original scale. The EDS-G was an- istic (ROC) curve that was based on the diagnostic criteria
swered by the population-based sample as well as by the for being at risk for exercise dependence reported by Hau-
other groups. senblas and Symons Downs (2002b). Congruence of this
The validated German version of the Eating Disorder new cutoff value and the cutoff algorithm proposed by the
Examination-Questionnaire (EDE-Q; Hilbert & Tuschen- original authors was then assessed using Cohens coeffi-
Caffier, 2006) was filled out by the population-based sam- cient kappa () among clients of fitness centers, sport stu-
ple. The EDE-Q consists of the four subscales: restraint dents, and medical students. Cohens coefficient mea-
(Cronbachs = 0.83 in the present sample), eating concern sures chance corrected agreement between diagnoses. In
( = 0.80), shape concern ( = 0.89), and weight concern nondegenerated situations it ranges from 0 to +1.00, where
( = 0.80). The 22 items are rated on a 6-point scale ranging < 0.20 indicates only slight agreement, 0.21 < < 0.40
from 1 (no days/not at all) to 5 (every day/markedly). fair agreement, 0.41 < < 0.60 moderate agreement, 0.61
Data on physical activity were available only from the < < 0.80 substantial agreement, and 0.81 < < 1.00 re-
medical students. They were asked to indicate the number flects almost perfect agreement (Landis & Koch, 1977).
of days per week from 0 (no days) to 7 (seven days) with Cronbachs s were calculated in all samples to investi-
intensive physical activity. gate the internal consistency. Convergent validity was de-
termined by using the Spearman rank correlation coeffi-
cient to calculate the associations between the EDS-G and
Data Analysis the EDE-Q scales in the population-based sample. In the
MD-group the correlation between EDS-G scores and daily
As we had a clear hypothesis about the 7-factor structure physical activity was investigated.
of the EDS, we performed confirmatory factor analyses To explore group differences in terms of age and sex, we
(CFAs) with LISREL 8.71 (Jreskog & Srbom, 2004). To used t-tests for age and tests for sex, respectively. The
test whether the power of the seven-factor model of Hau- Exercise Dependence Scale scores were compared across
senblas and Symons Downs (2000b) was large enough to samples using one-way analysis of covariance (ANCOVA,
reject a more parsimonious one-factor alternative, we fitted adjusted for age and sex) in order to investigate discrimi-
an additional one-factor model (Bentler, 2007), assuming nant validity. Post hoc tests (Tukey B) were conducted to
one general exercise dependence factor. A second motiva- evaluate pairwise group differences in age. Overall and
tion for the one-factor model was the use of the total EDS- pairwise group differences in the ANCOVA adjusted
G score as a measure of exercise dependence, in the ab- means for EDS-G total and subscale scores were also as-
sence of scores on at least one of the seven subscales. For sessed by post hoc tests with a simple Bonferroni correction
the 7-factor model, we determined that every item loaded to control the type I error across pairwise comparisons.
on its corresponding factor, and latent factors were allowed
to correlate.
The distributions of responses to some items were rather
skewed. Mean skewness for all items equaled 2.40 (SD = Results
0.44). However, the joint distribution of the items at the
total scale was less skewed (1.88 for the total score). Given Factor Structure
that the data were ordinal and due to the skewness of the
Table 1 summarizes the results of the CFAs. The fit of the
response distribution of several items, a robust weighted
one-factor solution was reasonable, but the RSMEA was
least square estimation method on the polychoric correla-
somewhat too high. In comparison, the 7-factor model ob-
tions, weighted by the asymptotic variances, was preferred
tained a substantially better fit to the data ( = 1713.72;
for estimating the parameters of the CFA models, as sug-
gested by Flora and Curran (2004). To evaluate model fit,
multiple criteria were used: Satorra-Bentler scaled chi- Table 1. Indices describing the goodness of fit of the con-
square values for comparing nested models (2 test), com- firmatory factor analyses on the items of the Ex-
parative fit index (CFI) for fit relative to a null model, the ercise Dependence Scale (EDS-G) based on the
root mean squared error of approximation (RMSEA), and German population-based sample
the standardized root mean square residual (SRMR) as the df Satorra RMSEA CFI SRMR
standardized difference between the observed and the pre- Bentler
dicted correlation. Stringent cutoff values were used as the One-factor model 189 2395.52 0.09 0.99 0.05
skewness of the response distribution can lead to slightly Seven-factor model 168 681.80 0.04 0.99 0.02
overestimated 2 values and fit indices. The criteria for fit
Seven + one-factor 182 888.37 0.05 0.99 0.03
were defined according to Hu and Bentler (1999) as CFI > model
0.97, RMSEA < 0.05, and SRMR < 0.09. Notes. n = 1,611. CFI = comparative fit index, RMSEA = root mean
A cutoff value for being at risk for exercise dependence squared error of approximation, SRMR = standardized root mean
on the total EDS-G score was determined in the popula- square residual.
df = 21; p < .001). However, all factors were seriously cor- Item English version/German translation Factor
related (correlations ranging from .79 to .94), suggesting a loading
superordinate factor. Therefore, we fitted a CFA with seven 11 I am unable to reduce how often I exercise./Es ge- 0.91
factors, corresponding to Hausenblas and Symons Downs lingt mir nicht, die Hufigkeit meiner sportlichen
Aktivitten zu reduzieren.
(2000b), complemented with a single second-order factor
on which all first-order factors were assumed to load pos- 18 I am unable to reduce how intense I exercise./Es ge- 0.94
lingt mir nicht, die Intensitt meiner sportlichen Ak-
itively. The nature of this higher-order factor may reflect tivitten zu verringern.
the unifying feature that comprehensively defines exercise Reduction in other activities/Verringern anderer Ak-
dependence. The second-order factor explained a large tivitten
amount of the variance in each of the first-level factors (i.e., 5 I would rather exercise than spend time with fami- 0.91
R varies from 0.79 to 0.94). ly/friends./Ich treibe lieber Sport, als Zeit mit mei-
First-factor loadings of the final 7 + 1 factor model are ner Familie/meinen Freunden zu verbringen.
presented in Table 2. Only factor loadings of items to their 12 I think about exercise when I should be concentrat- 0.90
corresponding first-order factor are mentioned (factors are ing on school/work./Ich denke an Sport, wenn ich
printed in italics above the items), as all other first-factor mich eigentlich auf die Schule/Arbeit konzentrieren
loadings are fixed to zero. The loadings of the first-order sollte.
factors to the second order factor ranged from .89 to .96. 19 I choose to exercise so that I can get out of spending 0.88
time with family/friends./Ich entscheide mich dafr,
Sport zu treiben, damit ich keine Zeit mit der Fami-
lie/mit Freunden verbringen muss.
Table 2. Factor loadings of the items of the Exercise De-
Time/Zeit
pendence Scale (EDS-G) on the seven first order
6 I spend a lot of time exercising./Ich verbringe viel 0.92
factors in the German population-based sample
Zeit mit Sport.
Item English version/German translation Factor 13 I spend most of my free time exercising./Ich verbrin- 0.96
loading ge den grten Teil meiner Freizeit mit Sport.
Withdrawal effects/Entzugssymptome 20 A great deal of my time is spent exercising./Ich ver- 0.95
1 I exercise to avoid feeling irritable./Ich treibe Sport, 0.87 bringe den Groteil meiner Zeit mit Sport.
weil ich das Gefhl der Gereiztheit vermeiden mch- Intention effects/Intention
te.
7 I exercise longer than I intend./Ich treibe lnger 0.94
8 I exercise to avoid feeling anxious./Ich treibe Sport, 0.87 Sport als ich mir vorgenommen habe.
um Angstgefhle zu vermeiden.
14 I exercise longer than I expect./Ich treibe lnger 0.96
15 I exercise to avoid feeling tense./Ich treibe Sport, 0.89 Sport als erwartet.
um Spannungsgefhle zu vermeiden.
21 I exercise longer than I plan./Ich treibe lnger Sport 0.96
Continuance/Fortfhren als geplant.
2 I exercise despite recurring physical problems./Ich 0.88 Note. n = 1,611.
treibe Sport, obwohl ich immer wieder krperliche
Beschwerden habe.
9 I exercise when injured./Ich treibe Sport, wenn ich 0.89 Normative Data and Internal Consistency
Verletzungen habe.
16 I exercise despite persistent physical problems./Ich 0.89
Means, standard deviations, and Cronbachs s of the EDS-
bin trotz anhaltender krperlicher Beschwerden G scales can be found in Table 3.
sportlich aktiv.
Tolerance/Toleranzentwicklung
3 I continually increase my exercise intensity to 0.92 Table 3. Means, standard deviations, and Cronbachs for
achieve the desired effects/benefits./Meine sportli- the subscales of the Exercise Dependence Scale
chen Aktivitten werden immer intensiver, damit (EDS-G) in the population-based sample
ich die gewnschten Effekte/Vorteile erziele.
Mean SD Cron-
10 I continually increase my exercise frequency to 0.92
bachs
achieve the desired effects/benefits./Ich treibe im-
mer hufiger Sport, um die gewnschten Effek- EDS Total 32.78 17.29 0.96
te/Vorteile zu erzielen. Withdrawal effects 4.72 2.76 0.81
17 I continually increase my exercise duration to 0.94 Continuance 4.33 2.45 0.83
achieve the desired effects/benefits./Ich verlngere
Tolerance 4.97 3.18 0.89
stndig die Dauer meiner sportlichen Aktivitten,
um die gewnschten Effekte/Vorteile zu erzielen. Lack of control 4.85 3.18 0.88
Loss of control/Kontrollverlust Reduction in other activities 4.09 2.20 0.83
4 I am unable to reduce how long I exercise./Es ge- 0.92 Time 5.12 3.31 0.91
lingt mir nicht, die Dauer meiner sportlichen Aktivi- Intention effects 4.71 3.01 0.93
tten zu verkrzen.
Note. n = 1,611.
Table 4. Comparison of Exercise Dependence Scale (EDS-G) scores across samples

Population-based Clients of fitness Sport students (3) Medical Overall group Pairwise group
sample (1) centers (2) students (4) comparison comparisons
N = 1,611 N = 129 N = 85 N = 129
Mean (SD) Mean (SD) Mean (SD) Mean (SD) ANOVA Tukey-B
posthoc test
Age 40.83 (12.46) 25.51 (8.37) 22.46 (1.94) 23.25 (2.82) F(3, 1949) = 211.57* 1 > 2, 4, 3
EDS-subscales ANCOVAa,b ANCOVAa,b
EDS-G Total 32.78 (17.29) 58.17 (17.30) 65.82 (13.70) 43.76 (13.77) F(3, 1946) = 106.12* 3>2>4>1
Withdrawal effects 4.72 (2.76) 7.08 (3.10) 6.96 (2.27) 7.12 (2.99) F(3, 1947) = 32.40* 4,2,3 > 1
Continuance 4.33 (2.45) 7.41 (3.84) 8.73 (3.33) 5.23 (2.49) F(3, 1947) = 84.98* 3>2>4>1
Tolerance 4.97 (3.18) 10.23 (4.01) 10.51 (3.70) 8.01 (3.50) F(3, 1947) = 105.14* 3,2 > 4 > 1
Lack of control 4.85 (3.18) 7.95 (4.16) 8.48 (4.53) 5.89 (3.43) F(3, 1946) = 37.31* 3,2 > 4,1
Reduction in other 4.09 (2.20) 6.08 (2.49) 7.53 (2.63) 4.83 (1.99) F(3, 1947) = 47.75* 3 > 2,4,1
activities
Time 5.12 (3.31) 11.08 (4.06) 14.00 (2.76) 6.83 (2.85) F(3, 1947) = 189.17* 3 > 2 > 4,1
Intention effects 4.71 (3.01) 8.33 (3.91) 9.61 (3.57) 6.11 (2.87) F(3, 1947) = 61.78* 3,2 > 4 > 1
Notes. Means presented in the table are unadjusted means, aadjusted for age and gender, bBonferroni adjusted p = .05/8 = .006. *p < .001.
Cutoff Value for Being at Risk of Exercise total score (r = 0.20, p < .001), and the EDE-Q subscales
restraint (r = 0.21, p < .001), eating concern (r = 0.22, p <
Dependence .001), shape concern (r = 0.16, p < .001), and weight con-
The ROC analysis showed that a total EDS-G score above cern (r = 0.17, p < .001).
77 was indicative of exercise dependence risk with a sen- The mean number of days per week with intensive phys-
sitivity of 1 and a specificity of 0.97. Based on this cutoff ical activity among medical students was 2.81 days (SD =
value, 56 persons were classified as at risk. By using the 1.76, range 07). This variable was significantly positive
scoring guidelines of Hausenblas and Symons Downs related to EDS-G scores (r = 0.42, p < .01).
(2002b), 15 of them (0.09% of the total sample) were also As can be seen in Table 4, we found significant age dif-
classified as being at risk for suffering from exercise de- ferences between groups. Posthoc comparisons showed
pendence, and 41 persons were symptomatic. As the EDS- that the population-based sample was significantly older
21 is used as a screening instrument, we decided to include than the three other groups. The samples were also different
those who were symptomatic in the ROC analysis in order with respect to sex ((3) = 54.89, p < .001). The rate of
to avoid underdiagnosis based on a heightened cutoff. female participants varied across groups with 75% women
The analysis of congruence between at-risk diagnoses in the MD-group, 54% in the population-based sample,
for exercise dependence based on the aforementioned 41% in the SP-group, and 31% in the GYM-group. The
ROC analysis and the algorithm proposed by Hausenblas gender differences between medical students and the pop-
and Symons Downs (2002b) revealed kappa coefficients ulation-based sample ((1) = 22.02, p < .001) and between
of 0.73 for the GYM-group, 0.69 for sport students, and the population-based sample and sport students ((1) =
0.66 for medical students. 4.98, p = .026) were significant. In terms of EDS-G scores
By using a cutoff EDS-G score above 77, the point preva- analyses revealed significant group differences controlling
lence of exercise dependence at risk in the present German for age and sex with sport students reporting the highest
representative sample was estimated to be 3.5%. Although mean scores followed by clients of fitness centers, medical
men reported significantly higher EDS-G scores than women students, and the population-based sample.
(M = 35.32, SD = 18.92 and M = 30.57, SD = 15.41, F(1, 1609)
= 30.87, p < .001), gender differences in prevalence estimates
for exercise dependence did not reach statistical significance
(4.4% and 2.7%, (1) = 3.64,p = .056, respectively). The high- Discussion
est estimated point prevalence was found among sport students
(16.5%) and in the GYM-group (12.4%). In the MD-group the In a representative, population-based German sample and
prevalence rate was substantially lower (1.6%). in samples with varying degrees of exercise, the study re-
sults offer support for the EDS-G as a reliable and valid
measure of exercise dependence.
Validity The results of the CFA confirm the 7-factor structure of
the EDS suggested by Hausenblas and Symons Downs
In the population-based sample, the EDS-G total score was (2002b) in our German population-based sample. This
positively, albeit only slightly, correlated with the EDE-Q finding is in line with the results of the French (Allegre &
Therme, 2008) and Spanish (Sicilia & Gonzlez-Cutre, Besides the strengths of our study, some limitations need
2011) validation studies. Like the Spanish study, our find- to be recalled when interpreting the results of this study. As
ings also support a higher-order factor model. Compared mentioned above, a substantial portion of sport students
to this study, the correlations between the factors in our was at risk for exercise dependence according their EDS-G
German representative sample were high (present study: scores. It is important to note, however, that the EDS-G
0.79 to 0.94, Spanish study: 0.32 to 0.84). Therefore, one does not facilitate a diagnosis of exercise dependence be-
could argue that the EDS-G is a one-dimensional scale, cause the EDS-G is solely a screening instrument and not
which is why we suggest using the total score for the as- a diagnostic tool. Thus, the data cannot be used to deter-
sessment of exercise dependence symptoms. mine whether or not these at-risk participants clinically rep-
The total EDS-G scale showed an excellent internal re- resent a exercise-dependent group. Furthermore, the cur-
liability in line with previous studies that used the original rent findings are limited by the fact that participants were
EDS-21 (e.g., Symons Downs et al., 2004). Furthermore, not assessed with an additional clinical interview to con-
the positive relationship between the total EDS-G score and firm the classification. Therefore, we are conducting anoth-
the physical activity among medical students offers support er study to investigate the consistency between EDS-G data
for the validity of the questionnaire. The scores of the eat- and a diagnostic interview for exercise dependence. Anoth-
ing disorder symptoms questionnaire (EDE-Q) were posi- er shortcoming of the present study is the absence of test-
tively, albeit moderately, related to EDS-G scores. In ac- retest reliability evidence of the EDS-G. The present study
cordance with previous research concerning the link be- also lacks detailed information about the amount and forms
tween excessive exercising and eating disorder symptoms of exercise (e.g., running, triathlon, weightlifting, cycling,
we had expected a stronger correlation between the two swimming, team sports) displayed by sport students and
scales. Additional analyses by sex, age, and BMI groups clients of fitness centers. This information could give in-
(not reported here but available by request) did not yield sight as to whether athletes who engage in specific sport
any explanations for the modest correlation. forms would be more prone to exercise compulsively.
The between-group comparison of the EDS-G scores The strengths of the present study include the use of a
suggests that the scale may differentiate between persons large population-based representative sample and the in-
who regularly exercise and those who exercise less. Unfor- vestigation of additional samples with varying degree of
tunately, except for the MD-group information about phys- exercising. Future research should address whether exer-
ical activity is lacking, which limits the informational value cise dependence overlaps not only with eating disorder
of this finding. symptoms, but also with other excessive behaviors such as
Based on a ROC analysis in the German population- compulsive buying or excessive internet use. For example,
based sample we proposed a cutoff on the total EDS-G Lejoyeux et al. (2008) found that exercise dependence was
score with high sensitivity and specificity in order to dis- strongly associated with fitness-studio clients compulsive
tinguish people who are at risk for exercise dependence buying. Both exercise dependence and compulsive buying
from nondependent persons. Our findings indicated a sub- may be considered socially accepted forms of excessive
stantial agreement between this EDS-G total score thresh- behaviors that might explain the link between both diag-
old and the cutoff value calculated in accordance with the noses. The investigation of possible similar underlying fac-
scoring guidelines of Hausenblas and Symons Downs tors (e.g., personality features) of different behavioral ad-
(2002b). Thus, the use of the EDS-G cutoff score indeed dictions seems to be a topic worth pursuing.
may represent an alternative approach to identify individ-
In conclusion, the current study provides psychometric
uals who were at risk for exercise dependence. By using
support for the German version of the EDS-21. The scale
this cutoff score, we estimated the point prevalence of ex-
developed using U. S. subjects successfully can be trans-
ercise dependence in the German population-based sample
ferred to the adult German population. Our findings indi-
to be 3.5%. The prevalence estimates in the general Ger-
cate that the total score reflects a comprehensive feature of
man population were closed to those in the group of med-
exercise dependence, and that the EDS-G can be used for
ical students. Both prevalence rates of exercise dependence
screening purposes in a German context as well as for fur-
were in accordance with results of previous studies among
ther research improving the knowledge about this behav-
U. S. students participating in fitness classes (Symons
ioral excess. In summary, our results demonstrate that the
Downs et al., 2004). As hypothesized, the highest percent-
EDS-G is a brief, psychometrically sound, and valid meas-
age of at-risk individuals was found among sport students
ure for the assessment of exercise dependence symptoms.
followed by clients of fitness centers. The rates, however,
were much lower than those indicated in a French study
(Lejoyeux, Avril, Richoux, Embouazza, & Nivoli, 2008)
that investigated 300 clients of a Parisian fitness studio by
Acknowledgments
utilizing a self-designed questionnaire. Lejoyeux et al.
(2008) reported that 42% of clients of the fitness studio
being exercise dependent. This discrepancy could be due We kindly thank Heather Hausenblas for her support during
to different screening instruments used. the course of this project.
amination-Questionnaire. Deutsche bersetzung [Eating Dis-

References order Examination-Questionnaire. German translation]. Mn-
ster: Verlag fr Psychotherapie.
Allegre, B., & Therme, P. (2008). Confirmative study of a French Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in
version of the Exercise Dependence Scale-revised with a covariance structure analysis: Conventional criteria versus
French population. Encephale, 2008, 490495. new alternatives. Structural Equation Modeling, 6, 155.
American Psychiatric Association. (1994). Diagnostic and statis-
Jreskog, K., & Srbom, D. (2004). LISREL 8.71 [computer pro-
tical manual of mental disorders (4th ed.). Washington, DC:
gram]. Lincolnwood, IL: Scientific Software International.
Author.
Bamber, D. J., Cockerill, I. M., Rodgers, S., & Carroll, D. (2003). Landis, J. R., & Koch, G. G. (1977). The measurement of observer
Diagnostic criteria for exercise dependence in women. British agreement for categorical data. Biometrics, 33, 159174.
Journal of Sports Medicine, 37, 393400. Lejoyeux, M., Avril, M., Richoux, C., Embouazza, H., & Nivoli,
Bentler, P. M. (2007). On tests and indices for evaluating structural F. (2008). Prevalence of exercise dependence and other behav-
models. Personality and Individual Differences, 42, 825829. ioral addictions among clients of a Parisian fitness room. Com-
Bewell-Weiss, C. V., & Carter, J. C. (2010). Predictors of exces- prehensive Psychiatry, 49, 353358.
sive exercise in anorexia nervosa. Comprehensive Psychiatry, Sicilia, ., & Gonzlez-Cutre, D. (2011). Dependence and phys-
51, 566571. ical exercise: Spanish validation of the Exercise Dependence
Brislin, R. W. (1970). Back translation for cross-sectional re- Scale-Revised (EDS-R). The Spanish Journal of Psychology,
search. Journal of Cross-Cultural Psychology, 1, 185216. 14, 421431.
Cook, B., & Hausenblas, H. (2008). The role of exercise depend- Symons Downs, D., Hausenblas, H. A., & Nigg, C. R. (2004).
ence for the relationship between exercise behavior and eating Factorial validity and psychometric examination of the Exer-
pathology: Mediator or moderator? Journal of Health Psychol- cise Dependence Scale-Revised. Measurement in Physical Ed-
ogy, 13, 495502. ucation and Exercise Science, 8, 183201.
Edmunds, J., Ntoumanis, N., & Duda, J. L. (2006). Examining
exercise dependence symptomatology from a self-determina-
tion perspective. Journal of Health Psychology, 11, 887902.
Published online: July 20, 2012
Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of
alternative methods of estimation for confirmatory factor anal-
ysis with ordinal data. Psychological Methods, 9, 466491.
Hausenblas, H. A., & Symons Downs, D. (2002a). Exercise de- Astrid Mller
pendence: A systematic review. Psychology of Sport and Ex-
ercise, 3, 89123. Department of Psychosomatic Medicine and Psychotherapy
Hausenblas, H. A., & Symons Downs, D. (2002b). Exercise De- Hannover Medical School
pendence Scale-21 manual. Retrieved from http://www.per- Carl-Neuberg-Strae 1
sonal.psu.edu/dsd11/EDS/EDS21Manual.pdf 30625 Hannover
Hausenblas, H. A., & Symons Downs, D. (2002c). How much is Germany
too much? The development and validation of the exercise de- Tel. +49 511 532-9179
pendence scale. Psychology and Health, 17, 387404. Fax +49 511 532-3190
Hilbert, A., & Tuschen-Caffier, B. (2006). Eating Disorder Ex- E-mail mueller.astrid@mh-hannover.de
K. Michalczyk et al.:European
Age Differences
Journalofand Measurement
Psychological Invariance
Assessment of Vol.
2013;
2012 Work ingPublishing
Hogrefe Memory
29(3):220229
Original Article
Age Differences and Measurement

Invariance of Working Memory
in 5- to 12-Year-Old Children
K. Michalczyk1,2, N. Malstdt1,2, M. Worgt1,2, T. Knen1,2, and M. Hasselhorn1,2
1
German Institute for International Educational Research (DIPF), Frankfurt a. M., Germany, 2Center for
Individual Development and Adaptive Education of Children at Risk (IDeA), Frankfurt a. M., Germany
Abstract. In this study, we investigated working memory structure and measurement invariance of working memory in children between
5 and 12 years. In a sample of 1,669 children, we administered a battery of 12 computer-based working memory subtests. We found the
same tripartite structure of working memory with a phonological loop, a visual-spatial sketchpad, and a central executive in the age
groups 56 years, 79 years, and 1012 years; interrelations of the latent factors were invariant across the age groups. Only the relation-
ships between the phonological loop and the central executive were weaker in the 5- to 6-year-olds than in the older age groups. The
results corroborate the tripartite working memory model developed by Baddeley (1986) and show that the functional relationships between
working memory components are mostly invariant through childhood. In addition, the findings indicate that future research should focus
on extending the model by a fourth factor representing inhibition.
Keywords: working memory, structure, assessment, children, change, measurement invariance
Introduction than those studying adults. Nevertheless, first findings

seem to suggest a multicomponent model that includes the
In 1986 Baddeley presented one of the most prominent ap- phonological loop, the visual-spatial sketchpad, and the
proaches to working memory research (e.g., Conway, Jarrold, central executive as functionally related but separable fac-
Kane, Miyake, & Towse, 2007; Jarrold & Towse, 2006). His tors. For example, findings by Jarvis and Gathercole (2003)
model postulates a central executive and two subsystems: the suggest that the visual-spatial sketchpad and the phonolog-
phonological loop and the visual-spatial sketchpad. The cen- ical loop represent independent entities in 11- and 14-year-
tral executive works as a flexible master unit that controls and olds and in 5- to 8-year-olds (Pickering, Gathercole, &
allocates resources and monitors information processing Peaker, 1998). In 4- to 5-year-olds, Roebers and Zoelch
across informational domains (see Baddeley, Emslie, Kolod- (2005) found a clear dissociation of the phonological and
ny, & Duncan, 1998; Baddeley & Hitch, 1974). The phono- visual-spatial subsystem. In the case of 4-year-olds, how-
logical loop stores verbal information temporarily, whereas ever, the working memory tasks of the phonological loop
the visual-spatial sketchpad is specialized in processing vis- and the visual-spatial sketchpad commonly used to assess
ual and spatial information. In adult populations, the compo- working memory subcapacities had to be adapted. As the
nents of this working memory model have been validated by authors point out, assessing central executive working
a vast body of research including experimental and neuropsy- memory capacity before the age of 5 requires the further
chological studies (see Baddeley & Logie, 1999), neuroimag- development of age-adequate tasks. In confirmatory factor
ing and neuropsychological studies (e.g., Henson, 2001; Val- analyses, this leads to poor model fit and low factor load-
lar & Papagno, 2002). Evidence for this model is also provid- ings. However, with regard to the assessment of the three
ed by the individual differences approach, which is subsystems, some findings suggest that a tripartite working
characterized by the use of confirmatory factor analysis (for memory structure the phonological loop, the visual-spa-
a review, see Jarrold & Towse, 2006). tial sketchpad, and the central executive exists in young
children. For example, Gathercole, Pickering, Ambridge,
and Wearing (2004) reported data corroborating the as-
Evidence for the Three-Factor Model in Children sumption of a tripartite structure in children between 6 and
15 years. They point out that this structure might be found
The number of studies using the individual differences ap- in younger age groups as well. Subsequently, Alloway, Ga-
proach to examine samples of children is by far smaller thercole, and Pickering (2006) indeed found that the tripar-
DOI: 10.1027/1015-5759/a000149
K. Michalczyk et al.: Age Differences and Measurement Invariance of Working Memory 221
tite model well fit the working memory functioning of 4- with inhibition. Furthermore, Lambek and Shevlin (2011)
to 6-year-olds. Furthermore, Alloway, Gathercole, Willis, reported that, in children aged 712 years and 1316 years,
and Adams (2004, see also Alloway et al., 2005) suggested verbal working memory, visual-spatial memory, and inhi-
a model for 4- to 6-year-olds in which the central executive bition represent related but separable constructs, as pro-
and the phonological loop represent entities that are sepa- posed by Miyake et al. (2000). However, other studies
rable from other cognitive abilities (e.g., nonverbal intelli- failed to identify inhibition as a factor separable from work-
gence, phonological awareness). ing memory and shifting (e.g., Huizinga, Dolan, & van der
Molen, 2006; Van der Sluis, De Jong, & Van der Leij,
2007). A recent study by Shing, Lindenberger, Diamond,
Relationship of the Visual-Spatial Sketchpad and Davidson (2010) indicated that, although inhibition
may not be a factor separable from memory maintenance
and the Central Executive
in children aged 47 years or 79.5 years, in children aged
However, not all findings indicate such clear separability 9.514.5 years it is. Therefore, this finding might indicate
of the different working memory components. For in- that the differentiation of inhibition emerges in the course
stance, in 6- and 7-year-olds the visual-spatial sketchpad of development. To date, the organization of central exec-
was not clearly dissociable from the central executive (Ga- utive processes in young children is far from being fully
thercole & Pickering, 2000). Alloway et al. (2006) found understood. Especially, it still remains unclear whether in-
that the correlation of the visual-spatial component was hibition represents a separate component of working mem-
higher with the central executive in 4- to 6-year-olds than ory in children, and if so, at what age this can be observed.
in older age groups (i.e., 78 and 911 years). Nevertheless,
Gathercole et al. (2004) argue in favor of the separability
of these two constructs in the course of development. This Developmental Change in the Structure of
raises the question whether the visual-spatial component the Three-Factor Model
and the central executive are separate functional working
memory structures in young children. To our knowledge, only few studies have as yet assessed
the change in the structure of working memory in children.
Some first findings, however, provided support for the idea
Functional Differentiation of the Central of a three-factor model as described by Baddeley (1986),
Executive which is mostly invariant across different age groups from
6 to 15 years (Gathercole et al., 2004) or even in younger
Earlier conceptions of the Baddeley model (e.g., Baddeley, children aged 411 years (Alloway et al., 2006). Both stud-
1986; Baddeley & Hitch, 1974) did not describe the func- ies found high latent factor correlations of the central ex-
tions of the central executive precisely. However, in later ecutive with the phonological loop and the visual-spatial
conceptions (see Baddeley, 1986, 2003) the role of the cen- sketchpad, whereas the correlations between the subsys-
tral executive was specified more clearly by postulating tems were lower, albeit of similar strength throughout de-
1. processes of focusing attention (switching), velopment. These findings might be taken as empirical in-
2. coordination of the subsystems, dications of the adequacy of the assumption of an invariant
3. controlling of encoding and decoding strategies, and tripartite working memory structure in the age range 4 to
4. retrieval of information from long-term memory. 15 years. However, more evidence is certainly required in
order to corroborate the current findings, especially with
There are, however, alternative or complementary concep- regard to children younger than 6 years.
tions of central executive functioning (e.g., Miyake et al., In sum, the structure of working memory functioning
2000; see also St Clair-Thompson & Gathercole, 2006). remains to be further validated in children. The number of
Miyake et al. (2000) presented a theoretical framework in working memory components in children is unclear as is
which three executive functions mental set shifting, up- the extent to which they are correlated. Moreover, whether
dating, and inhibition play different roles in executive the relationship of working memory components, that is,
tasks. Miyake et al. (2000) showed that these three factors their functional interdependence, changes over childhood
represent related, but separable constructs. It is important requires further research. Accordingly, we decided to ad-
to note, however, that their theoretical framework was de- dress the following assumptions in more detail:
rived from studying adult populations. Nevertheless, in 1. As outlined above, one should expect to find Baddeleys
children, some recent findings suggest that at least inhibi- tripartite structure of working memory (1986) in the age
tion may represent a factor that can be dissociated from the range between 5 and 12 years (see Alloway et al., 2004,
central executive: For instance, findings by St Clair- 2005, 2006; Roebers & Zoelch, 2005). A hierarchical
Thompson and Gathercole (2006) suggested that at least structure across all age groups should be found within
two executive factors exist in 11- to 12-year-old children, this framework. In particular, the phonological loop is
one being associated with updating functions and the other likely to be only moderately related to the visual-spatial
222 K. Michalczyk et al.: Age Differences and Measurement Invariance of Working Memory
sketchpad. The central executive, in contrast, should be Procedure

more strongly related to both subsystems and be separa-
ble from the visual-spatial component (e.g., Alloway et Subjects completed 12 tests from the German Working
al., 2006; Baddeley, 1986; Gathercole et al., 2004). Memory Test Battery for Children, 512 years (AGTB
2. Although alternative conceptions of working memory in 512; Hasselhorn et al., 2012). They were tested individ-
children exist (see Miyake et al., 2000), we assume Bad- ually in a quiet area of the nursery school or the school
deleys model to be superior to a single-factor model. with a break after six tasks, which resulted in approxi-
Nevertheless, within the theoretical framework pro- mately 2 35 minutes of testing.
posed by Miyake et al. (2000), it is conceivable that ad-
ditional dissociable central executive functions can be
differentiated (e.g., Lambek & Shevlin, 2011; St Clair-
Working Memory Measures1
Thompson & Gathercole, 2006). Accordingly, a second
aim of the present study was to contribute to the inves- Phonological Loop
tigation of whether it is more appropriate to separate in-
hibition from other central executive functions. Four of the working memory subtests assessed the phono-
logical loops efficiency. In the word span forward
3. A third aim of our study is to explore whether both the
(one/three syllables) and the digit span task the child was
structure and the functional interrelations of the phono-
required to repeat one/three syllable words or digits in the
logical loop, the visual-spatial sketchpad, and the central
presented order. In the nonword repetition task, the child
executive are invariant throughout the course of chil-
was required to correctly recall a word-like sound that did
drens development (e.g., Alloway et al., 2006; Gather-
not occur in the German language and has no meaning in
cole et al., 2004). We expect the relation of the central
German either (e.g., limparett).
executive and the phonological loop to be weaker in
children younger than 7 years. This assumption is based
on the finding that spontaneous subvocal rehearsal with-
Visual-Spatial Working Memory
in the phonological loop, which is moderated by the cen-
tral executive, does not reliably occur before the age of
Two tasks addressed the visual-spatial working memory
7 (see Gathercole & Hitch, 1993).
system. In the matrix task, the child was required to recall
black squares in a 4 4 matrix by pointing at a touch-
screen. In the Corsi-block task, a yellow smiley appeared
sequentially over nine empty blocks. The child was then
required to indicate on the touchscreen the serial order of
Method the blocks in which the smiley had previously appeared.
Sample/Participants Central Executive
Data were selected from the norming sample of the Work- Six tasks assessed the efficiency of the central executive
ing Memory Test Battery for Children (AGTB 512; Has- and/or inhibition. In the digit span backwards task, the
selhorn et al., 2012). 1,669 children (856 boys and 813 child was asked to repeat the given sequence of digits in
girls) from nursery schools, primary and secondary schools reverse order. In color span backwards, the child was re-
in Frankfurt/Main, Gttingen, Eichsttt and surrounding quired to identify the colored dots presented in reverse
areas in Germany participated in this study. The children order. In the object span task, the child was required to
were divided into three age groups: 56 years (n = 284), say whether the object was edible or nonedible. At the end
79 years (n = 690), and 1012 years (n = 695). These age of the tasks, the child was asked to recall the items in the
ranges were chosen because they mark key stages in young presented order. In the counting span task, the child was
childrens cognitive development related to working mem- presented a series of pictures depicting circles and asked
ory functions. First, rehearsal strategies associated with the to count the number of circles shown as well as to say the
central executive do not reliably occur automatically before respective number aloud (e.g., four). At the end of the
the age of 7 (e.g., Gathercole & Hitch, 1993). Second, ab- task, the child was required to recall the numbers she or
stract self-reflection emerges at the age of approximately he had said aloud (e.g., four, nine). In the go/nogo task,
10 years (Hasselhorn, 2008; Piaget, 1971), at which age the child was asked to press a key whenever she or he saw
children begin to think about their knowledge and their a specified item (go trial), for example, a yellow balloon.
learning processes. This, in turn, enhances the development In a nogo trial, the item was similar but distinct (e.g., a
of strategies which relate to central executive functioning. red balloon). In the Stroop task, a stylized picture of a man
1 A more detailed description of the working memory tasks is available http://www.idea-frankfurt.eu/journal/ejpa/appendix_M_et_al
or woman was presented on the upper half of the touch- Stroop task encompassed 24 trials and the number of cor-
screen and two other stylized pictures of a man and wom- rectly repeated congruent trials was scored. Table 1 shows
an appeared on the lower half. Simultaneously, the child the retest reliabilities for children aged 56 years, 79
was given the verbal cue of the word man or woman. years, and 1012 years (AGTB 512; Hasselhorn et al.,
The child was asked to react to the congruent visual stim- 2012).
ulus only (man man; woman woman) by tapping onto
the congruent figure in the lower half of the touchscreen
while ignoring the verbal cue.
Statistical Modeling Procedures
Testing Details and Scoring In order to model the structure of working memory, confir-
matory factor analysis was carried out with Mplus 6.1 (Mu-
Every child was required to complete ten trials of each of thn & Muthn, 19982010) using maximum likelihood
the adaptive tests (i.e., word span tasks, digit span task, estimation with robust standard errors (MLR). Model fit
matrix, Corsi-block, digit span backwards, color span was assessed by the following fit indices:
backwards, counting span, object span). Within each trial 1. The statistic with a nonsignificant value indicating
the length of the list varied, depending on the childs per- an exact model fit. However, the statistic is sensi-
formance. The mean number of correctly repeated items tive to sample size and may therefore become signif-
over these ten trials was then scored. The number of trials icant in large samples despite a good model fit (Kline,
in the nonword repetition task was 24, and the number of 1998),
the correctly repeated trials was scored. In the go/nogo 2. the comparative fit index (CFI; Bentler, 1990) where a
task, the number of trials was 24. As signal detection the- value equal to or greater than .90 represents an adequate
ory applied to the go/nogo task, we calculated the ability fit,
to discriminate (d) over 24 trials according to the proce- 3. the Bayesian information criterion (BIC),
dure described by Stanislaw and Todorov (1999). The 4. the standardized root mean square residual (SRMR),
where a value equal to or lower than .10 indicates a good
Table 1. Retest reliabilities for the age groups 56, 79, and model fit, and
1012 5. the root mean squared error of approximation (RMSEA;
Hu & Bentler, 1999), adjusted for multiple groups.
Retest reliabilitya
Measure 56b 79c 1012d
The RMSEA is less affected by sample size than other fit
Word span forward (one syllable) .54 .69 .77
indices, and values equal to or lower than .08 are accept-
Word span forward (three syllables) .51 .68 .56
able, whereas values equal to or lower than .05 represent
Digit span forward .62 .72 .83 a good fit (McDonald & Ho, 2002). In addition, confi-
Nonword repetition .65 .80 .85 dence intervals can be calculated even if data deviate
Matrix span .35 .63 .34 from normal distribution. Changes in model fit in nested
Corsi-block .61 .62 .54 model comparisons were assessed using a Satorra-Bent-
Digit span backwards .42 .70 .66 ler scaled -difference test ( SB-; Satorra & Bentler,
Color span backwards .37 .57 .66
2001), where a nonsignificant value indicates that a nest-
ed model can describe the data just as well as the com-
Object span .53 .53 .38
parison model. Following recommendations by Cheung
Counting span .68 .60 .59 and Rensvold (2002) measurement invariance was addi-
Go/no-go .46 .37 .40 tionally assessed with the CFI, where values smaller
Stroop .66 .74 .62 than or equal to .01 indicate no difference between the
Subsystem restricted and the unrestricted model, and the null hy-
Phonological loop .82 .86 .90 pothesis of invariance should not be rejected. This indi-
Visual-spatial sketchpad .62 .69 .57 cator is independent of model complexity and size. It
Central executivee .74 .83 .83
does not correlate with overall fit measures.
Notes. Data were drawn from the AGTB 512 norming sample (Has-
selhorn et al., 2012).
a
The test-retest interval was 12 weeks.
b
n = 66; mean age = 6;1; SD = 8 months, range = 5;0 to 6;11 Descriptive Statistics
c
n = 125; mean age = 8;4; SD = 9 months, range = 7;0 to 9;11
d
n = 69; mean age = 11;5; SD = 11 months, range = 10;0 to 12;11
e
In the AGTB 512 norming sample (Hasselhorn et al., 2012) retest The mean scores and standard deviations of each working
reliabilities are shown for the central executive including the go/nogo memory measure are shown as a function of age group in
task and the stroop task. Table 2.
Table 2. Means and standard deviations as functions of age Results

group
Age (in years) In the three age groups, distributions of the raw data de-
Measure 56 79 1012 parted from univariate normality distribution in univariate
(n = 284) (n = 690) (n = 695)
skewness and kurtosis. Multivariate skewness and kurto-
sis was also not given. Transforming the raw data to age
Word span forward (one syllable) specific t-values2 showed that normality assumptions
M 3.04 3.56 4.13 were now given in the univariate tests. For the t-values,
SD 0.58 0.68 0.73 Mardias test revealed multivariate normal distribution re-
garding multivariate skewness in the age group 7 to 9
Word span forward (three syllables)
years, and regarding multivariate kurtosis in all three age
M 2.62 2.91 3.20 groups. However, multivariate normal distribution was
SD 0.40 0.44 0.50 not given for the age groups 56 years and 1012 years
Digit span forward as indicated by multivariate skewness. Therefore, while
we continued using t-values, we estimated all model pa-
M 3.22 4.08 4.83
rameters, standard errors, and fit indices using the maxi-
SD 0.72 0.75 0.78 mum likelihood robust rescaling procedure which applies
Nonword repetition when data do not fulfill the criterion of normal distribu-
M 14.69* 14.5* 13.52* tion.
SD 4.44 4.38 4.39
Matrix span
Structural Equation Modeling
M 2.76 4.18 5.57
SD 0.78 1.24 1.37 The Tripartite Model
Corsi-block First, we addressed the question of working memory struc-
M 2.90 3.88 4.71 ture in the different age groups. In accordance with our
SD 0.81 0.82 0.79 assumptions, we used the tripartite model as a starting point
for our analyses. We found that the same working memory
Digit span backward
model provided a good fit for the data in all three different
M 2.13 2.99 3.69 age groups; Figure 1 shows the fit indices, residual vari-
SD 0.62 0.67 0.75 ances, factor loadings, and latent factor correlations for the
Color span backward three-factor model for the age groups 56 years, 79 years,
and 1012 years. It is important to note that the go/nogo
M 1.93 2.71 3.50
task and the Stroop task loaded poorly on the central exec-
SD 0.60 0.79 0.86 utive factor in all age groups. However, the tripartite struc-
Object span ture of working memory turned out to be adequate in all
M 2.06 2.84 3.37 three age groups. In addition, across all three age groups,
correlations of the central executive with the two subsys-
SD 0.59 0.72 0.76
tems were higher than the interrelations between the two
Counting span subsystems.
M 1.80 2.90 3.63
SD 0.63 0.81 0.85
Go/no-go Alternative Models
M 1.22 1.81 2.34 Single-Factor Model
SD 0.84 0.78 0.70
We also tested a single-factor model for the three age
Stroop groups. In the age group 56 years, the model fit was inad-
M 49.99 49.99 50.00 equate with (54, n = 284) = 179.362, p < .01, CFI = .804;
SD 9.94 9.96 9.97 BIC = 24464; RMSEA = .090 (90% CI: .076.105). In the
Note. *The different age groups received wordlists with varying de- age group from 7 to 9 years, the model fit was slightly better
grees of difficulty, which explains the stable level of points received than in the group of younger children with (54, n = 690)
in all age groups. = 318.635, p < .01; CFI = .878; BIC = 58531; RMSEA =
2 The correlations the age range specific t-values of the working memory tasks are available at http://www.idea-frankfurt.eu/journal/ejpa/ap-
pendix_M_et_al
Table 3. Comparison of fit in the nested models for whole sample

MIa df Compare with df SB- p CFI CFI BIC SRMR RMSEA (90%CI)
COI 153 289.296 .974 141584 .035 .040 (.033.047)
MEI 171 312.312 COI 18 22.92 .19 .973 .001 141473 .047 .039 (.032045)
SC 189 313.263 MEI 18 .64 .99 .976 .003 141340 .047 .034 (.027.041)
SII 213 344.946 SC 24 31.68 .14 .975 .001 141194 .046 .033 (.027.040)
EV 219 352.086 SII 6 6.84 .34 .975 .000 141156 .048 .033 (.027.039)
ECOV_a 224 352.947 EV 5 .86 .97 .976 .001 141119 .048 .032 (.026.038)
ECOV 225 368.498 ECOV_a 1 19.55 .00 .973 .003 141124 .052 .034 (.028.040)
Notes. aMI-Model: COI = all parameters estimated freely (configural invariance); MEI = factor loadings set as equal across groups (metric
invariance); SC = factor loadings and intercepts set as equal across groups (scalar invariance/strong measurement invariance); SII = factor
loadings, intercepts, residuals set as equal across groups (strict invariance); EV = factor loading, intercepts, residuals, and latent factor variances
set as equal across groups; ECOV_a = factor loadings, intercepts, residuals, variances, and covariances set as equal across groups, with covariance
of the phonological loop and the central executive in age group 56 estimated freely; ECOV = factor loadings, intercepts, residuals, and
covariances set as equal across groups.
Figure 1. Three-factor model for the age groups 56, 79, and 1012. Read from left to right the digits represent error
terms, factor loadings, and latent factor correlations, respectively, for the age groups 56/79/1012. The model-fit for
the different age groups was as follows:
56: (51, n = 284) = 64.800, p > .05; CFI = .978; BIC = 24664; RMSEA = .031 (90% CI: 0.052);
79: (51, n = 690) = 123.844, p < .01; CFI = .966; BIC = 58353; RMSEA = .045 (90% CI: .035.056);
1012: (51, n = 695) = 101.471, p < .01; CFI = .980; BIC = 58728; RMSEA = .038 (90% CI: .0270.048).
.084 (90% CI: .075.093). The same was true for the age factor covariance. When comparing this sequence of nest-
group 1012 years with (54, n = 695) = 318.296, p < .01; ed models to one another, no changes in model fit mean
CFI = .895; BIC = 58925; RMSEA = .084 (90% CI: that the more parsimonious model is to be preferred. In
.075.093). The three-factor model showed better fit in all the more parsimonious model, a higher number of param-
age groups, and furthermore, represented a more parsimo- eters are fixed, which results in a higher number of de-
nious model than the single-factor model. grees of freedom (df). Table 3 summarizes the results of
these nested model comparisons.
A Four-Factor Model: Inhibition as a Factor Separable Comparing the model of metric invariance (MEI) with
from the Central Executive that of configural invariance (COI) revealed no reliable
difference between the two models. Therefore, we chose
We additionally tested for a model in which inhibition pro- the more parsimonious model MEI. Then, the model as-
cesses were represented by an inhibition factor distin- suming scalar invariance (SC) was compared with MEI,
guishable from the central executive. This factor com- resulting in a preference for the more parsimonious mod-
prised the Stroop task and the go/nogo task. The model fit el SC. In the next step, the model of strict invariance (SII)
well in all three age groups: proved to be more parsimonious than SC and, in turn, the
5- to 6-year-olds: (48, n = 284) = 57.311; p > .05; CFI model of equal latent factor variances (EV) rather than
= .985; BIC = 24373; RMSEA = .026 (90% CI: 0.049); SII. This permitted testing for change or invariance in
7-to 9-year-olds: (48, n = 690) = 110.311; p < .01; CFI latent factor covariances: In model ECOV_a all factor
= .971; BIC = 58358; RMSEA = .043 (90% CI: loadings, residuals, intercepts, latent factor variances,
.033.054); and latent factor covariances were set to be equal (except
10- to 12-year-olds: (48, n = 690) = 90.189; p < .01; for the covariance of the phonological loop and the cen-
CFI = .983; BIC = 58736; RMSEA = .036 (90% CI: tral executive in the age group 56). ECOV_a showed a
.024.047). more parsimonious fit than EV. In the last step of model
comparisons, a model of equal covariances in all three
In sum, the model that assumes a separate inhibition factor age groups (ECOV) was tested against ECOV_a. The re-
fitted the data just as well as the model in which inhibition strictions in ECOV were the same as in ECOV_a, except
was part of a common central executive factor. Notably, the that all latent factor covariances were set as equal across
factor loadings of the Stroop and go/nogo-tasks on latent age groups. ECOV_a was revealed to be the most parsi-
factor inhibition were low across all age groups (Stroop: monious model to describe the data. In order to formally
.280 to .396, go/nogo .325 to .407). complete the analyses, we tested whether any other mod-
el, in which one latent factor covariance was estimated
freely while the remaining latent factor covariances
were as set equal across the three age groups fitted the
Measurement Invariance of the Tripartite Model data better than EV. ECOV_a fit the data best.
In the next step of analysis, we addressed the issue of the

age invariance of the functional interrelations between the
different working memory components across the age Discussion
groups 56 years, 79 years, and 1012 years. To do so,
we compared the latent factor correlations across the three This study investigated the working memory structure and
age groups (e.g., is the correlation between the phonolog- the development of relations in working memory compo-
ical loop and the central executive lower in the 56-year- nents in 5- to 12-year-old children. Our results revealed a
olds than in the 79-year-olds?). However, this compari- good fit of a model consisting of the phonological loop,
son requires establishing measurement invariance (MI), the visual-spatial sketchpad, and the central executive in
which involves testing for a hierarchy of configural invari- children aged 56 years, 79 years, and 1012 years. This
ance, measurement invariance, scalar invariance, and model showed better fit than a single-factor solution. A
strict factor invariance: Configural invariance implies that four-factor solution, with inhibition as an entity dissocia-
the same pattern of nonzero factor loadings fits all age ble from the central executive, showed a fit comparable
groups, which constrains construct dimensionality to be to that of the three-factor solution. However, the validity
equivalent. Metric invariance requires unstandardized of the four-factor solution needs to be interpreted with
factor loadings to be invariant across age groups. Scalar caution. Furthermore, our results indicate measurement
measurement invariance implies that, in addition, specific invariance of the three-factor model across the age groups.
intercepts of the indicator variables are equal across This finding suggests stable relations of the phonological
groups. Strict factorial invariance implies that the factor loop, the visual-spatial sketchpad, and the central execu-
loadings, intercepts, and the residual variances are equiv- tive throughout development. Only in the 5- to 6-year-olds
alent across groups. In a subsequent step, variances can was the correlation of the phonological loop and the cen-
be set as equal across groups, enabling the restriction of tral executive lower than in the older age groups.
Does Baddeleys Model Hold for Children 5 the central executive correlates consistently with the visu-
al-spatial sketchpad.
to 12 Years Old?
Baddeleys model (1986) is characterized by three compo- The Central Executive and the Visual-Spatial
nents and their hierarchic interrelation. Our results indicate
Sketchpad
that modeling of working memory structure with the pho-
nological loop, the visual-spatial sketchpad, and the central In our study, we found a remarkably high correlation be-
executive (Baddeley, 1986) is adequate even in the age tween the visual-spatial sketchpad and the central execu-
range from 5 to 12 years. As such, our findings are in line tive. This is in line with previous findings (Alloway et al.,
with previous studies that found the three-factor model to 2006; Gathercole et al., 2004). In these studies, the visual-
hold true in children (e.g., Alloway et al., 2006; Gathercole spatial sketchpad and the central executive correlated par-
et al., 2004). ticularly highly in the younger age groups, in children from
4 to 6 years and from 6 to 7 years. However, our study
differs in two regards with respect to the latent factor cor-
Hierarchy relation of the central executive and the visual-spatial com-
ponent. First, latent factor correlations were lower in our
Our findings are consistent with the hierarchical concep- study: In Alloway et al. (2006) the latent factor correlation
tion of working memory. In this conception, the central ex- was .97 in the youngest age group, whereas it was .85 in
ecutive coordinates the flow of information through work- Gathercole et al. (2004). Second, we could not confirm a
ing memory and is responsible for the transmission and drop in latent factor correlations from the younger to the
retrieval of information from the slave systems (Baddeley, older age groups to the same degree as these studies. No-
1986). In our study, this assumption is corroborated by a tably, in our study the retest reliability of the matrix task
higher latent factor correlation of the central executive with was low in the youngest and the oldest age groups, which
the two subsystems than the correlation of the latent factors weakens factor interpretation. Nevertheless, our findings
of the two subsystems. Although this pattern has been re- suggest a high degree of central executive involvement in
ported before (e.g., Alloway et al., 2006; Gathercole et al., childrens visual-spatial component. It has been argued that
2004), it still merits more detailed discussion: especially young children recruit controlled attention when
they perform visual-spatial tasks (e.g., Cowan, 2005).
However, controlled attention might particularly occur in
Central Executive and the Phonological Loop dynamic visual-spatial tasks (e.g., Duff & Logie, 1999; Lo-
gie, 1995). This is supported in our study by moderate cor-
As mentioned above, in the present study the relationship relations between the Corsi-block task and the central ex-
of the phonological loop and the central executive was ecutive tasks (digit span backwards, color span backwards,
weaker in the 5- to 6-year-olds with r = .66 than in the other object span, and counting span).
age groups with r = .86 (7- to 9-year-olds) and r = .83 (10-
to 12-year-olds). A similar increase of latent factor corre-
lations was reported by Gathercole et al. (2004) where the Alternative Conceptions to Baddeleys Model
phonological loop correlated with the central executive
with r = .73 in 6- to 7-year-olds and r = .74 in 8- to 9-year- In adult populations, various conceptions of working mem-
olds, which then increased to r = .92 in 10- to 12-year-olds, ory exist with regard to cognitive processes and structure;
and r = .90 in 13- to 15-year-olds. In our study, this increase describing them all would go far beyond the scope of this
in correlation occurred at an earlier point in time, namely, study. We therefore limited our study to investigating the
after the age of 6 years. This finding corresponds to the validity of two further models with different factor struc-
assumption that the spontaneous use of rehearsal strategies tures.
occurs in children from the age of 7 years onward (for a First, we tested for a single-factor model, corresponding
review, see Gathercole, 1998; Gathercole & Hitch, 1993). to the idea of a g-factor model in the assessment of intelli-
However, an alternative explanation for this finding might gence. The results indicated that a multicomponent solu-
be that children in older age groups process central execu- tion is superior to a single-factor solution, indicating that a
tive tasks more efficiently. This would imply that, in older more differentiated account of working memory structure
children, these tasks require less central executive effort, is more adequate than a single-factor model. This finding
so that the variance in these tasks appears to be related to supports the possibility of functional interrelationships of
the capacity of the phonological loop to a larger extent. latent factors in young children.
This, in turn, might be interpreted to support the position Second, against the background of the framework pro-
that, in older children, the central executive should be elim- posed by Miyake et al. (2000) we also evaluated a four-fac-
inated as a separate factor, at least with regard to the tasks tor model in which inhibition represented a factor separable
we administered in this study. However, the latent factor of from other central executive processes (see Lambek &
Shevlin, 2011; Shing et al., 2010; St Clair-Thompson & ist beyond the models tested here (for an overview, see Mi-
Gathercole, 2006). We tested whether inhibition may rep- yake & Shah, 1999; or Conway et al., 2007). In the theo-
resent a factor separable from the central executive and retical framework proposed by Miyake et al. (2000), other
found a good fit for this four-factor model. However, these possibilities of within-factor differentiation of central ex-
results have to be viewed with caution. On the one hand, ecutive functions may play a role in children as well. Apart
the theoretical framework by Miyake et al. (2000) relates from inhibition, these may include shifting and updat-
to adult populations and the frontal lobe, which is generally ing. Therefore, it may well be possible that shifting and
associated with central executive functions and continues updating as central executive functions mark separable but
to develop into adolescence (Nelson, 1995). As pointed out related factors in children (e.g., Letho, Juujrvi, Kooistra,
above, there are not only few findings regarding the func- & Pulkkinnen, 2003). As in the case of inhibition, investi-
tional differentiation of the central executive based on chil- gating the existence of such processes and their relations to
dren, but these are also rather diverse (see Lambek & Shev- other central executive tasks in young children and the
lin, 2011). On the other hand, in our study, inhibition was course of their development certainly presents challenges
represented by two tasks only, one of which (go/nogo task) for future research.
showed low reliability in all three age groups. Therefore,
although our results might support the view that inhibition
is separable from the central executive, further evidence is Acknowledgments
needed to clarify the factor structure of central executive
This research was funded by the Hessian Initiative for the
processes with regard to inhibition.
Development of Scientific and Economic Excellence
(LOEWE).
Does Working Memory Structure Change

During Childhood?
References
Another main goal of our study was to assess whether the
functional interrelations of the working memory components Alloway, T. P., Gathercole, S. E., Adams, A.-M., Willis, C., Eaglen,
are age invariant. In particular, across all age groups, the cen- R., & Lamont, E. (2005). Working memory and phonological
tral executive was always strongly related to both the phono- awareness as predictors of progress toward early learning goals
at school entry. British Journal of Developmental Psychology,
logical loop and the visual-spatial sketchpad, whereas the
23, 417426.
phonological loop was moderately associated with the visu-
Alloway, T. P., Gathercole, S. E., & Pickering, S. J. (2006). Verbal
al-spatial sketchpad. This finding suggests functional invari- and visuospatial short-term and working memory in children:
ance of the three working memory components throughout Are they separable? Child Development, 77, 16981716.
development. Only the relationships of the central executive Alloway, T. P., Gathercole, S. E., Willis, C., & Adams, A.-M.
seem to be less involved in the phonological loop in 5- to (2004). A structural analysis of working memory and related
6-year-olds than in 7- to 9-year-olds, and 10- to 12-year-olds, cognitive skills in young children. Journal of Experimental Child
which, as mentioned above, can be explained with the auto- Psychology, 87, 85106.
mated activation of subvocal rehearsal processes from the age Baddeley, A. (2003). Working memory: Looking back and looking
of 7 years onward. Developmental variance or invariance of forward. Nature Reviews. Neuroscience, 4, 829839.
factor structure might also be conceivable for the four-factor Baddeley, A. D. (1986). Working memory. Oxford, UK: Oxford
model with inhibition as a separate factor (see Lambek & University Press.
Shevlin, 2011; Shing et al., 2010). However, in our study, the Baddeley, A. D., Emslie, H., Kolodny, J., & Duncan, J. (1998). Ran-
dom generation and the executive control of working memory.
go/nogo-task had low retest reliability, and this task also
The Quarterly Journal of Experimental Psychology. A, Human
showed low factor loadings across all age groups. The inter- Experimental Psychology, 51, 819852.
pretation of a separate inhibition factor was thus affected, and Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. Bower
we refrained from testing this four-factor model in nested (Ed.), The psychology of learning and motivation (pp. 4789).
model comparison. New York, NY: Academic Press.
Baddeley, A. D., & Logie, R. (1999). Working memory: The multi-
ple component model. In A. Miyake & P. Shah (Eds.), Models of
Limitations and Outlook working memory: Mechanisms of active maintenance and exec-
utive control (pp. 2861). Cambridge, UK: Cambridge Universi-
ty Press.
In this study, we addressed the working memory structure Bentler, P. M. (1990). Comparative fit indexes in structural mod-
and its invariance, taking Baddeleys (1986) model as a els. Psychological Bulletin, 107, 238246.
starting point, and tested alternative conceptions such as a Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-
single-factor model and a model in which inhibition may fit indexes for testing measurement invariance. Structural Equa-
represent a separate entity (see Miyake et al., 2000). Nota- tion Modeling, 9, 233255.
bly, different theoretical accounts on working memory ex- Conway, R. A., Jarrold, C., Kane, M. J., Miyake, A., & Towse, J. N.
(2007). Variation in working memory. New York: Oxford Uni- Miyake, A., Friedman, N. P., Emerson, M. J., Witzki, A. H., Hower-
versity Press. ter, A., & Wager, T. D. (2000). The unity and diversity of execu-
Cowan, N. (2005). Working memory capacity. East Sussex, UK: tive functions and their contributions to complex frontal lobe
Psychology Press. tasks: A latent variable analysis. Cognitive Psychology, 100,
Duff, S. C., & Logie, R. H. (1999). Storage and processing in visuo- 49100.
spatial working memory. Scandinavian Journal of Psychology, Miyake, A., & Shah, P. (Eds.). (1999). Models of working memory:
40, 251259. Mechanisms of active maintenance and executive control. New
Gathercole, S. E. (1998). The development of memory. Journal of York: Cambridge University Press.
Child Psychology and Psychiatry, and Allied Disciplines, 39(1), Muthn, L. K., & Muthn, B. O. (19982010). Mplus users guide
327. (6th ed.). Los Angeles, CA: Muthn & Muthn.
Gathercole, S. E., & Hitch, G. J. (1993). Developmental changes in Nelson, C. A. (1995). The ontogeny of human memory: A cogni-
short-term memory: A revised working memory perspective. In tive neuroscience perspective. Developmental Psychology, 31,
A. Collins, S. E. Gathercole, A. M. Conway, & P. E. Morris 723738.
(Eds.), Theories of memory (pp. 189210). Hove, UK: Erlbaum. Piaget, J. (1971). Biology and knowledge. Chicago, IL: Chicago
Gathercole, S. E., & Pickering, S. J. (2000). Assessment of working University Press.
memory in six- and seven-year-old children. Journal of Educa- Pickering, S. J., Gathercole, S. E., & Peaker, M. (1998). Verbal and
tional Psychology, 92, 377390. visuo-spatial short-term memory in children: Evidence for com-
Gathercole, S. E., Pickering, S. J., Ambridge, B., & Wearing, H. mon and distinct mechanisms. Memory and Cognition, 26,
(2004). The Structure of working memory from 4 to 15 years of 11171130.
age. Developmental Psychology, 40, 177190. Roebers, C. M., & Zoelch, C. (2005). Erfassung und Struktur des
Hasselhorn, M. (2008). Competencies for successful learning: De- phonologischen und visuell-rumlichen Arbeitsgedchtnisses
velopmental changes and constraints. In J. Hartig, E. Klieme, & bei 4-jhrigen Kindern [The assessment and structure of phono-
D. Leutner (Eds.), Assessment of competencies in educational logical and visual-spatial working memory in 4-year-old chil-
settings (pp. 2343). Gttingen: Hogrefe. dren]. Zeitschrift fr Entwicklungspsychologie und Pdagogi-
sche Psychologie, 37, 113121.
Hasselhorn, M., Schumann-Hengsteler, R., Gronauer, J. C., Grube,
D., Mhler, C., Schmid, I., . . . Zoelch, C. (2012). Arbeitsgedcht- Satorra, A., & Bentler, P. M. (2001). A scaled difference test sta-
nistestbatterie fr Kinder von 5 bis 12 Jahren (AGTB 512) tistic for moment structure analysis. Psychometrica, 66,
[Working memory test battery for Children, 5 to 12 years]. Gt- 507514.
tingen: Hogrefe. Shing, Y. L., Lindenberger, U., Diamond, A., & Davidson, M. C.
(2010). Memory maintenance and inhibitory control differentiate
Henson, R. (2001). Neural working memory. In J. Andrade (Ed.),
from early childhood to adolescence. Developmental Neuropsy-
Working memory in perspective (pp. 151173). Hove, UK: Psy-
chology, 35, 679697.
chology Press.
St Clair-Thompson, H. L., & Gathercole, S. E. (2006). Executive
Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in
functions and achievements in school: Shifting, updating, inhibi-
covariance structure analysis: Conventional criteria versus new
tion, and working memory. Quarterly Journal of Experimental
alternatives. Structural Equation Modeling, 6, 155.
Psychology, 59, 745759.
Huizinga, M., Dolan, C. V., & van der Molen, M. V. (2006). Age-re-
Stanislaw, H., & Todorov, N. (1999). Calculation of signal detection
lated change in executive function: Developmental trends and a
theory measures. Behavior Research Methods, Instruments, and
latent variables analysis. Neuropsychologia, 44, 20172036.
Computers, 31, 137149.
Jarrold, C., & Towse, J. N. (2006). Individual differences in working Vallar, G., & Papagno, C. (2002). Neuropsychological impairments
memory. Neuroscience, 139, 3950. of verbal short-term memory. In M. Kopelman, A. D. Baddeley,
Jarvis, H. L., & Gathercole, S. E. (2003). Verbal and nonverbal & B. Wilson (Eds.), Handbook of neuropsychology (2nd ed.,
working memory and achievements on national curriculum tests pp. 249270). Chichester, UK: Wiley.
at 11 and 14 years of age. Educational and Child Psychology, 20, Van der Sluis, S., De Jong, P. F., & Van der Leij, A. (2007). Execu-
123140. tive functioning in children, and its relations with reasoning,
Kline, R. B. (1998). Principles and practice of structural equation reading, and arithmetic. Intelligence, 35, 427449.
modeling. New York: Guilford.
Lambek, R., & Shevlin, M. (2011). Working memory and response Published online: July 20, 2012
inhibition in children and adolescents: Age and organization is-
sues. Scandinavian Journal of Psychology, 52, 427432.
Letho, J. E., Juujrvi, P., Kooistra, L., & Pulkkinnen, L. (2003). Di- Kurt Michalczyk
mensions of executive functioning: Evidence from children.
British Journal of Developmental Psychology, 21, 5980. German Institute for International Educational Research (DIPF)
Logie, R. H. (1995). Visuo-spatial working memory. Hove, UK: Erl- Schlossstr. 29
baum. 60486 Frankfurt a. M.
McDonald, R. P., & Ho, M.-H. R. (2002). Principles and practice in Germany
reporting statistical equation analyses. Psychological Methods, Fax +49 69 24708-216
7(1), 6482. E-mail michalczyk@dipf.de
EAPA APPLICATION FORM
EAPA membership includes a free subscription to the European Journal
of Psychological Assessment. To apply for membership in the EAPA,
please fill out this application form and return it together with your
curriculum vitae to: Valentin Bucik, PhD (EAPA Secretary General),
Dept. of Psychology, Faculty of Arts, University of Ljubljana, Askerceva
2, Ljubljana 1000, Slovenia, fax +386 1 425-9303, E-mail tine.bucik@ff.
uni-lj.si and secretary@eapa-homepage.org
Family name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
First name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Affiliation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
City . . . . . . . . . . . . . . . . Postcode . . . . . . . . . . . . . . . . . . . .
Country . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Phone . . . . . . . . . . . . . . . Fax . . . . . . . . . . . . . . . . . . . . . .
E-mail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ANNUAL FEES
EURO 75.00 (US $ 84.00)

IMPORTANT!
3-digit security code in
FORM OF PAYMENT signature field on reverse
of card (VISA/Mastercard)
Credit card or 4 digits on the front
(AmEx)
VISA Mastercard/Eurocard American Express
Number
Expiration date / CVV2/CVC2/CID#
Card holders name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Signature . . . . . . . . . . . . . . Date . . . . . . . . . . . . . . . . . . . . .
Cheque or postal order
Send a cheque or postal order to the address given above
Signature . . . . . . . . . . . . . . Date . . . . . . . . . . . . . . . . . . . . .
This issue available for free at www.hogrefe.com
A new direction for the European Psychologist!

Moving into the future as a showcase of relevant and intriguing
research from across Europe as well as from further afield.
Editor-in-Chief: Alexander Grob
Managing Editor: Kristen Lavallee
Associate Editors: Rainer Banse, Ulrike Ehlert, Giovanni
Galfano, Katariina Salmela-Aro, Norman Anderson
EFPA News and Views Editor: Veronika A. Poliensk
The European Psychologist is a multidisciplinary journal

that serves as the voice of psychology in Europe, seeking
to integrate across all specializations in psychology and
to provide a general platform for communication and
cooperation among psychologists throughout Europe
and worldwide.
From the Editorial:
We have had a productive year at the editorial offices, chang-

Official Organ of the EFPA and supported by ing direction from mostly empirical articles to nearly exclusively
other organizations of psychology in Europe integrative papers. Starting with the first issue of 2013, expect
to see reviews and synthesizing articles on cutting-edge topics,
current directions in the field, and directions for the broad base
of psychologists in both research and practice.
Alexander Grob and Kristen Lavallee
Table of Contents Issue 1, 2013:
Editorial: Attachment the Italian Way

European Psychologist Moves Forward by R. Cassibba, G. Sette, M. J. Bakermans-Kranenburg, and
by A. Grob (Editor-in-Chief) and K. Lavallee (Managing Editor) M. H. van IJzendoorn
Intergenerational Family Relations in Luxembourg
Original Articles and Reviews:
by I. Albert, D. Ferring, and T. Michels
Using Personal Cell Phones for Ecological Momentary
Assessment by E. Kuntsche and F. Labhart EFPA News and Views:
Psychological Resilience by D. Fletcher and M. Sarkar A Decade of European Congresses of Psychology
Procrastination by K. B. Klingsieck by V. Poliensk
Impairments in Mental Model Construction and Benefits News and Announcements From the EFPA Network of
of Defocused Attention National News Correspondents by V. Poliensk
by U. von Hecker, G. Sedek, and A. Brzezicka Meeting Calendar
Subscribe online at www.hogrefe.com/journals/ep
2013 rates (4 issues per annual volume):

Call for Papers Institutions: US$ 214.00 / 159.00 / 127.00
Individuals: US$ 107.00 / 79.00 / 63.00
The European Psychologist welcomes new submissions Special rate for members of EFPA associations and other supporting
for one of the two sections of the journal: organizations: US$ 68.00 / 49.00 / 39.00
Original Articles and Reviews / EFPA News and Views + p/h US$ 16.00 / 12.00 / 10.00
Subscriptions delivered to the Americas will be charged in US$, in the UK
For aims and scope and other details, see the Advice for
in , in Europe and RoW in .
Authors at www.hogrefe.com/journals/ep
ISSN-Print 1016-9040, ISSN-online 1878-531X, ISSN-L 1016-9040
Hogrefe Publishing
38 Chauncy Street, Suite 1002 Boston, MA 02111, USA
Tel: (866) 823 4726 Fax: (617) 354 6875 E-Mail: customerservice@hogrefe-publishing.com
Merkelstr. 3 37085 Gttingen, Germany
Tel: +49 551 99950-0 Fax: +49 551 99950-425 E-Mail: customerservice@hogrefe.de
New!
Compact, authoritative guidance to effective as-

sessment and treatment of the most common psy-
chological difficulties in children and adolescents
phobia and anxiety disorders
This authoritative but compact text addresses the
psychopathology, assessment, and treatment of
the anxiety disorders and phobias in childhood and
adolescence. These perplexing conditions are the
most prevalent psychological difficulties in young
people and result in considerable impairment and
distress, not only to the child but also to her or his
family. Effective treatments exist, but unfortunate-
ly many of these interventions are either not known
to the practicing professionals or not used by them.
This volume aims to address this gap and to present
these interventions in a clear and straightforward
Amie E. Grills-Taquechel & Thomas H. Ollendick manner.
Phobic and Anxiety Disorders in
Children and Adolescents See sample pages at www.hogrefe.com
Advances in Psychotherapy - Evidence-Based Practice,
Vol. 27
2013, x + 130 pp., ISBN 978-0-88937-339-6
US $29.80 / 19.90 / 24.95
Recent praise:
This informative book describes the various ways that anxiety can be detrimental and interfering for children and
adolescents, and explores the course of anxiety development, methods for its assessment, and considerations in its
treatment. The writing is concise and up-to-date, and guided by evidencebased clinical practice. No hocus pocus, just
well-informed guidance.
Philip C. Kendall, Ph.D., ABPP, Distinguished University Professor, Laura H. Carnell Professor, and Director of the
Child and Adolescent Anxiety Disorders Clinic, Department of Psychology, Temple University, Philadelphia, PA, USA
Anxiety disorders of childhood in a perfect nutshell. If you are looking for a concise, thorough, and bang up-to-date
book, then this is the one!
Dr. Sam Cartwright-Hatton, NIHR Career Development Fellow, University of Sussex, UK
Hogrefe Publishing Hogrefe Publishing Hogrefe Publishing c/o Marston Book Services Ltd
30 Amberwood Parkway Ashland, OH 44805 USA Merkelstr. 3 37085 Gttingen Germany 160 Eastern Ave., Milton Park Abingdon, OX14 4SB UK
Tel: (800) 228-3749 Fax: (419) 281-6883 Tel: +49 551 999 500 Fax: +49 551 999 50 111 Tel: +44 1235 465577 Fax +44 1235 465556
E-Mail: customerservice@hogrefe.com E-Mail: customerservice@hogrefe.de E-mail: direct.orders@marston.co.uk
Order online at www.hogrefe.com

or call toll-free (800) 228-3749 (US only)
H. Schuler, G. C. Thornton, A. Frintrup, & R. Mueller-Hanson
AMI Achievement Motivation Inventory

Complete Test
Consisting of the manual, a question booklet,
20 response sheets, 20 score profiles (1 pad), and a case.
complete test: order # 05 107 01, US $299.00 / 248.00 / 218.00
Area of Application:
Personnel Selection, Potential Analysis, Professional Counseling,
Personnel Development, Profiling, Psychology of Sports, Personality
Research.
General Aspects:
The Achievement Motivation Inventory (AMI) is a personality inven-
tory designed to measure a broad construct of work-related achieve-
ment motivation. It enables users to test candidates for 17 different
facets of achievement motivation. The AMI is based upon the ac-
cepted theories of the construct, but uniquely it also integrates rel-
evant social motives. Thus, in addition to the traditional scales, e.g.,
Confidence in Success or Persistence, scales such as Dominance or
Status Orientation are also included. The AMI consists of 170 items
to be responded to on a 7-point Likert scale.
Reliability:
Reliability (Cronbachs ) for the total score is = .96, and ranges from = .66 to = .83 for individual scales. Retest reli-
ability is rtt = .94 for the total score (rtt = .71 to rtt = .89 for single scales).
Validity:
Content validity has been demonstrated by extensive research on all major aspects of achievement motivation. Expert
ratings ensured that only relevant aspects are included. Additionally, confirmatory factor analysis shows a good fit of
the model to theory. Construct validity is shown by correlations with related personality scales, e.g., Big Five inventories
(ranging up to r = .72). Criterion-related validity is indicated by prediction of grade point averages in US colleges (r = .22
for the total score and up to r = .29 for single scales) and early academic achievements (ranging between r = .21 to r = .36
on different scales). Studies have shown that there is high social validity in regard to acceptance of the test by examinees.
Norms:
N = 335 US students; N = 410 US working adults; N = 1,267 German students; N = 166 German working adults. Additional
norms for the German original are also
available. Also available separately:
Test Duration: Manual: order # 05 107 02, US $151.00 / 98.00 / 108.00
approx. 30 min. Question booklet: order # 05 107 03, US$39.00 / 28.00 / 28.00
20 response sheets: order # 05 107 04, US $106.00 / 72.00 / 76.00
20 score profiles (= 1 pad of 20): order # 05 107 05, US $22.00 / 14.00 / 16.00
Case: order # 05 107 06, US $53.00 / 38.00 / 38.00
Hogrefe Publishing Hogrefe Publishing Hogrefe Ltd.

30 Amberwood Parkway Merkelstr. 3 37085 Gttingen Burgner House 4630 Kingsgate
Ashland, OH 44805 USA Germany Oxford Business Park South
Tel: (800) 228-3749 Fax: (419) 281-6883 Tel: +49 551 999 500 Fax: +49 551 999 50 111 Oxford OX4 2SU UK
E-Mail: customerservice@hogrefe.com E-Mail: customerservice@hogrefe.de Tel: +44 1865 402900 Fax: +44 1865 402888
AMI_Ad_2012.indd 1 22.10.2012 11:38:49

For further details visit www.hogrefe.com
New!
Thought-provoking contributions from the world of the
Rorschach including a special section on psychoanaly-
sis and the Rorschach.
This latest volume of Rorschachiana well illustrates the
diversity of ideas and the many possibilities that projec-
tive methods can offer. The general section includes a
blend of research papers, innovative methods, and com-
plex theoretical ideas.
Rorschachiana is the official publication of the Interna-
tional Society for the Rorschach and Projective Methods,
now with more than 3,000 members throughout the
world, which for over half a century has played an im-
portant role in supporting the use of the instrument, as
well as helping to develop extensive empirical research
on its effectiveness in specific clinical situations.
Rorschachiana is also available as a journal (ISSN 1192-
5604). The journal subscription includes 2 online issues
Sadegh Nashat (Editors)
per year, access to available back issues, and the book as
an annual print compendium.
Rorschachiana
Journal of the International Society
for the Rorschach
2012, viii + 240 pp.
ISBN 978-0-88937-427-0
US $98.00 / 56.00 / 69.95
Table of contents:
Editorial Original Articles
The Importance of Context by Sadegh Nashat Bridging the Gap: Quantitative Rorschach Approaches to
Original Articles Psychodynamic Constructs by Philip Erdberg
Heres Looking at You: Eye Movement Exploration of Ror- An Epistemological and Methodological Approach to Drives
schach Images by Barry Dauphin and Harold H. Greene and Diffusion of Instincts Through the Clinical Assess-
Human Face Responses in the Rorschach Test: A Reconsid- ment of Suicidal Adolescents: The Contribution of the
eration by Yifat Weinberger and Anne Andronikof Rorschach Test by Dimitra Laimou
Resilience, Mentalization, and the Development Tutor: A Psy- Countertransference in the Rorschach Situation as a Clue to
choanalytic and Projective Approach by Claude de Tychey the Patients Affective Functioning: An Exploratory Study
et al. by Emilie Deschenaux et al.
Selflessness in Anorexia Nervosa as Reflected in the Ror- Changing While Remaining the Same: Self-Representation
schach Comprehensive System by Gili Curiel-Levy et al. Confronted With Aging by Benot Verdon
Using Rorschach CS Narrative Responses of the MOA Scale
Special Section: Psychoanalysis and the Rorschach to Construct and Share Patients Model Scenes by Silvia
Editorial Salcuni et al.
Psychoanalysis and the Rorschach by Justine McCarthy Mentalization and the Rorschach by Adam C. Conklin et al.
Woods and Sadegh Nashat Learning From the Inkblot by Latife Yazigi and Sadegh Nashat
Advice for Authors
Hogrefe Publishing Hogrefe Publishing Hogrefe Publishing c/o Marston Book Services Ltd
30 Amberwood Parkway Ashland, OH 44805 USA Merkelstr. 3 37085 Gttingen Germany PO Box 269 Abingdon, OX14 4YN UK
Tel: (800) 228-3749 Fax: (419) 281-6883 Tel: +49 551 999 500 Fax: +49 551 999 50 111 Tel: +44 1235 465577 Fax +44 1235 465556
E-Mail: customerservice@hogrefe.com E-Mail: customerservice@hogrefe.de E-mail: direct.orders@marston.co.uk

Rorschachiana_ad_2012.indd 1 18.12.2012 11:32:58

Instructions to Authors European Journal of Psychological Assessment
May 1, 2013. 201 Hogrefe Publishing
The European Journal of Psychological Assessment publishes pa- the text and are to be arranged alphabetically. Works cited in the text
pers on all domains of psychological assessment as well as on the as- within the same parentheses should be listed alphabetically. APA style
sessmentprocess,providedtheyhavenotbeensubmittedforpublication is to be used for text citations and the reference list throughout (for
elsewhere. Of special interest are papers reporting the construction of details, see Publication Manual of the APA, 6th edtion):
new measures or a considerable advancement of an existing measure. Journal article: Crisp, A.H., Palmer, R.L., & Kalucy, R.S. (1976).
Papers published are original and preferably empirical studies. Review How common is anorexia nervosa? A prevalence study. British
papersandcasestudieswillalsobeconsiderediftheyresultininnovative Journal of Psychiatry, 128, 549554.
research questions in assessment and decision-making. The Journal Entire book: Author (date). Book title. City (State and/or Country if
especiallywelcomespapersin which thedatahavebeensampledacross city is relatively unknown): Publisher.
two or more countries and if applicable with different methods (e.g., Article or chapter in edited book: Corbett, J.A., & Turpin, G. (1985).
questionnaire, interview, observation, performance). Test reviews will Tics and Tourettes syndrome. In M. Rutter & L. Hersov (Eds.),
not be considered. Papers that propose or discuss alternative measure- Child and adolescent psychiatry. Modern approaches (2nd ed.,
mentmodelsforexistingscalesshouldalsodiscussandpreferablydem- pp. 516525). Oxford: Blackwell.
onstrate the consequences for the construct and criterion validity of the Galley Proofs and Offprints: The corresponding author receives
scales. Papers on psychometric methods will be considered only if they galley proofs of the paper in due course, and these must be sent back to
have a clear bearingonassessmentresearch or practice.Allmanuscripts the publisher by return mail. Changes in the style or content of the
will be subject to masked peer review. The Journal also publishes infor- galleys can be made only in exceptional instances. Hogrefe will send
mation on upcoming meetings, congresses, and other professional the corresponding author of each accepted paper free of charge an
events. The Journal appears four times a year. e-offprint (PDF) of the published version of the paper when it is first
Language: All contributions must be in English. It is recom- released online. This e-offprint is provided for the authors personal use,
mended that authors who are not native speakers of English have including for sharing with coauthors.
their manuscripts checked by a native-speaker colleague before sub- Copyright Agreement: By submitting an article, the author con-
mission. firms and guarantees on behalf of him-/herself and any coauthors that
Manuscripts: Authors should follow the Publication Manual of the he or she holds all copyright in and titles to the submitted contribution,
American Psychological Association (6th ed.) for general style and including any figures, photographs, line drawings, plans, maps, sketch-
layout. Please arrange and type manuscripts accordingly. Nonstandard es, and tables, and that the article and its contents do not infringe in any
abbreviations should be defined in the text upon first use. Footnotes way on the rights of third parties. The author indemnifies and holds
should generally be avoided. All manuscripts should be submitted on- harmless the publisher from any third-party claims.
line at http://www.editorialmanager.com/ejpa. Please follow the online The author agrees, upon acceptance of the article for publication, to
instructions for submission. Should you have any technical queries re- transfer to the publisher the exclusive right to reproduce and distribute
garding this process, please contact Juliane Munson, Hogrefe Publish- the article and its contents, both physically and in nonphysical, elec-
ing (E-mail juliane.munson@hogrefe.com, Tel. +49 551 99950-422, tronic, or other form, in the journal to which it has been submitted and
Fax +49 551 99950-425). Please direct any editorial questions to the in other independent publications, with no limitations on the number of
editorial office, E-mail ejpaeditor@gmail.de. The title page should, in copies or on the form or the extent of distribution. These rights are
addition to the title, contain the name and complete address of the cor- transferred for the duration of copyright as defined by international law.
responding author and names and affiliations of all authors. The text Furthermore, the author transfers to the publisher the following exclu-
should be written double-spaced with a margin of 2.5 cm (1 inch) on all sive rights to the article and its contents:
sides. 1. The rights to produce advance copies, reprints or offprints of the
Regular Full-Length Papers may exceed neither 22 manuscript article, in full or in part, to undertake or allow translations into other
pages (double-spaced, pitch 12) nor 5,000 words, including figures, languages, to distribute other forms or modified versions of the arti-
tables, and references. Papers exceeding this limit will immediately cle, and to produce and distribute summaries or abstracts.
be rejected. They should normally be divided into Summary (not 2. The rights to microfilm and microfiche editions or similar, to the use
exceeding 200 words), Keywords (no more than five), Introduction, of the article and its contents in videotext, teletext, and similar sys-
Materials and Methods, Results, Discussion, Acknowledgments tems, to recordings or reproduction using other media, digital or
(optional), and References. analog, including electronic, magnetic, and optical media, and in
Statistical and Mathematical Copy: Abbreviations and styling multimedia form, as well as for public broadcasting in radio, televi-
of statistical and mathematical symbols, Greek letters, etc., as well sion, or other forms of broadcast.
as spacing, alignment, punctuation, equations in text, displayed 3. The rights to store the article and its contents in machine-readable or
equations, and tabular presentation or mathematical copy, should electronic form on all media (such as computer disks, compact disks,
follow the Publication Manual of the APA (Sections 4.414.49). magnetic tape), to store the article and its contents in online databases
Figures and Tables: Figures and tables should be numbered belonging to the publisher or to third parties for viewing or for down-
using Arabic numerals. The number of figures and tables should be loading by third parties, and to present or reproduce the article or its
kept to a minimum and only be included to facilitate understanding contents on visual display screens, monitors, and similar devices,
of the text. The same information should not appear in both a figure either directly or via data transmission.
and a table. Each table and figure must be cited in the text and should 4. The rights to reproduce and distribute the article and its contents by
be accompanied by a legend on a separate sheet. Please note that all other means, including photomechanical and similar processes
online submission via the Editorial Manager allows text, figures, (such as photocopying or facsimile), and as part of so-called docu-
and tables to be submitted as separate files. Figures must be supplied ment delivery services.
in a form suitable for reproduction: preferably high-resolution bit- 5. The right to transfer any or all of the rights mentioned in this agree-
maps (e.g., jpg, 300 dpi) or as vector graphics files. Figures will ment, as well as the rights retained by the relevant copyright clearing
normally be reproduced in black and white only. While it is possible centers, including the corresponding royalty rights to third parties.
to reproduce color illustrations, authors are reminded that they will Online Rights for Journal Articles: Guidelines on authors rights to
be invoiced for the extra costs involved. archive electronic versions of their manuscripts online are given in the
References: References must include all and only works cited in Advice for Authors on the journals web page at www.hogrefe.com.
May 1, 2013. 201 Hogrefe Publishing

New!
General Aspects:
The FTT is a projective test that is based on 21 drawn cards of

well-known fairy tale characters, for example, Little Red Riding
Hood, Snow White, giant, witch, etc.
In contrast to other thematic tests, the child is not required to
tell a story or complete a story, but to answer directed ques-
tions about the characters and situations in the fairy tales.
There are 30 personality variables in the FTT that are described
and differentiated in detail with user-friendly notes and illus-
trations in the margins. Clear instructions and examples are
given for rating the test as well as quantitatively and qualita-
tively interpreting the variables. The most common defense
mechanisms observed in the FTT are listed with examples. Two
case studies of children from the Indian sample highlight how
to use the test and illustrate the clinical usefulness of this test.
In addition, in order to aid scoring and interpretation, the most
frequently reported answers and the most frequently expressed
variables for each set of cards are listed in the appendix.
Area of Application:
The Fairy Tale Test (FTT) is suitable for children aged 612 years.
Carina Coulacoglou It can be applied in individual therapeutic work, in educational
with co-researchers Nilanjana Sanyal counseling, and in remedial educational work as a basic tool for
and Manisha Dasgupta clinical anamnesis, as a method of assessing personality chang-
es over time, and as an instrument in clinical (therapeutic) and
The Fairy Tale Test cross-cultural research.
2013
Complete test kit consisting of manual, 21 test Validity:
cards, and 10 recording sheets.
Complete test: order # 05 108 01, The validity of the FTT was examined through the application
US $193.00 / 110.00 / 138.00 of factor analyses (first and second order) on the 30 personality
variables.
Also available separately:
Manual: order # 05 108 02, Norms:
US $111.00 / 63.00 / 79.00
Test cards (21 cards): order # 05 108 03, Percentiles and T scores for the FTT variables are listed accord-
US $48.00 / 27.50 / 34.50 ing to gender, age and test region in India based a nonclinical
Recording sheets (10 sheets): order # 05 108 04, sample of Indian children (N = 1,355).
US $34.00 / 19.50 / 24.50
Test Duration:
Approximately 45 minutes test administration time. The time
required for scoring and interpretation lies between 30 to 60
minutes depending on the complexity of the answers and the
experience of the tester.
Hogrefe Publishing Hogrefe Publishing
30 Amberwood Parkway Ashland, OH 44805 USA Merkelstr. 3 37085 Gttingen Germany
Tel: (800) 228-3749 Fax: (419) 281-6883 Tel: +49 551 999 500 Fax: +49 551 999 50 111
E-Mail: customerservice@hogrefe.com E-Mail: customerservice@hogrefe.de

FTT_ad_2013.indd
View publication stats1 21.01.2013 09:32:47

Getting Entangled in The Nomological Net

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Getting Entangled in The Nomological Net

Загружено:

Авторское право:

Доступные форматы

See

Lost in Translation: Thoughts Regarding the

Article in European Journal of Psychological Assessment May 2013

Matthias Ziegler Doreen Bensch

SEE PROFILE SEE PROFILE

European Journal of Psychological Assessment View project

The user has requested enhancement of the downloaded file.

2013 Hogrefe Publishing

Consulting Editors Paul De Boeck, Belgium Helfried Moosbrugger, Germany

Founders Roco Fernndez-Ballesteros and Fernando Silva

ISSN ISSN-L 1015-5759, ISSN-Print 1015-5759, ISSN-Online 2151-2426

Publication Published in 4 issues per annual volume.

Subscription prices Annual subscription, Institutions (2013): e284.00, US $384.00, 228.00

2013 Hogrefe Publishing

Official Organ of the

Matthias Ziegler1, Tom Booth2, and Doreen Bensch1

In fact, the largest overall convergent correlation was found

Schweizer, K. (2012). On issues of validity and especially on the

Framing Work Competencies

Keywords: competencies, ESEM, Big Five traits, BFC grid

Table 1. The ESEM solution for the best fitting model

Development and Validation

Procedure is not white, but in front of the square) has to be iden-

fully tested with a sample of N = 76, confirming the validity References

Item characteristics and overview of overlapping items for year 13

VO8 .44 0.50 .28 .55 0.50 .26

The MMPI-2-Restructured Form and

Keywords: MMPI-2-RF, MMPI-2, international adaptation, predictive validity, DSM-IV

(2005). Participants were classified with a depressive dis-

Somatic/Cognitive and Internalizing Scales

Externalizing and Interpersonal Scales

chotic patients in community clinics. American Journal of Psy-

The Factor Structure and

Keywords: EDE-Q, CFA, eating disorders, factorial validity, Greece

based only on the 22 attitudinal items. The four subscales are

Eating Disorder Examination-Questionnaire

Table 2. Pearsons correlations between G-EDE-Q and validity measures (N = 164)

Keywords: self-efficacy beliefs, emotion regulation, measurement invariance

Study Aims Materials and Methods

Table 3. Scale items of the German RESE-R (German text in italics)

Materials and Method means between groups (partial invariance). Evaluation of

Reliability and Validity of the

Introduction EMS are usually assessed with self-report question-

sessing 18 schemas with five items each (Young, 2005). To Instruments

A short-form of the Symptom Checklist 90-R (SCL-90-R;

Table 3. Results of the reliability analysis

Table 4. Results of the factorial and convergent validity analysis

Young, J. E. (2006). Young Schema Questionnaire (German ver- Levente Kriston

Validation of the German Version

Keywords: exercise dependence, eating disorder symptoms, measurement, validity

Introduction lowing subscales: tolerance, withdrawal effects, continuance,

3) Nondependent-asymptomatic is a categorization where- Population-Based Sample

Table 4. Comparison of Exercise Dependence Scale (EDS-G) scores across samples

amination-Questionnaire. Deutsche bersetzung [Eating Dis-

Age Differences and Measurement

Keywords: working memory, structure, assessment, children, change, measurement invariance

Introduction than those studying adults. Nevertheless, first findings

sketchpad. The central executive, in contrast, should be Procedure

Sample/Participants Central Executive

1 A more detailed description of the working memory tasks is available http://www.idea-frankfurt.eu/journal/ejpa/appendix_M_et_al

Table 2. Means and standard deviations as functions of age Results

Table 3. Comparison of fit in the nested models for whole sample

In the next step of analysis, we addressed the issue of the