Вы находитесь на странице: 1из 11

Author’s personal copy (e-offprint)

Review Article

Beyond Dichotomies
Competence Viewed as a Continuum
Sigrid Blömeke,1 Jan-Eric Gustafsson,1,2 and Richard J. Shavelson3,4
Centre for Educational Measurement (CEMO), University of Oslo, Norway, 2Department of Education
and Special Education, University of Gothenburg, Sweden, 3SK Partners LLC, Menlo Park, CA, USA,
Graduate School of Education, Stanford University, CA, USA

Abstract. In this paper, the state of research on the assessment of competencies in higher education is reviewed. Fundamental conceptual and
methodological issues are clarified by showing that current controversies are built on misleading dichotomies. By systematically sketching
conceptual controversies, competing competence definitions are unpacked (analytic/trait vs. holistic/real-world performance) and common-
places are identified. Disagreements are also highlighted. Similarly, competing statistical approaches to assessing competencies, namely item-
response theory (latent trait) versus generalizability theory (sampling error variance), are unpacked. The resulting framework moves beyond
dichotomies and shows how the different approaches complement each other. Competence is viewed along a continuum from traits that underlie
perception, interpretation, and decision-making skills, which in turn give rise to observed behavior in real-world situations. Statistical
approaches are also viewed along a continuum from linear to nonlinear models that serve different purposes. Item response theory (IRT) models
may be used for scaling item responses and modeling structural relations, and generalizability theory (GT) models pinpoint sources of
measurement error variance, thereby enabling the design of reliable measurements. The proposed framework suggests multiple new research
studies and may serve as a ‘‘grand’’ structural model.

Keywords: competencies, ability, competence assessment, cognition, modeling

In our Call for Papers for this topical issue of the Zeitschrift real life’’ (Koeppen, Hartig, Klieme, & Leutner, 2008,
für Psychologie (ZfP) we blithely said that the ‘‘assessment p. 61) – that we editors left it unchallenged until we tried
of competence development during the course of higher to unpack it and rephrased it as ‘‘the latent cognitive and
education presents a substantive and methodological chal- affective-motivational underpinning of domain-specific
lenge. The challenge is to define, and model competence performance in varying situations.’’ To be sure, cognition
– as the latent cognitive and affective-motivational and affect-motivation are latent traits (i.e., human construc-
underpinning of domain-specific performance in varying tions of unobserved processes); they cannot be directly
situations – in a reliable and valid way’’ (Blömeke, observed but have to be inferred from observable behavior.
Gustafsson, & Shavelson, 2013). We now say ‘‘blithely’’ However, this definition only provides a starting point to
because at the time we thought that defining ‘‘what was address the conceptual and methodological challenges
meant by the term ‘competencies’ seemed . . . to be an eas- involved in assessing competencies acquired in higher
ier task [than measuring competencies].’’ We should have education.
known better. For it is well known that definitions are Conceptually, in a first interpretation, the ‘‘complex
important and contested. Moreover, once defined, con- ability’’ part of the definition is stressed and competence is
straints are placed on the measurement of competence – analytically divided into several cognitive and affective-
what constitutes a task eliciting competence and what motivational traits (or resources; Schoenfeld, 2010), each to
doesn’t, what is an allowable measurement procedure and be measured reliably and validly. The validity of an interpre-
what isn’t, what is a reasonable approach to scaling and tation that such a measurement taps competence could then
what isn’t, etc.? This reality has been brought home not be established by, for example, testing whether the trait
only in the diversity of definitions and measurement structure was as hypothesized, or whether the measurement
approaches represented in this topical issue of ZfP but also predicted performance in a ‘‘criterion situation.’’ The correla-
in the debates and deliberations in the literature – and tion between competence and performance might vary across
among the editors! different situations but we would expect it to be positive and of
The call actually set us up for controversy with a phrase substantial magnitude. Many of the papers included in this
so commonly seen in the measurement literature – topical issue fit well within this analytic tradition. Zlatkin-
‘‘Competencies are conceptualized as complex ability con- Troitschanskaia et al. (2015), for example, examine the
structs that are context-specific, . . . and closely related to content knowledge in micro- and macroeconomics acquired

Ó 2015 Hogrefe Publishing Zeitschrift für Psychologie 2015; Vol. 223(1):3–13

DOI: 10.1027/2151-2604/a000194
Author’s personal copy (e-offprint)
4 S. Blömeke et al.: Competence as Continuum

during higher education in Germany and Japan with a paper- Overview

and-pencil test validated for cross-country comparisons.
A second interpretation focuses on the ‘‘real-life’’ part We have purposely characterized the definition and
of the definition and thus on observed behavior in context. measurement of competence by strong and opposing posi-
Competence itself, then, is assumed to involve a multitude tions. Pragmatically, reality in both respects lies somewhere
of cognitive abilities and affect-motivation states that are in between. At either extreme, there is a chance of
ever changing throughout the duration of the performance. forgetting either observable behavior or cognitive abilities.
In this case, the goal is to get measures as ‘‘closely related’’ That is, our notion of competence includes ‘‘criterion
to criterion performance as possible. Perhaps the closest a behavior’’ as well as the knowledge, cognitive skills, and
measurement can get to criterion performance is to sample affective-motivational dispositions that underlie that behav-
real-world tasks and observe performance on them. What is ior. Statistically, we believe that both CTT, in particular GT
to be measured, then, is behavior in real-life situations and other approaches that are based on the decomposition
recognizing that no two people might use the exact same of variance, and more recent latent trait, mixed, and mixture
competence profile to carry out the behavior. Some of the models in the IRT tradition have a role to play in examining
papers included in this topical issue fit within this more the quality of competence measurements.
holistic tradition. This paper tries to tidy up ‘‘this messy construct.’’ We do
Methodologically, we note that the long-standing mea- not intend to find ‘‘the’’ one definition and assessment-
surement traditions based on classical test theory (CTT) of-competence measurement. Rather by systematically
certainly provide useful tools to approach technical issues sketching conceptual controversies and assessment
in assessment of competences. But factor analysis and other approaches we attempt to clarify the construct and its
classical methods were developed to solve other measure- measurement. Our discussion of ‘‘messy’’ challenges con-
ment problems than those encountered when assessing fronting the definition and measurement of competence
domain-specific performance. Thus, with only few excep- begins with definitional issues. We unpack competing
tions (e.g., generalizability theory [GT] as developed by definitions and identify commonplaces where there seems
Cronbach, Gleser, Nanda, & Rajaratnam, 1972; see also to be a modicum of agreement. We also highlight disagree-
Shavelson & Webb, 1991; Brennan, 2001) much of CTT ments and suggest how some might be resolved. We then
focuses on reliable assessment of individual differences provide examples of how competence is defined in several
on single characteristics in norm-referenced contexts. But professions. Next we discuss methodological issues, focus-
assessment of competences often requires criterion- ing on how we can move beyond dichotomies by balancing
referenced decisions, such as whether particular levels of and making the best use of both CTT and IRT. Finally, we
competence have been reached (Berry, Clark, & McClure, conclude by tying key points and issues together.
Furthermore, as pointed out above, in competence
assessments a multitude of characteristics is to be taken into
account at the same time and the profile, how these charac-
teristics are related to each other within a person often is of
strong interest. For such purposes latent trait and mixed Conceptual Framework: Definitions
models – out of which item-response-theory (IRT) models of Competence
are the most prominent ones – seem to hold promise, in
particular because they make it possible to investigate the The notion of competence was first discussed in the US
nature of scales (Rijmen, Tuerlinckx, De Boeck, & during the 1970s (Grant, Elbow, & Ewens, 1979). The dis-
Kuppens, 2003). cussion focused on performance on ‘‘criterion tasks’’
If the latent variables are, in addition, categorical sampled from real-life situations. McClelland (1973)
(mixture models; McLachlan & Peel, 2000), a person- contrasted the ‘‘criterion-sampling’’ approach with testing
oriented approach to competence profiles can be explored. for aptitude and intelligence. In McClelland’s view, ‘‘intel-
The intent is to capture unobserved heterogeneity of pro- ligence and aptitude tests are used nearly everywhere by
files in subpopulations. These approaches open therefore schools, colleges, and employers. . . The games people are
up a wide range of possibilities in the field of educational required to play on aptitude tests are similar to the games
measurement. teachers require in the classroom. . . So it is scarcely sur-
A specific methodological challenge in the context of prising that aptitude test scores are correlated highly with
competence assessments though is that reliability require- grades in school’’ (1971, p. 1). He argued that we instead
ments typically imply a large number of items which leads should be testing for competence – successful behavior in
to selected-response assessments that can be quickly admin- real-life situations: ‘‘If someone wants to know who will
istered and scored. However, assessment of domain-specific make a good teacher, they will have to get videotapes of
competence in higher education does not necessarily lend classrooms, as Kounin (1970) did, and find out how the
itself to such approaches because validity considerations behaviors of good and poor teachers differ. To pick future
call for tapping ‘‘real-life’’ performance at some point. businessmen, research scientists, political leaders, prospects
Achieving sufficient reliability and generalizability in the for a happy marriage, they will have to make careful behav-
assessments is challenging given the complexities of higher ioral analyses of these outcomes and then find ways of sam-
education competencies. pling the adaptive behavior in advance’’ (p. 8).

Zeitschrift für Psychologie 2015; Vol. 223(1):3–13 Ó 2015 Hogrefe Publishing

Author’s personal copy (e-offprint)
S. Blömeke et al.: Competence as Continuum 5

A contrasting perspective stressed competence’s disposi- or his behavior and how these best can be developed are
tional, and in particular its cognitive nature; either generic essential in this approach. An implicit assumption is that
competence, which is often synonymous with intelligence these characteristics are amenable to external interventions
or information processing abilities, or domain-specific (Koeppen et al., 2008; Sternberg & Grigorenko, 2003)
competence, often referred to as expertise. Boyatzis such as opportunities to learn and systematic training so
(1982) carried out one of the first empirical studies in this that the relationship between educational inputs and
perspective. Based on top managers definitions of their competence outcomes is foregrounded and a frequent
competence he defined it as an ‘‘underlying characteristic research topic. In the long run, the purpose is not to iden-
of a person which results in effective and/or superior perfor- tify job-person fit but to identify those opportunities to
mance in a job’’ (p. 97). Spencer and Spencer (1993, p. 9) learn on the individual, classroom, and system level best
were more precise: suited to foster competence development. The German
research program, ‘‘Modeling and measuring competencies
A competency is an underlying characteristic of in higher education,’’ is an example (Blömeke, Zlatkin-
an individual that is causally related to criterion- Troitschanskaia, Kuhn, & Fege, 2013). The program
referenced effective and/or superior performance in responds to the increasing discussion about instructional
a job or situation. Underlying characteristic means quality in higher education and the new wave of compe-
the competency is a fairly deep and enduring part tence-based curricula as a result of the Bologna process’
of a person’s personality. [. . .] Causally related means requirements.
that a competency causes or predicts behavior and
performance. Criterion-referenced means that the
competency actually predicts who does something
well or poorly, as measured on a specific criterion Overcoming Disagreements Due to
or standard.’’
Oversimplified Dichotomies
So, as we see, a variety of definitions has existed and
The industrial/organizational selection and the educational
still exists. The respective representatives mutually criticize
training approaches to the definition of competence and
each other fiercely for misconceiving the construct, reduc-
competence assessments are in some respects distinct.
ing its complexity, ignoring important aspects, and so on
In the following, we unpack the disagreements and suggest
(e.g., McMullan et al., 2003). The value added by each of
how to overcome these. However, we also see substantial
the perspectives is rarely acknowledged.
commonalities in the various notions of competence –
The dichotomy of a behavioral assessment in real-life
a ‘‘framework’’ of sorts. We highlight these commonalities
situations versus an analytical assessment of dispositions
underlying such behavior has much to do with the origins
of these different models. The first approach stems from
industrial/organizational psychology that has the selection
of candidates best suited for a job as the main purpose in Agreements in the Definition of
mind. Naturally, underlying dispositions are then not Competence
the focus because they are not as close as observed
performance in context. Rather, predicting future job- There is some agreement in the two contrasting perspective
performance by sampling typical job tasks and assessing laid out above that ‘‘competence’’ (plural ‘‘competences’’)
how well a candidate does represents a reliable and valid is the broader term whereas ‘‘competency’’ (plural ‘‘compe-
approach to identify job-person fit (Arthur, Day, McNelly, tencies’’) refers to the different constituents of competence.
& Edens, 2003). Many large employers carry out such The first term describes a complex characteristic from a
assessments as part of their recruitment process. It is holistic viewpoint whereas the latter takes an analytic
not important how a candidate has come to his or her stance. The constituents (or resources) may be cognitive,
competence. What matters is that he or she shows it in sit- conative, affective, or motivational. In contrast to common
uations relevant for the job (Sparrow & Bognanno, 1993). views of intelligence as a less malleable trait, competence
But also in the context of professional certification and and competency are regarded as learnable and can thus
licensure, performance criteria and their assessment be improved through deliberate practice (Epstein &
according to the standards of a profession are foreground- Hundert, 2002; Shavelson, 2010; Weinert, 2001).
ed. Which opportunities to learn a candidate had during Furthermore, agreement exists in both perspectives that
his or her training or which traits contribute to perfor- a competence framework recognizes the importance of real-
mance is not the focus. The license is only awarded if world situations typical for performance demands in a field
a teacher, nurse, or psychologist is able to do what is as ‘‘the’’ point of reference. The definition of competence
required. therefore has to start from an analysis of authentic job or
In contrast to this selection approach, the second societal situations and enumerate the tasks as well as the
approach stems from educational research and intends to cognition, conation, affect, and motivation involved. And
find ways to foster the development of competence. Iden- no matter whether one follows the behavioral or the dispo-
tifying a person’s characteristics (resources) underlying her sitional perspective – such real-world situations should be

Ó 2015 Hogrefe Publishing Zeitschrift für Psychologie 2015; Vol. 223(1):3–13

Author’s personal copy (e-offprint)
6 S. Blömeke et al.: Competence as Continuum

sampled in measures of competence or in measures of cri- (a) competence is performance in real-world situations,
teria. In both cases, the underlying competencies inferred more specifically, whether behavior is the focus of compe-
from such a framework do not necessarily have to be in line tence, or (b) behavior is the criterion against which cogni-
with those inferred from a curriculum in school or tion and affect-motivation are validated as measures of
university. competence. As we will see, such a dichotomy overlooks
an essential question and this is how knowledge, skills,
and affect are put together to arrive at performance.
Beyond Dichotomies: Competence as a The first position (a) takes a holistic view in which cog-
Multidimensional Construct nition, affect-motivation, and performance are complexly
linked together, changing during the course of performance
If we agree that competence ultimately refers to real-world (Corno et al., 2002). A competence assessment, then,
performance, either as constituent of the construct or as a involves successfully carrying out concrete tasks in real-
validity criterion, several disagreements are resolved. It is world criterion situations; a definition of competence, then,
then no longer a question whether competence is a set of should be based on a thorough analysis of the demands of
cognitive abilities only or is a combination of cognition, and variations in these situations. To be sure, knowledge,
conation, affect, and motivation. To the degree that cona- skill, and affective-motivation components underlie perfor-
tion, affect, and motivation are involved in that performance mance but they change during the in-situation performance
besides cognition, so too should the definition of compe- as the situation moves along. Cognition, affect-motivation,
tence include them for that domain. Competence thus and performance are linked as a system, cobbled together
involves complex intellectual characteristics along with in response to task demands, somewhat differently for each
affect-motivation that underlies observable performance. person. This observation is what Oser (2013) had in mind
Evidence exists that for long-term job success, such subjec- when he pointed out that competence involves a process
tive indicators have to be taken into account (Brief & Weiss, dimension which he calls a ‘‘competence profile’’ – a set
2001). Job satisfaction predicts productivity and perfor- of resources enacted in practice. One important research
mance (Judge, Thorensen, Bono, & Patton, 2001). Work question in this context is how precisely the different
engagement also predicts performance and, in addition, resources are cobbled together, what this interplay depends
organizational commitment (Bakker, 2011) and health on and how the resources can be built up (i.e., how should
(Hakanen & Schaufeli, 2012). they look like, e.g., at the end of higher education).
This argument leads back to Snow’s (1994) idea of two The second position (b) restricts the term ‘‘competence’’
pathways that contribute to achievement, namely a cogni- to the sum of cognitive and motivational resources. This
tive and a commitment pathway. Thus, he included motiva- approach assumes that the whole is the sum of its
tional-conative processes in his new concept of aptitude. (weighted) parts and divides competence into multiple
Lau and Roeser (2002) confirmed this framework empiri- constituents (latent abilities, skills) needed for competent
cally with respect to science achievement. Whereas performance. Competencies, then, are used to predict
students’ cognitive abilities were the strongest predictors, behavior in criterion situations (e.g., Spencer & Spencer,
it turned out that motivational characteristics increased 1993). From this perspective, among others, measures of
the predictive validity and that these were also the strongest both declarative and procedural ‘‘knowing’’ tap underlying
predictors for commitment. competencies such that they are applicable to multiple
A priori, it is impossible to specify which specific facets ‘‘real-world’’ situations in which doing is the end game.
enter into a definition of competence. For example, what If this reasoning holds, we should seek a model of compe-
does a competent physicist know, believe, and is able to tence featuring cost-efficient selected-response measures of
do? Only from detailed observation and other information declarative and procedural knowledge in a domain. Note
particular profiles of cognition, motivation, etc., can be that this definition of competence would also lead to a mea-
specified. Not only is subject-matter knowledge required surement model that accounted for task/response sampling,
to solve force and motion problems, so too are problem- in addition to scaling scores. Since real-world behavior is
solving strategies, analytic reasoning, critical thinking, the core validity criterion in this case, again a careful anal-
and the like. Moreover, if competent performance involves ysis of the demands of and the variations of these situations
working successfully as a team member, this competency would be crucial. One important research question is about
would be included in the definition of competence. Thus, the relation of competence and its constituents (Sadler,
any definition of competence should entertain the possibil- 2013): Is it possible to decompose competence exhaustively
ity that competence involves complex cognitive abilities as it is often done in technology and science? The decom-
along with affective and volitional dispositions to work in position reduces complexity and aids understanding – but is
particular situations. it the same then?
In both perspectives, the behavioral and the disposi-
tional, the question arises as to whether and how persons
Beyond Dichotomies: Competence as a Horizontal who possess all of the resources belonging to a competence
Continuum construct are able to integrate them, such that the underly-
ing competence emerges in performance. This might be an
Currently, the dichotomy of disposition versus performance empirical question but would require assessments for each
comes down to and gets stuck with the question of whether competency.

Zeitschrift für Psychologie 2015; Vol. 223(1):3–13 Ó 2015 Hogrefe Publishing

Author’s personal copy (e-offprint)
S. Blömeke et al.: Competence as Continuum 7

Figure 1. Modeling compe-

tence as a continuum.

Conceptually, this question leads us to point out an Taking a longitudinal, developmental perspective on
important gap in the current dichotomized discussion: competence adds complexity. The model might be similar
Which processes connect cognition and volition-affect- to Figure 1 in that, firstly, some dispositions have to be in
motivation on the one hand and performance on the other place before situation-specific skills can be acquired. Many
hand? Different facets have to be integrated, perhaps to higher education programs (e.g., teaching or medicine) are
be transformed and/or restructured through practical expe- built on such an implicit assumption by delivering basic
rience. Processes such as the perception and interpretation knowledge first before students undergo practical training.
of a specific job situation together with decision-making But it might as well be that a developmental model would
(Schoenfeld, 2010) may mediate between disposition and look completely different in that growth or loss continu-
performance (see Figure 1). ously happens on all dimensions at the same time (Baltes,
Thus, instead of insisting on an unproductive dichotomy Reese, & Lipsitt, 1980; Cattell, 1971). An interesting
view of competence, in particular knowledge or perfor- research question is whether competence changes are then
mance, competence should be regarded as a process, a best characterized by linear increase (or decrease), by dif-
continuum with many steps in between. Thus, we suggest ferentiation processes from more general and basic expres-
that trait approaches recognize the necessity to measure sions to more specialized one, or by qualitative changes as
behaviorally, and that behavioral approaches recognize it is assumed in the novice-expert paradigm. In the two lat-
the role of cognitive, affective, and conative resources. ter cases, developmental trajectories would imply structural
At this time, we encourage research on competence in changes in the nature of competence.
higher education emanating from either perspective and
paying attention particularly to the steps in between.
Our model may help thinking about these.
Particular Fields of Research on
Competence Is Also a Continuum in Other Respects
Many professions are concerned about the nature and
Before we consider particular fields of research on compe- assessment of competence. They have to train the next gen-
tence, it is worth noting at least briefly that competence is erations of professionals on the one hand and to award
also a vertical continuum in terms of performance levels licenses or to select candidates on the other hand. Here
and of developmental stages. More specifically, one inter- we look specifically at medicine, teaching, and vocational
pretation is that competence is a continuous characteristic education to see how each deals with the definition of com-
with higher and lower levels (more or less competent). petence and its measurement.
Additionally, as competence is a multidimensional con- In medicine, the debate about the meaning of compe-
struct, a person’s profile might include stronger estimates tence has a long tradition and included from the beginning
in one dimension and weaker ones in another. So, the def- both perspectives: competence development through medi-
inition of competence includes the notion of how much is cal training but also selection at its end in terms of licens-
enough to be called ‘‘competent.’’ Furthermore, an impor- ing. The debates resulted in Miller’s (1990) widely used
tant research question is whether the different dimensions pyramid of clinical competence. The pyramid provides a
of competence can compensate for each other (i.e., are framework for how to think about the different transforma-
additive by nature) or if strength on one cannot compensate tion processes that link factual knowledge and job-related
for weakness on another dimension (i.e., multiplicative nat- behavior by distinguishing between knowledge, compe-
ure of competence dimensions; Koeppen et al., 2008). In tence, performance, and behavior. The level of each cate-
the latter case, an interesting follow-up research question gory and the relation between categories (e.g., knowledge
would be which minimum threshold has to be in place is regarded an antecedent to competence) are assumed to
before someone can show a certain behavior. be influenced by other characteristics such as beliefs,

Ó 2015 Hogrefe Publishing Zeitschrift für Psychologie 2015; Vol. 223(1):3–13

Author’s personal copy (e-offprint)
8 S. Blömeke et al.: Competence as Continuum

opportunities to learn, practical experiences, or situational combines dispositional and behavioral approaches by partly
affordances. taking place in school – delivering traditional knowledge
The Accreditation Council for Graduate Medical Educa- and completed with theoretical examinations – and partly
tion (http://www.acgme.org) strives to include both, the at the workplace – delivering practical experience in an
educational and the selection perspectives, in their accredi- occupation and completed with examinations in which stu-
tation procedure for medical programs in the US by dents are supposed to master real-world challenges.
requesting assessments of different dimensions of clinical Importantly, all three fields are heavily affected by the
competence. Epstein and Hundert (2002) summarize these debate about the theory-practice gap between schools or
as a cognitive function – knowledge to solve real-life prob- universities and workplace. A major research focus is there-
lems; an integrative function – using biomedical and psy- fore on competence-based school and university education
chosocial information in clinical reasoning; a relational which has become increasingly popular in Western Europe.
function – communication with patients and colleagues; Instead of following a disciplinary curriculum, defining
and an affective/moral function – the willingness, patience, cognitive outcomes related to typical situations in an occu-
and emotional awareness to use these skills judiciously and pation and examining them in a performance-based way is
humanely. For each category, specific assessment formats regarded a promising way to raise the quality of the work-
have been developed (Wass, Van der Vluten, Shatzer, & force (Handley, 2003). After some euphoria, implementa-
Jones, 2001): traditional paper-and-pencil tests, standard- tion turned out to be more difficult and less related to
ized performance assessments using laboratory experiments higher quality, though, than expected (Eraut, 2003).
or simulations, and unstandardized performance assess-
ments at the workplace.
Teaching is another field with extensive research on
what it means to be competent. Outstanding teacher perfor-
mance is regarded to involve different types of resources, in Methodological Framework:
particular knowledge, skills, beliefs, values, motivation, and Assessing Competence
metacognition (Shulman, 1987; Schoenfeld, 2010). The
corresponding research is mostly driven by the objective Assessments developed to measure competence, by nature,
of long-run improvement of teacher education. A study have to differ from traditional knowledge tests (Bennett,
by Blömeke et al. (2014), for example, found that mathe- 1993; Birenbaum, 2007). For example, frequent or central
matics teachers’ perception accuracy of classroom situa- real-world situations typical for performance demands in
tions and speedy recognition of students’ errors are a domain play a crucial role either for determining constit-
influenced by their knowledge acquired during teacher edu- uents of competence or validity criteria. Thus, the sampling
cation (see also König et al., 2014). Gold, Förster, and Hol- of these situations is crucial and their representativeness for
odynski (2013) showed that it is possible to train perception the universe of tasks has to be ensured (Shavelson, 2012).
abilities with respect to classroom management through Moreover, whereas reliability and (construct) validity as
guided video analysis. Correspondingly, Stürmer, Könings, classical criteria of test quality remain important, the range
and Seidel (2012) confirmed a positive effect of classes in of quality criteria has been expanded to address specific
teaching and learning on professional vision. characteristics of competence assessments such as authen-
However, selection also plays an important role in tea- ticity, fairness, transparency, consequences for student
cher education and is addressed differently. The German achievement and motivation, and cost efficiency (Kane,
teacher education system, for example, requires two com- 2013; Messick, 1995). These requirements impose chal-
prehensive examinations before a license is awarded: A first lenges for competence assessments which currently often
one after university with typical written and oral knowledge are given too limited attention.
tests and a second one on-site in schools where student
teachers have to demonstrate their teaching skills. If one
regards these exams as indicators of what is meant to con-
stitute teacher competencies, the German system combines Challenges and Issues
a dispositional and a behavioral perspective.
Finally, in the field of Vocational Education and Train- The analytic view of competence assessment focuses on
ing (VET) competence is discussed intensely. Although measuring different latent traits (cognitive, conative, affec-
many different definitions exist here as well (Biemans, Nie- tive, motivational) with different instruments. Assessing the
uwenhuis, Poell, Mulder, & Wesselink, 2004), some agree- resources one-by-one has the advantage that it identifies
ment exists with respect to core concepts (Mulder, Gulikers, specific preconditions for performing well in real life.
Biemans, & Wesselink, 2009). Competence is regarded as The approach also has the advantage of diagnostic accuracy
an integrated set of knowledge, skills, and attitudes. It is because what is measured within reasonable time and cost
regarded as a necessary condition for task performance constraints by a particular scale is a constituent of the
and for being able to function effectively in a certain situa- broader competence, thereby pinpointing particular
tion. Shavelson (2010) presented a definition of compe- strengths and limitations. Because such measures include
tence in VET from the holistic behavioral perspective; large numbers of observations, the approach often leads
included in the assessment are also probes of knowledge to high reliability. Nevertheless, serious validity concerns
and skills though. The German dual VET system again exist, most notably construct underrepresentation.

Zeitschrift für Psychologie 2015; Vol. 223(1):3–13 Ó 2015 Hogrefe Publishing

Author’s personal copy (e-offprint)
S. Blömeke et al.: Competence as Continuum 9

From the holistic view of competence (performance in assessments translates into the use of a limited range of
complex, messy real-life situations), assessments have been assessment formats, with either multiple-choice items or
developed to estimate real-life performance without account- very complex tasks dominating. It is obvious that knowl-
ing for the contribution of specific dispositional resources. edge and personality tests as well as performance assess-
Assuming the whole is greater than the sum of its parts, it is ments have important functions to fulfill in a competence
argued that assessing them one-by-one might distort the assessment. The limitation to either-or should be of concern
actual underlying traits needed for successful performance. because they each only tap into parts of the construct
The Collegiate Learning Assessment provides a holistic definition.
example, sampling tasks from newspapers and other common Using combinations of approaches, we may also be able
sources and constructing an assessment around them to tap to cover the processes mediating the transformation of dis-
critical thinking, analytic reasoning, problem solving, and positions into performance. Wass et al. (2001) demonstrated
communication (Benjamin, 2013; Shavelson, 2010). the richness of available formats in building competence
However, there are several challenges in this approach, measurements in medicine that capture different levels of
too (Kane, 1992). The first is that there is a tradeoff between proximity to real-life situations: Besides multiple-choice
testing time and the number of independent samples of and constructed-response items or performance assess-
behavior that can be collected. Performance tasks are com- ments in real life or laboratories, they suggested video-
plex and take considerably more time than selected- based assessments using representative job situations so that
response or short-answer tasks. Hence, only a limited the perception of real-life, that is unstructured situations,
sample of behavior can be collected in a given amount of can be included. Also the speed of performance which pro-
time which imposes limits on generalizability. A second vides information not available with accuracy (Stanovich,
issue is that assessment of the complex student responses 2009) has increasingly been examined with the advent of
which typically are produced in performance assessments computer-based testing. Blömeke et al. (2014) developed
introduces considerable amounts of measurement error different assessment formats to capture teacher competence
because it is harder to define and assess quality of responses in terms of different knowledge facets as well as perceptual,
in complex situations than with respect to clearly-defined interpretation, and decision-making skills as well as their
items. Yet another issue is that different components of speedy reaction to student errors. And the Comparative
extended performance tasks tend to depend on one another, Judgment procedure, based on Thurstone’s early work, rep-
thereby violating the assumption of local independence which resents an interesting implementation in assessments of
is central in most measurement models. This raises questions authentic Design and Technology tasks (Kimbell, 2006).
about how to model the item responses appropriately. In some This challenges us to make productive, integrative use
situations solutions may be found by creating testlets (Wainer, of performance assessments, traditional discrete items,
Bradlow, & Wang, 2007), but development of specialized and other innovative formats in competence measurement.
models to deal with this issue may also be needed. One potential consequence of combining formats, though,
is that when selected-response and performance tasks are
scaled together, unless specific weights are assigned to per-
Overcoming Disagreements Due to formance data, selected-response data may ‘‘swamp’’ the
Oversimplified Dichotomies signal provided by the performance tasks. Additional chal-
lenges now arise, among others is the (multi) trait-(multi)
Thus both the analytic and the holistic approaches to assess- method issue of distinguishing constructs from methods
ment are afflicted by issues of validity and reliability. These (Campbell & Fiske, 1959). Do differences between the
issues need attention in further work on modeling and mea- results of analytic and holistic instruments reflect differ-
suring competence. The issues space is not primarily a mat- ences in the methods used or are we talking about different
ter of dichotomy and choice between the analytical or constructs which should consequently then be labeled dif-
holistic approaches. Rather the space involves how the dif- ferently? This problem can be thought of as a sampling
ferent approaches may be developed and combined in fruit- problem, that is of defining the sampling frame for con-
ful ways to improve the reliability and validity of structing performance assessments – does the frame involve
competence assessments. This involves many conceptual sampling of assessment methods in addition to the sampling
and empirical questions, and data rather than opinion is of items, raters, and test takers?
needed to inform future measurement methods. Below we
discuss future work in three areas which we see as promis-
ing for methodological development, namely assessment Beyond Dichotomies: Essential Unidimensionality/
formats, conceptual frameworks and dimensionality, as well Multidimensionality
as modeling techniques.
One of the fundamental challenges in competence assess-
ment is to reduce a large amount of observational complex-
Beyond Dichotomies: Tapping Into a Broader Range ity into scores which maintain meaningfulness and
of Assessment Formats interpretability. To this end one of the classic principles
upon which measurement is based is the principle of unidi-
One gets the impression that the unproductive dichotomy of mensionality. This principle fundamentally states that the
dispositions (analytic) versus performance (holistic) in different components (e.g., items, tasks, ratings) of an

Ó 2015 Hogrefe Publishing Zeitschrift für Psychologie 2015; Vol. 223(1):3–13

Author’s personal copy (e-offprint)
10 S. Blömeke et al.: Competence as Continuum

assessment should reflect one and the same underlying produces interval-scale measurements and, second, it links
dimension. It should be noted that the principle of unidi- individual performance to levels of performance that can
mensionality does not imply any requirement that the dif- be exemplified by items an individual at a particular ability
ferent components should in themselves be simple; they (theta) has some (e.g., .5) probability of performing –
can, for example, be complex authentic tasks as used in per- anchoring the interpretation of the score in the items and
formance assessments (Gustafsson & Åberg-Bengtsson, not in rank order. This link between performance on items
2010). However, a strict application of the principle of uni- and scale levels is one of the main approaches for investi-
dimensionality rarely is possible in competence assess- gating the meaning and characteristics of a scale.
ments because conceptually it typically is not expected CTT, in particular GT, is in contrast useful for assessing
(e.g., the analytic approach including cognition and affect) the impact of inconsistencies due to tasks, raters, and their
and empirically it is violated by the presence of method var- combinations with persons, on the basis of which an opti-
iance and multiple-expected dimensions. Such challenges mal assessment design can be set forth. This strength is par-
have typically been met by splitting the construct into more ticularly important in the field of competence assessments
narrow sub-constructs, each of which satisfies the assump- because rater effects and temporal instability tend to be
tion of unidimensionality. While such approaches typically large in more complex studies. GT can thus be helpful in
are successful in the sense that statistical criteria of unidi- estimating the extent of measurement error as a first step
mensionality are met, the approach in itself is self-defeating and then to estimate the effects of redesigning a study by
because the construct itself is splintered into pieces using more or better trained raters or more tasks.
(Gustafsson, 2002). For example, Shavelson (2012) suggests an assessment
An alternative approach is to focus instead on ‘‘essential approach based on a criterion-sampling approach and
unidimensionality’’ which preserves the construct while shows the close link to GT – a mixed model sampling the-
allowing for additional minor dimensions and different ory of measurement. The variance of a score is split up so
sources of method variance. Models for essential unidimen- that the error variance resulting from inconsistencies
sionality can be implemented in different ways, for example between raters, task difficulty and their interactions with
with so-called bi-factor models or hierarchical measure- each other and test takers can be partialed out and only
ment models (Gustafsson & Åberg-Bengtsson, 2010; Reise, the variance of interest remains. This approach can be
2012). Such models identify a general factor hypothesized extended by taking measurement methods into account
to represent the construct but also allowing for minor because a particular competence test can be regarded as
dimensions. Given that competence dimensions may be one instrument out of a broad range of possible instruments.
assumed to be multidimensional while at the same time a However, we also believe that the psychometric theories
common underlying dimension is expected, this approach can and should be used in combination, as is sometimes
may be particularly useful in developing and understanding done. For example, GT provides an initial step in that once
competence assessments. An extended version of this reliable scores are produced, they can be IRT scaled with a
approach is Multidimensional Item Response Theory number of different approaches such as partial credit or
(MIRT) which is able to model several latent traits simulta- rater models. Vice versa, generalized linear mixed models
neously and thus provides a promising approach to compe- and generalized latent variable modeling (e.g., Muthén,
tence assessments. However, it may be argued that the ideas 2002; Skrondal & Rabe-Hesketh, 2004) provide ways to
of essential unidimensionality or multidimensionality still analyze typical ‘‘GT questions’’ by explicitly stating
do not solve the fundamental dimensionality issue, because hypotheses and testing statistical models, and by offering
there are limits to how far these approaches may be flexible frameworks in which to deal with measurements
stretched. from virtually any assessment format, data structures, and
a multitude of fixed or random effects (e.g., time, rater,
Beyond Dichotomies: Psychometric Pluralism

Way too long CTT and IRT have been regarded as another
allegedly incompatible dichotomy. We see a continuum Particular Applications of Interesting
from linear CTT models to nonlinear IRT models and Assessment Approaches
beyond. Each theory has something to contribute to our
understanding of competence measurement with respect Combinations of GT and IRT have been successfully
to item/task functioning, scalability, reliability, and validity applied and their usefulness demonstrated. Raudenbush,
of assessment scores. Of course, different models have been Martinez, Bloom, Zhu, and Lin (2010) integrated GT and
developed to solve different problems so models should be IRT in the assessment of group-level quality measures.
carefully selected to suit the particular problem at hand. Characteristics assumed to influence competence develop-
For example, IRT is useful for forming scales, examin- ment such as classroom quality or opportunities to learn
ing the dimensionality of competence (as pointed out can be measured then in a reliable and valid way. Based
above), estimating persons’ scores, and typifying levels of on quantifying various sources of error, for example rater
competence to provide criterion-referenced interpretation. inconsistency, temporal instability and item inconsistencies,
IRT makes two important contributions, especially within Raudenbush et al. (2010) developed a six-step paradigm
the context of criterion-referenced testing: First, IRT that systematically integrates GT and IRT for the design

Zeitschrift für Psychologie 2015; Vol. 223(1):3–13 Ó 2015 Hogrefe Publishing

Author’s personal copy (e-offprint)
S. Blömeke et al.: Competence as Continuum 11

of measurements of social settings that minimizes measure- Finally, we would like to point out that the proposed
ment error and thus maximizes statistical power. framework (Figure 1) is not only heuristic in suggesting
We also encourage use of specialized models to multiple new research studies but also in viewing it as a
approach specific research questions, such as, for example, ‘‘grand’’ structural model. The analytic (latent trait) side
the stability-change issue of competence. Performance can of the model (left-side of Figure 1) includes indicators for
be regarded as an interaction of competence (latent abilities cognitive, affective, and motivational traits demanded in
and dispositions) and situation. A person has to integrate particular contexts/situations. Such competencies are struc-
several cognitive and motivational resources in order to turally related to real-world performance (right-side)
master situational demands. Latent State-Trait Theory through a set of perceptual, interpretive, and decision-mak-
(LST) has been developed to deal with this challenge. ing processes (middle). Research on competence measure-
LST is methodologically similar to the (multi) trait-(multi) ment, then, might work on various parts of the model and
method. It emphasizes that besides the person’s characteris- even attempt to test the entire model conceptually and
tics also effects of the situation and the interaction of person statistically.
and situation contribute to the variance of a variable (Steyer, Viewing competence as a continuum and applying a
Schmitt, & Eid, 1999). Situational aspects can be distin- corresponding range of assessment formats required by
guished into systematic variation of the context such as the framework is conceptually and methodologically chal-
teaching different classes and into similar contexts but dif- lenging. But we believe that solutions exist or can be devel-
ferential situational reactions due to working memory or oped to deal with these challenges and we tried to sketch
exhaustion (for more details see Eid & Diener, 1999; out possible approaches to trustworthy competence assess-
Jenßen et al., 2015). ments that overcome the risk of forgetting either observable
behavior or cognitive abilities. If our reasoning holds, it
opens up for a great range of research questions.
With the proposed integrated approach and the improve-
ment of measurement of competence, the field of higher
Beyond Dichotomies education will be in a position to address important, sub-
stantive questions. For example, we should be able to exam-
This paper tried to tidy up the ‘‘messy construct,’’ compe- ine the developmental trajectories of competence, identify
tence, that has been plagued by misleading dichotomies groups of students with differential developmental patterns,
(e.g., analytic vs. holistic, IRT vs. GT, trait vs. behavior). and determine effective educational strategies for develop-
We did not expect to find ‘‘the’’ one definition and statisti- ment. We should be able to go beyond immediate measure-
cal model for competence assessment. Rather by systemat- ment of behavior in situ to longer-term measurements of
ically sketching conceptual and statistical controversies and life outcomes beyond earning and including health, family,
assessment approaches we attempted to clarify the construct and civic and social engagement. We should also be able to
and its measurement. study the interaction of perception, interpretation, and deci-
We unpacked competing competence definitions (ana- sion-making in the education and training of students for
lytic/traits vs. holistic/real-world performance) and identi- particular life outcomes.
fied commonplaces. This led to the construction of a Higher education is certainly a field with huge research
framework for moving beyond dichotomies to show how gaps. By providing this overview and by editing this special
the analytic versus holistic approaches complemented one ZfP issue, we hope to inspire and encourage many col-
another (Figure 1). The measurement of competence, then, leagues to look into this field and to take up the challenge
may be viewed along a continuum from traits (cognitive, of what it means to define and assess competence acquired
affective, motivational) that underlie the perception, inter- in higher education.
pretation, and decision-making that give rise to observed
behavior in a particular real-world situation. Dichotomies
arise because one position looks at only one part of the con- References
tinuum (e.g., underlying traits) while another position looks
at a different part (behavior in criterion situation). We hope Arthur, W., Day, E. A., McNelly, T. L., & Edens, P. S. (2003).
that the proposed integrated perspective moves us beyond A meta-analysis of the criterion-related validity of assess-
dichotomies. ment center dimensions. Personnel Psychology, 56,
We unpacked competing statistical approaches to mod- Bakker, A. B. (2011). An evidence-based model of work
eling competence-assessment scores, namely IRT (latent engagement. Current Directions in Psychological Science,
trait) versus GT (sampling error variance). Once again we 20, 265–269.
viewed these models not as dichotomies but as arraying Baltes, P. B., Reese, H. W., & Lipsitt, L. P. (1980). Life-span
along a continuum of linear to nonlinear models. Rather developmental psychology. Annual Review of Psychology,
than competing, the various statistical models serve differ- 31, 65–110.
Benjamin, R. (2013). The principles and logic of competency
ent purposes. IRT models may be used for scaling item testing in higher education. In S. Blömeke, O. Zlatkin-
responses and modeling structural relations and GT models Troitschanskaia, C. Kuhn, & J. Fege (Eds.), Modeling and
for pinpointing sources of measurement error variance and measuring competencies in higher education: Tasks and
thereby enabling the design of reliable measurements. challenges (pp. 127–136). Boston, MA: Sense.

Ó 2015 Hogrefe Publishing Zeitschrift für Psychologie 2015; Vol. 223(1):3–13

Author’s personal copy (e-offprint)
12 S. Blömeke et al.: Competence as Continuum

Bennett, Y. (1993). The validity and reliability of assessments Gold, B., Förster, St., & Holodynski, M. (2013). Evaluation
and self-assessments of workbased learning. Assessment & eines videobasierten Trainingsseminars zur Förderung der
Evaluation in Higher Education, 18, 83–94. professionallen Wahrnehmung von Klassenführung im
Berry, C. M., Clark, M. A., & McClure, T. (2011). Black-white Grundschulunterricht [Evaluation of a video-based training
differences in the criterion-related validity of cognitive program to enhance professional perception of classroom
ability tests: A qualitative and quantitative review. Journal of leadership in primary school education]. Zeitschrift für
Applied Psychology, 96, 881–906. Pädagogische Psychologie, 27, 141–155.
Biemans, H., Nieuwenhuis, L., Poell, R., Mulder, M., & Grant, G., Elbow, P., & Ewens, T. (1979). On competence:
Wesselink, R. (2004). Competence-based VET in The A critical analysis of competence-based reforms in higher
Netherlands: Backgrounds and pitfalls. Journal of Voca- education. San Francisco, CA: Jossey-Bass.
tional Education and Training, 56, 523–538. Gustafsson, J.-E. (2002). Measurement from a hierarchical point
Birenbaum, M. (2007). Evaluating the assessment: Sources of of view. In H. I. Braun, D. N. Jackson, & D. E. Wiley (Eds.),
evidence for quality assurance. Studies in Educational The role of constructs in psychological and educational
Evaluation, 33, 29–49. measurement (pp. 73–95). London, UK: Erlbaum.
Blömeke, S., Busse, A., Suhl, U., Kaiser, G., Benthien, J., Gustafsson, J.-E., & Åberg-Bengtsson, L. (2010). Unidimen-
Döhrmann, M., & König, J. (2014). Entwicklung von sionality and interpretability of psychological instruments.
Lehrpersonen in den ersten Berufsjahren: Längsschnittliche In S. E. I Embretson (Ed.), Measuring psychological
Vorhersage von Unterrichtswahrnehmung und Lehrerreak- constructs: Advances in model-based approaches.
tionen durch Ausbildungsergebnisse [Teacher development Washington, DC: American Psychological Association.
during the initial years in the profession: Longitudinal Hakanen, J. J., & Schaufeli, W. B. (2012). Do burnout and work
prediction of lesson perception and teacher reactions engagement predict depressive symptoms and life satisfac-
by means of training results]. Zeitschrift für Erziehungswis- tion? A three-wave seven-year prospective study. Journal of
senschaft, 17, 509–542. Affective Disorders, 141, 415–424.
Blömeke, S., Gustafsson, J.-E., & Shavelson, R. (2013). Call for Handley, D. (2003). Assessment of competencies in England’s
papers: Assessment of competencies in higher education – National Vocational Qualification system. In G. Straka
a topical issue of the Zeitschrift für Psychologie. Zeitschrift (Ed.), Zertifizierung non-formell und informell erworbener
für Psychologie, 221, 202. beruflicher Kompetenzen. Münster, Germany: Waxmann.
Blömeke, S., Zlatkin-Troitschanskaia, O., Kuhn, Ch., & Fege, J. Jenßen, L., Dunekacke, S., Eid, M., & Blömeke, S. (2015). The
(Eds.). (2013). Modeling and measuring competencies in relationship of mathematical competence and mathematics
higher education: Tasks and challenges. Rotterdam, anxiety: An application of latent state-trait theory. Zeitschrift
The Netherlands: Sense. für Psychologie, 223, 31–38. doi: 10.1027/2151-2604/a000197
Boyatzis, R. E. (1982). The competent manager. New York, NY: Judge, T. A., Thorensen, C. J., Bono, J. E., & Patton, G. K.
Wiley. (2001). The job satisfaction-job performance relationship:
Brennan, R. L. (2001). Generalizability theory. New York, NY: A qualitative and quantitative review. Psychological
Springer. Bulletin, 127, 376–407.
Brief, A. P., & Weiss, H. M. (2001). Organizational behavior: Kane, M. (1992). An argument-based approach to validation.
Affect in the workplace. Annual Review of Psychology, 53, Psychological Bulletin, 112, 527–535.
279–307. Kane, M. T. (2013). Validating the interpretations and uses of
Campbell, D. T., & Fiske, D. W. (1959). Convergent and test scores. Journal of Educational Measurement, 50, 1–73.
discriminant validation by the multitrait-multimethod Kimbell, R. A. (2006). Innovative technological performance. In
matrix. Psychological Bulletin, 56, 81–105. J. Dakers (Ed.), Defining technological literacy: Towards an
Cattell, R. B. (1971). Abilities: Their structure, growth, and epistemological framework (pp. 159–179). Basingstoke, UK:
action. New York, NY: Houghton Mifflin. Palgrave.
Corno, L., Cronbach, L. J., Kupermintz, H., Lohman, D. F., König, J., Blömeke, S., Klein, P., Suhl, U., Busse, A., &
Mandinach, E. B., Porteus, A. W., . . . Talbert, J. E. (2002). Kaiser, G. (2014). Is teachers’ general pedagogical knowl-
Remaking the concept of aptitude: Extending the legacy of edge a premise for noticing and interpreting classroom
R. E. Snow. Mahwah, NJ: Erlbaum. situations? A video-based assessment approach. Teaching
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. and Teacher Education, 38, 76–88.
(1972). The dependability of behavioral measurements: Koeppen, K., Hartig, J., Klieme, E., & Leutner, D. (2008).
Theory of generalizability for scores and profiles. Current issues in competence modeling and assessment.
New York, NY: Wiley. Zeitschrift für Psychologie, 216, 61–73. doi: 10.1027/0044-
Eid, M., & Diener, E. (1999). Intraindividual variability in 3409.216.2.61
affect: Reliability, validity, and personality correlates. Kounin, J. S. (1970). Discipline and group management in
Journal of Personality and Social Psychology, 76, 662–676. classrooms. New York, NY: Holt, Rinehart, & Winston.
Epstein, R. M., & Hundert, E. M. (2002). Defining and Lau, S., & Roeser, R. W. (2002). Cognitive abilities and
assessing professional competence. JAMA, 287, 226–235. motivational processes in high school students’ situational
Eraut, M. (2003). National vocational qualifications in England: engagement and achievement in science. Educational
Description and analysis of an alternative qualification Assessment, 8, 139–162.
system. In G. Straka (Ed.), Zertifizierung non-formell und McClelland, D. C. (1973). Testing for competence rather than
informell erworbener beruflicher Kompetenzen. Münster, testing for ‘‘intelligence’’. American Psychologist, 28, 1–14.
Germany: Waxmann. McLachlan, G., & Peel, D. A. (2000). Finite mixture models.
Förster, M., Zlatkin-Troitschanskaia, O., Brückner, S., Happ, R., New York, NY: Wiley.
Hambleton, R., Walstad, W. B., . . . Yamaoka, M. (2015). McMullan, M., Endacott, R., Gray, M. A., Jasper, M., Miller,
Validating test score interpretations by cross-national com- C. M. L., Scholes, J., & Webb, C. (2003). Portfolios and
parison: Comparing the results of students from Japan and assessment of competence: A review of the literature.
Germany on an American test of economic knowledge in Journal of Advanced Nursing, 41, 283–294.
higher education. Zeitschrift für Psychologie, 223, 14–23. Messick, S. (1995). Validity of psychological assessment:
doi: 10.1027/2151-2604/a000195 Validation of inferences from persons’ responses and

Zeitschrift für Psychologie 2015; Vol. 223(1):3–13 Ó 2015 Hogrefe Publishing

Author’s personal copy (e-offprint)
S. Blömeke et al.: Competence as Continuum 13

performances as scientific inquiry into score meaning. Snow, R. E. (1994). Abilities in academic tasks. In R. J.
American Psychologist, 50, 741–749. Sternberg & R. K. Wagner (Eds.), Mind in context: Interactionist
Miller, G. E. (1990). The assessment of clinical skills/ perspectives on human intelligence. New York, NY: Cambridge
competence/performance. Academic Medicine: Journal of University Press.
the Association of American Medical Colleges, 65, 63–67. Sparrow, P. R., & Bognanno, M. (1993). Competency require-
Mulder, M., Gulikers, J., Biemans, H., & Wesselink, R. (2009). ment forecasting: Issues for international selection and
The new competence concept in higher education: Error or assessment. International Journal of Selection and Assess-
enrichment? Journal of European Industrial Training, 33, ment, 1, 50–58.
755–770. Spencer, L. M. Jr., & Spencer, S. M. (1993). Competence at work:
Muthén, B. O. (2002). Beyond SEM: General latent variable Models for superior performance. New York, NY: Wiley.
modeling. Behaviormetrika, 29, 81–117. Stanovich, K. E. (2009). What intelligence tests miss: The
Oser, F. (2013). ‘‘I know how to do it, but I can’t do it’’: psychology of rational thought. New Haven, CT: Yale
Modeling competence profiles for future teachers and University Press.
trainers. In S. Blömeke, O. Zlatkin-Troitschanskaia, Sternberg, R. J. & Grigorenko, E. L. (Eds.). (2003). The
C. Kuhn, & J. Fege (Eds.), Modeling and measuring psychology of abilities, competencies, and expertise. Cam-
competencies in higher education: Tasks and challenges bridge, MA: Cambridge University Press.
(pp. 45–60). Rotterdam, The Netherlands: Sense. Steyer, R., Schmitt, M., & Eid, M. (1999). Latent state-trait
Raudenbush, S. W., Martinez, A., Bloom, H., Zhu, P., & Lin, F. theory and research in personality and individual differ-
(2010). Studying the reliability of group-level measures with ences. European Journal of Personality, 13, 389–408.
implications for statistical power: A six-step paradigm Stürmer, K., Könings, K. D., & Seidel, T. (2012). Declarative
[Working Paper]. Chicago, IL: University of Chicago. knowledge and professional vision in teacher education:
Reise, S. P. (2012). The rediscovery of bifactor measurement Effect of courses in teaching and learning. British Journal of
models. Multivariate Behavioral Research, 47, 667–696. Educational Psychology, 83, 467–483.
Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response
A nonlinear mixed model framework for item response theory and its applications. Cambridge, UK: Cambridge
theory. Psychological Methods, 8, 185–205. University Press.
Sadler, R. (2013). Making competent judgments of competence. Wass, V., Van der Vluten, C., Shatzer, J., & Jones, R. (2001).
In S. Blömeke, O. Zlatkin-Troitschanskaia, C. Kuhn, & Assessment of clinical competence. Lancet, 357,
J. Fege (Eds.), Modeling and measuring competencies in 945–949.
higher education: Tasks and challenges (pp. 13–27). Weinert, F. E. (2001). Concept of competence: A conceptual
Rotterdam, The Netherlands: Sense. clarification. In D. S. Rychen & L. H. Salganik (Eds.),
Schoenfeld, A. H. (2010). How we think: A theory of goal- Defining and selecting key competencies (pp. 45–66).
oriented decision making and its educational applications. Göttingen, Germany: Hogrefe.
New York, NY: Routledge.
Shavelson, R. J. (2010). On the measurement of competency.
Empirical Research in Vocational Education and Training,
1, 43–65.
Shavelson, R. J. (2012). An approach to testing and modeling Sigrid Blömeke
competencies. In S. Blömeke, O. Zlatkin-Troitschanskaia,
C. Kuhn, & J. Fege (Eds.), Modeling and measuring University of Oslo
competencies in higher education: Tasks and challenges. Faculty of Education
Rotterdam, The Netherlands: Sense. Centre for Educational Measurement (CEMO)
Shavelson, R. J., & Webb, N. M. (1991). Generalizability Niels Henrik Abels hus
theory: A primer. Newbury Park, CA: Sage. Moltke Moes vei 35
Shulman, L. (1987). Knowledge and teaching: Foundations of 0318 Oslo
the new reform. Harvard Educational Review, 57, 1–22. Norway
Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent Tel. +47 464 18755
variable modeling: Multilevel longitudinal, and structural E-mail sigribl@cemo.uio.no
equation models. London, UK: Chapman and Hall/CRC.

Ó 2015 Hogrefe Publishing Zeitschrift für Psychologie 2015; Vol. 223(1):3–13

View publication stats