Академический Документы
Профессиональный Документы
Культура Документы
in Higher Education
Gordon Joughin
Editor
Assessment, Learning
and Judgement in Higher
Education
13
Editor
Gordon Joughin
University of Wollongong
Centre for Educational
Development & Interactive
Resources (CEDIR)
Wollongong NSW 2522
Australia
gordonj@ouw.edu.au
9 8 7 6 5 4 3 2 1
springer.com
In memory of Peter Knight
Preface
1
Carless, D., Joughin, G., Liu, N-F., & Associates. (2006). How assessment supports learning:
Learning-oriented assessment in action. Hong Kong: Hong Kong University Press.
2
Leung, P. & Berry, R. (Eds). (2007). Learning-oriented assessment: Useful practices. Hong
Kong: Hong Kong University Press. (Published in Chinese)
vii
viii Preface
higher education. The initial outcome of this was a Special Issue of the journal,
Assessment and Evaluation in Higher Education, one of the leading scholarly
publications in this field (Vol 31, Issue 4: ‘‘Learning-oriented assessment:
Principles and practice’’).
In the final phase of the project, eight experts on assessment and learning in
higher education visited Hong Kong to deliver a series of invited lectures at The
Hong Kong Institute of Education. These experts, Tim Riordan from the USA,
David Boud and Royce Sadler from Australia, Filip Dochy from Belgium, and
Jude Carroll, Kathryn Ecclestone, Ranald Macdonald and Peter Knight from
the UK, also agreed to contribute a chapter to this book, following the themes
established in their lectures. Georgine Loacker subsequently joined Tim
Riordan as a co-author of his chapter, while Linda Suskie agreed to provide
an additional contribution from the USA. This phase of the project saw its
scope expand to include innovative thinking about the nature of judgement in
assessment, a theme particularly addressed in the contributions of Boud, Sadler
and Knight.
The sudden untimely death of Peter Knight in April 2007 was a great shock
to his many colleagues and friends around the world and a great loss to all
concerned with the improvement of assessment in higher education. Peter had
been a prolific and stimulating writer and a generous colleague. His colleague
and sometime collaborator, Mantz Yorke, generously agreed to provide the
chapter which Peter would have written. Mantz’s chapter appropriately draws
strongly on Peter’s work and effectively conveys much of the spirit of Peter’s
thinking, while providing Mantz’s own unique perspective.
My former colleagues at the Hong Kong Institute of Education, David
Carless and Paul Morris, the then President of the Institute, while not directly
involved in the development of this book, provided the fertile ground for its
development through their initial leadership of LOAP. I am also indebted to
Julie Joughin who patiently completed the original preparation and formatting
of the manuscript.
ix
x Contents
xi
xii Contributors
Fellow at Otago University, New Zealand in 2007. Ranald’s current research and
development interests include scholarship and identity in academic development,
enquiry-focused learning and teaching, assessment and plagiarism, and the
nature of change in complex organisations. He is a keen orienteer, gardener
and traveller, and combines these with work as much as possible!
Tim Riordan is Associate Vice President for Academic Affairs and Professor
of Philosophy at Alverno College. He has been at Alverno since 1976 and in
addition to his teaching has been heavily involved in developing programs and
processes for teaching improvement and curriculum development. In addition
to his work at the college, he has participated in initiatives on the scholarship
of teaching, including the American Association for Higher Education Forum
on Exemplary Teaching and the Association of American Colleges and
Universities’ Preparing Future Faculty Project. He has regularly presented at
national and international conferences, consulted with a wide variety of
institutions, and written extensively on teaching and learning. He is co-editor
of the 2004 Stylus publication, Disciplines as Frameworks for Student Learning:
Teaching the Practice of the Disciplines. He was named the Marquette
University School of Education Alumnus of the Year in 2002, and he received
the 2001 Virginia B. Smith Leadership Award sponsored by the Council for
Adult and Experiential Learning and the National Center for Public Policy and
Higher Education.
D. Royce Sadler is a Professor of Higher Education at the Griffith Institute for
Higher Education, Griffith University in Brisbane, Australia. He researches the
assessment of student achievement, particularly in higher education. Specific
interests include assessment theory, methodology and policy; university grading
policies and practice; formative assessment; academic achievement standards
and standards-referenced assessment; testing and measurement; and assessment
ethics. His research on formative assessment and the nature of criteria and
achievement standards is widely cited. He engages in consultancies for
Australian and other universities on assessment and related issues. A member
of the Editorial Advisory Boards for the journals Assessment in Education, and
Assessment and Evaluation in Higher Education, he also reviews manuscripts on
assessment for other journals.
Linda Suskie is a Vice President at the Middle States Commission on Higher
Education, an accreditor of colleges and universities in the mid-Atlantic region
of the United States. Prior positions include serving as Director of the American
Association for Higher Education’s Assessment Forum. Her over 30 years of
experience in college and university administration include work in assessment,
institutional research, strategic planning, and quality management.
Ms. Suskie is an internationally recognized speaker, writer, and consultant
on a broad variety of higher education assessment topics. Her latest book is
Assessment of Student Learning: A Common Sense Guide (Jossey-Bass Anker
Series, 2004).
xiv Contributors
Gordon Joughin
G. Joughin
Centre for Educational Development and Interactive Resources,
University of Wollongong, NSW 2522, Australia
e-mail: gordonj@uow.edu.au
1
Brown, Bull and Pendlebury (1997), for example, list 17.
or discipline for which students are being prepared. Each of these is important,
with each having particular imperatives and each giving rise to particular issues
of conceptualisation and implementation.
We talk and write about assessment because we seek change, yet the process
of change is rarely addressed in work on assessment. Projects to improve
assessment in universities will typically include references to dissemination
or embedding of project outcomes, yet real improvements are often limited
to the work of enthusiastic individuals or to pioneering courses with ener-
getic and highly motivated staff. Examples of large scale successful initia-
tives in assessment are difficult to find. On the other hand, the exemplary
assessment processes of Alverno College continue to inspire (and senior
Alverno staff have contributed to this book) and the developing conceptua-
lisation of universities as complex adaptive systems promises new ways of
understanding and bringing about realistic change across a university
system.
Assessment is far from being a technical matter, and the improvement of
assessment requires more than a redevelopment of curricula. Assessment occurs
in the context of complex intra- and interpersonal factors, including teachers’
conceptions of and approaches to teaching; students’ conceptions of and
approaches to learning; students’ and teachers’ past experiences of assessment;
and the varying conceptions of assessment held by teachers and students
alike. Assessment is also impacted by students’ varying motives for entering
higher education and the varying demands and priorities of teachers who
typically juggle research, administration and teaching within increasingly
demanding workloads. Pervading assessment are issues of power and com-
plex social relationships.
1 Introduction: Refocusing Assessment 5
Re-Focusing Assessment
This book does not propose a radical re-thinking of assessment, but it does call
for a re-focusing on educationally critical aspects of assessment in response to
the problematic issues raised across these themes by the contributors to this
book. The following propositions which emerge from the book’s various chap-
ters illustrate the nature and scope of this refocusing task:
Frequently cited research linking assessment and learning is often misunder-
stood and may not provide the sure foundations for action that are fre-
quently claimed.
Assessment is principally a matter of judgement, not measurement.
The locus of assessment should lie beyond the course and in the world of
practice.
Criteria-based approaches can distort judgement; the use of holistic
approaches to expert judgement needs to be reconsidered.
Students need to become responsible for more aspects of their own assess-
ment, from evaluating and improving assignments to making claims for their
acquisition of complex generic skills across their whole degree.
Traditional standards of validity, reliability and fairness break down in the
context of new modes of assessment that support learning; new approaches
to quality standards for assessment are required.
Plagiarism is a learning issue; assessment tasks that can be completed without
active engagement in learning cannot demonstrate that learning has occurred.
Students entering higher education come from distinctive and powerful
learning and assessment cultures in their prior schools and colleges. These
need to be acknowledged and accommodated as students are introduced to
the new learning and assessment cultures of higher education.
Universities are complex adaptive systems and need to be understood as such
if change is to be effective. Moreover, large scale change depends on gen-
erating consensus on principles rather than prescribing specific practices.
The themes noted above of assessment, learning, judgement, and the imperatives
for change associated with them, are addressed in different ways throughout the
book, sometimes directly and sometimes through related issues. Moreover,
these themes are closely intertwined, so that, while some chapters of this book
focus on a single theme, many chapters address several of them, often in quite
complex ways.
In Chapter 2, Joughin provides a short review of four matters that are at the
heart of our understanding of assessment, learning and judgement and which
for various reasons are seen as highly problematic. The first of these concerns
how assessment is often defined in ways that either under-represent or conflate
6 G. Joughin
students to develop a capacity to judge the quality of their work similar to that
of their teacher, using this capacity as a basis to improve their work while it is
under development. Sadler’s support for this parallels Boud’s argument regard-
ing the need for practice to inform assessment, namely that this attribute is
essential for students to perform adequately in the world of their future work.
Sadler builds on his earlier argument that this entails students developing a
conception of quality as a generalized attribute, since this is how experts,
including experienced teachers, make judgements involving multiple criteria.
Consequently he argues here that students need to learn to judge work holi-
stically rather than analytically. This argument is based in part on the limita-
tions of analytic approaches, in part on the nature of expert judgement, and it
certainly requires students to see their work as an integrated whole. Fortunately
Sadler not only argues the case for students’ learning to make holistic judge-
ments but presents a detailed proposal for how this can be facilitated.
Yorke’s chapter, Faulty Signals? Inadequacies of Grading Systems and a
Possible Response, is a provocative critique of one of the bases of assessment
practices around the world, namely the use of grades to summarise student
achievement in assessment tasks. Yorke raises important questions about the
nature of judgement involved in assessment, the nature of the inferences made
on the basis of this judgement, the distortion which occurs when these infer-
ences come to be expressed in terms of grades, and the limited capacity of grades
to convey to others the complexity of student achievement.
Yorke points to a number of problems with grades, including the basing of
grades on students’ performance in relation to a selective sampling of the
curriculum, the wide disparity across disciplines and universities in types of
grading scales and how they are used, the often marked differences in distribu-
tions of grades under criterion- and norm-referenced regimes, and the ques-
tionable legitimacy of combining grades derived from disparate types of
assessment and where marks are wrongly treated as lying on an interval scale.
Grading becomes even more problematic when what Knight (2007a, b) termed
wicked competencies are involved – when real-life problems are the focus of
assessment and generic abilities or broad-based learning outcomes are being
demonstrated. Yorke proposes a radical solution – making students responsible
for preparing a claim for their award based on evidence of their achievements
from a variety of sources, including achievement in various curricular compo-
nents, work placements and other learning experiences. Such a process would
inevitably make students active players in learning to judge their own achieve-
ments and regulate their own learning accordingly.
When assessment practices are designed to promote learning, certain
consequences ensue. Dochy, Carroll and Suskie examine three of these.
Dochy, in The Edumetric Quality of New Modes of Assessment: Some Issues
and Prospects, notes that when assessment focuses on learning and moves away
from a measurement paradigm, the criteria used to evaluate the quality of the
assessment practices need to change. Carroll links learning directly to the need
to design assessment that will deter plagiarism. Suskie describes how the careful
8 G. Joughin
References
American Educational Research Association, the American Psychological Association, & the
National Council on Measurement in Education (1999). Standards for Educational and
Psychological Testing. Washington DC: American Educational Research Association.
Angelo, T. (1999). Doing assessment as if learning matters most. AAHE Bulletin, 51(9), 3–6.
Boud, D. (2007). Reframing assessment as if learning were important. In D. Boud &
N. Falchikov (Eds), Rethinking assessment for higher education: Learning for the longer
term (pp. 14–25). London: Routledge.
Brown, G., Bull, J., & Pendlebury, M. (1997). Assessing student learning in higher education.
London: Routledge.
Carless, D., Joughin, G., Liu, N.F., & Associates (2006). How assessment supports learning:
Learning-oriented assessment in action. Hong Kong: Hong Kong University Press.
Gibbs, G., & Simpson, C. (2004–5). Conditions under which assessment supports students’
learning. Learning and Teaching in Higher Education, 1, 3–31.
1 Introduction: Refocusing Assessment 11
Gordon Joughin
Introduction
G. Joughin
Centre for Educational Development and Interactive Resources,
University of Wollongong, NSW 2522, Australia
e-mail: gordonj@uow.edu.au
feature of higher education’’ (Boud, 2006, p. xvii), while Falchikov and Boud’s
recent work on assessment and emotion has provided graphic descriptions of the
impact of prior assessment experiences on a group of teachers completing a
masters degree in adult education (Falchikov & Boud, 2007). In biographical
essays, these students reported few instances of positive experiences of assessment
in their past education, but many examples of bad experiences which had major
impacts on their learning and self-esteem. Teachers in higher education will be
affected not only by their own past experience of assessment as students, but
also by their perceptions of assessment as part of their current teaching role.
Assessment in the latter case is often associated with high marking loads, anxiety-
inducing deadlines as examiners board meetings approach, and the stress of
dealing with disappointed and sometimes irate students.
There is a second difficulty in coming to a clear understanding of assessment.
As Rowntree has noted, ‘‘it is easy to get fixated on the trappings and outcomes of
assessment – the tests and exams, the questions and marking criteria, the grades
and degree results – and lose sight of what lies at the heart of it all’’ (Rowntree,
2007, para. 6). Thus at an individual, course team, department or even university
level, we come to see assessment in terms of our own immediate contexts,
including how we view the purpose of assessment, the roles we and our students
play, institutional requirements, and particular strategies that have become part
of our taken-for-granted practice. Our understanding of assessment, and thus
how we come to define it, may be skewed by our particular contexts.
Space does not permit the exploration of this through a survey of the
literature and usage of the term across universities. Two definitions will suffice
to illustrate the issue and highlight the need to move towards a simple definition
of assessment.
Firstly, an example from a university policy document. The University of
Queensland, Australia, in its statement of assessment policy and practice,
defines assessment thus:
Assessment means work (e.g., examination, assignment, practical, performance) that a
student is required to complete for any one or a combination of the following reasons:
the fulfillment of educational purposes (for example, to motivate learning, to provide
feedback); to provide a basis for an official record of achievement or certification of
competence; and/or to permit grading of the student. (The University of Queensland,
2007)
Here we might note that (a) assessment is equated with the student’s work; (b) there
is no reference to the role of the assessor or to what is done in relation to the work;
and (c) the various purposes of assessment are incorporated into its definition.
Secondly, a definition from a frequently cited text. Rowntree defines assess-
ment thus:
... assessment can be thought of as occurring whenever one person, in some kind of
interaction, direct or indirect, with another, is conscious of obtaining and interpreting
information about the knowledge and understanding, or abilities and attitudes of that
other person. To some extent or other it is an attempt to know that person. (Rowntree,
1987, p. 4)
2 Assessment, Learning and Judgement in Higher Education 15
In this instance we might note the reference to ‘‘interaction’’ with its implied
reciprocity, and that assessment is depicted as the act of one person in relation
to another, thereby unintentionally excluding self-assessment and the student’s
monitoring of the quality of his or her own work.
Both of these definitions have been developed thoughtfully, reflect impor-
tant aspects of the assessment process, and have been rightly influential in their
respective contexts of institutional policy and international scholarship. Each,
however, is problematic as a general definition of assessment, either by going
beyond the meaning of assessment per se, or by not going far enough to specify
the nature of the act of assessment. Like many, perhaps most, definitions of
assessment, they incorporate or omit elements in a way that reflects particular
contextual perspectives, giving rise to the following question: Is it possible to
posit a definition of assessment that encapsulates the essential components
of assessment without introducing superfluous or ancillary concepts? The
remainder of this section attempts to do this.
One difficulty with assessment as a term in educational contexts is that its
usage often departs from how the term is understood in everyday usage.
The Oxford English Dictionary (2002-) is instructive here through its location
of educational assessment within the broader usage of the term. Thus it
defines ‘‘to assess’’ as ‘‘to evaluate (a person or thing); to estimate (the quality,
value, or extent of), to gauge or judge’’ and it defines assessment in education
as ‘‘the process or means of evaluating academic work’’. These two definitions
neatly encompass two principal models of assessment: where assessment is
conceived of quantitatively in terms of ‘‘gauging’’ the ‘‘extent of’’ learning,
assessment follows a measurement model; where it is construed in terms of
‘‘evaluation’’, ‘‘quality’’, and ‘‘judgement’’, it follows a judgement model.
Hager and Butler (1996) have explicated the distinctions between these para-
digms very clearly (see also Boud in Chapter 3). They describe a scientific
measurement model in which knowledge is seen as objective and context-free
and in which assessment tests well-established knowledge that stands
apart from practice. In this measurement model, assessment utilises closed
problems with definite answers. In contrast, the judgement model integrates
theory and practice, sees knowledge as provisional, subjective and context-
dependent, and uses practice-like assessment which includes open problems
with indefinite answers. Knight (2007) more recently highlighted the impor-
tance of the distinction between measurement and judgement by pointing out
the common mistake of applying measurement to achievements that are not,
in an epistemological sense, measurable and noting that different kinds of
judgement are required once we move beyond the simplest forms of knowl-
edge. Boud (2007) has taken the further step of arguing for assessment that not
merely applies judgement to students’ work but serves actively to inform
students’ own judgement of their work, a skill seen to be essential in their
future practice.
Assessment as judgement therefore seems to be at the core of assessment, and
its immediate object is a student’s work. However, one further step seems
16 G. Joughin
needed. Is assessment merely about particular pieces of work or does the object
of assessment go beyond the work? Two other definitions are instructive.
Firstly, in the highly influential work of the Committee on the Foundations
of Assessment, Knowing What Students Know: The Science and Design of
Educational Assessment, assessment is defined as ‘‘a process by which educators
use students’ responses to specially created or naturally occurring stimuli to
draw inferences about the students’ knowledge and skills’’ (Committee on the
Foundations of Assessment, 2001, p. 20).
Secondly, Sadler (in a private communication) has incorporated these
elements in a simple, three-stage definition: ‘‘The act of assessment consists of
appraising the quality of what students have done in response to a set task so
that we can infer what students can do1, from which we can draw an inference
about what students know.’’
From these definitions, the irreducible core of assessment can be limited to
(a) students’ work, (b) judgements about the quality of this work, and
(c) inferences drawn from this about what students know. Judgement and
inference are thus at the core of assessment, leading to this simple definition:
To assess is to make judgements about students’ work, inferring from this what
they have the capacity to do in the assessed domain, and thus what they know,
value, or are capable of doing. This definition does not assume the purpose(s) of
assessment, who assesses, when assessment occurs or how it is done. It does,
however, provide a basis for considering these matters clearly and aids the
discussion of the relationship between assessment and learning.
1
Since what they have done is just one of many possible responses, and the task itself was also
just one of many possible tasks that could have been set.
2 Assessment, Learning and Judgement in Higher Education 17
The belief that students focus their study on what they believe will be assessed is
embedded in the literature of teaching and learning in higher education. Derek
Rowntree begins his frequently cited text on assessment in higher education by
stating that ‘‘if we wish to discover the truth about an educational system, we
must look into its assessment procedures’’ (Rowntree, 1977, p. 1). This dictum
has been echoed by some of the most prominent writers on assessment in recent
times. Ramsden, for example, states that ‘‘from our students’ point of view,
assessment always defines the actual curriculum’’ (Ramsden, 2003, p. 182);
Biggs notes that ‘‘students learn what they think they will be tested on’’
(Biggs, 2003, p. 182); Gibbs asserts that ‘‘assessment frames learning, creates
learning activity and orients all aspects of learning behaviour’’ (Gibbs, 2006,
p. 23); and Bryan and Clegg begin their work on innovative assessment in higher
education with the unqualified statement that ‘‘research of the last twenty years
provides evidence that students adopt strategic, cue-seeking tactics in relation
to assessed work’’ (Bryan & Clegg, 2006, p. 1).
Three works from the late 1960s and early 1970s are regularly cited to
support this view that assessment determines the direction of student learning:
Miller and Parlett’s Up to the Mark: A Study of the Examination Game (1974),
Snyder’s The Hidden Curriculum (1971) and Becker, Geer and Hughes’ Making
the Grade (1968; 1995). So influential has this view become, and so frequently
are these works cited to support it, that a revisiting of these studies seems
essential if we are to understand this position correctly.
Up to the Mark: A Study of the Examination Game is well known for
introducing the terms cue-conscious and cue-seeking into the higher education
vocabulary. Cue-conscious students, in the study reported in this book,
‘‘talked about a need to be perceptive and receptive to ‘cues’ sent out by
staff – things like picking up hints about exam topics, noticing which aspects
of the subject the staff favoured, noticing whether they were making a good
impression in a tutorial and so on’’ (Miller & Parlett, 1974, p. 52). Cue-seekers
took a more active approach, seeking out staff about exam questions and
discovering the interests of their oral examiners. The third group in this study,
termed cue-deaf, simply worked hard to succeed without seeking hints on
examinations. While the terms ‘‘cue-conscious’’ and ‘‘cue-seeking’’ have
entered our collective consciousness, we may be less aware of other aspects
of Miller and Parlett’s study. Firstly, the study was based on a very particular
kind of student and context – final year honours students in a single depart-
ment of a Scottish university. Secondly, the study’s sample was small – 30.
Thirdly, and perhaps most importantly, the study’s quantitative findings do
not support the conclusion that students’ learning is dominated by assess-
ment: only five of the sample were categorized as cue-seekers, 11 were cue
conscious, while just under half (14) of the sample, and therefore constituting
the largest group of students, were cue-deaf.
18 G. Joughin
One point of departure for Miller and Parlett’s study was Snyder’s equally
influential study, The Hidden Curriculum, based on first-year students at
the Massachusetts Institute of Technology and Wellesley College. Snyder
concluded that students were dominated by the desire to achieve high grades,
and that ‘‘each student figures out what is actually expected as opposed to
what is formally required’’ (Snyder, 1971, p. 9). In short, Snyder concluded
that all students in his study were cue-seekers. Given Miller and Parlett’s
finding that only some students were cue seekers while more were cue con-
scious and even more were cue deaf, Snyder’s conclusion that all students
adopted the same stance regarding assessment seems unlikely. The credibility
of this conclusion also suffers in light of what we now know about variation in
how students can view and respond to the same context, making the singular
pursuit of grades as a universal phenomenon unconvincing. Snyder’s metho-
dology is not articulated, but it does not seem to have included a search for
counter-examples.
The third classic work in this vein is Making the Grade (Becker, Geer &
Hughes, 1968; 1995), a participant observation study of students at the
University of Kansas. The authors described student life as characterized by
‘‘the grade point perspective’’ according to which ‘‘grades are the major insti-
tutionalized valuable of the campus’’ (1995, p. 55) and thus the focus of atten-
tion for almost all students. Unlike Snyder, these researchers did seek contrary
evidence, though with little success – only a small minority of students ignored
the grade point perspective and valued learning for its own sake. However, the
authors are adamant that this perspective represents an institutionalization
process characteristic of the particular university studied and that they would
not expect all campuses to be like it. Moreover, they state that ‘‘on a campus
where something other than grades was the major institutionalized form of
value, we would not expect the GPA perspective to exist at all’’ (Becker, Geer &
Hughes, 1995, p. 122).
Let us be clear ... about what we mean. We do not mean that all students invariably
undertake the actions we have described, or that students never have any other motive
than getting good grades. We particularly do not mean that the perspective is the only
possible way for students to deal with the academic side of campus life. (Becker, Geer &
Hughes, 1995, p. 121)
Yet many who have cited this study have ignored this emphatic qualification by
its authors, preferring to cite this work in support of the proposition that
students in all contexts, at all times, are driven in their academic endeavours
by assessment tasks and the desire to achieve well in them.
When Gibbs (2006, p. 23) states that ‘‘students are strategic as never before’’,
we should be aware (as Gibbs himself is) that students could be barely more
strategic than Miller and Parlett’s cue seekers, Snyder’s freshmen in hot pursuit
of grades, or Becker, Geer and Hughes’ students seeking to optimize their
GPAs. However, we must also be acutely aware that we do not know the extent
of this behaviour, the forms it may take in different contexts, or the extent to
2 Assessment, Learning and Judgement in Higher Education 19
which these findings can be applied across cultures, disciplines and the thirty or
more years since they were first articulated.
One strand of research has not simply accepted the studies cited in the
previous section but has sought to extend this research through detailed
empirical investigation. This work has been reported by Gibbs and his associ-
ates in relation to what they have termed ‘‘the conditions under which assess-
ment supports student learning’’ (Gibbs & Simpson, 2004). The Assessment
Experience Questionnaire (AEQ) developed as part of their research is
designed to measure, amongst other things, the consistency of student effort
over a semester and whether assessment serves to focus students’ attention on
particular parts of the syllabus (Gibbs & Simpson, 2003). Initial testing of the
AEQ (Dunbar-Goddett, Gibbs, Law & Rust, 2006; Gibbs, 2006) supports the
proposition that assessment does influence students’ distribution of effort and
their coverage of the syllabus, though the strength of this influence and
whether it applies to some students more than to others is unclear. One
research study, using a Chinese version of the AEQ and involving 108 educa-
tion students in Hong Kong produced contradictory findings, leaving it
unclear whether students tended to believe that assessment allowed them to
be selective in what they studied or required them to cover the entire syllabus
(Joughin, 2006). For example, while 58% of the students surveyed agreed or
strongly agreed with the statement that ‘‘it was possible to be quite strategic
about which topics you could afford not to study’’, 62% agreed or strongly
agreed with the apparently contradictory position that ‘‘you had to study the
entire syllabus to do well in the assessment’’.
While the AEQ-based research and associated studies are still embryonic and
not widely reported, they do promise to offer useful insights into how students’
patterns and foci of study develop in light of their assessment. At present,
however, findings are equivocal and do not permit us to make blanket state-
ments – the influence of assessment on students’ study patterns and foci may
well vary significantly from student to student.
learning is to change the assessment system’’ (Elton & Laurillard, 1979, p. 100).
Boud, as noted previously, took a more circumspect view, stating that ‘‘Assess-
ment activities . . . influence approaches to learning that students take’’ (Boud,
2006, p. 21, emphasis added), rather than determining such approaches. How-
ever, it remains a widespread view that the process of student learning can be
powerfully and positively influenced by assessment. A review of the evidence is
called for.
The tenor of the case in favour of assessment directing students’ learning
processes was set in an early study by Terry who compared how students
studied for essays and for ‘‘objective tests’’ (including true/false, multiple choice,
completion and simple recall tests). He found that students preparing for
objective tests tended to focus on small units of content – words, phrases and
sentences – thereby risking what he terms ‘‘the vice of shallowness or super-
ficiality’’ (Terry, 1933, p. 597). On the other hand, students preparing for essay-
based tests emphasized the importance of focusing on large units of content –
main ideas, summaries and related ideas. He noted that while some students
discriminated between test types in their preparation, others reported no
difference in how they prepared for essay and objective tests.
Meyer (1934) compared what students remembered after studying for an
essay-type test and for an examination which included multiple-choice, true/
false and completion questions, concluding that the former led to more com-
plete mastery and that other forms of testing should be used only in exceptional
circumstances. He subsequently asked these students to describe how they
studied for the type of exam they were expecting, and how this differed from
how they would have studied for a different type of exam. He concluded that
students expecting an essay exam attempted to develop a general view of the
material, while students expecting the other types of exam focused on detail.
Students certainly reported being selective in how they studied, rather than
simply following a particular way of studying regardless of assessment format
(Meyer, 1935).
The conclusion to which these two studies inevitably lead is summarized by
Meyer as follows:
The kind of test to be given, if the students know it in advance, determines in large
measure both what and how they study. The behaviour of students in this habitual way
places greater powers in the teacher’s hands than many realize. By the selection of
suitable types of tests, the teacher can cause large numbers of his students to study, to a
considerable extent at least, in the ways he deems best for a given unit of subject-matter.
Whether he interests himself in the question or not, most of his students will probably
use the methods of study which they consider best adapted to the particular types of
tests customarily employed. (Meyer, 1934, pp. 642–643)
of being able to design assessment tasks that will induce students to adopt deep
approaches to learning. Thirty years of studies have yielded equivocal results.
Early studies by Marton and Säljö (and summarized in Marton & Säljö,
1997) sought to see if students’ approaches to learning could be changed by
manipulating assessment tasks. In one study, Marton sought to induce a deep
approach by getting students to respond to questions embedded within a text
they were reading and which invited them to engage in the sort of internal
dialogue associated with a deep approach. The result was the opposite, with
students adapting to the task by responding to the questions without engaging
deeply with them. In another study, different groups of students were asked
different kinds of questions after reading the same text. The set of questions for
one group of students focused on facts and listing ideas while those for another
group focused on reasoning. The first group adopted a surface approach – the
outcome that was expected. In the second group, however, approximately half
of the sample interpreted the task as intended while the other half technified the
task, responding in a superficial way that they believed would yet meet the
requirements. Marton and Säljö’s conclusion more than 20 years after their
original experiments is a sobering one:
It is obviously quite easy to induce a surface approach and enhance the tendency to take a
reproductive attitude when learning from texts. However, when attempting to induce a
deep approach the difficulties seem quite profound.’’ (Marton & Säljö, 1997, p. 53)
More than 30 years after these experiments, Struyven, Dochy, Janssens and
Gielen (in press) have come to the same conclusion. In a quantitative study of
790 first year education students subjected to different assessment methods
(and, admittedly, different teaching methods), they concluded that ‘‘students’
approaches to learning were not deepened as expected by the student-activating
teaching/learning environment, nor by the new assessment methods such as
case based evaluation, peer and portfolio assessment’’ (pp. 9–10). They saw this
large-scale study, based in genuine teaching contexts, as confirming the experi-
mental findings of Marton and Säljö (1997) and concluded that ‘‘although it
seems relatively easy to influence the approach students adopt when learning, it
also appears very difficult’’ (p. 13).
One strand of studies appears to challenge this finding, or at least make us
think twice about it. Several comparative studies, in contexts of authentic
learning in actual subjects, have considered the proposition that different
forms of assessment can lead to different kinds of studying. Silvey (1951)
compared essay tests and objective tests, finding that students studied general
principles for the former and focused on details for the latter. Scouller (1998)
found that students were more likely to employ surface strategies when
preparing for a multiple choice test than when preparing for an assignment
essay. Tang (1992, 1994) found students applied ‘‘low level’’ strategies such as
memorization when preparing for a short answer test but both low level and
high level strategies were found amongst students preparing for the assign-
ments. Thomas and Bain (1982) compared essays and objective tests, finding
22 G. Joughin
own work. Feedback is at the centre of both of these processes and has received
considerable attention in the assessment literature over the past two decades.
There is widespread agreement that effective feedback is central to learning.
Feedback figures prominently in innumerable theories of effective teaching.
Ramsden (2003) includes appropriate assessment and feedback as one of his six
key principles of learning, noting strong research evidence that the quality of
feedback is the most salient factor in differentiating between the best and worst
courses. Using feedback (whether extrinsic or intrinsic) and reflecting on the
goals-action-feedback process are central to the frequently cited ‘‘conversa-
tional framework’’ of learning as presented by Laurillard (2002). Rowntree
(1987, p. 24) refers to feedback as ‘‘the lifeblood of learning’’. ‘‘Providing feed-
back about performance’’ constitutes one of Gagne’s equally well known con-
ditions of learning (Gagne, 1985).
Black and Wiliam, in their definitive meta-analysis of assessment and class-
room learning research (Black & William, 1998), clearly established that feed-
back can have a powerful effect on learning, though noting that this effect can
sometimes be negative and that positive effects depend on the quality of the
feedback. Importantly, they followed Sadler’s definition of feedback (Sadler,
1989), noting that feedback only serves a formative function when it indicates
how the gap between actual and desired levels of performance can be bridged
and leads to some closure of this gap. Their study suggested a number of aspects
of feedback associated with learning, including feedback that focuses on the
task and not the student. While their study focused on school level studies along
with a small number of tertiary level studies, their findings have been widely
accepted within the higher education literature.
The work of Gibbs and Simpson noted earlier is located exclusively in the
context of higher education and nominates seven conditions required for feed-
back to be effective, including its quantity and timing; its quality in focusing on
learning, being linked to the assignment criteria, and being able to be under-
stood by students; and the requirement that students take notice of the feedback
and act on it to improve their work and learning (Gibbs & Simpson, 2004).
Nicol and Macfarlane-Dick (2006) also posit seven principles of feedback,
highlighting feedback as a moderately complex learning process centred on
self-regulation. Based on the work of Sadler, Black and Wiliam and others,
their principles include encouraging teacher and peer dialogue and self-esteem,
along with the expected notions of clarifying the nature of good performance,
self-assessment, and opportunities to close the gap between current and desired
performance.
Most recently, Hounsell and his colleagues (Hounsell, McCune, Hounsell, &
Litjens, 2008) have proposed a six-step guidance and feedback loop, based on
surveys and interviews of students in first and final year bioscience courses. The
loop begins with students’ prior experiences of assessment, moves through
preliminary guidance and ongoing clarifications through feedback on perfor-
mance to supplementary support and feed-forward as enhanced understanding
is applied in subsequent work.
24 G. Joughin
If theory attests to the critical role of feedback in learning and recent work
has suggested principles for ensuring that this role is made effective, what does
empirical research tell us about actual educational practice? Four studies
suggest that the provision of feedback in higher education is problematic.
Glover and Brown (2006), in a small interview study of science students,
reported that students attended to feedback but often did not act on it, usually
because feedback was specific to the topic covered and was not relevant to
forthcoming work. The authors subsequently analysed feedback provided on
147 written assignments, noting the relative absence of ‘‘feed-forward’’ or
suggestions on how to improve and excessive attention to grammar. Chanock
(2000) highlighted a more basic problem – that students often simply do not
understand their tutors’ comments, a point supported by Ivanič, Clark and
Rimmershaw (2000) in their evocatively titled paper, ‘‘What am I supposed to
make of this? The messages conveyed to students by tutors’ written comments’’.
Higgins, Hartley and Skelton (2002), in an interview and survey study in
business and humanities, found that 82% of students surveyed agreed that
they paid close attention to feedback, while 80% disagreed with the statement
that ‘‘Feedback comments are not that useful’’, though this study did not report
if students actually utilized feedback in further work. Like the students in
Hyland’s study (Hyland, 2000) these students may have appreciated the feed-
back they received without actually using it.
It appears from the literature and research cited in this section that the
conceptualisation of feedback may be considerably in advance of its application,
with three kinds of problems being evident: problems that arise from the com-
plexity of feedback as a learning process; ‘‘structural problems’’ related to the
timing of feedback, its focus and quality; and what Higgins, Hartley and Skelton
(2001, p. 273) refer to as ‘‘issues of power, identity, emotion, discourse and
subjectivity’’. Clearly, assertions about the importance of feedback to learning
stand in contrast to the findings of empirical research into students’ experience
of assessment, raising questions regarding both the theoretical assumptions
about the centrality of feedback to learning and the frequent failure to bring
feedback effectively into play as part of teaching and learning processes.
Conclusion
The literature on assessment and learning is beset with difficulties. Four of these
have been noted in this chapter, beginning with problems associated with
conflated definitions of assessment. Of greater concern is the reliance by
many writers on foundational research that is not fully understood and fre-
quently misinterpreted. Certainly the assertions that assessment drives learning
and that students’ approaches to learning can be improved simply by changing
assessment methods must be treated cautiously in light of the nuanced research
which is often associated with these claims. Finally, the failure of feedback in
2 Assessment, Learning and Judgement in Higher Education 25
Acknowledgment I am grateful to Royce Sadler for his critical reading of the first draft of this
chapter and his insightful suggestions for its improvement.
References
Becker, H. S., Geer, B., & Hughes, E. C. (1968; 1995). Making the grade: The academic side of
college life. New Brunswick: Transaction.
Biggs, J. B. (2003). Teaching for quality learning at university (2nd ed.). Maidenhead: Open
University Press.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education,
5(1), 7–74.
Boud, D. (2006). Foreword. In C. Bryan & K. Clegg (Eds.), Innovative assessment in higher
education (pp. xvii–xix). London and New York: Routledge.
Boud, D. (2007). Reframing assessment as if learning were important. In D. Boud &
N. Falchikov (Eds.), Rethinking assessment in higher education: Learning for the longer
term (pp. 14–25). London and New York: Routledge.
Bryan, C., & Clegg, K. (2006). Introduction. In C. Bryan & K. Clegg (Eds.), Innovative
assessment in higher education (pp. 1–7). London and New York: Routledge.
Carless, D., Joughin, G., Liu, N-F., & Associates (2006). How assessment supports learning:
Learning-oriented assessment in action. Hong Kong: Hong Kong University Press.
Chanock, K. (2000). ‘Comments on essays: Do students understand what tutors write?’
Teaching in Higher Education, 5(1), 95–105.
Committee on the Foundations of Assessment; Pellegrino, J.W., Chudowsky, N., & Glaser, R.,
(Eds.). (2001). Knowing what students know: The science and design of educational assessment.
Washington: National Academy Press.
Dunbar-Goddett, H., Gibbs, G., Law, S., & Rust, C. (2006, August–September). A methodology
for evaluating the effects of programme assessment environments on student learning. Paper
presented at the Third Biennial Joint Northumbria/EARLI SIG Assessment Conference,
Northumbria University.
Elton, L., & Laurillard, D. M. (1979). Trends in research on student learning. Studies in
Higher Education, 4, 87–102.
Falchikov, N., & Boud, D. (2007). Assessment and emotion: the impact of being assessed. In
D. Boud, & N. Falchikov, (Eds.), Rethinking Assessment for Higher Education: Learning
for the Longer Term (pp. 144–155). London: Routledge.
Gagne, R. M. (1985). The conditions of learning and theory of instruction. New York: CBS
College Publishing.
Gibbs, G. (2006). How assessment frames student learning. In C. Bryan & K. Clegg (Eds.),
Innovative assessment in higher education (pp. 23–36). London: Routledge.
Gibbs, G., & Simpson, C. (2003, September). Measuring the response of students to assessment:
The Assessment Experience Questionnaire. Paper presented at the 11th International
Improving Student Learning Symposium, Hinckley.
Gibbs, G., & Simpson, C. (2004). Conditions under which assessment supports students’
learning. Learning and Teaching in Higher Education, 1, 3–31.
Glover, C., & Brown, E. (2006). Written feedback for students: Too much, too detailed or too
incomprehensible to be effective? Bioscience Education ejournal, vol 7. Retrieved 5 November
from http://www.bioscience.heacademy.ac.uk/journal/vol7/beej-7-3.htm
26 G. Joughin
Hager, P., & Butler, J. (1996). Two models of educational assessment. Assessment and
Evaluation in Higher Education, 21(4), 367–378.
Haggis, T. (2003). Constructing images of ourselves? A critical investigation into ‘approaches
to learning’ research in higher education. British Educational Research Journal, 29(1),
89–104.
Higgins, R., Hartley, P., & Skelton, A. (2001). Getting the message across: The problem of
communicating assessment feedback. Teaching in Higher Education, 6(2), 269–274.
Higgins, R., Hartley, P., & Skelton, A. (2002). The conscientious consumer: Reconsidering
the role of assessment feedback in student learning. Studies in Higher Education, 27(1),
53–64.
Hounsell, D., McCune, V., Hounsell, J., & Litjens, J. (2008). The quality of guidance and
feedback to students. Higher Education Research and Development, 27(1), 55–67.
Hyland, P. (2000). Learning from feedback on assessment. In P. Hyland & A. Booth (Eds.),
The practice of university history teaching (pp. 233–247). Manchester, UK: Manchester
University Press.
Ivanic, R., Clark, R., & Rimmershaw, R. (2000). What am I supposed to make of this? The
messages conveyed to students by tutors’ written comments. In M. R. Lea & B. Stierer
(Eds.), Student writing in higher education: New contexts (pp. 47–65). Buckingham, UK:
SRHE & Open University Press
Joughin, G. (2006, August–September). Students’ experience of assessment in Hong Kong
higher education: Some cultural considerations. Paper presented at the Third Biennial
Joint Northumbria/EARLI SIG Assessment, Northumbria University.
Joughin, G. (2007). Student conceptions of oral presentations. Studies in Higher Education,
32(3), 323–336.
Knight, P. (2007). Grading, classifying and future learning. In D. Boud & N. Falchikov (Eds.),
Rethinking assessment in higher education (pp. 72–86). Abingdon and New York:
Routledge.
Laurillard, D. (2002). Rethinking university teaching (2nd ed.). London: Routledge.
Marton, F., & Säljö, R. (1997). Approaches to learning. In F. Marton, D. Hounsell, &
N. Entwistle (Eds.), The experience of learning (2nd ed., pp. 39–58). Edinburgh: Scottish
Academic Press.
Meyer, G. (1934). An experimental study of the old and new types of examination: 1. The
effect of the examination set on memory. The Journal of Educational Psychology, 25,
641–661.
Meyer, G. (1935). An experimental study of the old and new types of examination: II.
Methods of study. The Journal of Educational Psychology, 26, 30–40.
Miller, C. M. L., & Parlett, M. (1974). Up to the mark: A study of the examination game.
London: Society for Research into Higher Education.
Nicol, D. J., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learn-
ing: A model and seven principles of good feedback practice. Studies in Higher Education,
31(2), 199–218.
Nightingale, P., & O’Neil, M. (1994). Achieving quality in learning in higher education.
London: Kogan Page.
Nightingale, P., Wiata, I. T., Toohey, S., Ryan, G., Hughes, C., & Magin, D. (1996). Assessing
learning in universities. Sydney, Australia: University of New South Wales Press.
Oxford English Dictionary [electronic resource] (2002-). Oxford & New York: Oxford
University Press, updated quarterly.
Ramsden, P. (1997). The context of learning in academic departments. In F. Marton,
D. Hounsell, & N. Entwistle (Eds.), The experience of learning (2nd ed., pp. 198–216).
Edinburgh: Scottish Academic Press.
Ramsden, P. (2003). Learning to teach in higher education (2nd ed.). London: Routledge.
Rowntree, D. (1977). Assessing students: How shall we know them? (1st ed.). London: Kogan
Page.
2 Assessment, Learning and Judgement in Higher Education 27
Rowntree, D. (1987). Assessing students: How shall we know them? (2nd ed.). London: Kogan
Page.
Rowntree, D. (2007). Designing an assessment system. Retrieved 5 November, 2007, from
http://iet.open.ac.uk/pp/D.G.F.Rowntree/Assessment.html
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instruc-
tional Science, 18, 119–144.
Sambell, K., & McDowell, L. (1998). The construction of the hidden curriculum. Assessment
and Evaluation in Higher Education, 23(4), 391–402.
Scouller, K. (1998). The influence of assessment method on students’ learning approaches:
Multiple choice question examination versus assignment essay. Higher Education, 35,
453–472.
Silvey, G. (1951). Student reaction to the objective and essay test. School and Society, 73,
377–378.
Snyder, B. R. (1971). The hidden curriculum. New York: Knopf.
Struyven, K., Dochy, F., Janssens, S., & Gielen, S. (in press). On the dynamics of students’
approaches to learning: The effects of the teaching/learning environment. Learning and
Instruction.
Tang, K. C. C. (1992). Perception of task demand, strategy attributions and student learning.
Eighteenth Annual Conference of the Higher Education Research and Development
Society of Australasia, Monash University Gippsland Campus, Churchill, Victoria,
474–480.
Tang, K. C. C. (1994). Effects of modes of assessment on students’ preparation strategies. In
G. Gibbs (Ed.), Improving student learning: Theory and practice (pp. 151–170). Oxford,
England: Oxford Centre for Staff Development.
Terry, P. W. (1933). How students review for objective and essay tests. The Elementary School
Journal, April, 592–603.
The University of Queensland. (2007). Assessment Policy and Practices. Retrieved 5 November,
2007, from http://www.uq.edu.au/hupp/index.html?page=25109
Thomas, P. R., & Bain, J. D. (1982). Consistency in learning strategies. Higher Education, 11,
249–259.
Thomas, P. R., & Bain, J. D. (1984). Contextual dependence of learning approaches: The
effects of assessments. Human Learning, 3, 227–240.
Chapter 3
How Can Practice Reshape Assessment?
David Boud
Introduction
D. Boud
University Graduate School, University of Technology, Sydney, Australia
e-mail: David.Boud@uts.edu.au
in scientific knowledge traditions and broadly describes the views that underpin
common traditional programs. It is instrumental and based on means-end
rationalities. Practice is seen as an array of techniques that can be changed,
improved or learned independently of the ‘‘contingent and temporal circum-
stances’’ (p. 316) in which practices are embedded. The kind of knowledge
generated about practice ought to be ‘‘explicit, general, universal and systematic’’
(p. 318). To achieve this, such knowledge must by definition eliminate the
inherent complexity of the everyday thinking that actually occurs in practice.
The second, Model2, draws from practical knowledge traditions. In it practice is
a ‘‘purposeful, variable engagement with the world’’ (p. 321). Practices are fluid,
changeable and dynamic, characterised by their ‘alterability, indeterminacy and
particularity’ (p. 322). What is important is the specific situation in which
particular instances of practice occur and hence the context-relativity of prac-
tical knowledge. In this model, knowledge must be a flexible concept, capable of
attending to the important features of specific situations. Practice is understood
as situated action.
While Schwandt (2005) presents these models in apparent opposition to each
other, for our present purposes both need to be considered, but because the first
has been overly emphasised in discussions of courses and assessment, we will
pay attention to Model2. As introduced earlier, the key features of a practice
view from the practical knowledge traditions are that, firstly, practice is neces-
sarily contextualised, that is, it cannot be discussed independently of the settings
in which it occurs. It always occurs in particular locations and at particular
times; it is not meaningful to consider it in isolation from the actual sites of
practice. Secondly, practice is necessarily embodied, that is, it involves whole
persons, including their motives, feelings and intentions. Discussion of it in
isolation from the persons who practice is to misunderstand practice.
Furthermore, we must also consider the changing context of professional
practice (Boud, 2006). This involves, firstly, a collective rather than an indivi-
dual focus on practice, that is, a greater emphasis on the performance of teams
and groups of practitioners. Secondly, it involves a multidisciplinary and,
increasingly, a transdisciplinary focus of practice. In this, practitioners of
different specialisations come together to address problems that do not fall
exclusively in the practice domain of any one discipline. It is not conducted by
isolated individuals, but in a social and cultural context in which what one
professional does has necessarily to link with what others do. This is even the
case in those professions in which past cultural practice has been isolationist.
Thirdly, there is a new emphasis on the co-production of practice and the
co-construction of knowledge within it. Professionals are increasingly required
to engage clients, patients and customers as colleagues who co-produce solu-
tions to problems and necessarily co-produce the practices in which they are
engaged.
Practice and practice theory point to a number of features we need to
consider in assessment. The first is the notion of context knowledge and skills
used in a particular practice setting. The kinds of knowledge and skills utilised
3 How Can Practice Reshape Assessment? 33
depend on the setting. The second is the bringing together of knowledge and
skills to operate in a particular context for a particular purpose. Practice involves
these together, not each operating separately. Thirdly, knowledge and skills
require a disposition on the part of the practitioner, a willingness to use these
for the practice purpose. Fourthly, there is a need in many settings to work with
other people who might have different knowledge and skills to undertake prac-
tice. And, finally, there is the need to recognise that practice needs to take account
of and often involve those people who are the focus of the practice.
How well does existing assessment address these features of practice? In
general, it meets these requirements very poorly. Even when there is an element
of authentic practice in a work setting as part of a course, the experience is
separated from other activities and assessment of course units typically occurs
separately from placements, practical work and other located activities. Also,
the proportion of assessment activities based upon practice work, either on
campus or in a placement, or indeed any kind of working with others, is
commonly quite small and is rarely the major part of assessment in higher
education. When the vast bulk of assessment activities are considered, they
may use illustrations and examples from the world of practice, but they do not
engage with the kinds of practice features we have discussed here. Significantly,
assessment in educational institutions is essentially individualistic (notwith-
standing some small moves towards group assessment in some courses). All
assessment is recorded against individuals and group assessments are uncom-
mon and are often adjusted to allow for individual marks. If university learning
and assessment is to provide the foundation for students to subsequently engage
in practice, then it needs to respond to the characteristics identified as being
central to practice and find ways of incorporating them into common assess-
ment events. This needs to occur whether or not the course uses placements at
all or whether or not it is explicitly vocational.
Of course, in many cases we do not know the specific social and organisational
contexts in which students will subsequently practice. Does this mean we must
reject the practice perspective? Not at all: while the specifics of any given context
may not be known, it is known that the exercise of knowledge and skill will not
normally take place in isolation from others, both other practitioners and people
for whom a service or product is provided. It will take place in an organisational
context. It will involve emotional investments on the part of the practitioner and
it will involve them in planning and monitoring their own learning. These
elements therefore will need to be considered in thinking about how students
are assessed and examples of contexts will need to be assumed for these purposes.
programs. It could be argued that structuring occasions for this is the most
important educational feature of any course. Fostering reflexivity and self-
regulation is not something that can take place in one course unit and be expected
to benefit others. It is a fundamental attribute of programs that needs to be
developed and sustained throughout. Assessment must be integrated with learn-
ing and integrated within all elements of a program over time. As soon as one unit
is seen to promote dependency, it can detract from the overall goal of the
program.
These are more demanding requirements than they appear at first sight. They
require vigilance for all assessment acts, even the most apparently innocuous.
For example, tests at the start of a course unit which test simple recall of
terminology as a pre-requisite for what follows, can easily give the message
that remembering the facts is what is required in this course. This does not
mean that all such tests are inappropriate, but it does mean that the total
experience of students needs to be considered, and perhaps tests of one kind
need to be balanced with quite different activities in order to create a program
that has the ultimate desired outcomes.
This points to the need to examine the consequences of all assessment acts
for learning. Do they or do they not improve students’ judgement, and if so,
how do they do it? If they do not, then they are insufficiently connected with
their main raison d’eˆtre of education to be justified – if assessment does not
actively support the kinds of learning that are sought, then it is at least a missed
opportunity, if not an act that undermines the learning outcome.
Apprenticeship as a Prototype
What more can we learn from the apprenticeship tradition? Firstly, it has a
strong base in a community of practice in which the apprentice is gradually
becoming a part. That is, the apprentice sees the practices of which he or she is
becoming a part. They can imagine their involvement and vicariously, and
sometimes directly, experience the joys and frustrations of competent work.
The apprentice is a normal part of the work community and is accepted as a part
of it. There are clear external points of reference for judgements that are made
about work itself. These are readily available and are used on a regular basis.
Practice is reinforced by repetition and skill is developed. In this, assessment
and feedback are integrated into everyday work activities. They are not isolated
from it and conducted elsewhere. Final judgements are based on what a person
can do in a real setting, not on a task abstracted from it. Grades are typically not
used. There is no need to extract an artificial measure when competence or what
is ‘fit-for-purpose’ are the yardsticks.
Of course, we can only go so far in higher education in taking such an
example. Higher education is not an apprenticeship and it is not being suggested
that it should become like one. In apprenticeship, the skills developed are of a
very high order, but the range of activity over which they are deployed can be
more limited than in the professions and occupations for which higher education
is a preparation. The challenge for higher education is to develop knowledge and
skills in a context in which the opportunities for practice and the opportunities to
see others practice are more restricted but where high-level competencies are
required. This does not suggest that practice be neglected, it means though that it
is even more important to focus on opportunities for it to be considered than the
apparent luxury of the everyday practice setting of the apprenticeship.
What else can we take from a practice view for considering how assessment in
higher education should be conducted? The first, and most obvious considera-
tion is that assessment tasks must be contextualized in practice. Learning occurs
for a purpose, and while not all ultimate purposes are known at the time of
learning, it is clear that they will all involve applications in sites of practice. To
leave this consideration out of assessment then is to denature assessment and
turn it into an artificial construct that does not connect with the world. Test
items that are referential only to ideas and events that occur in the world of
exposition and education may be suitable as intermediate steps towards later
assessment processes, but they are not realistic on their own. While there may be
scope in applied physics, for example, for working out the forces on a weightless
pulley or on an object on a frictionless surface, to stop there is to operate in a
world where there is a lack of consequence. To make this assumption is to deny
that real decisions will have to be made and to create the need for students to
unlearn something in order to operate in the world.
3 How Can Practice Reshape Assessment? 39
involving students in making judgements so that they and others can take a view
about whether learning suitable for life after courses has been achieved and
what more needs to be engaged in.
Implications
Why then should we consider starting with practice as a key organiser for
assessment? As we have seen, it is anchored in the professional world, not the
world of educational institutions. This means that there are multiple views of
practice that are available external to the educational enterprise. Practice
focuses attention on work outside the artifacts of the course – there is a point
of reference for decision-making beyond course-determined assessment criteria
and actions that take place have consequences beyond those of formal assess-
ment requirements. These create possibilities for new approaches to assessment.
In addition to this, judgements of those in a practice situation (professional or
client) make a difference to those involved. That is, there are consequences
beyond the learning of the student that frame and constrain actions. These
provide a reality-check not available in an internally-referenced assessment
context. These considerations raise the stakes, intensify the experience and
embody the learner more thoroughly in situations that anticipate engagement
as a full professional. As we see beyond our course into the world of profes-
sional practice, assessment becomes necessarily authentic: authenticity does not
need to be contrived.
To sum up, a ‘practice’ perspective helps us to focus on a number of issues
for assessment tasks within the mainstream of university courses. These include:
1. Locating assessment tasks in authentic contexts.
These need not necessarily involve students being placed in external work
settings, but involve the greater use of features of authentic contexts to frame
assessment tasks. They could model or simulate key elements of authentic
contexts.
2. Establishing holistic tasks rather than fragmented ones.
The least authentic of assessment tasks are those taken in isolation and
disembodied from the settings in which they are likely to occur. While
tasks may need to be disaggregated for purposes of exposition and rehearsal
of the separate elements, they need to be put back together again if students
are to see knowledge as a whole.
3. Focusing on the processes required for a task rather than the product or
outcome per se.
Processes and ways of approaching tasks can often be applied from one
situation to another whereas the particularities of products may vary mark-
edly. Involving students in ways of framing tasks in assessment is often
neglected in conventional assessment.
3 How Can Practice Reshape Assessment? 41
4. Learning from the task, not just demonstrating learning through the task.
A key element of learning from assessment is the ability to identify cues from
tasks themselves which indicate how they should be approached, the criteria to
be used in judging performance and what constitutes successful completion.
5. Having consciousness of the need for refining the judgements of students, not
just the judgement of students by others.
Learning in practice involves the ability to continuously learn from the
tasks that are encountered. This requires progressive refinement of judgements
by the learner which may be inhibited by the inappropriate deployment of the
judgements of others when learners do not see themselves as active agents.
6. Involving others in assessment activities, away from an exclusive focus on the
individual.
Given that practice occurs in a social context, it is necessary that the skill of
involving others is an intrinsic part of learning and assessment. Assessment
with and for others needs far greater emphasis in courses.
7. Using standards appropriate to the task, not on comparisons with other
students.
While most educational institutions have long moved from inappropriate
norm-referenced assessment regimes, residues from them still exist. The
most common is the use of generic rather than task-specific standards
and criteria that use statements of quality not connected to the task in
hand (eg. abstract levels using terms such as adequate or superior perfor-
mance, without a task-oriented anchor).
8. Moving away from an exclusive emphasis on independent assessment in each
course unit towards development of assessment tasks throughout a program
and linking activities from different courses.
The greatest fragmentation often occurs through the separate treatment of
individual course units for assessment purposes. Generic student attributes
can only be achieved through coordination and integration of assessment
tasks across units. Most of the skills of practice discussed here develop over
time and need practice over longer periods than a semester and cannot be
relegated to parts of an overall program.
9. Acknowledging student agency and initiation rather than have them always
responding to the prompts of others.
The design of assessment so that it always responds to the need to build
student agency in learning and development is a fundamental challenge
for assessment activities. This does not mean that students have to choose
assessment tasks, but that they are constructed in ways that maximise active
student involvement in them.
10. Building in an awareness of co-production of outcomes with others.
Practitioners not only work with others, but they co-produce with them.
This implies that there needs to be assessment tasks in which students
co-construct outcomes. While this does not necessarily require group
assessment as such, it needs to design activities with multi-participant out-
comes into an overall regime.
42 D. Boud
The challenge the practice perspective creates is to find ways of making some
of these shifts in assessment activities in a higher education context that is
moving rapidly in an outcomes-oriented direction, but which embodies the
cultural practices of an era deeply sceptical of the excessively vocational. The
implication of taking such a perspective is not that more resources are required
or that we need to scrap what we are doing and start again. It does however
require a profound change of perspective. We need to move from privileging
our own academic content and of assessing students as if our part of the course
was more important than anything else, to a position that is more respectful of the
use of knowledge, of the program as a whole and the need to build the capacity of
students to learn and assess for themselves once they are out of our hands. Some
of the changes required are incremental and involve no more than altering
existing assessment tasks by giving them stronger contextual features. However,
others create a new agenda for assessment and provoke us to find potentially
quite new assessment modes that involve making judgements in co-production of
knowledge. If we are to pursue this agenda, we need to take up the challenges and
operate in ways in which our graduates are being increasingly required to operate
in the emerging kinds of work of the twenty-first century.
References
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education,
5(1), 7–74.
Boud, D. (1995). Enhancing learning through self assessment. London: Kogan Page.
Boud, D. (2006, July). Relocating reflection in the context of practice: Rehabilitation or
rejection? Keynote address presented at Professional Lifelong Learning: Beyond Reflec-
tive Practice, a conference held at Trinity and All Saints College. Leeds: Institute for
Lifelong Learning, University of Leeds. Retrieved 20 October, 2007, from http://www.
leeds.ac.uk/educol/documents/155666.pdf
Boud, D. (2007). Reframing assessment as if learning was important. In D. Boud &
N. Falchikov (Eds.), Rethinking assessment in higher education: Learning for the longer
term (pp. 14–25) London: Routledge.
Boud, D., & Falchikov, N. (2006). Aligning assessment with long term learning. Assessment
and Evaluation in Higher Education, 31(4), 399–413.
Boud, D., & Falchikov, N. (2007). Developing assessment for informing judgement. In
Boud, D. and Falchikov, N. (Eds) Rethinking Assessment in Higher Education: Learning
for the Longer Term (pp. 181–197). London: Routledge.
Gibbs, G. (2006). How assessment frames student learning. In K. Clegg & C. Bryan (Eds.),
Innovative Assessment in Higher Education. London: Routledge.
Hager, P., & Butler, J. (1996). Two models of educational assessment. Assessment and
Evaluation in Higher Education, 21(4), 367–378.
Knight, P. (2007). Grading, classifying and future learning. In D. Boud & N. Falchikov (Eds.),
Rethinking Assessment in higher education: Learning for the longer term (pp. 72–86) London:
Routledge.
Kvale, S. (2008). A workplace perspective on school assessment, In A. Havnes & L. McDowell
(Eds.), Balancing dilemmas in assessment and learning in contemporary education
(pp. 197–208) New York: Routledge.
3 How Can Practice Reshape Assessment? 43
Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation.
Cambridge, UK: Cambridge University Press.
Nicol, D. J., & MacFarlane-Dick, D. (2006). Formative assessment and self-regulated learn-
ing: A model and seven principles of good feedback practice. Studies in Higher Education,
31(2), 199–218.
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instruc-
tional Science, 18, 119–144.
Sadler, D. R. (1998). Formative assessment: Revisiting the territory. Assessment in Education,
5(1), 77–84.
Schatzki, T. R. (2001). Introduction: Practice theory. In T. Schatzki, K. Knorr Cetina, &
E. von Savigny (Eds.), The practice turn in contemporary theory (pp. 1–14). London:
Routledge.
Schwandt, T. (2005). On modelling our understanding of the practice fields. Pedagogy,
Culture & Society, 13(3), 313–332.
Tanggaard, L., & Elmholdt, C. (2008). Assessment in practice: An inspiration from appren-
ticeship. Scandinavian Journal of Educational Research, 52(1), 97–116.
Chapter 4
Transforming Holistic Assessment and Grading
into a Vehicle for Complex Learning
D. Royce Sadler
Introduction
One of the themes running through my work since 1980 has been that students
need to develop the capacity to monitor the quality of their own work during its
actual production. For this to occur, students need to appreciate what consti-
tutes work of higher quality; to compare the quality of their emerging work with
the higher quality; and to draw on a store of tactics to modify their work as
necessary. In this chapter, this theme is extended in two ways. The first is an
analysis of the fundamental validity of using preset criteria as a general
approach to appraising quality. The second is a teaching design that enables
holistic appraisals to align pedagogy with assessment.
For the purposes of this chapter, a course refers to a unit of study that forms a
relatively self-contained component of a degree program. A student response to
an assessment task is referred to as a work. The assessed quality of each work is
represented by a numerical, literal or verbal mark or grade. Detailed feedback
from the teacher may accompany the grade. For the types of works of interest in
this chapter, grades are mostly produced in one of two ways.
In analytic grading, the teacher makes separate qualitative judgments on a
limited number of properties or criteria. These are usually preset, that is, they
are nominated in advance. Each criterion is used for appraising each student’s
work. The teacher may prescribe the criteria, or students and teachers may
negotiate them. Alternatively, the teacher may require that students develop
their own criteria as a means of deepening their involvement in the assessment
process. In this chapter, how the criteria are decided is not important. After the
separate judgments on the criteria are made, they are combined using a rule or
formula, and converted to a grade. Analytic grading is overtly systematic. By
identifying the specific elements that contribute to the final grade, analytic
grading provides the student with explicit feedback. The template used in
D.R. Sadler
Griffith Institute for Higher Education, Mt Gravatt Campus, Griffith University,
Nathan, Qld, Australia
e-mail: r.sadler@griffith.edu.au
implementing the process may be called a rubric, or any one of scoring, marking
or grading paired with scheme, guide, matrix or grid. As a group, these models
are sometimes referred to as criterion-based assessment or primary trait analysis.
In holistic or global grading, the teacher responds to a student’s work as a
whole, then directly maps its quality to a notional point on the grade scale.
Although the teacher may note specific features that stand out while appraising,
arriving directly at a global judgment is foremost. Reflection on that judgment
gives rise to an explanation, which necessarily refers to criteria. Holistic grading
is sometimes characterised as impressionistic or intuitive.
The relative merits of analytic and holistic grading have been debated for
many years, at all levels of education. The most commonly used criterion for
comparison has been scorer reliability. This statistic measures the degree of
consistency with which grades are assigned to the same set of works by different
teachers (inter-grader reliability), or by the same teacher on separate occasions
(temporal reliability). Scorer reliability is undoubtedly a useful criterion, but is
too narrow on its own. It does not take into account other factors such as the
skills of the markers in each method, or the extent to which each method is able
to capture all the dimensions that matter.
The use of analytic grading schemes and templates is now firmly established
in higher education. Internationally, rapid growth in popularity has occurred
since about 1995. Nevertheless, the basic ideas are not new. Inductively decom-
posing holistic appraisals goes back at least to 1759, when Edmund Burke set
out to identify the properties that characterise beautiful objects in general. In
the forward direction, the theory and practice of assembling overall judgments
from discrete appraisals on separate criteria has developed mostly over the last
50 years. It has given rise to an extensive literature touching many fields. Key
research areas have been clinical decision making (Meehl, 1954/1996) and
human expertise of various types (Chi, Glaser & Farr, 1988; Ericsson &
Smith, 1991). The terminology used is diverse, and includes ‘policy capturing’
and ‘actuarial methods’.
Specifically in educational assessment, Braddock, Lloyd-Jones, and Schoer
(1963) reported early developmental work on analytic approaches to grading
English composition, and the rationale for it; primary trait scoring is described
in Lloyd-Jones (1977). Researchers in higher education assessment have
explored in recent years the use of criteria and rubrics, specifically involving
students in self- and peer-assessment activities (Bloxham & West, 2004;
Orsmond, Merry & Reiling, 2000; Rust, Price & O’Donovan, 2003; Woolf,
2004). Many books on assessment in higher education advocate analytic
grading, and provide practitioners with detailed operational guidelines.
Examples are Freeman and Lewis (1998), Huba and Freed (2000), Morgan,
Dunn, Parry, and O’Reilly (2004), Stevens and Levi (2004), Suskie (2004), and
Walvoord and Anderson (1998).
For the most part, both the underlying principles and the various methods of
implementation have been accepted uncritically. In this chapter, the sufficiency
of analytic grading as a general approach for relevant classes of student works is
4 Transforming Holistic Assessment and Grading 47
called into question, on both theoretical and practical grounds. The basic
reason is that it sets up appraisal frameworks that are, in principle, sub-optimal.
Although they work adequately for some grading decisions, they do a disservice
to others by unnecessarily constraining the scope of appraisals. The assumption
that using preset criteria is unproblematic has had two inhibiting effects. First,
teachers typically have not felt free to acknowledge, especially to students, the
existence or nature of certain limitations they encounter. Second, there has been
little or no imperative to explore and develop alternative ways forward.
The theme of this chapter is developed around five propositions. The first
four are dealt with relatively briefly; the fifth is assigned a section of its own. The
driving principle is that if students are to achieve consistently high levels of
performance, they need to develop a conceptualisation of what constitutes
quality as a generalised attribute (Sadler, 1983). They also need to be inducted
into evaluating quality, without necessarily being bound by tightly specified
criteria. This approach mirrors the way multi-criterion judgments are typically
made by experienced teachers. It is also an authentic representation of the ways
many appraisals are made in a host of everyday contexts by experts and
non-experts alike. Equipping students with evaluative insights and skills there-
fore contributes an important graduate skill. All five propositions are taken into
account in the second half of the chapter, which outlines an approach to the
assessment of complex student productions.
stage in the development. Otherwise the work cannot be improved. This in turn
necessitates that the student becomes sensitive to where those ‘pertinent points’
are, as they arise, during construction.
Many of the students whose usual levels of performance are mediocre are
hampered by not knowing what constitutes work of high quality. This sets an
upper bound on their ability to monitor the quality of their own developing
work (Sadler, 1983). Raising students’ knowledge about high quality and their
capability in self-monitoring can lead to a positive chain of events. These are
improved grades, increased intrinsic satisfaction, enhanced motivation and, as
a consequence, higher levels of achievement. Within the scope of a single course,
it is obviously not realistic to expect learners to become full connoisseurs. But as
a course proceeds, the learners’ judgments about the quality of their own works
should show progressively smaller margins of error. Self-monitoring raises self-
awareness and increases the learner’s metacognition of what is going on.
Attaining this goal is not intrinsically difficult, but it does require that a number
of specific conditions be met. Not to achieve the goal, however, represents a
considerable opportunity loss.
Proposition 2 is that students can develop evaluative expertise in much the
same way as they develop other knowledge and skills, including the substantive
content of a course. Skilled appraisal is just one of many types of expertise,
although it seldom features explicitly among course objectives. Initially, a key
tool for developing it is credible feedback, primarily from the teacher and peers.
Feedback usually takes the form of descriptions, explanations or advice
expressed in words. Preset criteria coupled with verbal feedback stem from a
desire to tell or inform students. It might be thought that the act of telling would
serve to raise the performance ceiling for learners, but just being told is rarely an
adequate remedy for ignorance. The height of the ceiling depends on what
students make of what is ‘told’ to them.
The next step along the path can be taken when relevant examples
complement the verbal descriptions (Sadler 1983, 1987, 1989, 2002). Examples
provide students with concrete referents. Without them, explanatory
comments remain more or less abstract, and students cannot interpret them
with certainty. If the number of examples is small, they need to be chosen as
judiciously as examples are for teaching. If examples are plentiful, careful selection
is not as critical, provided they cover a considerable range of the quality spectrum.
Even more progress can be made if teachers and learners actively discuss the
descriptions and exemplars together. Reading verbal explanations, seeing perti-
nent exemplars, and engaging in discourse provide categorically different cogni-
tive inputs. But the best combination of all three still does not go far enough.
The remaining element is that students engage in making evaluative
decisions themselves, and justifying those decisions. No amount of telling,
showing or discussing is a substitute for one’s own experience (Sadler 1980,
1989). The student must learn how to perceive works essentially through the
eyes of an informed critic, eventually becoming ‘calibrated’. Learning
50 D.R. Sadler
environments are self-limiting to the extent that they fail to make appropriate
provision for students to make, and be accountable for, serious appraisals.
Proposition 3 is that students’ direct evaluative experience should be relevant
to their current context, not translated from another. The focus for their
experience must therefore be works of a genre that is substantially similar to
the one in which they are producing. Apart from learning about quality, closely
and critically examining what others have produced in addressing assessment
tasks expands the student’s inventory of possible moves. These then become
available for drawing upon to improve students’ own work. This is one of the
reasons peer assessment is so important.
However, merely having students engage in peer appraisal in order to make
assessment more participatory or democratic is not enough. Neither is treating
students as if they were already competent assessors whose appraisals deserve
equal standing with those of the teacher, and should therefore contribute to
their peers’ grades. The way peer assessment is implemented should reflect the
reasons for doing it. Learners need to become reasonably competent not only at
assessing other students’ works but also at applying that knowledge to their
own works.
Proposition 4 is that the pedagogical design must function not only effectively
but also efficiently for both teachers and students. There is obviously little point
in advocating changes to assessment practices that are more labour intensive than
prevailing procedures. Providing students with sufficient direct evaluative experi-
ence can be time consuming unless compensating changes are made in other
aspects of the teaching. This aspect is taken up later in the chapter.
These arguments appear sound and fair. Who would argue against an
assessment system that provides more and better feedback, increases transpar-
ency, improves accountability, and achieves greater objectivity, all with no
increase in workload? On the other hand, the use of preset criteria accompanied
by a rule for combining judgments is not the only way to go. In the critique
below, a number of issues that form the basis of Proposition 5 are set out. The
claim is that no matter how comprehensive and precise the procedures are, or
how meticulously they are followed, they can, and for some student works do,
lead to deficient or distorted grading decisions. This is patently unfair to those
students.
In the rest of this section, the case for holistic judgments is presented in
considerable detail. Although any proposal to advocate holistic rather than
analytic assessments might be viewed initially as taking a backward step, this
chapter is underwritten by a strong commitment to students and their learning.
Specifically, the aim is to equip students to work routinely with holistic apprai-
sals; to appreciate their validity; and to use them in improving their own work.
The five clauses in the rationale above are framed specifically in terms of criteria
and standards. This wording therefore presupposes an analytic model for grad-
ing. By the end of this chapter, it should become apparent how the ethical
principles behind this rationale can be honoured in full through alternative
means. The wording of the rationale that corresponds to this alternative would
then be different.
use, the appraiser makes a qualitative judgment about the ‘strength’ or level of
the work on each criterion, and marks the corresponding point on the scale
line. If a total score is required, the relative importance of each criterion is
typically given a numerical weighting. The sub-scores on the scales are multi-
plied by their respective weightings. Then all the weighted scores are summed,
and the aggregate either reported directly or turned into a grade using a
conversion table.
An analytic rubric has a different format. Using the terminology in Sadler
(1987, 2005), a rubric is essentially a matrix of cross-tabulated criteria and
standards or levels. (This nomenclature is not uniform. Some rubrics use
qualities/criteria instead of criteria/standards as the headings.) Each standard
represents a particular level on one criterion. Common practice is for the
number of standards to be the same for all criteria, but this is not strictly
necessary. A rubric with five criteria, each of which has four standards, has a
total of 20 cells. Each cell contains a short verbal description that sets out a
particular strength of the work on the corresponding criterion. This is usually
expressed either as a verbal quantifier (how much), or in terms of sub-attributes
of the main criterion. For each student work, the assessor identifies the single
cell for each criterion that best seems to characterise the work. The rubric may
also include provision for the various cell standards to carry nominated ranges
of numerical values that reflect weightings. A total score can then be calculated
and converted to a grade.
Holistic rubrics form a less commonly used category than the two above.
They associate each grade level with a reasonably full verbal description,
which is intended as indicative rather than definitive or prescriptive. These
descriptions do not necessarily refer to the same criteria for all grade levels.
Holistic rubrics are essentially different from the other two and have a differ-
ent set of limitations, particularly in relation to feedback. They are not
considered further here.
In most analytic schemes, there is no role for distinctly global primary
appraisals. At most, there may be a criterion labelled overall assessment that
enjoys essentially the same status as all the other criteria. Apart from the
technical redundancy this often involves, such a concession to valuing overall
quality fails to address what is required. Students therefore learn that global
judgments are normally compounded from smaller-scale judgments. The learn-
ing environment offers regular reinforcement, and the stakes for the student
are high.
Grading methods that use preset criteria with mechanical combination
produce anomalies. The two particular anomalies outlined below represent
recurring patterns, and form part of a larger set that is the subject of ongoing
research. These anomalies are detected by a wide range of university teachers, in
a wide range of disciplines and fields, for a wide range of assessment task types
and student works. The same anomalies are, however, invisible to learners, and
the design of the appraisal framework keeps them that way. Therein lies the
problem.
4 Transforming Holistic Assessment and Grading 53
Anomaly 1
Teachers routinely discover some works for which global impressions of their
quality are categorically at odds with the outcomes produced by conscientious
implementation of the analytic grading scheme. Furthermore, the teacher is at a
loss to explain why. A work which the teacher would rate as ‘‘brilliant’’ overall
may not be outstanding on all the preset criteria. The whole actually amounts
to more than the sum of its parts. Conversely, a work the teacher would rate as
mediocre may come out extremely well on the separate criteria. For it, the whole
is less than the sum of its parts. This type of mismatch is not confined to
educational contexts.
Whenever a discrepancy of this type is detected, teachers who accept as
authoritative the formula-based grade simply ignore their informal holistic
appraisal without further ado. Other teachers react differently. They question
why the analytic grade, which was painstakingly built from component judg-
ments, fails to deliver the true assessment, which they regard as the holistic
judgment. In so doing, they implicitly express more confidence in the holistic
grade than in the analytic. To reconcile the two, they may adjust the reported
levels on the criteria until the analytic judgment tells the story it ‘‘should’’. At the
same time, these teachers remain perplexed about what causes such anomalies.
They feel especially troubled about the validity and ethics of what could be
interpreted by others as fudging. For these reasons, they are generally reluctant
to talk about the occurrence of these anomalies to teaching colleagues or to
their students. However, many are prepared to discuss them in secure research
environments.
What accounts for this anomaly? There are a number of possible contributing
factors. One is that not all the knowledge a person has, or that a group of people
share, can necessarily be expressed in words (Polanyi, 1962). There exists no
theorem to the effect that the domain of experiential or tacit knowledge, which
includes various forms of expertise, is co-extensive and isomorphic with the
domain of propositional knowledge (Sadler, 1980). Another possible factor is
the manner in which experts intuitively process information from multiple
sources to arrive at complex decisions, as summarised in Sadler (1981). Certain
holistic appraisals do not necessarily map neatly onto explicit sets of specified
criteria, or simple rules for combination. Further exploration of these aspects
lies outside the scope of this chapter. It would require tapping into the extensive
literature on the processes of human judgment, and into the philosophy of
so-called ineffable knowledge.
Anomaly 2
This anomaly is similar to the first in that a discrepancy occurs between the
analytically derived grade and the assessor’s informal holistic appraisal. In this
case, however, the teacher knows that the source of the problem is with a
54 D.R. Sadler
particular criterion that is missing from the preset list. That criterion may be
important enough to set the work apart, almost in a class of its own. To simplify
the analysis, assume there is just one such criterion. Also assume that the
teacher has checked that this criterion is not included on the specified list in
some disguised form, such as an extreme level on a criterion that typically goes
under another name, or some blend of several criteria. Strict adherence to the
analytic grading rule would disallow this criterion from entering, formally or
informally, into either the process of grading or any subsequent explanation. To
admit it would breach the implicit contract between teacher and student that
only specified criteria will be used.
Why do sets of criteria seem comprehensive enough for adequately apprais-
ing some works but not others? Part of the explanation is that, for complex
works, a specified set of criteria is almost always a selection from a larger pool
(or population) of criteria. By definition, a sample does not fully represent a
population. Therefore, applying a fixed sample of criteria to all student works
leaves open the possibility that some works may stand out as exceptional on
the basis of unspecified criteria. This applies particularly to highly divergent,
creative or innovative responses. Arbitrarily restricting the set of criteria for
these works introduces distortion into the grading process, lowering its
validity.
To illustrate this sampling phenomenon, consider a piece of written work
such as an essay or term paper, for which rubrics containing at least four criteria
are readily available. Suppose the teacher’s rubric prescribes the following
criteria:
relevance
coherence
presentation
support for assertions.
Behind these sits a much larger pool of potentially valid criteria. An idea of the
size of this pool can be obtained by analysing a large number of published sets.
One such (incomplete) collection was published in Sadler (1989). In alphabetical
order, the criteria were:
accuracy (of facts, evidence, explanations); audience (sense of); authenticity; clarity;
coherence; cohesion; completeness; compliance (with conventions of the genre);
comprehensiveness; conciseness (succinctness); consistency (internal); content
(substance); craftsmanship; depth (of analysis, treatment); elaboration; engagement;
exemplification (use of examples or illustrations); expression; figures of speech; flair;
flavour; flexibility; fluency (or smoothness); focus; global (or overall) development;
grammar; handwriting (legibility); ideas; logical (or chronological) ordering (or
control of ideas); mechanics; novelty; objectivity (or subjectivity, as appropriate);
organization; originality (creativity, imaginativeness); paragraphing; persuasiveness;
presentation (including layout); punctuation (including capitalization); readability;
referencing; register; relevance (to task or topic); rhetoric (or rhetorical effectiveness);
sentence structure; spelling; style; support for assertions; syntax; tone; transitions;
usage; vocabulary; voice; wording.
4 Transforming Holistic Assessment and Grading 55
This goal can be achieved by shifting the agenda beyond telling, showing and
discussing how appraisals of students’ works are to be made. Instead, the
process is designed to induct students into the art of making appraisals in a
substantive and comprehensive way. Such a pedagogical approach is integral to
the development of a valid alternative not only to analytic grading but also to
holistic grading as done traditionally.
This induction process can function as a fundamental principle for learning-
oriented assessment. Properly implemented, it recognises fully the responsibility of
the teacher to bring students into a deep knowledge of how criteria actually
function in making a complex appraisal, and of the need for the assessor to supply
adequate grounds for the judgment. In the process, it sets learners up
for developing the capability to monitor the quality of their own work during
its development. In particular, it provides for an explicit focus on the quality of how
the student work is coming together – as a whole – at any stage of development.
The approach outlined below draws upon strategies that have been trialled
successfully in higher education classes along with others that are worthy of
further exploration and experimentation. Major reform of teaching and learn-
ing environments requires significant changes to approaches that have become
deeply embedded in practice over many years. Because the use of preset criteria
has gained considerable momentum in many academic settings, transition
problems are to be expected. Some ideas for identifying and managing these
are also included.
Developing Expertise
The general approach draws from the work of Polanyi (1962). It could be
described as starting learners on the path towards becoming connoisseurs.
There is a common saying that goes, ‘‘I do not know how to define quality,
but I know it when I see it’’. In this statement, to know quality is to recognise,
seize on, or apprehend it. To recognise quality ‘‘when I see it’’ means that I can
recognise it – in particular cases. The concrete instance first gives rise to
perception, then to recognition. Formally defining quality is a different matter
altogether. Whether it is possible for a particular person to construct a defini-
tion depends on a number of factors, such as their powers of abstraction and
articulation. There is a more general requirement, not limited to a particular
person. Are there enough similarly classified cases that share enough key
characteristics to allow for identification, using inductive inference, of what
they have in common? The third factor is touched on in the account of the first
anomaly described above. It is that some concepts appear to be, in principle,
beyond the reach of formal definition. Many such concepts form essential
elements of our everyday language and communication, and are by no means
esoteric. Given these three factors, there is no logical reason to assume that
creating a formal definition is always either worth the effort, or even possible.
4 Transforming Holistic Assessment and Grading 57
On the other hand, it is known empirically that experts can recognise quality
(or some other complex characteristic) independently of knowing any
definition. This is evidence that recognition can function as a fundamentally
valid, primary act in its own right (Dewey, 1939). It is also what makes
connoisseurship a valuable phenomenon, rather than one that is just intriguing.
Basically, connoisseurship is a highly developed form of competence in quali-
tative appraisal. In many situations, the expert is able to give a comprehensive,
valid and carefully reasoned explanation for a particular appraisal, yet is unable
to do so for the general case. In other situations, an explanation for even a
particular case is at best partial. To accept recognition as a primary evaluative
act opens the door to the development of appraisal explanations that are
specifically crafted for particular cases without being constrained by predeter-
mined criteria that apply to all cases. Holistic recognition means that the
appraiser reacts or responds (Sadler, 1985), whereas building a judgment up
from discrete decisions on the criteria is rational and stepwise. That is the key
distinction in intellectual processing. The idea of recognising quality when it
is observed immediately strikes a familiar chord with many people who work
in both educational and non-educational contexts where multi-criterion
judgments are required. Giving primacy to an overall assessment has its coun-
terparts in many other fields and professions.
This does not imply that grading judgments should be entirely holistic or
entirely analytic, as if these were mutually exclusive categories. The two
approaches are by no means incompatible, and how teachers use them is a
matter of choice. To advocate that a teacher should grade solely by making
global judgments without reference to any criteria is as inappropriate as
requiring all grades to be compiled from components according to set rules.
Experienced assessors routinely alternate between the two approaches in order
to produce what they consider to be the most valid grade. This is how they
detect the anomalies above, even without consciously trying. In doing this, they
tend to focus initially on the overall quality of a work, rather than on its
separate qualities. Among other things, these assessors switch focus between
global and specific characteristics, just as the eye switches effortlessly between
foreground (which is more localised and criterion-bound) and background
(which is more holistic and open). Broader views allow things to be seen in
perspective, often with greater realism. They also counter the atomism that
arises from breaking judgments up into progressively smaller elements in a bid
to attain greater precision.
Inducting students into the processes of making multi-criterion judgments
holistically, and only afterwards formulating valid reasons for them, requires a
distinctive pedagogical environment. The end goal is to bring learners at least
partly into the guild of professionals who are able to make valid and reliable
appraisals of complex works using all the tools at their disposal. Among other
things, students need to learn to run with dual evaluative agendas. The first
involves scoping the work as a whole to get a feel for its overall quality; the
second is to pay attention to its particular qualities.
58 D.R. Sadler
In some learning contexts, students typically see only their own works. Such
limited samples cannot provide a sufficient basis for generalisations about
quality. For students to develop evaluative expertise requires that three inter-
connected conditions be satisfied. First, students need exposure to a wide
variety of authentic works that are notionally within the same genre. This is
specifically the same genre students are working with at the time. The most
readily available source of suitable works consists of responses from other
students attempting the same assessment task. As students examine other
students’ works through the lens of critique, they find that peer responses can
be constructed quite differently from one another, even those that the students
themselves would judge to be worth the same grade. They also discover, often
to their surprise, works that do not address the assessment task as it was
specified. They see how some of these works could hardly be classified as
valid ‘‘responses’’ to the assessment task at all, if the task specifications were
to be taken literally. This phenomenon is common knowledge among higher
education teachers, and is a source of considerable frustration to them.
The second condition is that students need access to works that range across
the full spectrum of quality. Otherwise learners have difficulty developing the
concept of quality at all. To achieve sufficient variety, the teacher may need to
supplement student works from the class with others from outside. These may
come from different times or classes, or be specially created by the teacher or
teaching colleagues. The third condition is that students need exposure to
responses from a variety of assessment tasks. In a fundamental sense, a develop-
ing concept of quality cannot be entirely specific to a particular assessment task.
In the process of making evaluations of successive works and constructing
justifications for qualitative judgments, fresh criteria emerge naturally. The
subtlety and sophistication of these criteria typically increase as evaluative
expertise develops. A newly relevant criterion is drawn – on demand – from
the larger pool of background or latent criteria. It is triggered or activated by
some property of a particular work that is noteworthy, and then added
temporarily to the working set of manifest criteria (Sadler, 1983). The work
may exhibit this characteristic to a marked degree, or to a negligible degree. On
either count, its relevance signals that it needs to be brought into the working
set. As latent criteria come to the fore, they are used in providing feedback on
the grade awarded.
Latent criteria may also be shared with other students in the context of the
particular work involved. Starting with a small initial pool of criteria, extending
it, and becoming familiar with needs-based tapping into the growing pool,
constitute key parts of the pedagogical design. This practice expands students’
personal repertoires of available criteria. It also reinforces the way criteria are
translated from latent to manifest, through appraising specific works. As addi-
tional criteria need to be brought into play, a class record of them may be kept
for interest, but not with a view to assembling a master list. This is because the
intention is to provide students with experience in the latent-to-manifest trans-
lation process, and the limitations inherent in using fixed sets of criteria.
4 Transforming Holistic Assessment and Grading 59
Students should then come to understand why this apparently fluid approach to
judging quality is not unfair or some sort of aberration. It is, in a profound
sense, rational, normal and professional.
Although it is important that students appreciate why it is not always possible
to specify all the criteria in advance, certain criteria may always be relevant. In
the context of written works, for example, some criteria are nearly always
important, even when not stated explicitly. Examples are grammar, punctuation,
referencing style, paragraphing, and logical development, sometimes grouped as
mechanics. These relate to basic communicative and structural features. They
facilitate the reader’s access to the creative or substantive aspects, and form part
of the craft or technique side of creative work. Properly implemented, they are
enablers for an appraisal of the substance of the work. A written work is difficult
to appraise if the vocabulary or textual structure is seriously deficient.
In stark contrast to the student’s situation, teachers are typically equipped with
personal appraisal resources that extend across all of the aspects above. Their
experience is constantly refreshed by renewed exposure to a wide range of student
works, in various forms and at different levels of quality (Sadler, 1998). This is so
naturally accepted as a normal part of what being a teacher involves that it hardly
ever calls for comment. It is nevertheless easy to overlook the importance of
comparable experience for students grappling with the concept of quality.
Providing direct evaluative experience efficiently is a major design element.
It is labour intensive, but offsets can be made in the way teaching time is
deployed. Evaluative activity can be configured to be the primary pedagogical
vehicle for teaching a considerable proportion of the substantive content of a
course. For example, in teaching that is structured around a lecture-tutorial
format, students may create, in their own time, responses to a task that requires
them to employ specific high-order intellectual skills such as extrapolating,
making structural comparisons, identifying underlying assumptions, mount-
ing counter-arguments, or integrating elements. These assessment tasks should
be strictly formative, and designed so that students can respond to them
successfully only as they master the basic content. Tutorial time is then spent
having students make appraisals about the quality of, and providing informed
feedback on, multiple works of their peers, and entering into discussions about
the process. In this way, student engagement with the substance of the course
takes place through a sequence of produce and appraise rather than study and
learn activities. Adaptations are possible for other modes of teaching, such as
studio and online.
Challenges of Transition
the knowledge and experience to assign grades. Student peers are simply not
qualified; they may judge too harshly or too leniently, or give feedback that is
superficial and lacking in credibility.
Obstacle 5 has to do with the students’ perceptions of themselves. They may
feel ill-equipped to grade the works of peers. This is true initially, of course, but
changing both the actuality and the self-perception are two of the course goals.
The rationale for using other students’ works is not just one of convenience. It
ensures that the student responses, which are used as raw material, are as
authentic as it is possible to get. This case may be easier to establish with
students if the Course Outline includes an objective specifically to that effect,
and a complementary statement along the following lines:
The learning activities in this course involve self- and peer-assessment. This is an
important part of how the course is taught, and how one of its key objectives will be
attained. Other students will routinely be making judgments about the quality of your
work, and you will make judgments about theirs. The students whose work you
appraise will not necessarily be those who appraise yours. If this aspect of the teaching
is likely, in principle, to cause you personal difficulty, please discuss your concern with
the person in charge of the course within the first two weeks of term.
Conclusion
The practice of providing students with the assessment criteria and their weight-
ings before they respond to an assessment task is now entrenched in higher
education. The rationale contains both ethical and practical elements. Most
62 D.R. Sadler
analytic rubrics and similar templates fix the criteria and the rule for combining
separate judgments on those criteria to produce the grade. This practice is rarely
examined closely, but in this chapter it is shown to be problematic, in principle
and in practice.
To address this problem, the unique value of holistic judgments needs to be
appreciated, with openness to incorporating criteria that are not on a fixed list.
To maintain technical and personal integrity, a number of significant shifts in
the assessment environment are necessary. The commitment to mechanistic use
of preset criteria needs to be abandoned. Teachers and students need to be
inducted into more open ways of making grading decisions and justifying them.
Students need to be provided with extensive guided experience in making global
judgments of works of the same types they produce themselves. Ultimately, the
aim is for learners to become better able to engage in self-monitoring the
development of their own works.
Shifting practice in this direction requires a substantially different alignment
of pedagogical priorities and processes. It also requires specific strategies to
overcome traditions and sources of potential resistance. The aim is to turn the
processes of making and explaining holistic judgments into positive enablers for
student learning.
Acknowledgment I am grateful to Gordon Joughin for his constant support and critical
readings of different versions of this chapter during its development. His many suggestions
for improvement have been invaluable.
References
Bloxham, S., & West, A. (2004). Understanding the rules of the game: marking peer assess-
ment as a medium for developing students’ conceptions of assessment. Assessment &
Evaluation in Higher Education, 29, 712–733.
Braddock, R., Lloyd-Jones, R., & Schoer, L. (1963). Research in written composition.
Urbana, Ill.: National Council of Teachers of English.
Burke, E. (1759). A philosophical enquiry into the origin of our ideas of the sublime and beautiful,
2nd ed. London: Dodsley. (Facsimile edition 1971. New York: Garland).
Chi, M. T. H., Glaser, R., & Farr, M. J. (Eds.), (1988). The nature of expertise. Hillsdale,
NJ: Lawrence Erlbaum.
Dewey, J. (1939). Theory of valuation. (International Encyclopedia of Unified Science, 2 (4)).
Chicago: University of Chicago Press.
Ericsson, K. A., & Smith, J. (Eds.), (1991). Toward a general theory of expertise: Prospects and
limits. New York: Cambridge University Press.
Freeman, R., & Lewis, R. (1998). Planning and implementing assessment. London: Kogan Page.
Huba, M. E., & Freed, J. E. (2000). Learner-centered assessment on college campuses: Shifting
the focus from teaching to learning. Needham Heights, Mass: Allyn & Bacon.
Lloyd-Jones, R. (1977). Primary trait scoring. In C. R. Cooper & L. Odell (Eds.), Evaluating
writing: Describing, measuring, judging. Urbana, Ill.: National Council of Teachers of
English.
Meehl, P. E. (1996). Clinical versus statistical prediction: A theoretical analysis and a review of
the evidence (New Preface). Lanham, Md: Rowan & Littlefield/Jason Aronson. (Original
work published 1954).
4 Transforming Holistic Assessment and Grading 63
Morgan, C., Dunn, L., Parry, S., & O’Reilly, M. (2004). The student assessment handbook:
New directions in traditional and online assessment. London: RoutledgeFalmer.
Orsmond, P., Merry, S., & Reiling, K. (2000). The use of student derived marking criteria in
peer and self-assessment. Assessment & Evaluation in Higher Education, 25, 23–38.
Polanyi, M. (1962). Personal knowledge. London: Routledge and Kegan Paul.
Rust, C., Price, M., & O’Donovan, R. (2003). Improving students’ learning by developing
their understanding of assessment criteria and processes. Assessment & Evaluation in
Higher Education, 28, 147–164.
Sadler, D. R. (1980). Conveying the findings of evaluative inquiry. Educational Evaluation and
Policy Analysis, 2(2), 53–57.
Sadler, D. R. (1981). Intuitive data processing as a potential source of bias in naturalistic
evaluations. Educational Evaluation and Policy Analysis, 3(4), 25–31.
Sadler, D. R. (1983). Evaluation and the improvement of academic learning. Journal of
Higher Education, 54, 60–79.
Sadler, D. R. (1985). The origins and functions of evaluative criteria. Educational Theory, 35,
285–297.
Sadler, D. R. (1987). Specifying and promulgating achievement standards. Oxford Review of
Education, 13, 191–209.
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instruc-
tional Science, 18, 119–144.
Sadler, D. R. (1998). Formative assessment: Revisiting the territory. Assessment in Education:
Principles, Policy & Practice, 5, 77–84.
Sadler, D. R. (2002). Ah! . . . So that’s ‘Quality’. In P. Schwartz & G. Webb (Eds.), Assessment:
case studies, experience and practice from higher education (pp. 130–136). London: Kogan
Page.
Sadler, D. R. (2005). Interpretations of criteria-based assessment and grading in higher
education. Assessment and Evaluation in Higher Education, 30, 175–194.
Stevens, D. D., & Levi, A. J. (2004). Introduction to rubrics: an assessment tool to save grading
time, convey effective feedback and promote student learning. Sterling, Va: Stylus
Publishing.
Suskie, L. (2004). Assessing student learning: A common sense approach. Boston, Mass: Anker
Publishing.
Walvoord, B. E., & Anderson, V. J. (1998). Effective grading: A tool for learning and assessment.
Etobicoke, Ontario: John Wiley.
Woolf, H. (2004). Assessment criteria: Reflections of current practices. Assessment & Evalua-
tion in Higher Education, 29, 51–493.
Chapter 5
Faulty Signals? Inadequacies of Grading Systems
and a Possible Response
Mantz Yorke
Criticism of grading is not new. Milton, Pollio and Eison (1986) made a fairly
forceful case that grading was suspect, but to little avail – not surprisingly, since
a long-established approach is difficult to dislodge. There is now more evidence
that grading does not do what many believe it does (especially in respect of
providing trustworthy indexes of student achievement), so the time may be ripe
for a further challenge to existing grading practices.
Grade is inherently ambiguous as a term, in that it can be used in respect of
raw marks or scores, derivatives of raw marks (in the US, for example, the letter
grades determined by conversion of raw percentages), and overall indexes of
achievement. In this chapter the ambiguity is not entirely avoided: however, an
appreciation of the context in which the term ‘grade’ is used should mitigate the
problem of multiple meaning.
Grades are signals of achievement which serve a number of functions includ-
ing the following:
informing students of their strengths and weaknesses;
informing academic staff of the success or otherwise of their teaching;
providing institutions with data that can be used in quality assurance and
enhancement; and
providing employers and others with information germane to recruitment.
For summative purposes, grading needs to satisfy a number of technical
criteria, amongst which validity and reliability are prominent. (For formative
purposes, the stringency of the criteria can be relaxed – a matter that will not be
pursued here.)
A detailed study of grading in Australia, the UK and the US (Yorke, 2008)
suggests that the signals from grading are not always clear – indeed, the stance
taken in this chapter is that they are generally fuzzy and lack robustness.
M. Yorke
Department of Educational Research, Lancaster University, LA1 4YD, UK
e-mail: mantzyorke@mantzyorke.plus.com
Sampling
Grading Scales
Business Statistics
12 12
10 10
8 8
6 6
4 4
2 2
0 0
1 11 21 31 41 51 61 71 81 91 1 11 21 31 41 51 61 71 81 91
Sociology Healthcare
12 12
10 10
8 8
6 6
4 4
2 2
0 0
1 11 21 31 41 51 61 71 81 91 1 11 21 31 41 51 61 71 81 91
8 8
6 6
4 4
2 2
0 0
1 11 21 31 41 51 61 71 81 91 1 11 21 31 41 51 61 71 81 91
1
It is instructive to look at the statistics produced by the Higher Education Statistics Agency
[HESA] regarding the profiles of honours degree classifications for different subject areas in the
UK (see raw data for ‘Qualifications obtained’ at www.hesa.ac.uk/holisdocs/pubinfo/stud.htm
or, more revealingly, the expression of these data in percentage format in Yorke, 2008, p. 118).
68 M. Yorke
Approaches to Grading
objectives or expected outcomes specified for the work being marked do not
cover everything that they would like to reward (see Sadler, Chapter 4 and, for
empirical evidence, Webster, Pepper & Jenkins, 2000, who found assessors using
unarticulated criteria when assessing dissertations). Unbounded tasks, such as
creative work of varying kinds, are particularly problematic in this respect.
Some would like to add ‘bonus marks’ for unspecified or collateral aspects of
achievement. Walvoord and Anderson (1998), for example, suggest holding back
a ‘‘fudge factor’’ of 10 percent or so that you can award to students whose work shows a
major improvement over the semester. Or you may simply announce in the syllabus and
orally to the class that you reserve the right to raise a grade when the student’s work
shows great improvement over the course of the semester. (p. 99)
The relationship between the grade and its meaning is unclear. In general, a mark
or grade summarises a multidimensional performance, as Exhibit 5.1 indicates.
First Class
It is recognised in all marking schemes that there are several different ways of
obtaining a first class mark. First class answers are ones that are exceptionally
good for an undergraduate, and which excel in at least one and probably several
of the following criteria:
comprehensive and accurate coverage of area;
critical evaluation;
clarity of argument and expression;
integration of a range of materials;
depth of insight into theoretical issues;
originality of exposition or treatment.
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 71
It is noticeable that the expansiveness of the criteria tails off as the level of
performance decreases, reflecting an excellence minus approach to grading that
often appears in lists of this type. The criteria for the lower levels of passing are
inflected with negativities: for example, the criteria for third class in this
example are all stated in terms of inadequacy, such as ‘‘misses key points of
information’’. An outsider looking at these criteria could be forgiven for assum-
ing that third class was equivalent to failure. In contrast, what might a threshold
plus approach look like?
Performances tend not to fit neatly into the prescribed ‘boxes’ of criteria,
often exceeding the expectations established in terms of some criteria and
failing to reach others. Whilst labelling some criteria as essential and others as
desirable helps, it does not solve all the problems of assigning a grade to a piece
of work. Some have thought to apply fuzzy set theory to assessment, on the
grounds that this allows a piece of work to be treated as having membership,
with differing levels of intensity, of the various grading levels available
(see relatively small scale research studies by Biswas, 1995; and Echauz &
Vachtsevanos, 1995, for example). Whilst this has conceptual attractiveness,
the implementation of the approach in practical settings offers challenges that
are very unlikely to be met.
A further issue has echoes of Humpty Dumpty’s claim regarding language in
Alice’s Adventures in Wonderland – words varying in meaning according to the
user. Webster et al. (2000) found evidence of the different meanings given to terms
in general assessment usage, exemplifying the variation in academics’ use of
analysis and evaluating. There is a further point which has an obvious relevance
to student learning – that the recipients of the words may not appreciate the
meaning that is intended by an assessor (e.g., Chanock, 2000). If one of the aims
of higher education is to assist students to internalise standards, feedback com-
ments may not be as successful as the assessor may imagine.
72 M. Yorke
More generally, a number of writers (e.g. Sadler, 1987, 2005; Tan & Prosser,
2004; Webster et al., 2000; Woolf, 2004) have pointed to the fuzziness inherent
in constructs relating to criteria and standards, which suggests that, collectively,
conceptions of standards of achievement may be less secure than many would
prefer.
A grade by itself says nothing about the kind of performance for which it was
awarded – an examination, an essay-type assignment, a presentation, and so on.
Bridges et al. (2002) presented evidence to support Elton’s (1998) contention
that rising grades in the UK were to some extent associated with a shift in
curricular demand from examinations towards coursework. Further, as Knight
and Yorke (2003, pp. 70–1) observe, a grade tells nothing of the conditions
under which the student’s performance was achieved. For some, the perfor-
mance might have been structured and ‘scaffolded’ in such a way that the
student was given a lot of guidance as to how to approach the task. Other
students might have achieved a similar level of achievement, but without much
in the way of support. Without an appreciation of the circumstances under
which the performance was achieved, the meaning – and hence the predictive
validity – of the grade is problematic.
The problem for the receiver of the signal is that they have little or no
information about how the grading was performed, and hence are unable to
do other than come to an appreciation of a student’s performance that is
influenced by their own – possibly faulty – presuppositions about grading.
Grade Inflation
2
If the comparison involves a poor and a middling performance, then correspondingly the
subject with the wider spread will exert the stronger effect on the overall classification.
3
See AVCC (2002) in the case of Australia and also Yorke (2008) for details of the UK and a
summary of practice in Australia and the US.
4
The relevance of these depends on the system being considered: all do not apply to all
systems. Brumfield (2004) demonstrates in some detail the variation within higher education
in the US.
74 M. Yorke
higher education) are much less amenable to the charge of grade inflation.
A similar conclusion was reached by Yorke (2008) for institutions in England,
Wales and Northern Ireland. Whatever the merits of the argument regarding
grade inflation, the problem is that those outside the system find difficulty in
interpreting the signals that grading sends.
Rising grades are not necessarily an indication of grade inflation. Space
considerations preclude a detailed discussion of this issue which is treated at
length in Yorke (2008). Table 5.1 summarises distinctions between causes of
rising grades that may or may not be inflationary, some of which have already
been flagged. Note that student choice appears in both columns, depending on
the view one takes regarding students’ exercise of choice.
5
This is not new, since the point was made in the Robbins Report of 1963 (Committee on
Higher Education, 1963).
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 75
This blurring of the distinction between academic study and the workplace
implies a rebalancing of the demands on students, especially in respect of
bachelors degrees but also in the more overtly work-based foundation degrees7
recently developed in some institutions in the UK. The late Peter Knight
referred to the work arena8 as demanding wicked competences which are
‘‘achievements that cannot be neatly pre-specified, take time to develop and
resist measurement-based approaches to assessment’’ (Knight, 2007a, p. 2).
As Knight pointed out, a number of aspects of employability could be
described as wicked competences (see the list in Yorke & Knight, 2004/06,
p. 8), as could the components of ‘graduateness’ listed by the Higher Education
Quality Council (1997b, p. 86).
Whilst the rebalancing of content and process is contentious to some, the
greater (but perhaps less recognised) challenge lies in the area of assessment.
Assessment of performance in the workplace is particularly demanding since it
is typically multidimensional, involving the integration of performances. Eraut
(2004) writes, in the context of medical education but clearly with wider
relevance:
treating [required competences] as separate bundles of knowledge and skills for assess-
ment purposes fails to recognize that complex professional actions require more than
several different areas of knowledge and skills. They all have to be integrated together
in larger, more complex chunks of behaviour. (p. 804)
6
There is a developing body of literature on this theme, including the collection of resources
available on the website of the Higher Education Academy (see www.heacademy.ac.uk/
resources/publications/learningandemployability) and Knight and Yorke (2003, 2004).
7
Two years full time equivalent, with a substantial amount of the programme being based in
the workplace (see Higher Education Funding Council for England, 2000).
8
He also pressed the relevance of ‘wicked’ competences to studying in higher education, but
this will not be pursued here.
76 M. Yorke
Knight (2006) made a powerful case that summative assessments were ‘local’ in
character, meaning that they related to the particular circumstances under
which the student’s performance was achieved. Hence any warranting by the
institution would not necessarily be generalisable (pp. 443–4). He also argued
that there were some aspects of performance that the institution would not be in
any position to warrant, though it might be able to attest that the student did
undertake the relevant activity (for example, a placement).9
Part of the problem of warranting achievement is because of a widely-held,
but often unacknowledged, misperception that assessments are tantamount to
measurements of student achievement. This is particularly detectable in the
insouciant way in which marks or grades are treated as steps on interval scales
when overall indices of achievement are being constructed. However, as Knight
(2006, p. 438) tartly observes, ‘‘True measurement carries invariant meanings’’.
The local character of grades and marks, on his analysis, demolishes any
pretence that assessments are measurements.
Knight is prepared to concede that, in subjects based on science or mathe-
matics, it is possible to determine with accuracy some aspects of student
performance (the kinds of achievement at the lower end of the Bloom, 1956;
and Anderson & Krathwohl, 2001, taxonomies, for instance). However, as soon
as these achievements become applied to real-life problems, wicked compe-
tences enter the frame, and the assessment of achievement moves from what
Knight characterises as ‘quasi-measurement’ to judgement. Where the subject
matter is less susceptible to quasi-measurement, judgement is perforce to the
fore, even though there may be some weak quasi-measurement based on mark-
ing schemes or weighted criteria.
9
See also Knight and Yorke (2003, p. 56) on what can be warranted by an institution, and
what not.
78 M. Yorke
1. Marks and grades are, for a variety of reasons, fuzzier signals than many
believe them to be.
2. They signal judgements more than they act as measures.
3. Whilst judgements may have validity, they will often lack in precision.
4. Judgements may be the best that assessors can achieve in practical situations.
5. If assessments generally cannot be precise, then it is necessary to rethink the
approach to summative assessment.
Imagining an Alternative
Over the past couple of years, debate has taken place in the UK about the utility
of the honours classification system that has long been in use for bachelors
degrees (Universities UK & GuildHE, 2006; Universities UK & Standing Con-
ference of Principals, 2004, 2005). This has led to suggestions that the honours
degree classification is no longer appropriate to a massified system of higher
education, and that it might be replaced by a pass/fail dichotomisation, with the
addition of a transcript of the student’s achievement in curricular components
(this is broadly similar to the Diploma Supplement adopted across the Eur-
opean Union).10 However, the variously constituted groups considering the
matter have not concentrated their attention on the assessments that are cumu-
lated into the classification.
10
‘Pass/not pass’ is probably a better distinction, since students who do not gain the required
number of credits for an honours degree would in all probability have gained a lesser number
of credits (which is a more positive outcome than would be signified by ‘fail’) There has
subsequently been a retreat from the conviction that the classification was no longer fit for
purpose (see Universities UK & GuildHE, 2007).
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 79
11
Some of these have been mentioned above. A fuller analysis can be found in Yorke (2008).
80 M. Yorke
Claims-Making
12
See the tables available by navigating from www.wes.org/gradeconversionguide/ (retrieved
August 8, 2007). Haug (1997) argues, rather as Knight (2006) does in respect of ‘local’
assessment, that a grade has to be understood in the context of the original assessment system
and the understanding has to be carried into the receiving system. This requires more than a
‘reading off’ of a grade awarded on a scale in one system against the scale of the other, since
simplistic mathematical conversions may mislead.
13
The argument is elaborated in Knight and Yorke (2003, pp. 159ff).
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 81
It is only now that this tributary of argument joins the main flow of this
book. Summative assessment is very often focused on telling students how
well they have achieved in respect of curricular intentions – in effect, this is a
judgement from on high, usually summarised in a grade or profile of grades.
14
In the interests of making progress, a number of quite significant reservations regarding the
robustness of summative assessment of academic achievement would need to be set aside.
82 M. Yorke
Claims-making would bring students more into the picture, in that they would
have to consider, on an ongoing basis and at the end of their programmes, the
range of their achievements (some – perhaps many – of which will as a matter of
course be grades awarded by academics). They would have to reflect on what
they have learned (or not), the levels of achievement that they have reached, and
on what these might imply for their futures. Knowing that an activity of this
sort was ‘on the curricular agenda’ would prompt the exercise of reflectiveness
and self-regulation which are of enduring value.
In a context of lifelong learning, would not the involvement of students in
claims-making be more advantageous to all involved, than students merely
being recipients of ex cathedra summative judgements?
References
Adelman, C. (forthcoming). Undergraduate grades: A more complex story than ‘inflation’. In
L. H. Hunt (Ed.), Grade inflation and academic standards. Albany: State University of
New York Press.
Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching and assess-
ment. New York: Addison Wesley Longman.
AVCC [Australian Vice Chancellors’ Committee]. (2002). Grades for honours programs (con-
current with pass degree), 2002. Retrieved October 10, 2007, from http://www.avcc.edu.au/
documents/universities/key_survey_summaries/Grades_for_Degree_Subjects _Jun02.xls
Baume, D., & Yorke, M., with Coffey, M. (2004). What is happening when we assess, and how
can we use our understanding of this to improve assessment? Assessment and Evaluation in
Higher Education, 29(4), 451–77.
Bekhradnia, B. (2006). Demand for higher education to 2020. Retrieved October 14, 2007, from
http://www.hepi.ac.uk/downloads/22DemandforHEto2020.pdf
Biswas, R. (1995). An application of fuzzy sets in students’ evaluation. Fuzzy sets and systems,
74(2), 187–94.
Bloom, B. S. (1956). Taxonomy of educational objectives, Handbook 1: Cognitive domain.
London: Longman.
Brandon, J., & Davies, M. (1979). The limits of competence in social work: The assessment of
marginal work in social work education. British Journal of Social Work, 9(3), 295–347.
Bridges, P., Cooper, A., Evanson, P., Haines, C., Jenkins, D., Scurry, D., Woolf, H., & Yorke,
M. (2002). Coursework marks high, examination marks low: Discuss. Assessment and
Evaluation in Higher Education, 27(1), 35–48.
Brumfield, C. (2004). Current trends in grades and grading practices in higher education:
Results of the 2004 AACRAO survey. Washington, DC: American Association of Collegi-
ate Registrars and Admissions Officers.
Chanock, K. (2000). Comments on essays: Do students understand what tutors write?
Teaching in Higher Education, 5(1), 95–105.
Committee on Higher Education. (1963). Higher education [Report of the Committee
appointed by the Prime Minister under the chairmanship of Lord Robbins, 1961–63].
London: Her Majesty’s Stationery Office.
Dalziel, J. (1998). Using marks to assess student performance: Some problems and alterna-
tives. Assessment and Evaluation in Higher Education, 23(4), 351–66.
Echauz, J. R., & Vachtsevanos, G. J. (1995). Fuzzy grading system. IEEE Transactions on
Education, 38(2), 158–65.
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 83
Eisner, E. W. (1979). The educational imagination: On the design and evaluation of school
programs. New York: Macmillan.
Ekstrom, R. B., & Villegas, A. M. (1994). College grades: An exploratory study of policies and
practices. New York: College Entrance Examination Board.
Elton, L. (1998). Are UK degree standards going up, down or sideways? Studies in Higher
Education, 23(1), 35–42.
Eraut, M. (2004). A wider perspective on assessment. Medical Education, 38(8), 803–4.
Felton, J., & Koper, P. T. (2005). Nominal GPA and real GPA: A simple adjustment that
compensates for grade inflation. Assessment and Evaluation in Higher Education, 30(6),
561–69.
Gibbons, M., Limoges, C., Nowotny, H., Schwartzman, S., Scott, P., & Trow, M. (1994). The
new production of knowledge: The dynamics of science and research in contemporary
societies. London: Sage.
Haug, G. (1997). Capturing the message conveyed by grades: Interpreting foreign grades.
World Education News and Reviews, 10(2), 12–17.
Hawe, E. (2003). ‘It’s pretty difficult to fail’: The reluctance of lecturers to award a failing
grade. Assessment and Evaluation in Higher Education, 28(4), 371–82.
Higher Education Funding Council for England. (2000). Foundation degree prospectus.
Bristol: Higher Education Funding Council for England. Retrieved October 14, 2007,
from http://www.hefce.ac.uk/pubs/hefce/2000/00_27.pdf
Higher Education Quality Council. (1997a). Assessment in higher education and the role of
‘graduateness’. London: Higher Education Quality Council.
Higher Education Quality Council. (1997b). Graduate standards programme: Final report
(2 vols). London: Higher Education Quality Council.
Hornby, W. (2003). Assessing using grade-related criteria: A single currency for universities?
Assessment and Evaluation in Higher Education, 28(4), 435–54.
Jessup, G. (1991). Outcomes: NVQs and the emerging model of education and training. London:
Falmer.
Karran, T. (2005). Pan-European grading scales: Lessons from national systems and the
ECTS. Higher Education in Europe, 30(1), 5–22.
Knight, P. (2006). The local practices of assessment. Assessment and Evaluation in Higher
Education, 31(4), 435–52.
Knight, P. (2007a). Fostering and assessing ‘wicked’ competences. Retrieved October 10, 2007,
from http://www.open.ac.uk/cetl-workspace/cetlcontent/documents/460d1d1481d0f.pdf
Knight, P. T. (2007b). Grading, classifying and future learning. In D. Boud & N. Falchikov
(Eds.), Rethinking assessment in higher education: Learning for the longer term (pp. 72–86).
Abingdon, UK: Routledge.
Knight, P. T., & Yorke, M. (2003). Assessment, learning and employability. Maidenhead, UK:
Society for Research in Higher Education and the Open University Press.
Knight, P., & Yorke, M. (2004). Learning, curriculum and employability in higher education.
London: RoutledgeFalmer.
Mager, R.F. (1962). Preparing objectives for programmed instruction. Belmont, CA: Fearon.
Milton, O., Pollio, H. R., & Eison, J. (1986). Making sense of college grades. San Francisco,
CA: Jossey-Bass.
National Committee of Inquiry into Higher Education (1997). Higher education in the learning
society. NICHE Publications, Middlesex.
Rothblatt, S. (1991). The American modular system. In R. O. Berdahl, G., C. Moodie, &
I. J. Spitzberg, Jr. (Eds.), Quality and access in higher education: Comparing Britain and the
United States (pp. 129–141). Buckingham, England: SRHE and Open University Press.
Sadler, D. R. (1987). Specifying and promulgating achievement standards. Oxford Review of
Education, 13(2), 191–209.
Sadler, D. R. (2005). Interpretations of criteria-based assessment and grading in higher
education. Assessment and Evaluation in Higher Education, 30(2), 176–94.
84 M. Yorke
Tan, K. H. K., & Prosser, M. (2004). Qualitatively different ways of differentiating student
achievement: A phenomenographic study of academics’ conceptions of grade descriptors.
Assessment and Evaluation in Higher Education, 29(3), 267–81.
Universities UK & GuildHE. (2006). The UK honours degree: provision of information – second
consultation. London: Universities UK & GuildHE. Retrieved October 6, 2007, from
http://www.universitiesuk.ac.uk/consultations/universitiesuk/
Universities UK & GuildHE. (2007). Beyond the honours degree classification: The Burgess
Group final report. London: Universities UK and GuildHE. Retrieved 17 October, 2007,
from http://bookshop.universitiesuk.ac.uk/downloads/Burgess_final.pdf
Universities UK & Standing Conference of Principals. (2004). Measuring and recording
student achievement. London: Universities UK & Standing Conference of Principals.
Retrieved October 5, 2007, from http://bookshop.universitiesuk.ac.uk/downloads/
measuringachievement.pdf
Universities UK & Standing Conference of Principals. (2005). The UK honours degree: provision
of information. London: Universities UK & Standing Conference of Principals. Retrieved
October 5, 2007, from http://www.universitiesuk.ac.uk/consultations/universitiesuk/
Walvoord, B. E., & Anderson, V. J. (1998). Effective grading: A tool for learning and assess-
ment. San Francisco: Jossey-Bass.
Webster, F., Pepper, D., & Jenkins, A. (2000). Assessing the undergraduate dissertation.
Assessment and Evaluation in Higher Education, 25(1), 71–80.
Woolf, H. (2004). Assessment criteria: Reflections on current practices. Assessment and
Evaluation in Higher Education, 29(4), 479–93.
Yorke, M. (1998). Assessing capability. In J. Stephenson & M. Yorke (Eds.), Capability and
quality in higher education (pp. 174–191). London: Kogan Page.
Yorke, M. (2003). Going with the flow? First cycle higher education in a lifelong learning
context. Tertiary Education and Management, 9(2), 117–30.
Yorke, M. (2008). Grading student achievement: Signals and shortcomings. Abingdon, UK:
RoutledgeFalmer.
Yorke, M., Barnett, G., Bridges, P., Evanson, P., Haines, C., Jenkins, D., Knight, P.,
Scurry, D., Stowell, M., & Woolf, H. (2002). Does grading method influence honours
degree classification? Assessment and Evaluation in Higher Education, 27(3), 269–79.
Yorke, M., Bridges, P., & Woolf, H. (2000). Mark distributions and marking practices in UK
higher education. Active Learning in Higher Education, 1(1), 7–27.
Yorke, M., Cooper, A., Fox, W., Haines, C., McHugh, P., Turner, D., & Woolf, H. (1996).
Module mark distributions in eight subject areas and some issues they raise. In N. Jackson
(Ed.), Modular higher education in the UK (pp. 105–7). London: Higher Education Quality
Council.
Yorke, M., & Knight, P. T. (2004/06). Embedding employability into the curriculum. York,
England: The Higher Education Academy. Retrieved October 14, 2007, from http://www.
heacademy.ac.uk/assets/York/documents/ourwork/tla/employability/id460_embedding_
employability_into_the_curriculum_338.pdf
Chapter 6
The Edumetric Quality of New Modes
of Assessment: Some Issues and Prospects
Filip Dochy
Introduction
Assessment has played a crucial role in education and training since formal
education commenced. Certainly, assessment of learning has been seen as the
cornerstone of the learning process since it reveals whether the learning process
results in success or not. For many decades, teachers, trainers and assessment
institutes were the only partners seen as crucial in the assessment event.
Students were seen as subjects who were to be tested without having any
influence on any other aspect of the assessment process. Recently, several
authors have called our attention to what is often termed ‘new modes of
assessment’ and ‘assessment for learning’. They stress that assessment can be
used as a means to reinforce learning, to drive learning and to support learning –
preferably when assessment is not perceived by students as a threat, an event
they have to fear, the sword of Damocles. These authors also emphasize that
the way we assess students should be congruent with the way we teach and the
way students learn within a specific learning environment. As such, the ‘new
assessment culture’ makes a plea for integrating instruction and assessment.
Some go even further: students can play a role in the construction of assessment
tasks, the development of assessment criteria, and the scoring of performance
can be shared amongst students and teachers. New modes of assessment that
arise from such thinking are, for example, 908 or 1808 feedback, writing sam-
ples, exhibitions, portfolio assessments, peer- and co-assessment, project and
product assessments, observations, text- and curriculum-embedded questions,
interviews, and performance assessments. It is widely accepted that these new
modes of assessment lead to a number of benefits in terms of the learning
process: encouraging thinking, increasing learning and increasing students’
confidence (Falchikov, 1986, 1995).
F. Dochy
Centre for Educational Research on Lifelong Learning and Participation, Centre
for Research on Teaching and Training, University of Leuven, Leuven, Belgium
e-mail: filip.dochy@ped.kuleuven.be
A quick review of the research findings related to traditional testing and its
effects shows the following.
First of all, a focus on small scale classroom assessment is needed instead of a
focus on large scale high stakes testing. Gulliksen (1985), after a long career in
measurement, stated that this differentiation is essential: ‘‘I am beginning to
believe that the failure to make this distinction is responsible for there having
been no improvement, and perhaps even a decline, in the quality of teacher-
made classroom tests. . .’’ (p. 4). Many investigations have now pointed to the
massive disadvantages of large scale testing (Amrein & Berliner, 2002; Rigsby &
DeMulder, 2003): students learn to answer the questions better, but may not
learn more; a lot of time goes into preparing such tests; learning experiences are
reduced; it leads to de-professionalisation of teachers; changes in content are
not accompanied by changes in teaching strategies; teachers do not reflect on
their teaching to develop more effective practices; effort and motivation of
teachers are decreasing; teachers question whether this is a fair assessment
for all students; and the standards don’t make sense to committed teachers or
skilled practitioners. Such external tests also influence teachers’ assessments.
They often emulate large scale tests on the assumption that this represents good
assessment practice. As a consequence, the effect of feedback is to teach the
weaker student that he lacks ability, so looses confidence in his own learning
(Black & Wiliam, 1998). One of my British colleagues insinuated: ‘‘A-levels are
made for those who can’t reach it, so they realise how stupid they are’’.
In the past decade, research evidence has shown that the use of summative
tests squeezes out assessment for learning and has a negative impact on motiva-
tion for learning. Moreover, the latter effect is greater for the less successful
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 87
students and widens the gap between high and low achievers (Harlen & Deakin
Crick, 2003; Leonard & Davey, 2001). Our research also shows convincingly
that students’ perceptions of the causes of success and failure are of central
importance in the development of motivation for learning (Struyven, Dochy, &
Janssens, 2005; Struyven, Gielen, & Dochy, 2003). Moreover, too much sum-
mative testing does not only affect students’ motivation, but also that of their
teachers. High stakes tests result in educational activities directed towards the
content of the tests. As a consequence, the diversity of learning experiences for
students is reduced and teachers use a small range of instructional strategies.
The latter leads teachers to deprofessionalization (Rigsby & DeMulder, 2003).
Firestone and Mayrowitz (2000) state: ‘‘What was missing . . . was the structures
and opportunities to help teachers reflect on their teaching and develop more
effective practices’’ (p. 745).
Recently, the finding that assessment steers learning has gained a lot of
attention within educational research (Dochy, 2005; Dochy, Segers, Gijbels, &
Struyven, 2006; Segers et al., 2003). But motivation has also been investigated as
one of the factors that is in many cases strongly influenced by the assessment or
the assessment system being used.
tasks. In this case, the student is then capable of drawing conclusions – after
or even during assessment – about the quality of his learning behaviour
(self-generated or internal feedback). He can then resolve to do something
about it.
The influence of summative assessment is less obvious since the post-
assessment effects of summative assessment are often minor. Besides, a
student often does not know after the assessment what he did wrong and
why. A student who passes is usually not interested in how he performed and
in what ways he could possibly improve. The influence of summative assess-
ment on learning behaviour can, instead, be called pro-active. These are pre-
assessment effects. Teachers are often not aware of these effects. There is
evidence that these pre-assessment effects on learning behaviour outweigh
the post-assessment effects of feedback (Biggs, 1996). An important differ-
ence between the pre- and post-assessment effects is that the latter are
intentional, whilst the first are more in the nature of side effects, because
summative assessment intends to orientate, to select or to certify. Nevo
(1995), however, points out the existence of a third effect of assessment on
learning. Students also learn during assessment because, at that moment,
they often have to reorganize their acquired knowledge and they have to
make links and discover relationships between ideas that they had not pre-
viously discovered while studying. When assessment incites students to
thought processes of a higher cognitive nature, it is possible that assessment
becomes a rich learning experience for them. This goes for both formative
and summative assessment. We call this the plain assessment effect. This
effect can spur students to learn (with respect to content), but it does not
really affect learning behaviour, except for the self-generated feedback that
we discussed before.
A further question that can be put is which elements of assessment could be
responsible for these pre- and post-assessment effects: for example, the content
of assessment, the type, the frequency, the standards that are used, the content
of feedback, and the way the feedback is provided (Crooks, 1988; Gielen,
Dochy, & Dierick, 2003, 2007).
In the literature, pre-assessment effects are also ascribed to the way in which an
assessment is carried out. Sometimes, the described negative influence of tests is
immediately linked to a certain mode of assessment, namely multiple choice
examinations. Nevertheless, it is important to pay attention to the content of
assessment, apart from the type, since this test bias can also occur with other
types of assessment.
The frequency of assessment can also influence learning behaviour. Tan
(1992) revealed that the learning environment’s influence on learning beha-
viour can be so strong that it cuts across the students’ intentions. Tan’s
research reveals that the majority of students turn to a superficial learning
strategy in a system of continued summative assessment. They did not do this
because they preferred it and had the intention to do it, but because the
situation gave them incentives to do so. As a result of the incongruity between
learning behaviour and intention, it turned out that many students were really
dissatisfied with the way in which they learnt. They had the feeling that they
did not have time to study thoroughly and to pay attention to their interests.
In this system, students were constantly under pressure. As a result they
worked harder but their intrinsic motivation disappeared and they were
encouraged to learn by heart. Crooks (1988) also drew attention to the fact
that, in relation to the frequency of assessment, it is possible that frequent
assessment is not an aid – it could even be a hindrance – when learning results
of a higher order are concerned, even when assessment explicitly focuses on it.
According to him, a possible explanation could be that students need more
breathing space in order to achieve in-depth learning and to reach learning
results of a higher order. We should note that Crooks’ discussion is of
summative, and not formative, assessment.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 91
In what way does formative assessment play its role with regard to learning
behaviour and are there also differences in the effect depending on the type of
formative assessment? Assessment for learning, or formative assessment, does
not always have positive effects on learning behaviour. Askham (1997) points
92 F. Dochy
The content and type of tasks can, in practice, have a negative effect, as can
formative assessment, and thus undo the positive effect of feedback. If forma-
tive assessment suits the learning goals and the instruction, it can result in the
refining of less effective learning strategies of students and confirming the
results of students who are doing well.
Based upon their review of 250 research studies, Black and Wiliam (1998)
conclude that formative assessment can be effective in improving students’
learning. In addition, when the responsibility for assessment is handed over to
the student, the formative assessment process seems to be more effective.
The most important component of formative assessment, as far as influence
on learning behaviour is concerned, is feedback. Although the ways in which
feedback can be given and received are endless, we can differentiate between
internal, self-generated and external feedback (Butler & Winne, 1995). The
latter form can be further split up into immediate or delayed feedback, global
feedback or feedback per criterion, with or without suggestions for improve-
ment. The characteristics of effective feedback, as described by Crooks (1988),
have already been discussed. If we want to know why feedback works, we have
to look into the functions of this feedback.
Martens and Dochy (1997) indicate the two main functions of feedback which
can influence students’ learning behaviour. The first, the cognitive function,
implies that information is provided about the learning process and the thorough
command of the learning goals. This feedback has a direct influence on the
knowledge and views of students (confirmation of correct views, improvement
or restructuring of incorrect ones), but can also influence the cognitive and meta-
cognitive strategies that students make use of during their learning process.
Secondly, feedback can be interpreted as a positive or a negative confirmation.
The impact of feedback as confirmation of the student’s behaviour depends on
the student’s interpretation (e.g. his attribution style). A third function of for-
mative assessment that has nothing to do with feedback, but that supports the
learning process, is the activation of prior knowledge through e.g. prior knowl-
edge tests. This is also a post-assessment effect of assessment.
Martens and Dochy (1997) investigated the effects of prior knowledge tests
and progress tests with feedback for students in higher education. Their results
revealed that the influence of feedback on learning behaviour is limited. An
effect that was apparent was that 24% of the students reported that they studied
certain parts of the course again, or over a longer period, as a result of the
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 93
feedback. Students who were kept informed of the progress they made seemed
to study longer in this self study context. As far as method of study and
motivation are concerned there were, in general, no differences from the control
group. They conclude that adult students are less influenced by feedback from
one or more assessments. Negative feedback did not seem to have dramatically
negative consequences for motivation. There is, however, a tendency towards
interaction with perceived study time. Possible explanations for the unconfirmed
hypotheses are that the content and timing of the feedback were not optimal
and that the learning materials were self-supporting.
Here again we note that the contents, the type/mode and the frequency of
assessment are important factors influencing the impact of formative assess-
ment on learning behaviour. Dochy, Segers, and Sluijsmans (1999) also point
out the importance of the student’s involvement in assessment. The most
important factor, that is unique to formative assessment, is the feedback com-
ponent. This, however, does not always seem to produce the desired effects.
Feedback contents, timing, and the fact that it is attuned to the student’s needs,
seem again to be important. We refer again to the characteristics of effective
feedback from Crooks.
Biggs (1998) asks us, however, not to overlook summative assessment in this
euphoria about formative assessment. First of all, the effects of summative
assessment on learning behaviour are also significant (see above). Secondly,
research from Butler (1988) reveals that formative and summative assessment
can also interact in their effects on learning behaviour and can only be separated
artificially because students always feel the influence of both and because
instruments can also fulfil both functions.
Crooks (1988) and Tan (1992), on the other hand, quote various investiga-
tions that point out the importance of the separation of opportunities for
feedback and summative assessment. Also, when assessment counts as well,
students seem to pay less attention to feedback and they learn less from it. On
the basis of research by McDowell (1995), we have already mentioned a dis-
advantage of the combination of both. Black (1995) also has reservations about
the combination of formative and summative assessment. When summative
assessment is done externally, it has to be separated from the formative aspect
because the negative backwash effects can be detrimental to learning and
inconvenient to the good support of the learning process. But even when
summative assessment is done internally, completely or partially, by the teacher
himself, the relationship between both functions has to be taken care of.
Primarily, the function of assessment must be set down and consequently
decisions about shape and method can be made. Struyf, Vandenberghe, and
Lens (2001), however, point out that in practice this will be mixed up anyway
and that formative assessment will be the basis for a summative judgement on
what someone has achieved.
What matters to us in this discussion is that the teacher or lecturer eventually
pays attention to the consequential validity of assessment, whether it is for-
mative, summative, or even both.
94 F. Dochy
In the textile world in which I spent most of my working life, many of the consumer
tests have been designed with reliability in mind but not validity. One simple example of
this is a crease test which purports to test the tendency of a material to get creased
during use. The problem is that the test is carried out under standard conditions at a
temperature of twenty degrees and a relative humidity of sixty-five percent. When worn
the clothing is usually at a higher temperature and a much higher humidity and textile
fabrics are very sensitive to changes in these variables but still they insist on measuring
under standard conditions, thus making the results less valid for the user.
Various authors have proposed ways to extend the criteria, techniques and
methods used in traditional psychometrics (Cronbach, 1989; Frederiksen &
Collins, 1989; Haertel, 1991; Kane, 1992; Linn, 1993; Messick, 1989). In this
section, an integrated overview is given.
Within the literature on quality criteria for evaluating assessment, a distinc-
tion can be made between authors who present a more expanded vision on
validity and reliability (Cronbach, 1989; Kane, 1992; Messick, 1989) and those
who propose specific criteria, sensitive to the characteristics of new assessment
modes (Baartman, Bastiaens, Kirschner, & Van der Vleuten, 2007; Dierick &
Dochy, 2001; Frederiksen & Collins, 1989; Haertel, 1991; Linn, Baker, &
Dunbar, 1991).
educational assessment. The content aspect of validity means that the range
and type of tasks used in assessment must be an appropriate reflection (content
relevance, representativeness) of the construct domain. Increasing achieve-
ment levels in assessment tasks should reflect increases in expertise in the
construct domain. The substantive aspect emphasizes the consistency between
the processes required for solving the tasks in an assessment, and the processes
used by domain experts in solving tasks (problems). Furthermore, the internal
structure of assessment – reflected in the criteria used in assessment tasks, the
interrelations between these criteria, and the relative weight placed on scoring
these criteria – should be consistent with the internal structure of the construct
domain. If the content aspect (relevance, representativeness of content and
performance standards) and the substantive aspect of validity is guaranteed,
score interpretation, based on one assessment task should be generalisable to
other tasks which assess the same construct. The external aspect of validity
refers to the extent to which the assessment scores’ relationships with other
measures and non-assessment behaviours reflect the expected high, low, and
interactive relationships. The consequential aspect of validity includes evidence
and rationales for evaluating the intended and unintended consequences of
score interpretation and use.
The view that the three traditional validity aspects are integrated within one
concept for evaluating assessment was also suggested by Messick (1989). The
aspect most stressed recently, however, is evaluating the influence of assessment
on education. Additionally, the concept of reliability becomes part of the
construct validating process. Indeed, for new modes of assessment, it is not
important to select the scores of students following a normal curve. The most
important question is how reliable is the judgement that a student is competent
or not?
The generalised aspect of validity investigates the extent to which the deci-
sion that a student is competent can be generalised to other tasks, and thus is
reliable. Measuring reliability then can be interpreted as a question of the
accuracy of the generalisation of assessment results to a broader domain of
competence.
In the following section we discuss whether generalisability theory, rather
than classical reliability theory, can be used to express the extent of the relia-
bility of accurate assessment.
In classical test theory, reliability can be interpreted in two different ways. On the
one hand reliability can be understood as the extent to which agreement between
raters is achieved. In the case of performance assessment, reliability can be
described in terms of agreement between raters on one specified task and on
98 F. Dochy
one specified occasion. The reliability can be raised, according to this definition,
by using detailed procedures and standards to structure the judgements obtained.
Heller, Sheingold, & Mayford (1998), however, propose that measurements
of inter-rater reliability in authentic assessment do not necessarily indicate
whether raters are making sound judgements and also do not provide bases
for improving the technical quality of a test. Differences between ratings some-
times represent more accurate and meaningful measurement than absolute
agreement would. Suen, Logan, Neisworth and Bagnato (in press) argue that
the focus on objectivity among raters, as a desirable characteristic of an assess-
ment procedure, leads to a loss of relevant information. In high stake decisions,
procedures which include ways of weighing high quality information from
multiple perspectives may lead to a better decision than those in which informa-
tion from a single perspective is taken into account.
On the other hand, reliability refers to the consistency of results obtained
from a test when students are re-examined with the same test and the same rater
on a different occasion. According to this concept, reliability can be improved
by using tasks which are similar in format and context. This definition of
reliability in terms of consistency over time is, in the opinion of Bennet
(1993), problematic: between the test and the retest the world of a student will
change and that will affect the re-examination. Therefore, it is difficult to see
how an assessor can assure that the same aspects of the same task can be
assessed in the same way on different occasions. Assessors will look primarily
for developments rather than consistency in scores.
Taking the above definitions into account, it looks as though reliability and
the intention of assessment are opposed to each other. Achieving a high relia-
bility in terms of the student’s knowledge and skills by means of making
concessions will lead to a decrease in content validity. In the new test culture,
the vision of weighing information from multiple perspectives may result in a
better decision than using information taken from a single perspective. In that
case, different test modes can be used and the resulting information from these
different tests can be used to reach accurate decisions.
Classical reliability theory cannot be used in line with the vision of assess-
ment modes to express the reliability of a test because assessment is about
determining the real competence of students with a set of tests and tasks and
not trying to achieve a normally distributed set of results. Additionally, with
classical reliability theory you only can examine the consistency of a test on a
certain occasion, or the level of agreement between two raters.
Assessment involves measuring the competencies of students on different
occasions and in different ways, possibly by different raters. Assessment has to
be considered as a whole and not as a set of separate parts. Thus, the reliability
of the assessment as a whole is far more interesting than the reliability of the
separate parts. It is far more useful to examine in what way and to what extent
the students’ behaviours can be generalised (or transferred) to, for example,
professional reality and the necessary tasks, occasions and raters required for
that purpose.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 99
Apart from an expansion of the traditional validity and reliability criteria, new
criteria can be suggested for evaluating the quality of assessment: transparency,
fairness, cognitive complexity and authenticity of tasks, and directness of assessment
100 F. Dochy
(Dierick, Dochy, & Van de Watering, 2001; Dierick, Van de Watering, &
Muijtjens, 2002; Frederiksen & Collins, 1989; Gielen et al., 2003; Haertel, 1991;
Linn, et al., 1991). These criteria were developed to highlight the unique char-
acteristics of new assessment modes.
An important characteristic of new assessment modes is the kind of tasks
that are used. Authentic assessment tasks question the higher order skills that
stimulate a deeper learning approach by students.
The first criterion that distinguishes new assessment modes from tradi-
tional tests is the extent to which assessment tasks are used to measure
problem solving, critical thinking and reasoning. This criterion is called
cognitive complexity (Linn et al., 1991). To judge whether assessment tasks
meet this criterion, we can analyse whether there is consistency between the
processes required for solving the tasks and those used by experts in solving
such problems. Next, it is necessary to take into account students’ familiarity
with the problems and the ways in which students attempt to solve them
(Bateson, 1994).
Another criterion for evaluating assessment tasks is authenticity. Shepard
(1991) describes authentic tasks as the best indicators of attainment of learning
goals. Indeed, traditional tests always assume an interpretation from student
answers to competence in a specific domain. When using authentic assessment,
interpretation of answers is not needed, because assessment is already a direct
indication of competence.
The criterion authenticity of tasks is closely related to the directness of
assessment. Powers, Fowles, and Willard (1994) argue that the extent to
which teachers can judge competence directly is relevant evidence of the direct-
ness of assessment. In their research, teachers were asked to give a global
judgement about the competence ‘‘general writing skill’’ for writing tasks, with-
out scoring them. Thereafter, trained assessors scored these works, following
predetermined standards. Results indicate that there was a clear correlation
between the global judgement of competence by the teachers and the marks
awarded by the assessors.
When scoring assessment, the criterion of fairness (Linn, et al. 1991) plays an
important role. The central question is whether students have had a fair chance
to demonstrate their real ability. Bias can occur for several reasons. Firstly,
because tasks are not congruent with the received instruction/education.
Secondly, because students do not have equal opportunities to demonstrate
their real capabilities on the basis of the selected tasks (e.g. because they are
not accustomed to the cultural content that is asked for), and thirdly, because of
prejudgment in scoring. Additionally, it is also important that students under-
stand the criteria that are used in assessment. ‘‘Meeting criteria improves
learning’’: communicating these criteria to students when the assignments are
given improves their performances, as they can develop clear goals to strive for
in learning (Dochy, 1999).
A final criterion that is important when evaluating assessment is the trans-
parency of the scoring criteria that are used. Following Frederiksen and Collins
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 101
(1989), the extent to which students can judge themselves and others to be as
reliable as trained assessors provides a good indicator for this criterion.
Though these criteria seem to be good indicators for evaluating the quality of
new assessments, they cannot be seen as completely different from those
formulated by Messick (1994). The difference is that these criteria are more
concrete and more sensitive to the unique characteristics of new assessment
modes. They specify how validity can be evaluated in an edumetric, instead of a
psychometric, way.
The criteria authenticity of tasks and cognitive complexity can be seen as
further specifications of Messick’s (1994) content validity aspect. Authenticity
of tasks means that the content and the level of tasks need to be an adequate
representation of the real problems that occur within the construct/competence
domain that is being measured. To investigate the criterion of cognitive com-
plexity, we need to judge whether solving assessment tasks requires the same
thinking processes that experts use for solving domain-specific problems. This
criterion corresponds to what Messick calls substantial validity.
The criteria directness and transparency are relevant in the context of the
consequential validity of assessment. The way that competence is assessed,
directly or from an interpretation of student’s answers, has an immediate effect
on the nature and content of education and the learning processes of students. In
addition, the transparency of the assessment methods used influences the learning
process of students, as their performance will improve if they know exactly which
assessment criteria will be used (see the above-mentioned argument).
The criterion fairness forms part of both Messick’s (1994) content validity
aspect and his internal structural aspect. To give students a fair chance to
demonstrate what their real capabilities are, the tasks offered need to be
varied so that they contain the whole spectrum of knowledge and skills
needed for the competence measured. Also important is that the criteria
used for assessing a task, and the weight that is given, will be an adequate
reflection of the criteria used by experts for assessing competence in a specific
domain.
On the question of whether there is really a gap between the old and the new
assessment stream in formulating criteria for evaluating the quality of assess-
ment, it can be argued that there is a clear difference in approach and back-
ground. Within psychometrics, the criteria validity and reliability are
interpreted differently.
If we integrate the most important changes within the assessment field with
regard to the criteria for evaluating assessment, conducting a quality assessment
inquiry involves a comprehensive strategy that addresses the evaluation of:
102 F. Dochy
Table 6.1 A framework for collecting supporting evidence for, and examining possible
threats to, construct validity
Procedure Review questions regarding
1. Establish an explicit 1. Validity of the Assessment Tasks
conceptual framework
for the assessment Judging how well the assessment matches the content and
= Construct definition: cognitive specifications of the construct /competency that is
(content and cognitive measured
specifications) +
A. Does the assessment consist of a representative set of
tasks that cover the spectrum of knowledge and skills
needed for the construct/competence being measured?
Purpose B. Are the tasks authentic in that they are representative of
Frame of reference for the real life problems that occur within the construct/
reviewing assessment tasks competence domain that is being measured?
or items with regard to the C. Do the tasks assess complex abilities in that the same
purport construct/ thinking processes are required for solving the tasks
competence that experts use for solving domain-specific problems?
2. Identify rival explanations 2. Validity of Assessment Scoring
for the observed
performance Searching evidence for the appropriateness of the inference
from the performance to an observed score
+
+
A. Is the task congruent with the received instruction/
Collect multiple types of
education?
evidence on the rival
B. Do all students have equal opportunities to demonstrate
explanations of assessment
their capabilities on the basis of the selected tasks?
performance
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 103
What are the arguments in support of the construct validity of new assessment
modes?
Assessment development begins with establishing an explicit conceptual frame-
work that describes the construct domain being assessed: content and cognitive
specifications should be identified. During the first stage, validity inquiry judges
how well assessment matches the content and cognitive specifications of the con-
struct being measured. The defined framework can then be used as a guideline to
select assessment tasks. The following aspects are important to consider. Firstly,
the tasks used must be an appropriate reflection of the construct or, according to
new assessment modes, the competence that needs to be measured. Next, with
regard to the content, the tasks should be authentic in that they are representative
of real life problems that occur in the knowledge domain being measured. Finally,
the cognitive level needs to be complex, so that the same thinking processes are
required that experts use for solving domain-specific problems.
New assessment modes score better on these criteria than standardised tests,
precisely because of their authentic and complex problem characteristics.
The next aspect that needs to be investigated is whether the assessment is valid.
The fairness criterion plays an important role in this. This requires on the
one hand that the assessment criteria fit and are appropriately used, so that they
are an appropriate reflection of the criteria used by experts and that appropriate
weightings are given for assessing different competences. On the other hand, it
requires that students need to have a fair chance to demonstrate their real abilities.
Possible problems that can occur are, firstly, that relevant assessment criteria
could be lacking, so certain competence aspects do not get enough attention.
Secondly, irrelevant, personal assessment criteria could be used. Because assess-
ment measures the ability of students at different times, in different ways, by
different judges, there is less chance that these problems will occur with new modes
of assessment, since any potential bias in judgement will be outweighed. As a result
of this, the totality of all the assessments will give a more precise picture of the real
competence of a person than standardised assessment, where the decision of
whether a student is competent is reduced to one judgement at one time.
Generalisability of Assessment
This step in the validating process investigates to what extent assessment can be
generalised to other tasks that measure the same construct. This indicates that
score interpretation is reliable and supplies evidence that assessment really
measures the intended construct.
106 F. Dochy
Consequences of Assessment
The last question that needs to be asked is what the consequences are of using a
particular assessment form for instruction, and for the learning process of
students. Linn et al. (1991) and Messick (1995) stressed not only the importance
of the intended consequences, but also the unintended consequences, positive
and negative. Examples of intended educational consequences concern the
increased involvement and quality of the learning of students (Boud, 1995),
improvements in reflection (Brown, 1999), improvements and changes in teach-
ing method (Dancer & Kamvounias, 2005; Sluijsmans, 2002), increased feelings
of ownership and higher performances (Dierick et al., 2001; Dierick et al., 2002;
Farmer & Eastcott, 1995; Gielen et al., 2003; Orsmond, Merry, & Reiling,
2000), increased motivation and more direction in learning (Frederiksen & Col-
lins, 1989), better reflection skills and more effective use of feedback (Brown,
1999; Gielen et al., 2003) and the development of self-assessment skills and
lifelong learning (Baartman, Bastiaens, Kirschner, & Van der Vleuten, 2005;
Boud, 1995). Examples of unintended educational consequences are, among
other things, surface learning approaches and rote learning (e.g., Nijhuis, Segers,
& Gijselaers, 2005; Segers, Dierick, & Dochy, 2001), increased test anxiety
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 107
(e.g., Norton et al., 2001) and stress (e.g., Evans, McKenna, & Oliver, 2005;
Pope, 2001, 2005), cheating and tactics to impress teachers (e.g., Norton et al.,
2001), gender bias (e.g., Langan et al., 2005), prejudice and unfair marking (e.g.,
Pond, Ui-Haq, & Wade, 1995), lower performances (e.g., Segers & Dochy, 2001),
and resistance towards innovations (e.g., McDowell & Sambell, 1999).
Consequential validity investigates whether the actual consequences of
assessment are also the expected consequences. This is a very important ques-
tion, as each form of assessment also steers the learning process. This can be
made clear by presenting statements of expected (and unexpected) conse-
quences of assessment to the student population, or by holding semi-structured
key group interviews. Using this last method, unexpected effects also become
clear. When evaluating consequential validity, the following aspects are impor-
tant to consider: what students understand the requirements for assessment to
be; how students prepare themselves for education; what kind of learning
strategies are used by students; whether assessment is related to authentic
contexts; whether assessment stimulates students to apply their knowledge to
realistic situations; whether assessment stimulates the development of various
skills; whether long term effects are perceived; whether effort, instead of mere
chance is actively rewarded; whether breath and depth in learning is rewarded;
whether independence is stimulated by making expectations and criteria expli-
cit; whether relevant feedback is provided for progress; and whether competen-
cies are measured, rather than just the memorising of facts.
From this chapter, we can derive the following practical guidelines to be kept in
mind:
The use of summative tests squeezes out assessment for learning and has a
negative impact on motivation for learning.
Some research reveals that the majority of students turn to a superficial
learning strategy in a system of continued summative assessment.
Using more new modes of assessment supports the recent views on student
learning.
Assessment steers learning.
The consequential validity of assessments should be taken into account when
designing exams. Consequential validity asks what the consequences are of
the use of a certain type of assessment on education and on the students’
learning processes, and whether the consequences that are found are the
same as the intended effects.
The influence of formative assessment is mainly due to the fact that the
results are looked back upon after assessment, as well as the learning pro-
cesses upon which the assessment is based.
108 F. Dochy
Conclusion
In this chapter, it is argued that student centred education implies a new way of
assessing students. Assessment no longer means ‘‘testing’’ students. New assess-
ment modes differ in various aspects from the characteristics of traditional tests.
Students are judged on their actual performance in using knowledge in a
creative way to demonstrate competence. The assessment tasks used are authen-
tic representations of real-life problems. Often, students are given responsibility
for the assessment process, and both knowledge and skills are measured.
Furthermore, in contrast to traditional tests, assessment modes are not stan-
dardised. Finally, assessment implies that at different times, different knowl-
edge and skills are measured by different raters. For new assessment modes the
most important question, with regard to quality, is: How justified is the decision
that a person is competent?
Because of these specific characteristics, the application of traditional theo-
retical quality criteria is not essential. Therefore, in this article attention has
been paid to the consequences of these developments for screening the edu-
metric quality of assessment. It can be concluded that the testing of the theore-
tical meaning of validity and reliability is no longer tenable but needs, at least,
to be expanded and changed. For new assessment modes, it is important to
portray the following quality aspects: cognitive complexity, authenticity of
tasks, fairness, transparency of the assessment procedure, and the influence of
assessment on education.
References
American Educational Research Association, American Psychological Association, National
Council on Measurement in Education (1985). Standards for educational and psychological
testing. Washington, DC: Author.
Amrein, A. L., & Berliner, D. C. (2002). High-stakes testing, uncertainty, and student learning
Education Policy Analysis Archives, 10(18). Retrieved August 15, 2007, from http://epaa.
asu.edu/epaa/v10n18/.
Askham, P. (1997). An instrumental response to the instrumental student: assessment for
learning. Studies in Educational Evaluation, 23(4), 299–317.
Baartman, L. K. J., Bastiaens, T. J., Kirschner, P. A., & Van der Vleuten, C. P. M. (2007).
Teachers’ opinions on quality criteria for competency assessment programmes. Teaching
and Teacher Education. Retrieved August 15, 2007, from http://www.fss.uu.nl/edsci/
images/stories/pdffiles/Baartman/baartman%20et%20al_2006_teachers%20opinions%
20on%20quality%20criteria%20for%20caps.pdf
110 F. Dochy
Baartman, L. K. J., Bastiaens, T. J., Kirschner, P. A., & Van der Vleuten, C. P. M. (2005). The
wheel of competency assessment. Presenting quality criteria for competency assessment
programmes. Paper presented at the 11th biennial Conference for the European Associa-
tion for Research on Learning and Instruction (EARLI), Nicosia, Cyprus.
Bateson, D. (1994). Psychometric and philosophic problems in ‘authentic’ assessment,
performance tasks and portfolios. The Alberta Journal of Educational Research, 11(2),
233–245.
Bennet, Y. (1993). Validity and reliability of assessments and self-assessments of work-based
learning assessment. Assessment & Evaluation in Higher Education, 18(2), 83–94.
Biggs, J. (1996). Assessing learning quality: Reconciling institutional, staff and educational
demands. Assessment & Evaluation in Higher Education, 21(1), 5–16.
Biggs, J. (1998). Assessment and classroom learning: A role for summative assessment?
Assessment in Education: Principles, Policy & Practices, 5, 103–110.
Birenbaum, M. (1994). Toward adaptive assessment – The student’s angle. Studies in Educa-
tional Evaluation, 20, 239–255.
Birenbaum, M. (1996). Assessment 2000. In: M. Birenbaum & F. Dochy, (Eds.). Alternatives
in assessment of achievement, learning processes and prior knowledge. Boston: Kluwer
Academic.
Birenbaum, M., & Dochy, F. (Eds.) (1996). Alternatives in assessment of achievement, learning
processes and prior knowledge. Boston: Kluwer Academic.
Black, P., (1995). Curriculum and assessment in science education: The policy interface.
International Journal of Science Education, 17(4), 453–469.
Black, P. & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education:
Principles, Policy & Practices, 5(1), 7–74.
Boud, D. (1990). Assessment and the promotion of academic values. Studies in Higher
Education, 15(1), 101–111.
Boud, D. (1995). Assessment and learning: Contradictory or complementary? In P. Knight
(Ed.), Assessment for learning in higher education (pp. 35–48). London: Kogan Page.
Brennan, R. L., & Johnson, E. G. (1995). Generalisability of performance assessments.
Educational Measurement: Issues and Practice, 11(4), 9–12.
Brown, S. (1999). Institutional strategies for assessment. In S. Brown, & A. Glasner (Eds.),
Assessment matters in higher education: Choosing and using diverse approaches (pp. 3–13).
Buckingham: The Society of Research into Higher Education/Open University Press.
Butler, D. L. (1988). A critical evaluation of software for experiment development in research
and teaching. Behaviour-Research-Methods, Instruments and Computers, 20, 218–220.
Butler, D. L., & Winne, P. H. (1995). Feedback and self-regulated learning: A theoretical
synthesis. Review of Educational Research, 65(3), 245–281.
Cronbach, L. J. (1989). Construct validation after thirty years. In R. L. Linn (Ed.), Intelli-
gence: Measurement, theory and public policy (pp. 147–171). Chicago: University of Illinois
Press.
Cronbach, L. J., Gleser, G. C., Nanda, H. & Rajaratnam, N. (1972). The dependability of
behavioral measurements: Theory of generalizability for scores and profiles. New York:
Wiley.
Cronbach, L. J., Linn, R. L., Brennan, R. L. & Haertel, E. H. (1997). Generalizability analysis
for performance assessments of students’ achievement or school effectiveness. Educational
and Psychological Measurement, 57(3), 373–399.
Crooks, T. (1988). The impact of classroom evaluation practices on students. Review of
Educational Research, 58(4), 438–481.
Dancer, D., & Kamvounias, P. (2005). Student involvement in assessment: A project designed
to assess class participation fairly and reliably. Assessment and Evaluation in Higher
Education, 30, 445–454.
Dierick, S., & Dochy, F. (2001). New lines in edumetrics: New forms of assessment lead to new
assessment criteria. Studies in Educational Evaluation, 27, 307–329.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 111
Dierick, S., Dochy, F., & Van de Watering, G. (2001). Assessment in het hoger onderwijs.
Over de implicaties van nieuwe toetsvormen voor de edumetrie [Assessment in higher
education. About the implications of new test forms for edumetrics]. Tijdschrift voor Hoger
Onderwijs, 19, 2–18.
Dierick, S., Van de Watering, G., & Muijtjens, A. (2002). De actuele kwaliteit van assessment:
Ontwikkelingen in de edumetrie [The actual quality of assessment: Developments in edu-
metrics]. In F. Dochy, L. Heylen, & H. Van de Mosselaer (Eds.) Assessment in onderwijs:
Nieuwe toetsvormen en examinering in studentgericht onderwijs en competentiegericht onder-
wijs [Assessment in education: New testing formats and examinations in student-centred
education and competence based education] (pp. 91–122). Utrecht: Lemma BV.
Dochy, F. (1999). Instructietechnologie en innovatie van probleemoplossen: over constructie-
gericht academisch onderwijs. Utrecht: Lemma.
Dochy, F. (2001). A new assessment era: Different needs, new challenges. Research Dialogue
in Learning and Instruction, 2(1), 11–20.
Dochy, F. (2005). Learning lasting for life and assessment: How far did we progress? Presidential
address at the EARLI conference 2005, Nicosia, Cyprus. Retrieved October 18, 2007
from http://perswww.kuleuven.be/u0015308/Publications/EARLI2005%20presidential%
20address%20FINAL.pdf
Dochy, F., & Gijbels, D. (2006). New learning, assessment engineering and edumetrics. In
L. Verschaffel, F. Dochy, M. Boekaerts, & S. Vosniadou (Eds.), Instructional psychology:
Past, present and future trends. Sixteen essays in honour of Erik De Corte. New York:
Elsevier.
Dochy, F., & McDowell, L. (1997). Assessment as a tool for learning. Studies in Educational
Evaluation, 23, 279–298.
Dochy, F. & Moerkerke, G. (1997). The present, the past and the future of achievement
testing and performance assessment. International Journal of Educational Research, 27(5),
415–432.
Dochy F., Moerkerke G., & Martens R. (1996). Integrating assessment, learning and instruc-
tion: Assessment of domain-specific and domain transcending prior knowledge and
progress. Studies in Educational Evaluation, 22(4), 309–339.
Dochy, F., Segers, M., Gijbels, D., & Struyven, K. (2006). Breaking down barriers
between teaching and learning, and assessment: Assessment engineering. In D. Boud &
N. Falchikov (Eds.), Rethinking assessment for future learning. London: RoutledgeFalmer.
Dochy, F., Segers, M., & Sluijsmans, D. (1999). The use of self-, peer- and co-assessment in
higher education: A review. Studies in Higher Education, 24(3), 331–350.
Elton L. R. B., & Laurillard D. M. (1979). Trends in research on student learning. Studies in
Higher Education, 4(1), 87–102.
Evans, A. W., McKenna, C., & Oliver, M. (2005). Trainees’ perspectives on the assessment and
self-assessment of surgical skills. Assessment and Evaluation in Higher Education, 30, 163–174.
Falchikov, N. (1986). Product comparisons and process benefits of collaborative peer group
and self-assessments. Assessment and Evaluation in Higher Education, 11(2), 146–166.
Falchikov, N. (1995). Peer feedback marking: Developing peer assessment. Innovations in
Education and Training International, 32(2), 395–430.
Fan, X., & Chen, M. (2000). Published studies of interrater reliability often overestimate
reliability: computing the correct coefficient. Educational and Psychological Measurement,
60(4), 532–542.
Farmer, B., & Eastcott, D. (1995). Making assessment a positive experience. In P. Knight (Ed.),
Assessment for learning in higher education (pp. 87–93). London: Kogan Page.
Feltovich, P. J., Spiro, R. J. & Coulson, R. L. (1993). Learning, teaching, and testing for
complex conceptual understanding. In N. Frederiksen, R. J. Mislevy, & I. I. Bejar (Eds.),
Test theory for a new generation of tests. Hillsdale, NJ: Lawrence Erlbaum.
Firestone, W. A., & Mayrowitz, D. (2000). Rethinking ‘‘high stakes’’: Lessons from the
United States and England and Wales. Teachers College Record, 102, 724–749.
112 F. Dochy
Frederiksen, J. R., & Collins, A. (1989). A system approach to educational testing. Educa-
tional Researcher, 18(9), 27–32.
Frederiksen, N. (1984). The real test bias: Influences of testing on teaching and learning.
American Psychologist, 39(3), 193–202.
Gibbs, G. (1999). Using assessment strategically to change the way students learn, In
S. Brown & A. Glasner (Eds.), Assessment matters in higher education: Choosing and
using diverse approaches, Buckingham: Open University Press.
Gielen, S., Dochy, F., & Dierick, S. (2003). Evaluating the consequential validity of new
modes of assessment: The influence of assessment on learning, including pre-, post-, and
true assessment effects. In M. S. R. Segers, F. Dochy, & E. Cascallar (Eds.), Optimising
new modes of assessment: In search of qualities and standards (pp. 37–54). Dordrecht/
Boston: Kluwer Academic Publishers.
Gielen, S., Dochy, F., & Dierick, S. (2007). The impact of peer assessment on the consequential
validity of assessment. Manuscript submitted for publication.
Gielen, S., Tops, L., Dochy, F., Onghena, P., & Smeets, S. (2007). Peer feedback as a substitute
for teacher feedback. Manuscript submitted for publication.
Gielen, S., Dochy, F., Onghena, P., Janssens, S., Schelfhout, W., & Decuyper, S. (2007).
A complementary role for peer feedback and staff feedback in powerful learning environ-
ments. Manuscript submitted for publication.
Glaser, R. (1990). Testing and assessment; O Tempora! O Mores! Horace Mann Lecture,
University of Pittsburgh, LRDC, Pittsburgh, Pennsylvania.
Gulliksen H. (1985). Creating Better Classroom Tests. Educational Testing Service. Opinion
papers, Reports – evaluative.
Haertel, E. H. (1991). New forms of teacher assessment. Review of Research in Education,
17, 3–29.
Harlen, W., & Deakin Crick, R. (2003). Testing and motivation for learning. Assessment in
Education, 10(2), 169–207.
Heller, J. I., Sheingold, K., & Mayford, C. M. (1998). Reasoning about evidence in portfolios:
Cognitive foundations for valid and reliable assessment. Educational Assessment, 5(1), 5–40.
Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Langan, A. M., Wheater, C. P., Shaw, E. M., Haines, B. J., Cullen, W. R., Boyle, J. C., et al.
(2005). Peer assessment of oral presentations: Effects of student gender, university affilia-
tion and participation in the development of assessment criteria. Assessment and Evalua-
tion in Higher Education, 30, 21–34.
Leonard, M, & Davey, C. (2001). Thoughts on the 11 plus. Belfast: Save the Children Fund.
Linn, R. L. (1993). Educational assessment: Expanded expectations and challenges. Educa-
tional Evaluation and Policy Analysis, 15(1), 1–16.
Linn, R. L., Baker, E., & Dunbar, S. (1991). Complex, performance-based assessment:
Expectations and validation criteria. Educational Researcher, 20(8), 15–21.
Martens, R., & Dochy, F. (1997). Assessment and feedback as student support devices.
Studies in Educational Evaluation, 23(3), 257–273.
Marton, F. & Säljö, R. (1976). On qualitative differences in learning. Outcomes and process.
British Journal of Educational Psychology, 46, 4–11, 115–127.
McDowell, L. (1995). The impact of innovative assessment on student learning. Innovations in
Education and Training International, 32(4), 302–313.
McDowell, L., & Sambell, K. (1999). The experience of innovative assessment: Student
perspectives. In S. Brown, & A. Glasner (Eds.), Assessment matters in higher education:
Choosing and using diverse approaches (pp. 71–82). Buckingham: The Society of Research
into Higher Education/Open University Press.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assess-
ment. Educational Researcher, 18(2), 5–11.
Messick, S. (1994). The interplay of evidence and consequences in the validation performance
assessments. Educational Researcher, 23(2), 13–22.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 113
Struyven, K., Dochy, F., & Janssens, S. (2005). Students’ perceptions about evaluation and
assessment in higher education: a review. Assessment and Evaluation in Higher Education,
30(4), 331–347.
Struyven, K., Gielen, S., & Dochy, F. (2003). Students’ perceptions on new modes of assess-
ment and their influence on student learning: the portfolio case. European Journal of
School Psychology, 1(2), 199–226.
Suen, H. K., Logan, C. R., Neisworth J. T. & Bagnato, S. (1995). Parent-professional
congruence. Is it necessary? Journal of Early Intervention, 19(3), 243–252.
Tan, C. M. (1992). An evaluation of continuous assessment in the teaching of physiology.
Higher Education, 23(3), 255–272.
Thomas, P., & Bain, J. (1984). Contextual dependence of learning approaches: The effects of
assessments. Human Learning, 3, 227–240.
Thomson, K., & Falchikov, N. (1998). Full on until the sun comes out: The effects of
assessment on student approaches to studying. Assessment & Evaluation in Higher Educa-
tion, 23(4), 379–390.
Topping, K. (1998). Peer-assessment between students in colleges and universities. Review of
Educational Research, 68(3), 249–276.
Trigwell, K., & Prosser, M. (1991a). Improving the quality of student learning: The influence
of learning context and student approaches to learning on learning outcomes. Higher
Education, 22(3), 251–266.
Trigwell, K., & Prosser, M. (1991b). Relating approaches to study and quality of learning
outcomes at the course level. British Journal of Educational Psychology, 61(3), 265–275.
Vermunt, J. D. H. M. (1992). Qualitative-analysis of the interplay between internal and
external regulation of learning in two different learning environments. International
Journal of Psychology, 27, 574.
Chapter 7
Plagiarism as a Threat to Learning:
An Educational Response
Jude Carroll
Introduction
J. Carroll
The Oxford Centre for Staff and Learning Development, Oxford Brookes University,
Oxford, UK
e-mail: jrcarroll@brookes.ac.uk
Even a cursory review of media coverage post-2000 would show that plagiar-
ism, both within and without higher education, has a prominent position in
discussions of contemporary behavior and misbehavior. Macfarlane (2007)
refers to current interest as ‘‘almost prurient curiosity’’ and as a modern ‘‘obses-
sion’’ then, like many other commentators, lists recent examples ranging from
the UK Prime Minister’s case for the Iraq war in 2002 which included unauthor-
ized copying, to Dan Brown’s 2006 high profile court case on The Da Vinci
Code. Such stories are common and the fallout significant for high profile cases.
Journalists have been sacked, Vice Chancellors prompted to resign, novelists
have had to withdraw books and musicians have gone to court to protect their
work. Students are commonly referred to these cases when they are introduced
to academic writing requirements and urged not to follow these examples and to
attend to the consequences. For example, the 2007 WriteNow site in the UK
(http://www.writenow.ac.uk/student_authorship.html) primarily encourages
students to be authors rather than copiers and explains why this is important
but it also warns them about plagiarism and showcases these instances.
Resources from the site designed to encourage authorship include the state-
ment, ‘‘Some people in universities hate students who plagiarize!’’ then suggests
that teachers warn their students:
There are quite a lot of people in universities who are fed up with reading student essays
that have been pasted in from web sites. . . ... [S]tudents have to be extremely careful not
to get themselves in a position where they are suddenly accused of cheating and risk
having disciplinary action taken against them.
Connections between cheating and plagiarism such as these are common, but
often they are not warranted.1 Uncoupling plagiarism and integrity is not an
argument for rule breaking to be overlooked. When students plagiarise, they
have not met some important requirements and therefore should not be
awarded academic credit for the learning they have bypassed through plagiar-
ism. Moreover, in some instances, additional penalties are warranted because
the plagiarism also included serious cheating, a point which will be discussed in
more detail below.
By keeping the central focus on learning (rather than student cheating), it is
possible to develop ways other than warning students about the dangers to
deter them from adopting unacceptable methods for generating coursework.
When cases of student plagiarism do occur, learning-centred procedures for
dealing with it makes it more likely that decisions and penalties are fair and
proportionate. However, only some ways of looking at learning are useful in
explaining to students why their plagiarism is important and only some ways
in which students themselves conceive of their responsibilities as learners help
1
The authorship presentation also mentions ‘‘carelessness’’ and ‘‘hurry’’ and the more neutral
‘‘pressure’’ to explain why plagiarism might occur.
7 Plagiarism as a Threat to Learning: An Educational Response 117
students’ lack of initiative for reading outside of the set text and their
preference for spoonfeeding. Nevertheless, universities in Anglophone coun-
tries and in what are often termed Western universities base their assessment
criteria and judgements about whether or not the student has learned
on assumptions that are more akin to constructivist than to reproductive
thinking. This is especially true of coursework. Teachers’ underpinning
constructivist assumptions explain why they are so exercised about student
plagiarism.
Other studies have been more specific to plagiarism and they too look
worrying. McCabe (2006) found that four in ten students regularly cut-and-
paste without citation but did not investigate the extent of the practice in
individual pieces of work. Narrowing the focus to deliberate cheating involving
plagiarism, even lower rates emerge. Two per cent of students in one Canadian
study admitted to commissioning others to write their coursework – but we do
not know how often this happened. Actual experience is lower still as Lambert,
Ellen, and Taylor (2006) found when investigating cases in New Zealand. They
estimated that five per cent of students had a case dealt with informally by their
teachers in the previous academic year and fewer than one per cent were
managed formally, which they presumed constituted the more serious exam-
ples. In 2007, the official figures for cases in the UK, even for very large
universities, were fewer than 100 for upwards of 20,000 students.
In my own institution, we found deliberate cheating involving plagiarism in
0.015% of coursework submitted in one academic year, 2005/2006. Students in
these cases had paid ghost writers, submitted work which had been done by
someone at another university, paid professional programmers to create their
code, or had copied-and-pasted substantial amounts from others’ work into
their own without attribution. Serious cases revealed students cheating by
manipulating their references; some had altered texts using the ‘‘find and
replace’’ button in ways that were intended to create a false assumption in the
assessor as to whose work was being judged. They had stolen others’ work or
coerced others into sharing it. One student had gone to the library, borrowed
someone’s thesis then ripped out the text, returned the cover, scanned the
contents and submitted the work under his own name. (Note: we caught this
particular student because he used his real name when taking out the original!)
These are all unacceptable acts. Some are incompatible with a qualification
and where this was proven, the students did not graduate. All of them are
incompatible with awarding academic credit. However, the number of stu-
dents involved in this type of deliberate cheating is very small – one case in
1500 submissions. Even if, in all these examples, the percentage of cheating
cases were increased in some substantial way to adjust for under-detection
(which undoubtedly occurred), this level of deliberate cheating does not
threaten the overall value of a degree from our university nor does it support
the view that our graduates’ skills and knowledge cannot be trusted. It is not
cause for panic.
It is not just the relatively small number of cases that should reassure those
who might panic and dissuade those who focus too strongly on cheating. Most
of the cases in the previous examples will be one-off instances. Only a small
number of students use cheating as the regular or even exclusive means of
gaining academic credit rather than, as most studies show, a one-off mishand-
ling of assessment requirements). Szabo and Underwood (2004) found six per
cent of students who admitted to deliberate plagiarism said they did so regu-
larly. The two per cent of Canadian students purchasing essays probably did so
only once although sites where students can commission others to do their
7 Plagiarism as a Threat to Learning: An Educational Response 121
coursework usually include a section for users’ comments and in these, satisfied
customers sometimes explicitly state they will use the service again. A few sites
actively promote regular use as, for example, one which offers to produce PhD
dissertations chapter by chapter to allow discussion with supervisors. In every
institution, there are undoubtedly instances where all (or almost all) of a
student’s credit has been acquired by fraudulent means. However, in my own
university, with a student cohort of 18,000, we deal with only a handful of cases
per year which involve repeated acts by the same student.
Students do usually move from the dualist position to the next level of
cognitive development which Perry terms the ‘multiplistic’ stage. Studies on
how rapidly students move to multiplistic thinking show wide variation. One
study by Baxter Magolda (2004) which followed the cognitive development
of a group of US undergraduate students over 16 years found some required
several years to move to the next stage, a few never did so, and most
managed to shift their thinking within their undergraduate study. At the
multiplistic stage, students recognise multiple perspectives on an issue, situa-
tion or problem, however, they do not necessarily develop strategies for
evaluating them. Students at this stage say, ‘‘Well, that’s your opinion and
we are all entitled to our own views’’. A multiplistic thinker is likely to
regard an essay that requires him or her to structure an argument as an
invitation to document a disagreement. A student operating at this level is
likely to find choosing between sources difficult and to view the concept
of authority as clashing with their ‘‘it is all relative’’ thinking. Citation
becomes a stumbling block to the free-flowing expression of the student’s
own ideas and opinions. Again, as for dualist students, teaching about
academic writing and avoiding plagiarism needs to focus on rules and
acceptable behaviours whilst ensuring students know there are value-driven
reasons for academic integrity policies.
Only when students reach the stage Perry terms relativistic are requirements
around plagiarism likely to be genuinely understood and acted upon.
(Note: the previously mentioned longitudinal study found that not all stu-
dents reach this point, even after several years of university study.) Relati-
vistic students, as the term implies, see knowledge as linked to particular
frames of reference and see their relationship to knowledge as detached and
evaluative. Once students see learning in this way, they are able to review
their own thinking and see alternative ways of making decisions. When they
eventually reach a conclusion (and they see this as difficult), relativistic
students are willing to accept that the position is personal, chosen and/or
even defensible. Perry describes students at this level as recognising that those
in authority (i.e. experts, authors of research publications and, indeed, their
own teachers) generally arrive at their positions by similar routes to the
student’s own and therefore are also open to review and questioning. For
relativistic students, using others’ authority to support an argument, citing
information gathered through research and reading, and paraphrasing as a
way of showing that points have been understood and personalised seems
almost second nature (albeit still tricky to do well as I myself find after many
decades of practice). Work produced by relativistic students will be their own
even if the resulting document resembles many which have been written
before.
The challenge comes when it is the teacher’s task to ensure that all students
are able to comply with academic regulations that specify students must do their
own work and avoid plagiarism. These rules apply to all students, including
those with a dualistic view of knowledge, those who have studied previously in
7 Plagiarism as a Threat to Learning: An Educational Response 125
When students are presented with an assignment, they report asking themselves
a series of questions about the task such as
Can I do this?
Can I do it well (or well enough)?
Has someone else already done it and if so, can I find the result?
If I do it myself, what else in my life will suffer?
Is it worth putting in the time and effort to do it or should I spend my time
on other assignments?
If I copy or find someone else’s answer, will I be caught?
If I’m caught, what will happen?
and so on.
By asking themselves a similar series of questions when setting assessment
tasks, teachers can create courses and assessment tasks that encourage learning
and discourage plagiarism. In practice, reviewing one’s own assessments seems
more difficult than doing this with and for one’s colleagues. Often the person who
sets the assignment sees it as a valid and plausible task whereas someone else can
identify opportunities for copying and collusion. Equally, interrogating assess-
ment tasks becomes more likely if it is part of the normal programme/course
design process rather than the responsibility of individual course designers or
teachers. The following section suggests ways to interrogate course designs and
assessment tasks, either individually or as a programme team.
1. Are you sure the students will have been taught the skills they will need
to display in the assessment?
Answering this question positively requires you to be able to list the skills
students will need, especially those usually classed as study skills or linked to
academic writing. Necessary skills extend beyond the use of a specific referen-
cing system to include those for locating and using information, for structuring
an argument, and for using others’ authority to underpin the student’s own
ideas and views.
As well as generic skills, students will also need explicit teaching of discipline-
specific ways to handle an assessment. Some discipline-specific skills will be
linked to the previous generic list which is largely concerned with writing.
Writing an essay as a historian, a biologist or a paediatric nurse will have
specific requirements which must be learned. Whilst some students are espe-
cially good at spotting cues and picking up these matters implicitly, most only
become skilful through explicit instruction, practice and acting on feedback.
For all students, learning to complete assessments in the discipline is part of
learning the discipline itself.
7 Plagiarism as a Threat to Learning: An Educational Response 127
Students who lack the skills to complete assignments often feel they have no
alternative but to plagiarise.
2. Have you designed a task that encourages students to invest time and effort?
Have you designed ways to dissuade them from postponing assessment tasks to
‘the last minute’?
One of the conditions named by Gibbs and Simpson (2004–2005) when they
looked at ways in which assessment supports learning was time on task. Of
course, time on its own will not always lead to learning valued by the teacher.
Students could be distracted by unfocused reading, or side-tracked into creating
an over-elaborate presentation. They could be staring at notes or spending
hours transcribing lectures. However, unless students put in the time necessary,
they cannot develop the understanding and personal development required for
tertiary learning. Course design solutions which counter student procrastina-
tion and focus time on task include early peer review of drafts where students
read each others’ work in progress then comment in writing, with the require-
ment that the student includes the peer feedback in their final submission by
either taking the comments into account or explaining why they did not do so.
Other ways to capture students’ time include asking them to log their efforts in
on-line threaded discussion or to chunk tasks and monitor their completion.
Ensuring activity is occurring is not the same as assessing it. Formative
assessment is time-consuming and unrealistic, given teachers’ workloads. How-
ever, verifying that work has started by asking to see it and then signing and
dating the student’s hard copy for later submission alongside the finished article
can be relatively undemanding on teacher time and hugely encouraging to
students to ‘get going’.
Students who delay work until the last minute often see little alternative to
plagiarism.
3. Are there opportunities for others to discuss and interact with the student’s
assessment artefact during its production ?
This is unlikely to happen unless you specifically design ways in which
students can share and comment on each others’ work into the course. You
might design in peer review and/or feedback as mentioned in the last section;
alternatively, you could set aside some face-to-face time to observe and support
the assessment work as part of scheduled class meetings. Evidence of activity
from students’ peers, workplace mentors, or tutors can enrich the student’s own
understanding as well as authenticate their efforts.
4. Turning to the task itself, can the student find the answer somewhere ? Does the
answer already exist in some form?
Verbs are often crucial here. Asking students to rank, plan, alter, invent or be
ready to debate signals they must do the work whereas asking the student to
show his or her knowledge or to demonstrate understanding of a theory or
idea (‘‘What is the role of smoking cessation in the overall goal of improving
public health?’’) usually needs a few on-line searches, some cut-and-paste and
a bit of text-smoothing. Students tell me they can do this ‘‘work’’ in a short
time – 30 minutes was the guess from a UK student who reported being given
128 J. Carroll
exactly the title involving smoking. Those with reasonable writing skills
engaging in cut-paste-smooth ‘‘authorship’’ (sic) will rarely if ever be identified,
leaving the non-native English speakers and poor writers to trigger the asses-
sor’s suspicions.
In general, all tasks that ask the student to discuss, describe or explain
anything are more appropriately judged through examination, if at all, rather
than through coursework.
5. Can the student copy someone else’s answer?
In general, if the students sense that the answer already exists, either because
the problem is well-known or the issue has been well-covered, then they see
finding the answer as being a sensible way to allocate their time rather than as
being plagiarism. Students report no difficulty finding answers if they suspect
(or know) that the questions have not changed since the last time the course was
run, even if the work was not returned to students. They describe easy ways to
access others’ coursework or locate informal collections of past student work
(and some collections are now located outside of particular universities). Copy-
ing between students is common if the assessment task/problem has one answer
or a small number of ways in which the problem can be solved.
6. Is the brief clear about what will be assessed and which aspects of the work must
be done by the student?
Students often sub-contract parts of the work if they consider it to be less
important or not relevant to the assessment decision. This means that the brief
may need to specify whether or not proofreading is acceptable; who may help
with locating and selecting sources; how much help from fellow students is
encouraged; and conversely, where such help should stop. If these are not
stated, a student must guess or make an effort to ask. Students say they shy
away from taking such matters to teachers, assuming they should use tutorial
time for ‘‘interesting’’ questions instead.
Assessment criteria often state which aspects will be judged for a grade and
students can usefully have these drawn to their attention as a way of clarifying
which work will be assessed.
7. Does the question, in the way that it is posed, push students towards asking
themselves, ‘‘How and where do I find that?’’ or ‘‘How do I make that?’’
Some of the factors that prompt one answer or the other have already been
mentioned, such as novelty and clarity of the assessment brief. Here, the issue
turns more to the nature of the task itself.
Assessments that encourage making, and therefore learning, include those
where students must include any of the following:
their own experience or data;
recent and specific information such as legislation, current events or recently
published material;
application of generic theories in specific, local settings or with particular
individuals such as patients the student has cared for or experiments/projects
the student has carried out;
7 Plagiarism as a Threat to Learning: An Educational Response 129
dualist views of learning, unable to see their responsibilities as being other than
providing answers to teachers’ questions. However, for most students most of
the time, using strategies that support and encourage learning and which
discourage or remove opportunities for copying will tip them into doing their
own work, whether they want to or not. Rethinking assessment and course
design can only be effective if it operates in conjunction with other actions
designed to deal with student plagiarism such as good induction, well-resourced
skills teaching, written guidance, and procedures that are used and trusted by
teaching staff. Thus, the institution as a whole needs an integrated series of
actions to ensure its students are capable of meeting the requirement that they
do their own work because it is only in this way that they do their own learning.
References
Angelo, T. (2006, June). Managing change in institutional plagiarism practice and policy. Key-
note address presented at the JISC International Plagiarism Conference, Newcastle, UK.
Atherton, J. (2005) Learning and teaching: Piaget’s developmental theory. Retrieved August
11, 2007, from http://www.learningandteaching.info/learning/piaget.htm
Au, C., & Entwistle, N. (1999, August). ‘Memorisation with understanding’ in approaches
to studying: cultural variant or response to assessment demands? Paper presented at
the European Association for Research on Learning and Instruction Conference,
Gothenburg, Sweden.
Baxter Magolda, M. (2004). Evolution of a constructivist conceptualization of epistemological
reflection. Educational Psychologist, 30, 31–42.
Bruner, J. (1960). The process of education. Cambridge, MA: Harvard University Press.
Carroll, J. (2007). A handbook for deterring plagiarism in higher education. Oxford: Oxford
Brookes University.
Dewey, J. (1938). Experience and education. New York: Macmillan.
Gibbs, G., & Simpson, C. (2004–2005). Conditions under which assessment supports
students’ learning. Learning and Teaching in Higher Education, 1, 3–31.
Hayes, N., & Introna, L. (2005), Cultural values, plagiarism, and fairness: When plagiarism
gets in the way of learning. Ethics & Behavior, 15(3), 213–231.
Handa, N., & Power, C. (2005). Land and discover! A case study investigating the cultural
context of plagiarism. Journal of University Teaching and Learning Practice, 2(3b), 64–84.
Howard, R. M. (2000). Sexuality, textuality: The cultural work of plagiarism. College English,
62(4), 473–491.
JISC-iPAS Internet Plagiarism Advisory Service Retrieved October 11, 2007, from www.
jiscpas.ac.uk/index.
Lambert, K., Ellen, N., & Taylor, L. (2006). Chalkface challenges: A study of academic
dishonesty amongst students in New Zealand tertiary institutions. Assessment and Evalua-
tion in Higher Education, 31(5), 485–503.
McCabe, D. (2006, June). Ethics in teaching, learning and assessment. Keynote address at the
JISC International Plagiarism Conference, Newcastle, UK.
Macdonald, R., & Carroll, J. (2006). Plagiarism: A complex issue requiring a holistic
approach. Assessment and Evaluation in Higher Education, 31(2), 233–245.
Macfarlane, R. (2007, March 16). There’s nothing original in a case of purloined letters. Times
Higher Education Supplement, p. 17.
Maslen, G. (2003, January 23). 80% admit to cheating. Times Higher Education Supplement.
7 Plagiarism as a Threat to Learning: An Educational Response 131
Norton, L.S., Newstead, S.E., Franklyn-Stokes, A., & Tilley, A. (2001). The pressure of
assessment in undergraduate courses and their effect on student behaviours. Assessment
and Evaluation in Higher Education, 26(3), 269–284.
Park, C. (2003). In other (people’s) words: Plagiarism by university students – literature and
lessons. Assessment and Evaluation in Higher Education, 28(5), 471–488.
Pecorari, D. (2003). Good and original: Plagiarism and patch writing in academic second-
language writing. Journal of Second Language Writing, 12, 317–345.
Perry, W. (1970). Forms of intellectual and ethical development in the college years: A scheme.
New York: Holt.
Piaget, J. (1928). The child’s conception of the world. London: Routledge and Kegan Paul.
Robinson, V., and Kuin, L. (1999). The explanation of practice: Why Chinese students copy
assignments. Qualitative Studies in Education. 12(2), 193–210.
Rust, C., O’Donovan, B., & Price, M. (2005). A social constructivist assessment process
model: How the research literature shows us this could be best practice, Assessment and
Evaluation in Higher Education, 30 (3), 231–240.
Szabo, A., & Underwood, J. (2004). Cybercheats: Is information and communication tech-
nology fuelling academic dishonesty? Active Learning in Higher Education, 5(2), 180–199.
Vygotsky, L. (1962). Thought and language. Cambridge, Mass: M.I.T. Press.
Write Now CETL. (2007). Retrieved September 8, 2007 from www.writenow.ac.uk/index.
html.
Chapter 8
Using Assessment Results to Inform Teaching
Practice and Promote Lasting Learning
Linda Suskie
Introduction
While some may view systematic strategies to assess student learning as merely
chores to satisfy quality assurance agencies and other external stakeholders, for
faculty who want to foster lasting learning assessment is an indispensable tool
that informs teaching practice and thereby promotes lasting learning. Suskie
(2004c) frames this relationship by characterizing assessment as part of a
continual four-step teaching-learning-assessment cycle.
The first step of this cycle is articulating expected learning outcomes. The
teaching-learning-process is like taking a trip – one cannot plot out the route
(curriculum and pedagogies) without knowing the destination (what students
are to learn). Identifying expected student learning outcomes is thus the first
step in the process. The more clearly the expected outcomes are articulated
(i.e., articulating that one plans to visit San Francisco rather than simply the
western United States), the easier it is to assess whether the outcome has been
achieved (i.e., whether the destination has been reached).
The second step of the teaching-learning-assessment cycle is providing suffi-
cient learning opportunities, through curricula and pedagogies, for students to
achieve expected outcomes. Students will not learn how to make an effective
oral presentation, for example, if they are not given sufficient opportunities to
learn about the characteristics of effective oral presentations, to practice deli-
vering oral presentations, and to receive constructive feedback on them.
The third step of the teaching-learning-assessment cycle is assessing how well
students have achieved expected learning outcomes. If expected learning out-
comes are clearly articulated and if students are given sufficient opportunity to
achieve those outcomes, often this step is not particularly difficult – students’
learning opportunities become assessment opportunities as well. Assignments
in which students prepare and deliver oral presentations, for example, are not
just opportunities for them to hone their oral presentation skills but also
L. Suskie
Middle States Commission on Higher Education, Philadelphia, PA, USA
e-mail: LSuskie@msche.org
Today’s faculty are, in many ways, living in a golden age of education: their
teaching practices can be informed by several decades of extensive research
and publications documenting teaching practices that promote deep, lasting
learning (e.g., Angelo, 1993; Association of American Colleges and Universities,
2002; Astin, 1993; Barr & Tagg, 1995; Chickering & Gamson, 1987, 1991; Ewell &
Jones, 1996; Huba & Freed, 2000; Kuh, 2001; Kuh, Schuh, Whitt, & Associates,
1991; Light, 2001; McKeachie, 2002; Mentkowski & Associates, 2000; Palmer,
1998; Pascarella, 2001; Pascarella & Terenzini, 2005; Romer, & Education
Commission of the States, 1996). Suskie (2004b) has aggregated this work into
a list of 13 conditions under which students learn most effectively (p. 311):
1. Students understand course and program goals and the characteristics of
excellent work.
2. They are academically challenged and given high but attainable
expectations.
3. They spend more time actively involved in learning and less time listening to
lectures.
4. They engage in multidimensional real-world tasks in which they explore,
analyze, justify, evaluate, use other thinking skills, and arrive at multiple
solutions. Such tasks may include realistic class assignments, field experi-
ences, and service-learning opportunities.
8 Using Assessment Results to Inform Teaching Practice 135
5. The diversity of their learning styles is respected; they are given a variety of
ways to learn and to demonstrate what they’ve learned.
6. They have positive interactions with faculty and work collaboratively with
fellow students.
7. They spend significant time studying and practicing.
8. They receive prompt, concrete feedback on their work.
9. They have opportunities to revise their work.
10. They participate in co-curricular activities that build on what they are
learning in the classroom.
11. They reflect on what and how they have learned and see coherence in their
learning.
12. They have a synthesizing experience such as a capstone course, independent
study, or research project.
13. Assessments focus on the most important course and program goals and
are learning activities in their own right.
These principles elucidate two ways that assessment activities play a central
role in promoting deep, lasting learning. First, as Dochy has noted in Chapter 6,
providing a broad array of activities in which students learn and are assessed
(Principle 5) promotes deep, lasting learning (Biggs, 2001; Entwistle, 2001). This
rich array of evidence also provides a more complete and more meaningful
picture of student learning, making assessment evidence more usable and useful
in understanding and improving student learning. At its best, this broad array
of learning/assessment activities is designed to incorporate three other princi-
ples for promoting lasting learning:
Principle 5: Students learn more effectively when the diversity of their learning
styles is respected and they are given a variety of ways to learn and to demonstrate
what they’ve learned. Suskie (2000) has noted that, because every assessment is
inherently imperfect, any decisions related to student learning should be based
on multiple sources of evidence. Furthermore, because every assessment favors
some learning styles over others, students should have ‘‘a variety of ways to
demonstrate what they’ve learned’’ (p. 8).
Principle 11: Students learn more effectively when they reflect on what and how
they have learned and see coherence in their learning. In Chapter 4, Sadler has
argued extensively on the educational value of students’ practicing and devel-
oping skill in critical self-appraisal. Self-reflection fosters metacognition: learn-
ing how to learn by understanding how one learns (Suskie, 2004a).
Principle 12: Students learn more effectively when they have a synthesizing
experience such as a capstone course, independent study, or research project. Such
experiences give students opportunities to engage in synthesis – integrating what
they have learned over their academic careers into a new whole (Henscheid, 2000).
The second way that assessment activities play a central role in promoting
deep, lasting learning is through the time and energy students spend learning
what they will be graded on, as Dochy has discussed extensively in Chapter 6.
Indeed, Entwistle (2001) asserts, ‘‘It is not the teaching-learning environment
136 L. Suskie
itself that determines approaches to studying, but rather what students believe
to be required’’ (p. 16). Biggs (2001) concurs, noting, ‘‘[Students] see the assess-
ment first, then learn accordingly, then at last see the outcomes we are trying to
impart’’ (p. 66). The assessments that are used to grade students can thus be a
powerful force influencing what and how students learn. This idea is captured in
the final principle: ‘‘Assessments focus on the most important course and
program goals and are learning activities in their own right.’’ Four of the
principles are keys to making this happen:
Principle 1: Students learn more effectively when they understand course and
program goals and the characteristics of excellent work. As Sadler has discussed
in Chapter 4, students who want to do well learn more effectively when they
understand clearly why they have been given a particular assignment, what they
are expected to learn from it, and how they will be graded on it. If students
know, for example, that their papers will be graded in part on how effective the
introduction is, some will try their best to write an effective introduction.
Principle 2: Students learn more effectively when they are academically chal-
lenged and given high but attainable expectations. While there is a limit to what
students can achieve in a given amount of time (even the best writing faculty, for
example, cannot in a few weeks enable first-year students to write at the level
expected of doctoral dissertations), many students respond remarkably well to
high standards, provided that they are given a clear roadmap on how to achieve
them (De Sousa, 2005). First-year students, for example, may be able to write
quite competent library research papers if the research and writing process is
broken down into relatively small, manageable steps (identifying the research
topic, finding appropriate library resources, reading and analyzing them, out-
lining the paper, etc.) and students are guided through each step.
Principle 7: Students learn more effectively when they spend significant time
studying and practicing. While this is an obvious truism, there is evidence that
students in some parts of the world spend relatively little time on out-of-class
studies. Those who attended college a generation or two ago may recall the
adage to study two hours outside of class for every hour spent in class. For a
full-time student spending 15 hours in class, this would mean spending about 30
hours on out-of-class studies. But according to the National Survey of Student
Engagement (2006), about 45% of today’s American college freshmen and
seniors report spending less than 11 hours a week preparing for class (studying,
reading, writing, doing homework or lab work, analyzing data, rehearsing, and
other academic activities). Only about ten percent spend more than 25 hours a
week preparing for class. While these results reflect part-time as well as full-time
students, it is nonetheless clear that, at least in the United States, much more
could be done to have students take responsibility for their learning.
Principle 8: Students learn more effectively when they receive prompt, concrete
feedback on their work. As Dochy has discussed in Chapter 6, we all benefit from
constructive feedback on our work, and students are no different (Ovando,
1992). The key is to get feedback to students quickly enough that they can use
the feedback to improve their learning. Faculty know all too well that final
8 Using Assessment Results to Inform Teaching Practice 137
exams graded after classes end are often not retrieved by students and, if they
are, checked only for the final grade. The problem is that providing construc-
tive, timely feedback can take a great deal of time. Here are some practical
suggestions on ways to minimize that time:
Return ungraded any assignments that show little or no effort, with a request
that the paper be redone and resubmitted within 24 hours. As Walvoord and
Anderson (1998) point out, if students make no effort to do an assignment
well, why should the professor make any effort to offer feedback?
When appropriate (see Sadler, Chapter 4 of this book) use rubrics (rating
scales or scoring guides) to evaluate student work. They speed up the process
because much feedback can be provided simply by checking or circling
appropriate boxes rather than writing comments.
Use Haswell’s (1983) minimal marking method: rather than correct gramma-
tical errors, simply place a check in the margin next to the error, and require
the student to identify and correct errors in that line.
Provide less feedback on minor assignments. Some short homework assign-
ments, for example, can be marked simply with a plus symbol for outstand-
ing work, a checkmark for adequate work, and a minus symbol for minimal
effort.
MacGregor, Tinto, and Lindblad (2001) note, ‘‘If you’re not clear on the goals
of an assessment and the audiences to which that assessment will be directed, it’s
hard to do the assessment well. So your first task is to ask yourself why, with
whom, and for what purpose you are assessing. . .’’ (p. 48). Assessment results
can inform decisions in at least five areas:
Learning outcomes. Assessment results might help faculty decide, for exam-
ple, if their statements of expected learning outcomes are sufficiently clear and
focused or if they have too many intended learning outcomes to cover in the
allotted instructional time.
Curriculum. Assessment results might help faculty decide, for example, if
classes or modules taught by several faculty have sufficient uniformity across
sections or whether a service-learning component is achieving its goal.
138 L. Suskie
Pedagogy. Assessment results might help faculty decide, for example, whether
online instruction is as effective as traditional classroom-based instruction or
whether collaborative learning is more effective than traditional lectures.
Assessment. Assessment results can, of course, help faculty decide how useful
their assessment strategies have been and what changes are needed to improve
their effectiveness.
Resource allocations. Assessment results can provide a powerful argument
for resource allocations. Disappointing evidence of student writing skills, for
example, can lead to fact-based arguments for, say, more writing tutors or more
online writing software. Poor student performance on a technology examina-
tion required for licensure in a profession can be a compelling argument for
upgrading technologies in laboratories.
Understanding and articulating the decisions that a particular assessment
will inform helps to ensure that the assessment results will indeed help enlighten
those decisions. Suppose, for example, that a professor is assessing student
learning in a biology laboratory. If the results are to inform decisions about
the laboratory’s learning outcomes, for example, the assessments will need to be
designed to assess each expected outcome. If the results are to inform decisions
about curriculum, the assessments will need to be designed to assess each aspect
of the curriculum. And if the results are to inform decisions about pedagogy, the
assessments may need to be designed to provide comparable information on
different pedagogical approaches.
have learned? The key is to ensure that the steps that are taken build upon
practices that promote deep, lasting learning.
What follow are practical suggestions for summarizing, interpreting, and
using assessment results to answer these questions for three different assessment
practices: the use of rubrics (rating scales or scoring guides), multiple choice
tests, and reflective writing exercises.
Rubrics
As Sadler has discussed in Chapter 4, rubrics are a list of the criteria used to
evaluate student work (papers, projects, performances, portfolios, and the like)
accompanied by a rating scale. Using rubrics can be a good pedagogical
practice for several reasons:
Creating a rubric before the corresponding assignment is developed, rather
than vice versa, helps to ensure that the assignment will address what the
professor wants students to learn.
Giving students the rubric along with the assignment is an excellent way to help
them understand the purpose of the assignment and how it will be evaluated.
Using a rubric to grade student work ensures consistency and fairness.
Returning the marked rubric to students with their graded assignment gives
them valuable feedback on their strengths and weaknesses and helps them
understand the basis of their grade.
In order to understand rubric results and use them to inform teaching
practice, it is helpful to tally students’ individual ratings in a simple chart.
Table 8.1 provides a hypothetical example for the results of a rubric used to
evaluate the portfolios of 30 students studying journalism.
It is somewhat difficult to understand what Table 8.1 is saying. While it
seems obvious that students performed best on the fourth criterion (‘‘under-
standing professional ethical principles and working ethically’’), their other
relative strengths and weaknesses are not as readily apparent. Table 8.1
would be more useful if the results were somehow sorted. Let us suppose that,
in this hypothetical example, the faculty’s goal is that all students earn at least
‘‘very good’’ on all criteria. Table 8.2 sorts the results based on the number of
students who score either ‘‘excellent’’ or ‘‘very good’’ on each criterion. Table 8.2
also converts the raw numbers into percentages, because this allows faculty to
compare student cohorts of different sizes. The percentages are rounded to the
nearest whole percentage, to reduce the volume of information to be digested
and to keep the reader from focusing on trivial differences.
Now the results jump out at the reader. It is immediately apparent that
students not only did relatively well on the fourth criterion but also on the
first, third and fifth. It is equally apparent that students’ weakest area (of those
evaluated here) is the tenth criterion (‘‘applying basic numerical and statistical
142 L. Suskie
Table 8.1 An example of tallied results of a rubric used to evaluate the portfolios of 30
students studying journalism
The student: Excellent Very Adequate Inadequate
Good
1. Understands and applies the 15 14 1 0
principles and laws of freedom of
speech and press.
2. Understands the history and role of 18 8 4 0
professionals and institutions in
shaping communications.
3. Understands the diversity of groups 12 17 1 0
in a global society in relationship
to communications.
4. Understands professional ethical 27 3 0 0
principles and works ethically in
pursuit of truth, accuracy,
fairness, and diversity.
5. Understands concepts and applies 12 17 1 0
theories in the use and
presentation of images and
information.
6. Thinks critically, creatively, and 2 20 5 3
independently.
7. Conducts research and evaluates 6 21 3 0
information by methods
appropriate to the
communications profession(s)
studied.
8. Writes correctly and clearly in forms 9 14 6 1
and styles appropriate for the
communications profession(s)
studied and the audiences and
purposes they serve.
9. Critically evaluates own work and 6 19 3 2
that of others for accuracy and
fairness, clarity, appropriate style
and grammatical correctness.
10. Applies basic numerical and 10 8 9 3
statistical concepts.
11. Applies tools and technologies 8 19 3 0
appropriate for the communicat-
ions profession(s) studied.
The validity and usefulness of multiple choice tests is greatly enhanced if they
are developed with the aid of a test blueprint or outline of the knowledge and
skills being tested. Table 8.3 is an example of a simple test blueprint.
In this example, the six listed objectives represent the professor’s key objec-
tives for this course or module, and the third, fourth, and sixth objectives are
considered the most important objectives. This blueprint is thus powerful
evidence of the content validity of the examination and a good framework for
summarizing the examination results, as shown in Table 8.4. Again, the results
in Table 4 have been sorted from highest to lowest to help readers grasp the
results more quickly and easily.
Table 8.4 makes clear that students overall did quite well in determining the
value of t needed to find a confidence interval, but a relatively high proportion
were unable to choose the appropriate statistical analysis for a given research
problem. Table 8.4 provides another clear roadmap for faculty reflection on
students’ relative strengths and weaknesses. The professor might consider how
to address the weakest area – choosing appropriate statistical analyses – by again
reflecting on the practices that promote deep, lasting learning discussed earlier in
this chapter. The professor might, for example, decide to address this skill by
revising or expanding lectures on the topic, giving students additional homework
on the topic, and having students work collaboratively on problems in this area.
Another useful way to review the results of multiple choice tests is to
calculate what testing professionals (e.g., Gronlund, 2005; Haladyna, 2004;
Kubiszyn & Borich, 2002) call the discrimination of each item (Suskie,
2004d). This metric, a measure of the internal reliability or internal consistency
of the test, is predicated on the assumption that students who do relatively well
on a test overall will be more likely to get a particular item correct than those
who do relatively poorly. Table 8.5 provides a hypothetical example of discri-
mination results for a 5-item quiz. In this example, responses of the ten students
with the highest overall scores on this quiz are compared against those of the ten
students with the lowest overall quiz scores.
Reflective Writing
‘‘documenting learning and providing a rich base of shared knowledge’’ (p. 60),
while the Conference on College Composition and Communication notes that
‘‘reflection by the writer on her or his own writing processes and performances
holds particular promise as a way of generating knowledge about writing’’
(2006). Reflective writing may be especially valuable for assessing ineffable
outcomes such as attitudes, values, and habits of mind. An intended student
learning outcome to ‘‘be open to diverse viewpoints’’ would be difficult to assess
through a traditional multiple choice test or essay assignment, because students
would be tempted to provide what they perceive to be the ‘‘correct’’ answer
rather than accurate information on their true beliefs and views.
Because reflective writing seeks to elicit honest answers rather than ‘‘best’’
responses, reflective writing assignments may be assessed and the results used
differently than other assessment strategies. While the structure of a student’s
reflective writing response can be evaluated using a rubric, the thoughts and
ideas expressed may be so wide-ranging that qualitative rather than quantita-
tive assessment strategies may be more appropriate.
Qualitative assessment techniques are drawn from qualitative research
approaches, which Marshall and Rossman (2006) describe as ‘‘naturalistic,’’
‘‘fundamentally interpretive,’’ relying on ‘‘complex reasoning that moves
dialectically between deduction and induction,’’ and drawing on ‘‘multiple
methods of inquiry’’ (p. 2). Qualitative assessment results may thus be
summarized differently than quantitative results such as those from rubrics
and multiple choice tests, which typically yield ratings or scores that can be
summarized using descriptive and inferential statistics. Qualitative research
techniques aim for naturalistic interpretations rather than, say, an average
score.
Qualitative assessment techniques typically include sorting the results into
categories (e.g., Patton, 2002). Tables 8.6 and 8.7 provide an example of a
summary of qualitative assessment results from a day-long workshop on the
assessment of student learning, conducted by the author. The workshop
addressed four topics: principles of good practice for assessment, promoting
an institutional culture of assessment, the articulation of learning outcomes,
and assessment strategies including rubrics. At the end of the day, participants
were asked two questions, adapted from the minute paper suggested by Angelo
Table 8.6 Responses to ‘‘What was the most useful or meaningful thing you
learned today?’’ by Participants at a one-day workshop on assessing student
learning
Percent of respondents (%) Category of response
40 Assessment strategies (e.g., rubrics)
20 Culture of assessment
16 Principles of good practice
10 Articulating learning outcomes
13 Miscellaneous
148 L. Suskie
and Cross (1993): ‘‘What was the most useful or meaningful thing you learned
today?’’ and ‘‘What one question is uppermost on your mind as we end this
workshop?’’
For the first question, ‘‘What was the most useful or meaningful thing you
learned today?’’ (Table 8.6), the author sorted comments into five fairly obvious
categories: the four topics of the workshop plus a ‘‘miscellaneous’’ category. For
the second question, ‘‘What one question is uppermost on your mind as we end
this workshop?’’ (Table 8.7), potential categories were not as readily evident.
After reviewing the responses, the author settled on the categories shown in
Table 8.7, then sorted the comments into the identified categories. Qualitative
analysis software is available to assist with this sorting – such programs search
responses for particular keywords provided by the professor.
The analysis of qualitative assessment results – identifying potential cate-
gories for results and then deciding the category into which a particular
response is placed – is, of course, inherently subjective. In the workshop
example described here, the question, ‘‘Would it be helpful to establish an
assessment steering committee composed of faculty?’’ might be placed into the
‘‘culture of assessment’’ category by one person and into the ‘‘organizing assess-
ment’’ category by another. But while qualitative assessment is a subjective
process, open to inconsistencies in categorizations, it is important to note that
any kind of assessment of student learning has an element of subjectivity, as the
questions that faculty choose to ask of students and the criteria used to evaluate
student work are a matter of professional judgment that is inherently subjective,
however well-informed. Inconsistencies in categorizing results can be mini-
mized by having two readers perform independent categorizations, then review-
ing and reconciling differences, perhaps with the introduction of a third reader
for areas of disagreement.
Qualitative assessment results can be extraordinarily valuable in helping
faculty understand and improve their teaching practices and thereby improve
student learning. The ‘‘Minute Paper’’ responses to this workshop (Tables 8.6
and 8.7) provided a number of useful insights to the author:
The portion of the workshop addressing assessment strategies was clearly
very successful in conveying useful, meaningful ideas; the author could take
satisfaction in this and leave it as is in future workshops.
8 Using Assessment Results to Inform Teaching Practice 149
The portion of the workshop addressing learning outcomes was not especially
successful; while there were few if any questions about this topic, few partici-
pants cited it as especially useful or meaningful. (A background knowledge
probe (Angelo & Cross, 1993) would have revealed that most participants
arrived at the workshop with a good working knowledge of this topic.) The
author used this information to modify her workshop curriculum to limit
coverage of this topic to a shorter review.
Roughly one in eight participants had questions about organizing assess-
ment activities across their institution, a topic not addressed in the work-
shop. The author used this information to modify her workshop curriculum
to incorporate this topic. (Reducing time spent on learning outcomes
allowed her to do this.)
The portion of the workshop addressing promoting a culture of assessment
was clearly the most problematic. While a fifth of all respondents found it useful
or meaningful, two-fifths had questions about this topic when the workshop
concluded. Upon reflection, the author realized that she placed this topic at the
end of the workshop curriculum, addressing it at the end of the day when she
was rushed and participants were tired. She modified her curriculum to move
this topic to the beginning of the workshop and spend more time on it.
As a result of this reflective writing assignment and the subsequent changes
made in curriculum and pedagogy, participant learning increased significantly
in subsequent workshops, as evidenced by the increased proportion of com-
ments citing organizing assessment activities and promoting a culture of assess-
ment as the most useful or meaningful things learned and the smaller
proportion of participants with questions about these two areas.
Conclusion
References
Angelo, T. A. (1993, April). A ‘‘teacher’s dozen’’: Fourteen general, research-based principles
for improving higher learning in our classrooms. AAHE Bulletin, 45(8), 3–7, 13.
Angelo, T. A., & Cross, K. P. (1993). Classroom assessment techniques: A handbook for college
teachers (2nd ed.). San Francisco: Jossey-Bass.
Association of American Colleges and Universities. (2002). Greater expectations: A New
vision for learning as a nation goes to college. Washington, DC: Author.
Astin, A. W. (1993). What matters in college: Four critical years revisited. San Francisco:
Jossey-Bass.
Banta, T. W., & Pike, G. R. (2007). Revisiting the blind alley of value added. Assessment
Update, 19(1), 1–2, 14–15.
Barr, R. B., & Tagg, J. (1995). From teaching to learning: A new paradigm for undergraduate
education. Change, 27(6), 12–25.
Biggs, J. (2001). Assessing for quality in learning. In L. Suskie (Ed.), Assessment to promote
deep learning: Insight from AAHE’s 2000 and 1999 assessment conferences (pp. 65–68).
Branch, W. T., & Paranjape, A. (2002). Feedback and reflection: Teaching methods for
clinical settings. Academic Medicine, 77, 1185–1188.
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for
research. Chicago: Rand McNally.
Chickering, A. W., & Gamson, Z. (1987). Seven principles for good practice in undergraduate
education. AAHE Bulletin, 39(7), 5–10.
8 Using Assessment Results to Inform Teaching Practice 151
Chickering, A. W., & Gamson, Z. (1991). Applying the seven principles for good practice in
undergraduate education. New directions for teaching and learning, No. 47. San Francisco:
Jossey-Bass.
Conference on College Composition and Communication. (2006). Writing assessment: A
position statement. Urbana, IL: National Council of Teachers of English. Retrieved
September 4, 2007, from http://www.ncte.org/cccc/resources/123784.htm
Costa, A., & Kallick B. (2000). Getting into the habit of reflection. Educational Leadership,
57(7), 60–62.
De Sousa, D. J. (2005). Promoting student success: What advisors can do (Occasional Paper
No. 11). Bloomington, Indiana: Indiana University Center for Postsecondary Research.
Entwistle, N. (2001). Promoting deep learning through teaching and assessment. In L. Suskie
(Ed.), Assessment to promote deep learning: Insight from AAHE’s 2000 and 1999 assessment
conferences (pp. 9–20).
Ewell, P. T., & Jones, D. P. (1996). Indicators of ‘‘good practice’’ in undergraduate education:
A handbook for development and implementation. Boulder, CO: National Center for Higher
Education Management Systems.
Gronlund, N. E. (2005). Assessment of student achievement (8th ed.) Boston: Allyn & Bacon.
Haladyna, T. M. (2004). Developing and validating multiple choice items. Boston, MA: Allyn &
Bacon.
Haswell, R. (1983). Minimal marking. College English, 45(6), 600–604.
Henscheid, J. M. (2000). Professing the disciplines: An analysis of senior seminars and
capstone courses (Monograph No. 30). Columbia, South Carolina: National Resource
Center for the First-Year Experience and Students in Transition.
Huba, M. E., & Freed, J. E. (2000). Learner-centered assessment on college campuses: Shifting
the focus from teaching to learning (pp. 32–64). Needham Heights, MA: Allyn & Bacon.
Kubiszyn, T., & Borich, G. D. (2002). Educational testing and measurement: Classroom
application and management (7th ed.) San Francisco: Jossey-Bass.
Kuh, G. (2001). Assessing what really matters to student learning: Inside the National Survey
of Student Engagement. Change, 33(3), 10–17, 66.
Kuh, G. D., Schuh, J. H., Whitt, E. J., & Associates. (1991). Involving colleges: Successful
approaches to fostering student learning and development outside the classroom. San Francisco:
Jossey-Bass.
Light, R. (2001). Making the most of college: Students speak their minds. Cambridge, MA:
Harvard University Press.
Linn, R., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectations and
validation criteria. Los Angeles: University of California Center for Research on Evalua-
tion, Standards, and Student Testing.
Livingston, S. A., & Zieky, M. J. (1982). Passing scores: A manual for setting standards on
performance on educational and occupational tests. Princeton: Educational Testing
Service.
MacGregor, J., Tinto, V., & Lindblad, J. H. (2001). Assessment of innovative efforts: Lessons
from the learning community movement. In L. Suskie (Ed.), Assessment to promote deep
learning: Insight from AAHE’s 2000 and 1999 assessment conferences (pp. 41–48).
McKeachie, W. J. (2002). Teaching tips: Strategies, research, and theory for colleges and
university teachers (11th ed.). Boston: Houghton Mifflin.
Marshall, C., & Rossman, G. B. (2006). Designing qualitative research (4th ed.). Thousand
Oaks, CA: Sage.
Mentkowski, M., & Associates. (2000). Learning that lasts: Integrating learning, development,
and performance in college and beyond. San Francisco, CA: Jossey-Bass.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement. New York:
Macmillan.
Messick, S. (1994). The interplay of evidence and consequences in the validation of perfor-
mance assessments. Educational Researcher, 23(2), 13–23.
152 L. Suskie
Moon, J. (2001). Reflection in learning and professional development: Theory and practice.
London: Routledge.
Moran, D. J., & Malott, R. W. (Eds.). (2004). Evidence-based educational methods. San Diego:
Elsevier Academic Press.
National Survey of Student Engagement. (2006). Engaged learning: Fostering success for all
students: Annual report 2006. Bloomington, IN: Author.
Ovando, M. N. (1992). Constructive feedback: A key to successful teaching and learning.
Austin, TX: University of Texas at Austin, College of Education, Department of Educa-
tion Administration. ERIC Document Reproductive Services No. ED 404 291.
Palmer, P. J. (1998). The courage to teach: Exploring the inner landscape of a teacher’s life.
San Francisco: Jossey-Bass.
Pascarella, E. T. (2001). Identifying excellence in undergraduate education: Are we even close?
Change, 33(3), 19–23.
Pascarella, E. T., & Terenzini, P.T. (2005). How college affects students: A third decade of
research. San Francisco: Jossey-Bass.
Patton, M. Q. (2002). Qualitative research & evaluation methods (3rd ed.). Thousand Oaks,
CA: Sage.
Pike, G. R. (2006). Assessment measures: Value-added models and the Collegiate Learning
Assessment. Assessment Update, 18(4), 5–7.
Romer, R., & Education Commission of the States. (1996, April). What research says about
improving undergraduate education. AAHE Bulletin, 48(8), 5–8.
Suskie, L. (2000, May). Fair assessment practices: Giving students equitable opportunities to
demonstrate learning. AAHE Bulletin, 52(9), 7–9.
Suskie, L. (2004a). Encouraging student reflection. In Assessing student learning: A common
sense guide (pp. 168–184). San Francisco: Jossey-Bass Anker Series.
Suskie, L. (2004b). Using assessment findings effectively and appropriately. In Assessing
student learning: A common sense guide (pp. 300–317). San Francisco: Jossey-Bass Anker
Series.
Suskie, L. (2004c). What is assessment? Why assess? In Assessing student learning: A common
sense guide (pp. 3–17). San Francisco: Jossey-Bass Anker Series.
Suskie, L. (2004d). Writing a traditional objective test. In Assessing student learning:
A common sense guide (pp. 200–221). San Francisco: Jossey-Bass Anker Series.
Suskie, L. (2007). Answering the complex question of ‘‘How good is good enough?’’ Assess-
ment Update, 19(4), 1–2, 14–15.
Walvoord, B., & Anderson, V. J. (1998). Effective grading: A tool for learning and assessment.
San Francisco: Jossey-Bass.
Chapter 9
Instrumental or Sustainable Learning?
The Impact of Learning Cultures on Formative
Assessment in Vocational Education
Kathryn Ecclestone
Introduction
K. Ecclestone
Westminster Institute of Education, Oxford Brookes University, Oxford, UK
e-mail: kecclestone@brookes.ac.uk
A Divided System
The past 30 years has seen a huge shift in British post-school qualifications
from methods and processes based on competitive, norm-referenced examina-
tions for selection, towards assessment that encourages more achievement.
There is now a strong emphasis on raising standards of achievement for every-
one through better feedback, sharing the outcomes and criteria with students
and recording achievement and progress in portfolios. One effect of this shift is
a harmonisation of assessment methods and processes between long-running,
high status academic qualifications in subjects such as history, sociology and
psychology and newer, lower status general vocational qualifications such as
leisure and tourism, health and social care. General academic and general
vocational qualifications now combine external examinations set by an award-
ing body, assignments or course work assessed by an awarding body and course
work or assignments assessed by teachers. Some general academic qualifica-
tions are still assessed entirely by external examinations.
A series of ad hoc policy-based and professional initiatives has encouraged
alternative approaches to both formative and summative assessment, such as
outcome and competence-based assessment, teacher and work-place assess-
ment, and portfolios of achievement. These have blurred the distinction
between summative and formative assessment and emphasised outcome-
based assessment as the main way to raise levels of participation, achievement,
confidence and motivation amongst young people and adults who have not
succeeded in school assessment (see, for example, Jessup, 1991; McNair, 1995;
Unit for Development of Adult and Continuing Education, 1989).
One effect has been to institutionalise processes for diagnostic assessment,
the setting and reviewing of targets, processes to engage students with assess-
ment specifications and criteria, methods of support and feedback to raise grade
attainment or improve competence, and ways of recording achievement. In
vocational and adult education, these processes tend to be seen as a generic
tool for motivation and achievement and for widening participation. In
contrast, similar practices in schools and higher education are more strongly
located in the demands of subject disciplines as the basis for higher achievement
and better student engagement.
The need to be clearer and more precise about the purposes of formative
assessment is confirmed by research which shows that the same assessment
activities or methods can lead to very different kinds of learning in different
contexts. In the Learning How to Learn Project in the Economic and Social
Science Research Council’s Teaching and Learning Research Programme
(TLRP), Marshall and Drummond use the evocative terms spirit and letter of
formative assessment as assessment for learning (AfL) to capture how it was
practised in the classroom:
The ‘spirit’ of AfL. . .we have characterized as ‘high organisation based on ideas’,
where the underpinning principle is promoting pupil autonomy . . ..This contrasts
with those lessons where only the procedures, or ‘letter’ of AfL seem in place. We use
these headings – the ‘spirit’ and ‘letter’ – to describe the types of lessons we watched,
because they have a colloquial resonance which captures the essence of the differences
we observed. In common usage adhering to the spirit implies an underlying principle
which does not allow a simple application of rigid technique. In contrast, sticking to the
letter of a particular rule is likely to lose the underlying spirit it was intended to
embody. (Marshall & Drummond, 2006, p. 137)
They found that teachers working in the spirit of AfL encouraged students to
become more independent and critical learners in contrast to those working in
the letter of AfL, where formative assessment activities were teacher-centred
and teacher-led in order to transmit knowledge and skills. Researchers and
teachers in the Improving Formative Assessment project have found the terms
letter and spirit helpful in characterising formative assessment practices emer-
ging from our research, and we connect them in the paper to instrumental and
sustainable formative assessment.
This useful distinction of spirit and letter illuminates the ways in which
formative assessment might enable students to go beyond extrinsic success in
158 K. Ecclestone
Motivation
Amotivated
Amotivated learners lack any direction for learning, and are, variously, indif-
ferent or apathetic. Sometimes this state is an almost permanent response to
formal education or assessment and therefore hard to shift, or it appears at
points during a course. There is a sense that amotivated learners are drifting or
hanging on until something better appears. However, it is important to recog-
nise the obvious point, that for all of us at different times, our deepest, most
intrinsic motivation can revert to states where we are barely motivated or not
9 Instrumental or Sustainable Learning? 159
External
Learning takes place largely in association with reinforcement, reward, or to
avoid threat or punishment, including: short-term targets, prescriptive out-
comes and criteria, frequent feedback (this might be about the task, the person’s
ego or feelings or the overall goal) and reviews of progress, deadlines, sanctions
and grades. In post-compulsory education, external motivation sometimes
takes the form of financial incentives (payment to attend classes or rewards
for getting a particular grade) or sanctions (money deducted for non-attendance
on courses).
External motives are sometimes essential at the beginning of a course, or at
low points during it. They are not, therefore, negative and can be used strate-
gically as a springboard for other forms of motivation or to get people through
difficult times in their learning. Yet, if left unchecked, external motives can
dominate learning all the way through a course and lead to instrumental
compliance rather than deep engagement.
Introjected/internalised
Introjected is a therapeutic term where someone has internalised an external
supportive structure and can articulate it as her or his own: in a qualification,
this might comprise the vocabulary and procedures of criteria, targets
and overall assessment requirements. Good specifications of grade criteria
and learning outcomes and having processes or tasks broken into small
steps enable learners to use the official specifications independently of tea-
chers. Nevertheless, although introjected motivation enables students to
articulate the official requirements and criteria almost by rote, it is not self-
determined.
For learners disaffected by assessment in the past, introjected motivation is
powerful and, initially, empowering. However, like external motivation, it can
become a straitjacket by restricting learners and teachers to prioritising the formal
requirements, especially in contexts where contact time and resources are restricted.
Identified
Learning occurs when students accept content or activities that may hold no
incentive in terms of processes or content (they might even see them as a burden)
but which are necessary for attaining a pre-defined goal such as a qualification
and short-term targets. It links closely to introjected motivation, and goals that
students identify with can be course-related, personal and social or all three as a
means to a desirable end.
160 K. Ecclestone
Intrinsic
Learners perceive any incentives as intrinsic to the content or processes of a
formal learning or assessment activity, such as enjoyment of learning something
for its own sake, helping someone else towards mastery or being committed to
others outside or inside the learning group. It is often more prevalent amongst
learners than teachers might assume and takes idiosyncratic, deeply personal
and sometimes fleeting forms. Intrinsic motivation is context-specific: someone
can therefore show high levels of intrinsic motivation in one task or context and
not in another. Learning is highly self-determined and independent of external
contingencies.
Interested
Interested motivation is characterised by learners recognising the intrinsic value
of particular activities and goals and then assigning their own subjective criteria
for what makes something important: these include introjected and identified
motives. Like intrinsic motivation, it relies on students assigning deeply perso-
nal meanings of relevance to content, activities and contexts. It is accompanied
by feelings of curiosity and perhaps risk or challenge, and encouraged by a
sense of flow, connection or continuity between different elements of a task or
situation.
High levels of self-determination, a positive identity or sense of self and the
ability to attribute achievement to factors within one’s own control are inte-
grated in a self-image associated with being a successful learner. Interested
motivation relates closely to Maslow’s well-known notion of self-actualisation,
where identity, learning activities, feelings of social and civic responsibility and
personal development are fused together. It is therefore often correlated with
good peer and social dynamics in a learning group.
Interested motivation characterised ideas about learning and personal
identity amongst some young people in Ball, Maguire and Macrae’s study of
transitions into post-compulsory education, training or work. The processes
and experiences of becoming somebody and of having an imagined future were
not only rooted strongly in formal education but also in their sense of having
positive opportunities in the local labour market and in their social lives.
Formal education therefore played an important and positive, although not
dominant part, in their evolving personal identity (Ball, Maguire, & Macrae,
2000).
Despite the descriptive and analytical appeal of these types, it is, of course,
crucial to bear in mind that they are not stable or neatly separated categories.
Nor can motivation be isolated from structural factors such as class, gender and
race, opportunities for work and education, and students’ and teachers’ percep-
tions of these factors. In formal education, students might combine aspects of
more than one type, they might change from day to day, and they might show
9 Instrumental or Sustainable Learning? 161
interested motivation strongly in one context (such as a hobby) but not at all in
the activities required of them at school, college or university.
The factors that make someone an interested or barely motivated learner are
therefore idiosyncratic, very personal and changeable: the most motivated,
enthusiastic young person can show high levels of interested motivation in
year one of a two year vocational course but be barely hanging on through
external and introjected motivation by the end of year two, simply because they
are tired with a course and want to move on. Some need the incentives of
external rewards and sanctions and to internalise the official demands and
support structures as a springboard to develop deeper forms of motivation
(see Davies & Ecclestone, 2007; Ecclestone, 2002).
Nevertheless, these caveats do not detract from a strong empirical connec-
tion shown in Prenzel’s studies and my own between intrinsic and interested
motivation based on high levels of self-determination and positive evidence of
the conditions listed below and, conversely, amotivation or external motivation
and poor evidence of these conditions:
support for students’ autonomy, e.g., choices for self-determined discovery,
planning and acting;
support for competence, e.g., effective feedback about knowledge and skills
in particular tasks and how to improve them;
social relations, e.g., cooperative working, a relaxed and friendly working
atmosphere;
relevance of content, e.g., applicability of content, proximity to reality,
connections to other subjects (here it is important to note that ‘‘relevance’’
is not limited to personal relevance or application to everyday life);
quality of teaching and assessment, e.g., situated in authentic, meaningful
problem contexts, adapted to students’ starting points; and
teachers’ interest, e.g., expression of commitment to students.
A learning culture is therefore not the same as a list of features that comprise a
course or programme, nor is it a list of factors that affect what goes on in a
learning programme; rather, it is a particular way of understanding the effects
of any course or programme by emphasising the significance of the interactions
and practices that take place within and through it. These interactions and
practices are part of a dynamic, iterative process in which participants (and
environments) shape cultures at the same time as cultures shape participants.
Learning cultures are therefore relational and their participants go beyond the
students and teachers to include parents, college managers at various levels,
policy makers and national awarding bodies.
Learning culture is not, then, synonymous with learning environment since the
environment is only part of the learning culture:
a learning culture should not be understood as the context or environment within
which learning takes place. Rather, ‘learning culture’ stands for the social practices
through which people learn. A cultural understanding of learning implies, in other
words, that learning is not simply occurring in a cultural context, but is itself to be
understood as a cultural practice. (James & Biesta, 2007, p. 18, original emphases)
Instead, the TLC project shows that learning cultures are characterised by the
interactions between a number of dimensions:
the positions, dispositions and actions of the students;
the positions, dispositions and actions of the tutors;
the location and resources of the learning site which are not neutral, but
enable some approaches and attitudes, and constrain or prevent others;
the syllabus or course specification, the assessment and qualification
specifications;
the time tutors and students spend together, their interrelationships, and the
range of other learning sites students are engaged with;
issues of college management and procedures, together with funding and
inspection body procedures and regulations, and government policy;
wider vocational and academic cultures, of which any learning site is
part; and
1
In the TLC project, the term ‘learning site’ was used, rather than ‘course’, to denote more
than classroom learning. In the IFA project we use the more usual terms ‘course’ and
‘programme’.
9 Instrumental or Sustainable Learning? 163
wider social and cultural values and practices, for example around issues of
social class, gender and ethnicity, the nature of employment opportunities,
social and family life, and the perceived status of Further Education as a sector.
Learning culture is intrinsically linked to a cultural theory of learning ‘‘[that]
aims to understand how people learn through their participation in learning
cultures, [and] we see learning cultures themselves as the practices through
which people learn’’ (James & Biesta, 2007, p. 26).
A cultural understanding illuminates the subtle ways in which students and
teachers act upon the learning and assessment opportunities they encounter and
the assessment systems they participate in. Teachers, students, institutional man-
agers, inspectors and awarding bodies all have implicit and explicit values and
beliefs about the purposes of a course or qualification, together with certain
expectations of students’ abilities and motivation. Such expectations are explicit
or implicit and can be realistic or inaccurate. Other influential factors in learning
cultures are the nature of relationships with other students and teachers, their
lives outside college and the resources available to them during the course (such
as class contact time): all these make students active agents in shaping expecta-
tions and practices in relation to the formal demands of a qualification (see, for
example, Ecclestone, 2002; Torrance et al., 2005).
Data and analysis in this section are taken from a paper by Davies and
Ecclestone (2007). The student group at Moorview College, a school in a
rural area of south west England, comprised 16 Year 13 students aged 17–18,
with roughly equal numbers of boys and girls, and three teachers (teaching
the physics, chemistry and biology elements of the course respectively).
The learning culture was marked by a high level of synergy and expansiveness,
of which teachers’ formative assessment practices were a part. Teachers
regarded formative assessment as integral to good learning, ‘‘part of what we
do’’ rather than separate practices, where formative assessment was a subtle
combination of helping students to gain realistic grades, alongside developing
their enthusiasm for and knowledge of scientific principles and issues, and their
skills of self-assessment. It encouraged them to become more independent and
self-critical learners. Strong cohesion between teachers’ expectations, attitudes
to their subject and aspirations for their students was highly significant in the
way they practised formative assessment.
Synergy stemmed from close convergence between teachers and students
regarding both expectations and dispositions to learning. The AVCE teachers
expected students to achieve, while accepting that they did not usually arrive on
the course with such high GCSE grades as those in A-level science subjects.
There was also a general consensus between teachers and students that science
164 K. Ecclestone
answer, because the answer is – no, we should be getting them to get grades. But that’s
never as I’ve seen it and it never will be. (Derek, 3rd interview)
Despite its collaborative nature, this learning culture was rooted in a strong
belief on both sides that the teacher is the most crucial factor in learning:
‘‘I believe they all know they can’t do it without me’’ (Derek, 3rd interview).
When asked what motivated his students on the AVCE, Derek’s answer was
unequivocal: ‘‘I’m going to put me, me, me, me’’ (1st interview).
learning culture. Although the course was not highly practical and did not
include work placements, it did include relevant trips and experimental work.
It was vocational above all, though, in the way the teachers related the knowl-
edge they taught to real-life experience. As Derek summed up:
I think the real life concepts that we try and pull out in everything works very well.
I think we’re incredibly fortunate to have time to teach learning, as opposed to time to
teach content. (1st interview, original emphases)
Students generally had begun the course expecting the work to be reasonably
easy and therefore restrictive and ‘‘safe’’, rather than challenging. In fact, the
teachers’ pedagogy and formative assessment enabled students to accept chal-
lenge and risk, not in what they were learning, but in how they were learning.
Our observations and interviews showed that they found themselves explaining
work to their fellow students, joining in Derek’s explanations, negotiating how
they might go about tasks and losing any initial inhibitions about asking
questions of Derek and of one another.
Data and analysis here are drawn from a different project that observed and
explored students’ and teachers’ attitudes to assessment, and which observed
formative assessment practices (Torrance et al., 2005). I focus here on students
in a further education college, Western Counties, where students progressed
from their local school to the college. The group comprised eleven students,
three boys and eight girls, all of whom had done Intermediate Level Business at
school and who were therefore continuing a vocational track. Students in this
group saw clear differences between academic and vocational qualifications:
academic ones were ‘‘higher status, more well-known’’. Yet, despite the lower
status of the vocational course, students valued its relevance.
Learning and achievement were synonymous and assessment became the
delivery of achievement, mirroring the language of inspection reports and policy
texts. Vocational tutors reconciled official targets for delivery with educational
goals and concerns about students’ prospects. They saw achievement as largely
about growing confidence and ability to overcome previous fears and failures.
Personal development was therefore more important than the acquisition of
skills or subject knowledge: this comment was typical:
[students] develop such a lot in the two years that we have them... the course gives
them an overall understanding of business studies really, it develops their under-
standing and develops them as people, and hopefully sets them up for employment.
It doesn’t train them for a job; it’s much more to develop the student than the content...
that’s my personal view, anyway.... (Vocational course leader, quoted in Torrance
et al., 2005, p. 43)
Some, but not all, students had a strong sense of identity as second chance
learners, an image that their tutors also empathised with from their own
educational experience and often used as a label. This led to particular ideas
about what students liked and wanted. One tutor expressed the widely held view
that ‘‘good assessment’’ comprised practical activities, work-experience and
field trips: ‘‘all the things these kids love... to move away from all this written
assessment’’. For her, assessment should reflect ‘‘the way that [vocational]
students prefer to learn. . .they are often less secure and enjoy being part of
168 K. Ecclestone
one group with a small team of staff...[assessment is] more supported, it’s to do
with comfort zones – being in a more protected environment’’. Beliefs about
‘‘comfort zones’’ and ‘‘protecting’’ students meant that teachers aimed to mini-
mize assessment stress or pressure. Teachers and students liked working in a
lively, relaxed atmosphere that combined group work, teacher input, time to
work on assignments individually or in small friendship-based groups, and
feedback to the whole group about completed assignments.
Motivation
There was a high level of synergy between students’ goals and the official policy
goals their teachers had to aim for, namely to raise attainment of grades,
maintain retention on courses and encourage progression to formal education
at the next level.
Three students were high achievers, gaining peer status from consistently
good work, conscientious studying and high grades. They referred to them-
selves in the same language as teachers, as ‘‘strong A-grade students’’,
approaching all their assignments with confidence and certainty and unable
to imagine getting low grades. They combined identified and interested motiva-
tion from the typology discussed above. Crucial to their positive identity as
successful students was that less confident or lower achieving students saw them
as a source of help and expertise. In an earlier study of a similar advanced level
business group in a further education college, successful students drew upon
this contrast to create a new, successful learning identity (see Ecclestone, 2004).
The majority of students combined the introjected motivation of being
familiar with the detail of the assessment specifications and the identified
motivation of working towards external targets, namely achieving the qualifi-
cation. They worked strategically in a comfort zone, adopting a different grade
identity from the high achievers. Most did not aim for A-grades but for Cs or
perhaps Bs. They were unconcerned about outright failure, since, as teachers
and students knew, not submitting work was the only cause of failure.
Goals for retention and achievement, together with teachers’ goals for
personal development and students’ desire to work in a conducive atmosphere
without too much pressure, encouraged external, introjected and identified
motivation and much-valued procedural autonomy. This enabled students to
use the specifications to aim for acceptable levels of achievement, with freedom
to hunt and gather information to meet the criteria, to escape from ‘‘boring’’
classrooms and to work without supervision in friendship groups (see also
Bates, 1998). Synergy between teachers and students in relation to these features
of the learning culture encouraged comfortable, safe goals below students’
potential capacity, together with instrumental compliance. The official specifi-
cations enabled students to put pressure on teachers to ‘‘cover’’ only relevant
knowledge to pass the assignment, or to pressurise teachers in difficult subjects
to ‘‘make it easy’’.
9 Instrumental or Sustainable Learning? 169
Tutors spent a great deal of time marking draft and final work, starting with
grade criteria from the Es through the Cs to the As (the grade descriptors were
changed from ‘P’, ‘M’, ‘D’ to ‘A’, ‘B’, ‘C’ in 2000). Where students had not done
enough to get an E, teachers offered advice about how to plug and cover gaps,
cross-referenced to the assessment specifications. Students had strong expecta-
tions that teachers would offer advice and guidance to improve their work.
There was a strong sense that assessment feedback and allowing time to do
work on assessed assignments in lessons had replaced older notions of lesson
planning and preparation.
Assignments were meticulously organised and staff and students in both
programmes had internalised the language of ‘‘evidencing the criteria’’, ‘‘sign-
posting’’ and ‘‘cross-referencing’’ and generating ‘‘moderatable evidence’’.
Demands from the awarding body that teachers must generate ‘‘evidence’’
that enabled moderation processes to ensure parity and consistency between
different centres offering the qualification around the country, led to student
assignments for assessment that had similar formats. Teachers in the Advanced
Business course felt that this made idiosyncratic interpretation or genuine local
design of content impossible. Yet, as we saw in the case of AVCE Science above,
170 K. Ecclestone
this sterile approach is not inevitable: the learning culture of this college, with
its strong conformity to official targets, combined with teachers’ beliefs about
students and what the qualification was primarily for, were powerful factors
shaping formative assessment.
In contrast, the learning culture of AVCE Science was shaped by the qualification
design, the subject enthusiasm of the teachers and a clear sense of vocational
knowledge, together with a system of selection as students progress from
compulsory schooling that guaranteed a certain level of achievement and
motivation. Selection was not available for the Advanced Business course: in
order to meet recruitment targets, the college took almost every student who
applied. These features, and the practices and expectations of teachers and
students, combined to produce a much more expansive learning culture, includ-
ing the way formative assessment was conceptualised and practised.
Formative assessment practices as part of the learning culture of a vocational
course are, to some extent, linked to the ways in which managers, practitioners,
parents and students perceive such courses. Differences in the learning cultures
of these two courses have also raised the question of what students and educa-
tors expect from a vocational course. Images and expectations of what counted
as vocational and the kind of status attached to vocational was a key factor in
shaping the learning culture. Its meaning in Advanced Science stemmed
strongly from the way teachers linked scientific knowledge to real life situations.
In Advanced Business, it seemed to be synonymous with the greater ratio of
coursework to exams and activities that enable students to gain the qualifica-
tion without too much difficulty. Vocational courses were generally accepted by
students, their parents and certain teachers as being of lower status than the
academic single subject courses at GCSE or A-level.
These two case studies therefore illuminate the subtle ways in which some
learning cultures foster a predominantly instrumental approach to learning and
formative assessment, while others encourage sustainable forms of formative
assessment. It seems that the high level of synergy and the expansive nature of
the learning culture of AVCE Science both encouraged and encompassed
practices in the spirit of formative assessment. In contrast, the more restrictive
learning culture of Advanced Business encouraged and perpetuated practices
that are essentially in the letter of it. From these differences, formative
172 K. Ecclestone
As analysis shows here, students progressing from learning cultures that have
already socialised them into instrumental attitudes to formative assessment are
likely to expect similar approaches and to resist those that are more challenging.
Yet, instrumentalism is not entirely negative. Instead, it has contradictory
effects: indeed, instrumentalism to achieve meaningful, challenging tasks can-
not be said to be uneducational per se. In a socio-cultural context of serious
concern about disengagement and the professed need to keep young people in
formal education as long as possible, instrumentalism enables some students in
the British education system to achieve when they would not otherwise have
done so and to go beyond their previous levels of skill and insight. It also
encourages others to work in comfortable confines of self-defined expectations
that are below their potential.
These conclusions raise questions about the purpose and effects of formative
assessment used primarily to increase motivation and participation rather than
to develop subject-based knowledge and skills. Assessment systems can privi-
lege broad or narrow learning outcomes, external, introjected, identified, intrin-
sic or interested motivation. They can also reinforce old images and stereotypes
2
‘Assessment for Excellence’: The Third Biennial Joint Northumbria/EARLI SIG Assess-
ment Conference, Darlington, England, August 30 – September 1, 2006.
9 Instrumental or Sustainable Learning? 173
of learning and assessment or encourage new ones, and therefore offer comfor-
table, familiar approaches or risky, challenging ones. However, in the British
education system, socio-political concerns about disengagement from formal
education amongst particular groups have institutionalised formative assess-
ment practices designed to raise formal levels of achievement rather than to
develop deep engagement with subject knowledge and skills.
Vocational students are undoubtedly ‘‘achieving’’ in some learning cultures
but we have to question what they are really ‘‘learning’’. A tendency to reinforce
comfortable, instrumental motivation and procedural autonomy, in a segre-
gated, pre-determined track, begs serious questions about the quality of educa-
tion offered to young people still widely seen as second best, second chance
learners. Nevertheless, as the learning culture of Advanced Vocational Science
shows, these features are not inevitable.
References
Assessment Reform Group. (2002). 10 principles of assessment for learning. Cambridge:
University of Cambridge.
Ball, S.J., David, M., & Reay, D. (2005). Degrees of difference. London: RoutledgeFalmer.
Ball, S.J., Maguire, M., & Macrae, S. (2000). Choices, pathways and transitions post-16: New
youth, new economies in the global city. London: RoutledgeFalmer.
Bates, I. (1998). Resisting empowerment and realising power: An exploration of aspects of
General National Vocational Qualifications. Journal of Education and Work, 11(2),
109–127.
Black, P. (2007, February). The role of feedback in learning. Keynote presentation to
Improving Formative Assessment in Post-Compulsory Education Conference, Univer-
sity of Nottingham, UK.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning, Assessment in Education,
18, 1–73.
Bloxham, S. & West, A. (2006, August–September). Tell me so that I can understand. Paper
presented at European Association for Learning and Instruction Assessment Special
Interest Group Bi-annual Conference, Darlington, England.
Boud, D., & Falchikov, N. (Eds.), (2007). Re-thinking assessment in higher education:
Learning for the long term. London: RoutledgeFalmer.
Davies, J., & Ecclestone, K. (2007, September). Springboard or strait-jacket?: Formative
assessment in vocational education learning cultures. Paper presented at the British Educa-
tional Research Association Annual Conference, Institute of Education, London.
Derrick, J., & Gawn, J. (2007, September). The ‘spirit’ and ‘letter’ of formative assessment in
adult literacy and numeracy programmes. Paper presented at the British Educational
Research Association Annual Conference, Institute of Education, London.
Ecclestone, K. (2002). Learning autonomy in post-compulsory education: The politics and
practice of formative assessment. London: RoutledgeFalmer.
Ecclestone, K. (2004). Learning in a comfort zone: Cultural and social capital in outcome-
based assessment regimes. Assessment in Education, 11(1), 30–47.
Ecclestone, K., Davies, J., Derrick, J., Gawn, J., Lopez, D., Koboutskou, M., & Collins, C.
(in progress). Improving formative assessment in vocational education and adult literacy and
numeracy programmes. Project funded by the Nuffield Foundation/National Research
Centre for Adult Literacy and Numeracy/Quality Improvement Agency. Nottingham:
University of Nottingham. www.brookes.ac.uk/education/research
174 K. Ecclestone
Introduction
What can be learned from an institution with more than thirty years experience in
the teaching and assessment of student learning outcomes? Much of that depends
on what they themselves have learned and what they can tell us. This chapter aims
to find out in one particular case. In doing so, we focus on the process of going
from principles to practice, rather than merely on a description of our practice.
Much has been written about the ability-based curriculum and assessment at
Alverno College (Arenson, 2000; Bollag, 2006; Levine, 2006; Mentkowski &
Associates, 2000; Twohey, 2006; Who needs Harvard?, 2006. Also see, www.
alverno.edu), but this particular chapter emphasizes how certain shared princi-
ples have informed the practice of the Alverno faculty and how the practice of the
faculty has led to emerging principles. Our reasoning is that the principles are
more apt to be something that Alverno faculty can hold in common with faculty
from other institutions, and the process can suggest how an institution can come
to its unique practice from a set of shared principles. The instances of Alverno
practice that we do include are examples of what kind of practice the faculty at
one institution have constructed as expressions of their shared principles.
Since 1973, the faculty of Alverno College have implemented and refined a
curriculum that has at its core the teaching and assessment of explicitly articu-
lated learning outcomes that are grounded in eight core abilities integrated with
disciplinary concepts and methods. In this chapter we explore the shared
principles and common practices that are integral to teaching, learning, and
assessment at Alverno. Indeed, one of the fundamental principles at the college
is that the primary focus of the faculty is on how to teach and assess in ways that
most effectively enhance student learning. This principle has permeated and
shaped the culture of the college and informed the scholarly work of the faculty,
and it serves as the impetus for our ongoing reflection and discussion with
colleagues throughout higher education.
T. Riordan
Alverno College, Milwaulkee WX, USA
e-mail: Tim.Riordan@alverno.edu
We also consider in this chapter what Alverno faculty have learned as they
have worked on their campus and with other higher education institutions to
make assessment a meaningful and vital way to enhance student learning. We
draw from the experience of our faculty, from their work with many institutions
interested in becoming more steadily focused on student learning, and from
research grounded in practice across higher education. We hope that what we
have to say here will be a stimulus for further reflection on issues we all face as
educators in the service of student learning.
Although, as our title implies, we emphasize in this chapter the collaborative
and systemic dimensions of assessment, some important shifts in the thinking of
Alverno faculty happened more in fits and starts than in an immediate, shared
sense of direction; so we start at this point in this introduction with a couple of
ideas that emerged in both individual reflection by Alverno faculty and in larger
discussions.
A key insight many of us gradually came to was that we had been teaching as
scholars rather than as scholar teachers: we had the conviction that, if we did
well in conveying our scholarship, learning was the student’s job. We did care
whether the students were learning, though, and many of us were beginning to
raise questions because of that. Then we began to learn that our caring had to
take on more responsibility. Gradually we came to learn what that responsi-
bility is. We knew we needed to integrate our scholarship and our teaching, and
consequently our assessment, at the level of the individual student as well as of
the program and institution. Our ideas about assessment did not come to us
fully-formed; we simply kept asking ourselves questions about what would be
necessary to put student learning at the center of our scholarly work. We have
since come to learn much about what that entails.
For a start, we decided we were responsible for deciding and expressing what
we thought our students should learn. We made that a foundational assumption
which turned into an abiding first principle.
The need to clearly and publicly articulate learning outcomes was and remains a
critical first principle for all of us at Alverno. Doing so provides a common
framework and language for faculty in developing learning and assessment experi-
ences. It helps students internalize what they are learning because they are more
conscious of the learning that is expected.
The process of articulating learning outcomes can begin in many ways. For
us, it grew out of faculty discussions in response to questions posed by the
president of the college. More than thirty years ago, we engaged in the process
of articulating student learning outcomes when the president challenged us to
consider what ‘‘students could not afford to miss’’ in our respective fields of
10 Collaborative and Systemic Assessment of Student Learning 177
study and to make public our answers to one another. Although we entered
those discussions confident that we knew what we wanted our students to learn,
we sometimes struggled to be as clear as we hoped. That is when we began to
realize that if we found it hard to articulate what students should learn, how
much harder it is for students to figure it out. We also discovered in those
discussions that, although each discipline emphasizes its unique concepts and
methods, there are some learning outcomes that seem to be common across all
fields and that faculty want students to learn no matter what their major area of
study is. The faculty agreed that all students would be required to demonstrate
eight abilities that emerged from the learning outcomes discussions. Since that
time, we have taught and assessed for the following abilities in the contexts of
study in the disciplines or in interdisciplinary areas, both in the general educa-
tion curriculum and the majors:
Communication
Analysis
Problem Solving
Valuing in Decision-Making
Social Interaction
Developing a Global Perspective
Effective Citizenship
Aesthetic Engagement.
The faculty have revised and refined the meaning of the abilities many times
over the years, but our commitment to the teaching and assessment of the
abilities in the context of disciplinary study has remained steadfast. In order
to graduate from the college, all students must demonstrate the abilities as part
of their curriculum, so faculty design of teaching and assessment processes is
informed by that requirement. We have in the course of our practice learned
more about the nature of what is involved in this kind of learning.
For instance, we were initially thinking of these abilities in terms of what we
would give the students; however, the more we thought about the abilities we
had identified and about our disciplines and the more we attended to the
research on the meaning of understanding, the more we realized what a student
would have to do to develop the abilities inherent in our disciplines. We moved
from questions like, ‘‘What is important for students to remember or know?’’ to
‘‘Are they more likely to remember or know something if they do something
with it?’’ This led us to another key principle at the foundation of our practice.
assess what they have learned. We are also committed to the notion that students
who graduate from our college should be able to apply their learning in the variety
of contexts they will face in life, whether it be to solve a life problem or to
appreciate a work of art. In all of these senses, we believe that education involves
not only knowing but also being able to use or do what one knows.
This principle has been confirmed not only by our own practice at Alverno
but also by scholars like Gardner (1999) and Wiggins & McTighe (2005) who
have demonstrated through their research that understanding is further devel-
oped and usually best determined in performance. They argue that students
grow in their understanding as they use what they have studied, and that
students’ understanding is most evident when they are required to use or
apply a theory or concept, preferably in a context that most closely resembles
situations they are likely to face in the future. For example, having engineering
students solve actual engineering problems using the theories they have studied
is usually a more effective way of assessing and improving their learning than
merely asking them to take a test on those theories. This principle has been
important to us from the start, and its implications for our practice continue to
emerge.
The nature of the eight abilities themselves reflects an emphasis on how
students are able to think and what they are able to do, so design of teaching
and assessment in the curriculum has increasingly been more about engaging
students in the practice of the disciplines they are studying, not just learning
about the subject by listening to the teacher. In most class sessions Alverno
students are, for example, engaging in communicating, interacting with one
another, doing active analysis based on assignments, arguing for a position,
exploring different perspectives on issues, or are engaged in other processes that
entail practice of the abilities in the context of the disciplines. In fact, we often
make the point that design of teaching has much more to do with what the
students will do than what the teacher will say or do. Even our physical facilities
have reflected our growing commitment to this kind of learning. Our class-
rooms are furnished not with rows of desks, but tables with chairs, to create a
more interactive learning environment, one in which students are engaged in the
analytic processes they are studying as opposed to one in which they are
watching the teacher do the analysis for them. This is not to suggest that there
is no room for input and guidance from the teacher, but, in the end, we have
found that it is the student’s ability to practice the discipline that matters.
In the same spirit, our assessment design is guided by the principle that
students not only improve and demonstrate their understanding most reliably
when they are using it, but also that what they do with their understanding
will make them most effective in the variety of contexts they face in their futures.
There has always been an emphasis on practice of a field in the professional
areas like nursing and education. Students must demonstrate their under-
standing and ability by what they can do in clinicals or in student teaching.
We take the same approach to assessment in disciplines across the curriculum.
Students in philosophy are required to give presentations and engage in
10 Collaborative and Systemic Assessment of Student Learning 179
group interactions in which they use theories they have studied to develop a
philosophical argument on a significant issue; English students draw from their
work in literary criticism to write an editorial in which they address a simulated
controversy over censorship of a novel in a school district; chemistry students
develop their scientific understanding by presenting a panel on dangers to the
environment.
In all of these instances faculty have designed assessments that require
students to demonstrate understanding and capability in performance that
involves using what they have learned. But what makes for effective assessment
design in addition to the emphasis on engaging students in using what they have
studied? Essential to effective assessment of student learning, we have found, is
the use of performance criteria to provide feedback to students – another
critical principle guiding our educational practice.
As the authors throughout this book insist, assessment should not only evaluate
student learning but enhance it as well. Our own practice has taught us that
effective and timely feedback to students on their performance is critical to making
assessment an opportunity for improvement in their learning. In addition, we have
learned how important specific criteria are as a basis for feedback – criteria made
known to the student when an assignment or assessment is given.
A question many ask of Alverno faculty is, ‘‘What has been the greatest
change for you as a result of your practice, as teachers, in your ability-based
curriculum?’’ Most of us point to the amount of time and attention we give to
feedback. We now realize that merely evaluating the work of students by
documenting it with a letter or number does little to inform students of what
they have learned and what they need to do to improve. How many of us as
students ourselves received grades on an assignment or test with very little clue
as to what we did well or not? We see giving timely and meaningful feedback as
essential to teaching and assessment so that our students do not have a similar
clueless experience. In fact, our students have come to expect such feedback
from us and are quick to call us to task if we do not provide it.
The challenge, we have learned, is to develop the most meaningful and
productive ways of giving feedback. Sometimes a few words to a student in a
conversation can be enough; other times extensive written feedback is neces-
sary. There are times when giving feedback to an entire class of students is
important in order to point out patterns for improvement. Increasingly the
question of how to communicate with students about their performance from a
distance is on our minds.
Perhaps most important in giving feedback, however, is being able to use
criteria that give students a picture of what their learning should look like in
performance. This is a principle we have come to value even more over time.
180 T. Riordan and G. Loacker
Although we could say with confidence from the start that we wanted our
students to be able to ‘‘communicate effectively,’’ to ‘‘analyze effectively,’’ to
‘‘problem-solve effectively,’’ we also knew that such broad statements would
not be of much help to students in their learning. Therefore we have given more
attention as faculty to developing and using criteria that make our expectations
more explicit. It is helpful to students, for example, when they understand that
effective analysis requires them to ‘‘use evidence to support their conclusions,’’
and effective problem-solving includes the ability to ‘‘use data to test a hypoth-
esis.’’ The process of breaking open general learning outcomes into specific
performance components provides a language faculty can use to give feedback
to students about their progress in relation to the broader learning outcomes.
As we have emphasized, students learn the abilities in our curriculum in the
context of study in the disciplines, and this has important implications for
developing criteria as well. Criteria must not only be more specific indicators
of learning, but must also reflect an integration of abilities with the disciplines.
In other words, we have found that another significant dimension of developing
criteria for assessment and feedback is integrating the abilities of our curricu-
lum with the disciplinary contexts in which students are learning them. This
disciplinary dimension involves, in this sense, a different form of specificity. In a
literature course the generic ‘‘using evidence to support conclusions’’ becomes
‘‘using examples from the text to support interpretations’’; while in psychology
the use of evidence might be ‘‘uses behavioral observations to diagnose a
disorder’’. It has been our experience that giving feedback that integrates
more generic language with the language of our disciplines assists the students
to internalize the learning outcomes of the curriculum in general and the
uniqueness and similarities across disciplinary boundaries. The following exam-
ple illustrates how our faculty create criteria informed by the disciplinary
context and how this enhances their ability to provide helpful feedback.
Two of the criteria for an assigned analysis of Macbeth’s motivation in
Shakespeare’s play might be ‘‘Identifies potentially credible motives for Mac-
beth’’ and ‘‘Provides evidence that makes the identified motives credible.’’ These
criteria can enable an instructor, when giving feedback, to affirm a good
performance by a student in an assessment and still be precise about limitations
that a student should be aware of in order to improve future performance: ‘‘You
did an excellent job of providing a variety of possible credible motives for
Macbeth with evidence that makes them credible. The only one I’d seriously
question is ‘‘remorse’’. Do you think that Macbeth’s guilt and fear moved into
remorse? Where, in your evidence of his seeing Banquo’s ghost and finally
confronting death, do you see him being sorry for what he did?’’ In such a
case, the instructor lets the student know that the evidence she provided for
some of the motives made them credible, and she realizes that she has demon-
strated effective analysis. The question about ‘‘remorse’’ gives her an opportu-
nity to examine what she had in mind in that case and either discover where her
thinking went askew or find what she considers sufficient evidence to open a
conversation about it with the instructor.
10 Collaborative and Systemic Assessment of Student Learning 181
We have tried to emphasize here how vital using criteria to give instructor
feedback is to student learning as part of the assessment process, but just as
critical is the students’ ability to assess their own performance. How can they
best learn to self assess? First, we have found, by practice with criteria. Then, by
our modeling in our own feedback to them, what it means to judge on the basis
of criteria and to plan for continued improvement of learning. After all, the
teacher will not be there to give feedback forever; students must be able to
determine for themselves what they do well and what they need to improve.
Criteria are important, then, not only for effective instructor feedback but also
as the basis for student self assessment.
In our work with students and one another we have been continually reminded of
the paradox in education that the most effective teaching eventually makes the
teacher unnecessary. Put another way, students will succeed to the extent that they
become independent life-long learners who have learned from us but no longer
depend on us to learn. We have found that a key element in helping students
develop as independent learners is to actively engage them in self assessment
throughout their studies.
Self assessment is such a critical aspect of our educational approach that
some have asked why we have not made it the ninth ability in our curriculum.
The fact that we have not included self assessment as a ninth ability required for
graduation reflects its uniqueness because it underlies the learning and assess-
ment of all of the abilities. Our belief in its importance has prompted us to make
self assessment an essential part of the assessment process to be done regularly
and systematically. We include self assessment in the design of our courses and
work with one another to develop new and more effective ways of engaging
students in self assessment processes. We also record key performances of self
assessment in a diagnostic digital portfolio where students can trace their own
development of it along with their development of the eight abilities.
We have learned from our students that their practice of self assessment
teaches them its importance. Our research shows that beginning students ‘‘make
judgments on their own behavior when someone else points out concrete
evidence to them’’ and that they ‘‘expect the teacher to take the initiative in
recognizing their problems’’ (Alverno College Assessment Committee/Office of
Educational Research and Evaluation, 1982). Then, gradually, in the midst of
constant practice, they have an aha experience. Of what sort? Of having judged
accurately, of realizing that they understand the criteria and can use them
meaningfully, of trusting their own judgment enough to release them from
complete dependence on the teacher.
The words of one student in a panel responding to the questions of a group of
educators provide an example of such an experience of self assessment. The
182 T. Riordan and G. Loacker
student said, ‘‘I think that’s one of the big things that self assessment gives you, a
lot of confidence. When you know that you have the evidence for what you’ve
done, it makes you feel confident about what you can and can’t do, not just
looking at what somebody else says about what you did, but looking at that and
seeing why or why not and back up what you said about what you did. It’s really
confidence-building.’’
Implicit in all that we have said here about elements of the assessment
process, including the importance of feedback and self assessment, is the vision
of learning as a process of development. We emphasize with students the idea
that they are developing as learners, and that this process never ends; but what
else have we done to build on this principle of development in our practice?
Faculty recognize that, although students may not progress in a strictly linear
fashion as learners, design of teaching and assessment must take into account the
importance of creating the conditions that assist students to develop in each stage
of their learning. This has led us to approach both curriculum and teaching design
with a decidedly developmental character.
After committing to the eight abilities as learning outcomes in the curricu-
lum, we asked ourselves what each of these abilities would look like at different
stages of a student’s program of study. What would be the difference, for
example, between the kind of analysis we would expect of a student in her
first year as opposed to what we would expect when she is about to graduate? As
a result of these discussions, the faculty articulated six generic developmental
levels for each of the eight abilities. For instance, the levels for the ability of
Analysis are:
Level 1 Observes accurately
Level 2 Draws reasonable inferences from observations
Level 3 Perceives and makes relationships
Level 4 Analyzes structures and relationships
Level 5 Refines understanding of disciplinary frameworks
Level 6 Applies disciplinary frameworks independently.
We would not argue that the levels we have articulated are the final word on
the meaning and stages of analysis or any of the abilities; indeed, we continually
refine and revise their meaning. On the other hand, we recognize the value of
providing a framework and language that faculty and students share in the
learning process. All students are required to demonstrate through level four of
all eight abilities in their general education curriculum and to specialize in some
of them at levels five and six, depending on what faculty decide are inherent in
their major and thus most important for their respective majors. For example,
nursing majors and philosophy majors, like every other major, are required to
demonstrate all eight abilities through level four. At levels five and six, however,
10 Collaborative and Systemic Assessment of Student Learning 183
The faculty at Alverno take seriously the idea that teaching and assessment
are not individual enterprises, but collaborative ones. In addition to the com-
mon commitment to the teaching and assessment of shared learning outcomes,
we believe that we are responsible to assist one another to improve the quality of
student learning across the college. We believe that requires organizing our
structures to allow for the collaboration necessary to achieve this. There is no
question that this does challenge the image of the individual scholar. We do
recognize that keeping current in one’s own field is part of our professional
responsibility, but we have also noted that a consequence of specialization and a
limited view of research in higher education is that faculty are often more
connected with colleagues of a disciplinary specialization than with the faculty
members on the same campus, even in the same department. As recent discus-
sions in higher education about the nature of academic freedom have suggested,
the autonomy of individual faculty members to pursue scholarship and to have
their teaching unfettered by forces that restrict free and open inquiry presumes a
responsibility to foster learning of all kinds. We believe that if the learning of
our students is a primary responsibility, then our professional lives should
reflect that, including how and with whom we spend our time. There are several
ways in which we have formalized this as a priority in institutional structures
and processes.
One important structural change we decided to make, given the significance of
the eight abilities at the heart of the curriculum, was that the Alverno faculty
decided to create a department for each of the abilities. In practice this means that
faculty at the college serve in both a discipline department and an ability depart-
ment. Depending on their interest and expertise, faculty join ability departments,
the role of which is to do ongoing research on each respective ability, refine and
revise the meaning of that ability, provide workshops for faculty, and regularly
publish materials about the teaching and assessment of that ability. These depart-
ments provide a powerful context for cross-disciplinary discourse and they are
just as significant as the discipline departments in taking responsibility for the
quality and coherence of the curriculum.
But how is this collaboration supported? In order to facilitate it, the aca-
demic schedule has been organized to make time for this kind of work. No
classes are scheduled on Friday afternoons, and faculty use that time to meet in
discipline or ability departments or to provide workshops for one another on
important developments in curriculum, teaching, and assessment. In addition,
the faculty hold three Institutes a year – at the beginning of each semester and at
the end of the academic year – and those are also devoted to collaborative
inquiry into issues of teaching, learning, and assessment. As a faculty we expect
one another to participate in the Friday afternoon work and the College
Institutes as part of our responsibility to develop as educators because of
what we can learn from one another and can create together in the service of
student learning.
The value placed on this work together is strongly reinforced by criteria for
the academic ranks, which include very close attention to the quality and
10 Collaborative and Systemic Assessment of Student Learning 185
(Bransford et al., 2000)? What can developmental theories tell us about the
complexities of learning (Perry, 1970)? We regularly share with one another the
literature on these kinds of questions and explore implications for our teaching
practice.
The research of others on learning has been a great help to us, but we have
also come to realize that each student and group of students is unique, and this
has led us to see that assessment is an opportunity to understand more about
our students as learners and to modify our pedagogical strategies accordingly.
In this sense, our faculty design and implement assignments and assessments
with an eye to how each one will assist them to know who their students are. We
are consistently asking ourselves how we have assisted students to make con-
nections between their experience and the disciplines they are studying, how we
have drawn on the previous background and learning of students, and how we
can challenge their strengths and address areas for improvement.
This focus on scholarly inquiry into the learning of our students is a supple-
ment to, not a substitute for, the ongoing inquiry of faculty in their disciplines,
but our thinking about what this disciplinary study involves and what it means
to keep current in our fields has evolved. Increasingly we have approached our
disciplines not only as objects of our own study but also as frameworks for
student learning. What has this meant for us in practice?
Perhaps most importantly we have tried to move beyond the debate in higher
education about the extent to which research into one’s field informs one’s
teaching or even whether research or teaching should take precedence in a
hierarchy of professional responsibilities. At Alverno we have taken a position
and explored its implications in order to act on them. We have asked ourselves
how our role as educators affects the kind of inquiry we pursue in our disciplines
and how we engage students in the practice of those disciplines in ways that are
meaningful and appropriate to the level at which they are practicing them.
From this perspective there is less emphasis on sharing what we know about
our disciplines and more on figuring out how to assist students to use the
discipline themselves. This is not to suggest that the content of disciplinary
study is unimportant for the faculty member or the student, but it does mean
that what faculty study and what students study should be considered in light of
how we want our students to be able to think and act as a result of their study. In
this view faculty scholarship emerges from questions about student learning,
and disciplines are used as frameworks for student learning.
Clearly essential to any scholarly work is making one’s ideas public to
colleagues for critical review and dialogue. This is an expectation we have for
our faculty as well, but we have embraced a view of ‘‘making one’s ideas public’’
that is not restricted to traditional forms of publication. As we have suggested
throughout this chapter, the first audience for our ideas on teaching, learning,
and assessment is usually our colleagues here at the college. Because our first
responsibility is to the learning of our students, our Friday afternoon sessions
and College Institutes provide opportunities for us to share and critically
consider the insights and issues we are encountering in our teaching and in
10 Collaborative and Systemic Assessment of Student Learning 187
the research on teaching we have studied. Our faculty also regularly create
written publications on various aspects of our scholarly work on teaching,
learning, and assessment. For example, Learning that Lasts (Mentkowski and
Associates, 2000) is a comprehensive look at the learning culture of Alverno
College based on two decades of longitudinal studies and on leading educa-
tional theories. Assessment at Alverno College: Student, Program, and Institu-
tional (Loacker & Rogers, 2005) explores the conceptual framework for
assessment at Alverno with specific examples from across disciplines and pro-
grams, while Self Assessment at Alverno College (Alverno College Faculty, 2000)
provides an analysis of the theory and practice of Alverno faculty in assisting
students to develop self assessment as a powerful means of learning. Ability-based
Learning Outcomes: Teaching and Assessment at Alverno College (Alverno Col-
lege Faculty, 2005) articulates the meaning of each of the eight abilities in the
Alverno curriculum and gives examples of teaching and assessment processes for
each. And in Disciplines as Frameworks for Student Learning: Teaching the
Practice of the Disciplines (Riordan & Roth, 2005), Alverno faculty from a variety
of disciplines consider how approaching disciplines as frameworks for learning
transforms the way they think about their respective fields. In most instances
these are collaborative publications based on the work of different groups of
faculty, like members of ability departments or representatives from the assess-
ment council or faculty who have pursued a topic of interest together and have
insights to share with the higher education community. The collective wisdom of
the faculty on educational issues also serves as the basis for regular workshops we
offer for faculty and staff from around the world both in hosting them on campus
and in consulting opportunities outside the college.
Our remarks about scholarly inquiry are not intended to diminish the value
of significant scholarly inquiry that advances understanding of disciplinary
fields and the connections among them. On the other hand, we have come to
value the fertile ground for inquiry that lies at the intersection of student
learning and the disciplines. We believe that it deserves the same critical and
substantive analysis because it has the potential to not only enhance student
learning but also to help us re-imagine the meaning and practice of the dis-
ciplines themselves.
ultimately the success of that work will depend on whether faculty embrace it
because they see it as enhancing how students learn what is important.
We have also learned that the effectiveness of learning outcomes in promoting
student learning depends heavily on the extent to which the outcomes are actually
required. From the start of our ability-based curriculum, student success through-
out the curriculum has depended on demonstration of the eight abilities in the
context of disciplinary study. Our students know that graduation from the college,
based on performance in their courses in general education and their majors, is
directly related to their achievement of learning outcomes. They know this
because the faculty have agreed to hold them to those standards. We realize that
learning outcomes that are seen by faculty as aspirations at best and distractions at
worst will not be taken seriously by students and, thus, will not become vital
dimensions of the curriculum, no matter what assessment initiatives emerge.
We have learned as well how important it is to make a focus on student learning
a significant priority in our scholarship. When most of us in higher education
complete our graduate work we have as our primary focus scholarly inquiry
into the discipline we have chosen. Indeed, for many the focus is a very
specialized aspect of a discipline. Another way of saying this is to say that our
primary identity is based on our responsibility to the discipline. But what
happens when we make student learning the kind of priority we have suggested
throughout this chapter, indeed throughout this book? The reflections here
attempt to indicate how the lens of student learning has affected both our work
as a community of educators as well as our individual professional identities.
Finally, we have learned that, although our collaboration as Alverno colleagues
has been essential to a focus on student learning, connections with colleagues
beyond the institution are critical to our ongoing inquiry. These colleagues include
the more than four-hundred volunteer assessors from the business and profes-
sional community who contribute their time and insight to our students’ learn-
ing. They also include the colleagues we connect with in workshops,
consultations, grant projects, conferences, and other venues; or those who are
contributing to the growing body of literature on teaching, learning, and
assessment we have found so valuable. We owe them a great debt and are
certain our students have been the beneficiaries as well. It is in this spirit that
this chapter aims to reflect and foster the kind of continuing conversation we
need in order to make assessment of student learning the powerful and dynamic
process it can be.
Looking Forward
What does the future hold for us, both at Alverno College and in the higher
education community at large? How can we continue to improve in our efforts
to optimize student learning? Drawing on our own principles and practices and
on our work with hundreds of institutions around the world, we propose a few
final thoughts.
190 T. Riordan and G. Loacker
References
Alverno College Assessment Committee/Office of Research and Evaluation. (1982). Alverno
students’ developing perspectives on self assessment. Milwaukee, WI: Alverno College
Institute.
Alverno College Faculty. (2005). Ability-based learning outcomes: Teaching and assessment at
Alverno College. Milwaukee, WI: Alverno College Institute.
192 T. Riordan and G. Loacker
Alverno College Faculty. (Georgine Loacker, Ed.). (2000). Self assessment at Alverno College.
Milwaukee, WI: Alverno College Institute.
Arenson, K. W. (2000, January 1). Going higher tech degree by degree. The New York Times,
p. E29.
Bollag, B. (2006, October 27). Making an art form of assessment. The Chronicle of Higher
Education, pp. A1–A4.
Boyer, E. (1997). Scholarship reconsidered: Priorities of the professoriate. New York: John
Wylie & Sons.
Bransford, J. D., Brown, A. L., & Cocking, R. R. (2000). How people learn: Brain, mind,
experience, and school (Expanded ed.). Washington, DC: National Academy Press.
Gardner, H. (1999). The disciplined mind: What all students should understand. New York:
Simon & Schuster.
Kolb, D. (1976). The learning style inventory: Technical manual. Boston: McBer.
Levine, A. (2006). Educating school teachers. The Education Schools Project Report #2.
Washington, DC: The Education Schools Project.
Loacker, G., & Rogers, G. (2005). Assessment at Alverno College: Student, program, and
institutional. Milwaukee, WI: Alverno College Institute.
Mentkowski, M. and Associates. (2000). Learning that lasts: Integrating learning, perfor-
mance, and development. San Francisco: Jossey-Bass.
Perry, W.G., Jr. (1970). Forms of intellectual and ethical development in the college: A Scheme.
Austin, TX: Holt, Rinehart &Winston.
Riordan, T., & Roth, J. (Eds.). (2005). Disciplines as frameworks for student learning: Teaching
the practice of the disciplines. Sterling, VA: Stylus Publications.
Twohey, M. (2006, August 6). Tiny college earns big reputation. The Milwaukee Journal
Sentinel, pp. A1, A19.
Who needs Harvard? Forget the ivy league. (2006, August 21). Time, pp. 37–44.
Wiggins, G. & McTighe, J. (2005). Understanding by design (Expanded 2nd ed.). Alexandria,
VA: Association for Supervision and Curriculum Development (ASCD).
Chapter 11
Changing Assessment in Higher Education:
A Model in Support of Institution-Wide
Improvement
Introduction
R. Macdonald
Learning and Teaching Institute, Sheffield Hallam University, Sheffield, UK
e-mail: R.Macdonald@shu.ac.uk
Each level within the model has a number of elements. At the level of the
module, where learning and its assessment occurs, we might see module design,
teachers’ experience of teaching and assessment, and students’ previous and
current experiences of being assessed. This is where the individual academic or
the teaching team designs assessment tasks and where students are typically
most actively engaged in their studies. This is also the level at which most
research into and writing about assessment, at least until recently, has been
directed, ranging from the classic studies into how assessment directs students’
learning (e.g., Miller & Parlett, 1974; Snyder, 1971) to more recent studies into
student responses to feedback, and including comprehensive textbooks on
assessment in higher education (e.g., Boud & Falchikov, 2007; Heywood,
2000; Miller, Imrie & Cox, 1998), not to mention a plethora of how to books
and collections of illuminating case studies (e.g., Brown & Glasner, 1999;
Bryan & Clegg, 2006; Knight, 1995; Schwarz & Gibbs, 2002).
196 R. Macdonald and G. Joughin
The institutional level elements may include the overall framework for setting
principles, policies and regulations around learning, teaching and assessment,
the allocation of resources to learning and teaching as against other activities,
and rewards and recognition for learning and teaching in comparison to
rewards and recognition for research, consultancy or management. At this
level, overall resourcing decisions are made regarding teaching and learning,
probation and promotion policies are determined, assessment principles are
formulated and institutional responses to external demands are devised.
1
where ‘faculty’ is used to refer to an organisational unit rather than, as is common in the US
and some other countries, to individual academic staff
11 Changing Assessment in Higher Education 197
Finally, the external level will have a large number of elements, including the
regulatory requirements of governments, the role of government funding bodies
in supporting quality enhancement and innovation in teaching and learning,
national and international developments within the higher education sector,
requirements of professional accrediting bodies, media perception and por-
trayal of higher education, and the external research environment. With respect
to learning and assessment, this level may well result in institutional responses
to curriculum design and content as well as overall principles such as the
requirement for general graduate attributes to be reflected in assessed student
learning outcomes. In addition to this, students’ expectations can be strongly
influenced by external forces, including their experience of other levels of
education, their engagement in work, and the operation of market forces in
relation to their chosen field of study.
Relationships
For some time students have been complaining that they are not all being
treated the same when it comes to their assessment. There are different
arrangements for where and when to hand in written assignments; the time
taken to provide feedback varies enormously, despite university guidelines;
different practices exist for late work and student illness; some staff only
seem to mark between 35 and 75 per cent whereas others seem prepared to
give anything between zero and 100; in some courses different tutors put
varying degrees of effort into identifying plagiarism; procedures for handing
back work vary with the result that some students do not even bother to
collect it. All this in spite of the fact this (UK) university has policies and
guidelines in place which fully meet the Quality Assurance Agency’s Code of
Practice on Assessment (The Quality Assurance Agency, 2006a).
Systems Approaches
So far we have not used the term system, but it is clear that our model represents
a systems approach to understanding universities and change, so an exploration
of systems thinking is inevitable if we are to understand higher education
institutions adequately and use such understanding to adopt change strategies
that have a reasonable chance of success. In this section, therefore, we consider
some fundamental concepts in systems theory.
The essential feature of a systems approach is to view the university as a
whole, rather than focusing on one or even several of its parts. A systems
approach therefore complements the expanding body of work more narrowly
focused on specific aspects of the university such as assessment strategies,
students’ experience of assessment in their courses, plagiarism or departmental
responses to assessment issues. A systems approach is also a reaction to the
reductionism implicit in much thinking about learning, teaching and assess-
ment, according to which change depends on improvements in one or two key
areas of academic practice.
working towards its goals. Importantly, the system is open to direct manipulation
by human intervention – things can be made to happen.
In contrast to this, soft systems methodologies come into play where objec-
tives are less clear, where assumptions about elements of the system cannot be
taken for granted, and where there is a challenging issue needing to be
addressed, in short, in situations where organisational, management and lea-
dership issues are involved. Checkland’s acronym, CATWOE, indicates the
flavour of soft systems thinking (Checkland, 1993). It refers to the elements
involved in defining a situation, where there are customers (the beneficiaries of
the system); actors (the agents who undertake the various activities within the
system); a transformation process (which results, in the case of higher education
institutions, in graduates and research outputs); a Weltanschauung, or world-
view; ownership (those who fund the system); and environmental or contextual
constraints on the system. Checkland distinguishes the move towards soft
systems as a move away from working with obvious problems towards working
with situations which some people may regard as problematical, which are not
easily defined, and which involve approaches characterised by inquiry and
learning (Checkland, 1999).
When the university is construed as a hard system, it is considered to have
agreed goals of good practice, clearly defined structures designed to achieve
these goals, and processes that are controlled and somewhat predictable. This is
a view of universities which may appeal to some, particularly senior managers
keen to introduce change and move their institution in particular directions.
However, hard systems thinking is inappropriate where ends are contested,
where stakeholders have differing (and often competing) perspectives and
needs, and where understanding of the system is necessarily partial. Inappro-
priate as it may be, the temptation to see the university as a hard system
remains, especially in the minds of some administrators or leaders seeking to
achieve specific ends through imposing non-consensual means. When this
occurs, change becomes problematic, since change in a soft system involves
not merely working on the right elements of the system, but understanding
transformational processes in particular ways, including understanding one’s
own role as an actor within the system.
The difficulties in effecting change within universities as systems is often
attributed to the ‘loose-coupling’ between units. Loose-coupling is well
described by Eckel, Green, Hill and Mallon as characteristic of an organisation
in which activity in one unit may be only loosely connected with what happens in
another unit. This compares with more hierarchical or ‘tightly-coupled’ organizations
where units impact more directly on each other. As a result, change within a university
is likely to be small, improvisational, accommodating, and local, rather than large,
planned, and organization-wide. Change in higher education is generally incremental
and uneven because a change in one area may not affect a second area, and it may affect
a third area only after significant time has passed. If institutional leaders want to
achieve comprehensive, widespread change, they must create strategies to compensate
for this decentralization. (Eckel, Green, Hill & Mallon., 1999, p. 4)
11 Changing Assessment in Higher Education 201
Loose coupling can apply to both hard and soft systems thinking. Both assume
a degree of causal relationship between elements or units, and in both cases
loose coupling highlights that these connections are not necessarily direct and
immediate. In our earlier scenario, we might consider the loose couplings
involved as we follow the impact of the university’s assessment policies as
they move from the university council down to faculties and departments and
into the design of course assessment and its implementation by individual
academic staff and students, noting how original intentions become diluted,
distorted or completely lost at various stages.
Loose coupling is an evocative term and, until recently, few would have
contested its use to describe the university. Loose coupling, however, assumes a
degree of connectivity or linearity between university units. This view has been
challenged by a quite different understanding which sees the university as a non-
linear system, where interactions between elements are far from direct and
where notions of causality in change need to be radically rethought, in short,
the university is seen as a complex adaptive system.
Agents
Stacey describes a complex adaptive system as consisting of ‘‘a large number of
agents, each of which behaves according to some set of rules. These rules require
the agents to adjust their behaviors to that of other agents. In other words,
agents interact with, and adapt to, each other’’ (Stacey, 2003, p. 237). Each unit
in a university can be thought of as an agent, as can each individual within each
unit, with each agent operating according to its own set of rules while adjusting
to the behaviour of other individuals and units.
The concept of agents introduces considerable complexity in two ways.
Firstly, it draws our attention to the large number of actors involved in any
given university activity. At the level of the individual course or module, we
have the body of enrolled students, tutorial groups, and individual students, as
202 R. Macdonald and G. Joughin
well as a course coordinator, individual tutors, and the course team as a whole,
including administrative staff. To these we could add the department which
interacts in various ways with each of the agents mentioned, and multiple agents
outside the department, including various agents of the university as well as
agents external to the university.
Secondly, each agent operates according to their own rules. This under-
standing is in contrast to the view that the various players in a university are
all operating within an agreed order, with certain values and parameters of
behaviour being determined at a high level of the organisation. Instead of
seeking to understand an organisation in terms of an overall blueprint, we
need to look at how each agent or unit behaves and the principles associated
with that behaviour. What appear to be cohesive patterns of behaviour asso-
ciated with the university as a whole are the result of the local activity of this
array of agents, each acting according to their own rules while also adapting to
the actions of other agents around them.
Many writers have noted the many sets of rules operating in universities,
often pointing out that the differing sets of values which drive action can be
complex and are typically contrasting. While some values, such as the impor-
tance of research, may be shared across an institution, variation is inevitable,
with disciplinary cultures, administrators, academics and students all having
distinctive beliefs (Kezar, 2001). Complexity theory draws our attention to
these variations, while suggesting that further variation is likely to be present
across each of the many agents involved in the university system.
One implication for our model is that it may need to be seen in the light of
the different agents within a university, with the concept of agency brought to
bear on each of the model’s elements. Aligned with this, we would interpret the
model as incorporating the actions within each agent, interactions between
agents, and a complex of intra- and inter-agent feedback loops (Eoyang,
Yellowthunder, & Ward, 1998). The model thus becomes considerably more
complex, and understanding and appreciating the rules according to which the
various agents operate, and how they might adapt to the actions of other
agents, becomes critical to understanding the possibilities and limitations of
change.
Self-organisation
to its own rules and according to what it believes is in its best interests. Tosey
(2002) illustrates this well with reference to students adjusting their behaviour in
meeting, or not meeting, assignment deadlines – their patterns changed in direct
response to their tutors’ lack of consistency in applying rules. Multiply this
kind of scenario countless times and we have a picture of the university as a
collection of units and individuals constantly changing or re-organising, in
unpredictable ways, in response to change. The result is not chaos, but neither
is it predictable or easily controlled, since multifarious internal and external
factors are involved.
Emergence
The characteristics of agents and self-organisation give rise to the third core
construct of complex adaptive systems. Many of us are accustomed to thinking
in terms of actions and results, of implementing processes that will lead, more or
less, to intended outcomes. The notion of emergence in complex adaptive
systems is a significant variation on this. Emergence has been defined as ‘‘the
process by which patterns or global-level structures arise from interactive local-
level processes. This ‘structure’ or ‘pattern’ cannot be understood or predicted
from the behaviour or properties of the components alone’’ (Mihata, 1997,
p. 31, quoted by Seel, 2005). In other words, how universities function, what
they produce, and certainly the results of any kind of change process, emerge
from the activities of a multiplicity of agents and the interactions between them.
In such a context, causality becomes a problematic notion (Stacey, 2003);
certainly the idea of one unit, or one individual at one level of the organisation,
initiating a chain of events leading to a predicted outcome, is seen as misleading.
As Eckel et al. (1999, p. 4) note, ‘‘(e)ffects are difficult to attribute to causes’’,
thus making it hard for leaders to know where to focus attention. On the other
hand, and somewhat paradoxically, small actions in one part of a system may
lead to large and unforeseeable consequences.
Emergence also suggests the need for what Seel (2005) refers to as watchful
anticipation on the part of institutional managers, as they wait while change
works it way through different parts of a system.
Emergence implies unpredictability, and, for those of us in higher education,
it reflects the notion of teaching at the edge of chaos (Tosey, 2002). Along with the
concept of agency, the complexity which emergence entails ‘‘challenges managers
to act in the knowledge that they have no control, only influence’’ (Tosey, 2002,
p. 10). Seel (2005) suggests a set of preconditions for emergence, or ways of
influencing emergence, including connectivity – emergence is unlikely in a frag-
mented organisation; diversity – an essential requirement if new patterns are to
emerge; an appropriate rate of information flow; the absence of inhibiting factors
such as power differences or threats to identity; some constraints to action or
effective boundaries; and a clear sense of purpose.
204 R. Macdonald and G. Joughin
Let’s return to our earlier scenario. Does viewing the university as a complex
adaptive system help us to see this situation differently, and in a more useful
way? For a start, it leads us to focus more strongly on the agents involved –
the university’s policy unit; whatever committee and/or senior staff are
responsible for quality assurance; the department and its staff; and the
students as individuals and as a body. What drives these agents, and what
motivates them to comply with or resist university policy? How do they self-
organise when faced with new policies and procedures? What efforts have
been made across the university to communicate the Quality Assurance
Agency’s codes of practice, and what, apart from dissatisfied students, has
emerged from these efforts? Perhaps most importantly, how might Seel’s
preconditions for emergence influence attempts to improve the situation?
It is, at least in part, the failure of such policies and the frustrations experienced
in seeking system-wide improvements in teaching, learning and assessment that
has led to an interest in harnessing the notion of the university as a complex
adaptive system in the interests of change. While our model suggests that
change can be promoted by identifying critical elements of the university system
and key relationships between them, perhaps the facilitation of change in a
complex adaptive system requires something more.
Two management theorists have proposed principles of organisational
change in the context of complex adaptive systems. The first of these, Dooley
(1997), suggests seven principles for designing complex adaptive organisations.
We would argue that universities are complex adaptive organisations – they do
not need to be designed as such. We interpret Dooley’s suggestions as ways of
living with this complexity and taking the best advantage of the university’s
resulting organisational qualities. Dooley’s principles are as follows:
(a) create a shared purpose;
(b) cultivate inquiry, learning, experimentation, and divergent thinking;
(c) enhance external and internal interconnections via communication and
technology;
(d) instill rapid feedback loops for self-reference and self-control;
(e) cultivate diversity, specialisation, differentiation, and integration;
(f) create shared values and principles of action; and
(g) make explicit a few but essential structural and behavioral boundaries.
(Dooley, 1997, 92–93)
Dooley’s list is prescriptive. It suggests how complexity theory could be
applied to an organisation. The second theorist, Stacey, in contrast, eschews
the notion of application, implying as it does the intentional control of a system
by a manager who stands outside it. Rather, ‘‘managers are understood to be
11 Changing Assessment in Higher Education 205
Sheffield Hallam University, no doubt like many others, has made changes
to assessment regulations and frameworks, processes and systems, and aca-
demic practices over the years without necessarily ensuring that the various
aspects are well integrated or that the whole is viewed holistically.
In 2005 the Assessment Working Group, which had been addressing assess-
ment regulations and procedures for a number of years, acknowledged some-
thing was not right. At the same time feedback from students was suggesting
11 Changing Assessment in Higher Education 209
that, whilst the quality of their learning experience overall was very good, there
were some aspects of assessment and feedback which needed addressing – a fact
reinforced in some subject areas by the results of the 2006 National Student
Survey.
It was in this context that the first author and a senior academic in one of the
university’s faculties produced a paper on Profile Assessment that addressed a
number of issues concerning assessment, feedback and re-assessment. This was
the outcome of the first of many coffee bar discussions! A further outcome was
a spin-off from one of the University’s Centres for Excellence in Teaching and
Learning (CETLs) with the proposal by the first author to establish TALI –
The Assessment for Learning Initiative.
At the same time, with the growing use of Blackboard as the university’s
virtual learning environment and the need for it to communicate effectively
with the University’s central data system, along with increasing complexity
in the assessment regulations as changes were made from year to year, there
was a growing misalignment between the practices of academics, the regula-
tions and procedures, and the systems used to manage and report on the
outcomes of the assessment procedures. There was also a perception that the
assessment regulations and procedures drove the academic practices, rather
than the other way around, producing what was commonly referred to as
‘the burden of assessment’.
The opportunity was taken to gather a group together, involving senior
staff from key departments and academics from the faculties at a local
conference centre, Ranmoor Hall, to discuss the issues widely and produce
suggestions from what became known as the Ranmoor Group. The Assess-
ment Project was established to focus on the deliberate actions required to
support the development of:
assessment practices that are learner focused and promote student
engagement and attainment;
regulations that are clear, consistent and student centred; and
assessment processes that are efficient and effective and which enable the
delivery of a high quality experience for staff and students.
A lead was taken in each of these areas by an appropriate senior member
of the relevant department under the aegis of the Assessment Development
Group. This group comprised three separate working groups and interacted
with other key parts of the university. It reported to an Assessment Project
Board, chaired by the Pro Vice-Chancellor, Academic Development, to
ensure the appropriate governance of the project and the availability of
people and resources to make it work.
One aspect, The Assessment for Learning Initiative (TALI), illustrates
how change has been occurring. TALI’s aim has been to achieve large-scale
cultural change with regard to assessment practice across the university with
a focus on:
210 R. Macdonald and G. Joughin
Conclusion
Early in this chapter we posed the question, ‘‘What does it take to improve
assessment in a university?’’ and suggested that part of the response lay in a
model of the university that would help us to see more clearly those aspects of
the university, and the relationships between them, that would require attention
if initiatives for change were to be effective.
It is possible, in light of the various matters canvassed in this chapter, that
the complexities of universities may defy the kind of simple depiction we have
presented in Fig. 11.1. However, we do not wish to propose a new model here.
We believe that a model which is true to the complexity of higher education
institutions still needs to identify the key elements of those institutions and the
connections between them as we have done in our model. On the other hand, a
useful model must emphasise the role of agents in relation to the elements,
including both individuals and units involved in or likely to be affected by change.
11 Changing Assessment in Higher Education 211
An Endnote
This chapter reflects a journey by the authors. It began when Ranald partici-
pated in a workshop run by Gordon at a conference in Coventry, UK in July
2003. This was followed by lengthy email exchanges which resulted in an article
for the Institute for Learning and Teaching (now the UK’s Higher Education
Academy) website (Joughin & Macdonald, 2004). The article introduced the
model and explored the various elements and relationships, though in little
depth and without locating it in any theoretical or conceptual literatures.
Two subsequent visits by Ranald to the Hong Kong Institute of Education,
where Gordon was then working, allowed us to develop our ideas further and
we began to draw on a variety of literatures, not least in the areas of systems and
complexity. The ideas for this chapter have developed whilst we have been
writing and are likely to continue to change from the point at which we are
writing in October 2007.
Our discussions and writing in themselves reflect aspects of complexity with
ourselves as the agents, self-organising our thoughts leading to emergent ideas
for influencing policy and practice in our roles as academic developers. Further,
we can identify with other areas of the literature such as communities of
practice, appreciative inquiry and the literature on educational change. Perhaps
the concern for us at the moment is to develop a highly pragmatic approach to
facilitating change, informed by the latest organisational development or stra-
tegic management thinking, but not dominated by this. We note with interest
that the subtitle of Stacey, Griffin and Shaw’s book applying complexity theory
212 R. Macdonald and G. Joughin
References
Beer, S., Strachey, C., Stone, R., & Torrance, J. (1999). Model. In A. Bullock & S. Tromble
(Eds.), The new Fontana dictionary of modern thought (3rd ed.), (pp. 536–537). London:
Harper Collins.
Boud, D., & Falchikov, N (Eds.). (2007). Rethinking assessment in higher education. Abing-
don, UK: Routledge.
Brown, S., & Glasner, A. (Eds.). (1999). Assessment matters in higher education. Buckingham,
UK: SRHE/Open University Press.
Bryan, C., & Clegg, K. (2006). Innovative assessment in higher education. Abingdon, UK:
Routledge.
Checkland, P. (1993). Systems thinking, systems practice. Chichester: John Wiley and Sons.
Checkland, P. (1999). Soft systems methodology: A 30-year retrospective. Chichester: John
Wiley and Sons.
Chelimsky, E. (1997). Thoughts for a new evaluation society. Evaluation, 3(1), 97–118.
Cooperrider, D. L., Whitney, D., & Stavros, J. M. (2005). Appreciative inquiry handbook.
Brunswick, Ohio: Crown Custom Publishing.
Dooley, K. (1997). A complex adaptive systems model of organization change. Nonlinear
Dynamics, Psychology, and life Sciences, 1(1), 69–97.
Eckel, P., Green, M., Hill, B., & Mallon, W. (1999). On change III: Taking charge of change:
A primer for colleges and universities. Washington, DC: American Council on Education.
Retrieved October 12, 2007, from http://www.acenet.edu/bookstore/pdf/on-change/on-
changeIII.pdf
Eoyang, G., Yellowthunder, L., & Ward, V. (1998). A complex adaptive systems (CAS)
approach to public policy making. Society for Chaos Theory in the Life Sciences. Retrieved
October 12, 2007, from http://www.chaos-limited.com/gstuff/SCTPLSPolicy.pdf
Fullan, M. (2001). The new meaning of educational change (3rd ed.). London: RoutledgeFalmer.
Gladwell, M. (2000). The tipping point. London: Abacus.
Heywood, J. (2000). Assessment in higher education: Student learning, teaching, programmes
and institutions. London: Jessica Kingsley.
Hopkins, D. (2002). The evolution of strategies for educational change – implications for higher
education. Retrieved October 10, 2007, from archive at The Higher Education Academy
Website: http://www.heacademy.ac.uk
Joughin, G., & Macdonald, R. (2004). A model of assessment in higher education institutions.
The Higher Education Academy. Retrieved September 11, 2007, from http://www.
heacademy.ac.uk/resources/detail/id588_model_of_assessment_in_heis
Keeves, J. P. (1994). Longitudinal research methods. In T. Husén, and N. T. Postlethwaite,
(Eds.), The international encyclopedia of education (2nd edn), (Vol. 6, pp. 3512–3524).
Oxford: Pergamon Press.
Kezar, A. (2001). Understanding and facilitating organizational change in the 21st century:
Recent research and conceptualizations. San Francisco: Jossey-Bass.
Knight, P. (Ed.). (1995). Assessment for learning in higher education. London: Kogan
Page.
Knight, P. (2000). The value of a programme-wide approach to assessment. Assessment and
Evaluation in Higher Education, 25(3), 237–251.
11 Changing Assessment in Higher Education 213
Knight, P., & Trowler, P. (2000). Department-level cultures and the improvement of learning
and teaching. Studies in Higher Education, 25(1), 69–83.
Knight, P., & Yorke, M. (2003). Assessment, learning and employability. Buckingham, UK:
SRHE/Open University Press.
Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cam-
bridge, UK: Cambridge University Press.
Matthews, K. M., White, M. C., & Ling, R. G. (1999). Why study the complexity sciences in
the social sciences? Human Relations, 52(4), 439–462.
Meadows, D. (1999). Leverage points: Places to intervene in a system. Hartland: The Sustain-
ability Institute. Retrieved 27 September, 2007, from http://www.sustainabilityinstitute.
org/pubs/Leverage_Points.pdf
Miller, A.H., Imrie, B.W., & Cox, K. (1998). Student assessment in higher education. London:
Kogan Page.
Miller, C. M. L., & Parlett, M. (1974). Up to the mark: A study of the examination game.
London: SRHE.
Nash, J., Plugge, L., & Eurlings, A. (2000). Defining and evaluating CSCL Projects. Unpub-
lished paper, Stanford, CA: Stanford University.
Owen, H. (1997). Open space technology: A user’s guide. San Francisco: Berrett-Koehler
Publishers.
Quality Assurance Agency. (2006a). Code of practice for the assurance of academic quality
and standards in higher education, Section 6: Assessment of students. Retrieved October
12, 2007, from http://www.qaa.ac.uk/academicinfrastructure/codeOfPractice/section6/
default.asp
Quality Assurance Agency for Higher Education (2006b). Handbook for institutional audit:
England and Northern Ireland. Mansfield: The Quality Assurance Agency for Higher
Education.
Schwarz, P., & Gibbs, G. (Eds.). (2002). Assessment: Case studies, experience and practice
from higher education. London: Kogan Page.
Seel, R. (2005). Creativity in organisations: An emergent perspective. Retrieved 3 October,
2007, from http://www.new-paradigm.co.uk/creativity-emergent.htm
Shaw, P. (2002). Changing conversations in organizations: A complexity approach to change.
Abingdon, UK: Routledge.
Snyder, B. R. (1971). The hidden curriculum. New York: Knopf.
Stacey, R. (2003). Strategic management and organizational dynamics: The challenge of
complexity (4th ed.). Harlow, England: Prentice-Hall.
Stacey, R. D. (2007). Strategic management and organizational dynamics (5th ed.). Harlow:
Pearson Educational Limited.
Stacey, R. D., Griffin, D., & Shaw, P. (2000). Complexity and management: Fad or radical
challenge to systems thinking? London: Routledge.
Tosey, P. (2002). Teaching at the edge of chaos. York: LTSN Generic Centre.
Retrieved October 4, 2007, from http://www.heacademy.ac.uk/resources.asp?
process=full_record§ion=generic&id=111
Trowler, P., Saunders, M., & Knight, P. (2003) Changing thinking, changing practices. York:
Higher Education Academy, available from enquiries@heacademy.ac.uk
Wenger, E, McDermott, R., & Snyder, W. M. (2002). Cultivating communities of practice.
Boston, Mass: Harvard Business School Press.
Yorke, M. (2001) Assessment: A guide for senior managers. York: Higher Education Academy
Chapter 12
Assessment, Learning and Judgement:
Emerging Directions
Gordon Joughin
Introduction
G. Joughin
Centre for Educational Development and Interactive Resources,
University of Wollongong, Australia
e-mail: gordonj@uow.edu.au
These movements raise a number of challenges, each far from trivial, which
could well set the agenda for theorising, researching, and acting to improve
assessment for the next decade and beyond.
Conceptual Challenges
The conceptual challenges posed here call for the ongoing exploration of
central tenets of assessment, subjecting them to critical scrutiny, and extending
our understanding of the emerging challenges to orthodoxy noted in this
book. Core concepts and issues that demand further scrutiny include the
following:
The purposes of assessment. While the multiple functions of assessment have
long been recognised, confusion continues regarding how these functions can be
held in creative tension. Terminology does not help – the very term formative
assessment, for example, suggests to many assessment of a certain type, rather
than assessment performing a certain function. The term learning-oriented
assessment is a step in the right direction, but only if it is seen as a way of
looking at assessment, rather than as a type of assessment. A single piece of
assessment may be, to varying extents, oriented towards learning, oriented
towards informing judgements about achievement, and oriented towards main-
taining the standards of a discipline or profession. Of course, according to the
definition of assessment proffered in Chapter 2, assessment will always entail
judgements about the quality of students’ work, irrespective of the purpose to
which these judgements are put. The challenge is to define assessment in its own
right, and to use this singular definition in understanding and defining each of
assessment’s multiple purposes.
The location of assessment. Boud has argued for locating assessment in
the context of the world of work which students will inhabit on graduation.
This context not only helps to define what kinds of assessment tasks will be
appropriate, but also emphasises the need for students to develop the capacity
to assess their own work, since this becomes a routine function in the workplace.
Once this perspective is adopted, other constructs fall into place, including the
use of complex, authentic tasks, the inclusion of generic attributes as integral
objects of assessment, and tasks that deter plagiarism and engage students in
learning through embedding assessment in current but constantly changing real
world contexts. The challenge is to unite the often necessarily abstracted nature
of learning in universities with its ultimate purpose – students need not only to
learn, but to learn in order to be able to act, and to be able to act in practice-like
contexts where they can experience themselves as responsible agents and be
aware of the consequences of their actions.
Assessment and judgement. Assessment as measurement represents a para-
digm that has been strongly articulated over decades. Assessment as judgement,
though underpinning what Dochy terms the new modes of assessment which
218 G. Joughin
have been the focus of attention in assessment literature for a least the past
twenty years, has received far less attention. Eisner’s work on connoisseurship
(Eisner, 1998) and Sadler and Boud’s work described in this book are notable
exceptions. The challenge is to continue the work of conceptualizing and
articulating assessment as judgement, drawing on both education and cognate
disciplines to illuminate our understanding of the nature of professional judge-
ment and how judgements are made in practice. The use of criteria has come to
be seen as central to judgement, to the extent that criterion-referenced assess-
ment has become an established orthodoxy in the assessment literature and the
policies, if not the practices, of many universities. Indeed, both Suskie and
Riordan and Loacker see criteria as central to learning. Sadler has challenged
us to reconsider the unthinking application of criteria as a basis of judgement,
to review our understanding of quality and its representation, and to legitimate
the role of holistic judgements.
Research Challenges
Practice Challenges
Assessment tasks should require responses that students need to create for
themselves, and should be designed to avoid responses that can be simply
found, whether on the Internet, in the work of past students, or in prescribed
texts or readings.
Assessment tasks should become the basis of learning rather than its result,
not only in the sense of students’ responses informing ongoing teaching as
proposed by Suskie in Chapter 8, but perhaps more importantly in Sadler’s
sense of assessment as a process whereby students produce and appraise
rather than study and learn (Sadler, Chapter 4).
Central to the argument of this book is the role of students as active agents
in the acts of judgement that are at the heart of assessment. This requires
recognising that assessment is appropriately a matter that engages students,
not just teachers, in acts of appraisal or judgement, and therefore working with
students whose conceptions of assessment tend to place all authority in the
hands of teachers, in order to reshape their conceptions of assessment. In short,
this entails making assessment an object of learning, devoting time to help
students understand the nature of assessment, learn about assessment and
their role in it, especially in terms of self-monitoring, and learn about assess-
ment in ways that parallel how they go about the substantive content of their
discipline.
Professional Development
These challenges to practice clearly call for more than a simple change in
assessment methods. They require a highly professional approach to assessment
in a context where, as Ramsden has argued, ‘‘university teachers frequently
assess as amateurs’’ (Ramsden, 2003, p. 177). Developing such an approach
places considerable demands on the ongoing formation of university teachers,
though fortunately at a time when the professional development of academics
as teachers is being given increasing attention in many countries. This forma-
tion is undoubtedly a complex process, but it would seem to entail at least the
following: providing access to existing expertise in assessment; providing time
and appropriate contexts for professional development activities; motivating
staff to engage in professional development through recognising and rewarding
innovations in assessment; developing banks of exemplary practices; incorpor-
ating the expertise of practitioners outside the academy; and, perhaps critically,
making assessment a focus of scholarly activity for academics grappling with
its challenges.
12 Assessment, Learning and Judgement 221
References
Eisner, E. (1985). The art of educational evaluation: A personal view. London: Falmer Press.
Marton, F., & Säljö, R. (1997). Approaches to learning. In F. Marton, D. Hounsell, &
N. Entwistle (Eds.), The experience of learning (2nd ed., pp. 39–58). Edinburgh: Scottish
Academic Press.
Ramsden, P. (2003). Learning to teach in higher education (2nd ed.). London: Routledge-
Falmer.
Author Index
A Bloom, B. S., 77
Adelman, C., 73 Bloxham, S., 46, 172
Amrein, A. L., 86 Bollag, B., 175
Anderson, L. W., 70, 77 Borich, G. D., 145
Anderson, V. J., 46, 137 Boud, D., 2, 3, 6, 7, 9, 13, 14, 15, 20, 29, 30,
Angelo, T., 123 35, 36, 88, 89, 106, 153, 195, 215, 217, 218
Angelo, T. A., 134, 148, 149 Boyer, E., 185
Arenson, K. W., 175 Braddock, R., 46
Askham, P., 88, 91, 92 Branch, W. T., 146
Astin, A. W., 134 Brandon, J., 70
Atherton, J., 117 Bransford, J. D., 186
Au, C., 118 Brennan, R. L., 99
Bridges, P., 69, 72
Brown, A. L., 121
B Brown, E., 24
Baartman, L. K. J., 95, 106 Brown, G., 1
Bagnato, S., 98 Brown, S., 88, 106, 195
Bain, J., 88, 90 Brumfield, C., 73
Bain, J. D., 21, 22 Bruner, J., 117
Baker, E., 95 Bryan, C., 17, 195
Ball, S. J., 154, 160 Bull, J., 1
Banta, T. W., 139 Burke, E., 6, 46
Barr, R. B., 134 Butler, D. L., 92, 93
Bastiaens, T. J., 95, 106 Butler, J., 15, 35
Bates, I., 168
Bateson, D., 100
Baume, D., 69 C
Baxter Magolda, M., 124 Campbell, D. T., 139
Becker, H. S., 17, 18 Carless, D., 2, 13
Beer, S., 194 Carroll, J., 7, 8, 115, 125, 216
Bekhradnia, B., 81 Cascallar, E., 86
Bennet, Y., 98 Chanock, K., 24, 71
Berliner, D. C., 86 Checkland, P., 199, 200
Biesta, G., 162, 163 Chelimsky, E., 208
Biggs, J. B., 17 Chen, M., 99
Biggs, J., 89, 91, 93, 135, 136 Chi, M. T. H., 46
Birenbaum, M., 87, 88, 90, 94, 96 Chickering, A. W., 134
Biswas, R., 71 Clark, R., 24
Black, P., 23, 35, 86, 92, 93, 153, 155, 156, 166 Clegg, K., 17, 195
223
224 Author Index
Silvey, G., 21 U
Simpson, C., 2, 19, 23, 127 Ui-Haq, R., 107
Skelton, A., 27 Underwood, J., 120
Sluijsmans, D., 93, 106
Smith, J., 46
Snyder, B. R., 17, 18, 195 V
Snyder, W. M., 206 Vachtsevanos, G. J., 71
Spiro, R. J., 88 Van de Watering, G., 100
Stacey, R., 201, 203, 204, 206, 211, 212 Van der Vleuten, C. P. M., 95, 106
Stanley, J. C., 139 Vandenberghe, R., 93
Starren, H., 89 Vermunt, J. D. H. M., 91
Stavros, J. M., 207, 208 Villegas, A. M., 69
Stevens, D. D., 46 Vygotsky, L., 117
Stone, R., 194
Strachey, C., 194
Struyf, E., 93 W
Struyven, K., 21, 87 Wade, W., 107
Suen, H. K., 98 Walvoord, B., 46, 70, 137
Suskie, L., 7, 8, 9, 46, 133, 134, 135, 138, 145, Ward, V., 202
216, 218, 220 Webb, N. M., 99
Szabo, A., 120 Webster, F., 70, 71, 72
Wenger, E., 34, 206
West, A., 46, 172
T White, M. C., 202
Tagg, J., 134
Whitney, D., 207, 208
Tan, C. M., 88, 90, 93
Whitt, E. J., 134
Tan, K. H. K., 72
Wiggins, G., 178
Tang, K. C. C., 21
Wiliam, D., 23, 35, 86, 92, 153,
Tanggaard, L., 34, 37
155, 156
Taylor, L., 120
Willard, A., 100
Terenzini, P. T., 134
Winne, P. H., 92
Terry, P. W., 20
Woolf, H., 46, 69, 72
Thomas, P., 21, 22, 88, 90
Thomson, K., 88
Tilley, A., 129
Tinto, V., 137 Y
Topping, K., 88, 104 Yellowthunder, L., 202
Torrance, H., 154, 157, 163, 167, 169, 171 Yorke, M., 7, 9, 65, 68, 74, 75, 80, 196,
Torrance, J., 194 197, 216
Tosey, P., 203
Trigwell, K., 88
Trowler, P., 194, 196, 199 Z
Twohey, M., 175 Zieky, M. J., 140
Subject Index
A B
Abilities, 7, 47, 105, 163, 175, 177, 180, 183, Backwash, 89, 91, 93
216, 219 Bias, 90, 100, 105, 107
Ability-based curriculum, 9, 175, 179, 189 Bologna Declaration, 191
Ability-based Learning Outcomes:
Teaching and Assessment at Alverno
College, 187 C
Aesthetic engagement, 177, 183 Carnegie Foundation for the Advancement
Agents, 36, 41, 201–202, 210 of Teaching and Learning, 185
Alverno College, 4, 9, 175, 181, 187, CATWOE, 200
189, 216 Change, 1, 4, 74, 204–210, 221
Amotivated learners, 158 Cheating, 107, 121, 122–123
Analysis, 9, 71, 73, 79, 95, 144, 145, 163, 167, Citation practices, 121
172, 177 Claims-making, 80–82
Analytic, 6, 7, 45, 46, 48, 50, 51, 52, 53, 54, 55, Classification, 67, 78
56, 57, 62, 69, 160, 178, 183 Classroom assessment, 35, 86
Analytic rating scales, 51 Clinical decision making, 46
Analytic rubrics, 51, 62 Co-assessment, 85, 88
Appraising, 16, 45, 48, 54, 58 Co-construction of knowledge, 32
Appreciative inquiry, 207, 208, 211 Code of Practice on Assessment, 198
Apprenticeship, 37–38 Cognitive complexity, 8, 99, 100, 101, 108, 109
Approaches to learning, 13, 16, 19–22, 24, Cognitive development, 123, 124
158, 218 Committee on the Foundations of
Assessment at Alverno College, 175, 187 Assessment, 16
Assessment-driven instruction, 90 Communication, 76, 104, 139, 142–143,
Assessment Experience Questionnaire, 19 177, 204
Assessment for learning, 34–37, 85, 91–93, Community of judgement, 30, 39
107, 155, 157, 209 Community of practice, 30, 34, 38
Assessment of learning, 85, 155, 188 Competency grades, 37, 102
Assessment for practice, 38–40 Complex adaptive systems, 4, 10, 201,
Assessment Reform Group, 153, 156 204–205, 208, 221
Authentic assessment, 6, 8, 21, 33, 34, 40, Complexity theory, 201, 202, 204, 208,
47, 54, 58, 61, 87, 91, 94, 98, 99, 100, 101, 211, 215
105, 106, 107, 108, 109, 127, 129, Conditions of learning, 23
161, 217 Conference on College Composition and
Authenticity, 8, 40, 100, 101, 108, 109 Communication, 147
Authentic practice, 33 Connoisseurship, 57, 77, 218
Autonomy, 153, 156, 158, 161, 168, 170, 173, Consequential validity, 88, 91, 93, 101, 102,
184, 207 107, 108, 109, 134
229
230 Subject Index
D G
Dearing Report, 74, 80 Generalisabilty, 77, 96, 97, 99, 102, 103,
Deep approach, 16, 19, 21, 22, 218 105–106, 109
Definition of assessment, 6, 13–16, 215, 217 Generic abilities, 7, 216, 219
Diagnostic assessment, 155 Ghost writers, 120
Directness, 8, 99, 100, 101, 104, 108 Global grading, 46
Disciplines as Frameworks for Student Global perspective, 177
Learning: Teaching the Practice of the Grade inflation, 69, 73–74
Disciplines, 187 Grade point perspective, 18
Discrimination, 145, 146 Grades, 3, 65, 72–73, 74, 77, 78, 82, 125,
149, 159
Grading, 6, 7, 14, 35, 39, 42, 45–62, 65–82,
E 138, 139
Edumetric, 85–109 Graduateness, 75, 81
Effective citizenship, 177
Emergence, 201, 203–204
Emotion, 13, 14, 24, 33, 39, 76 H
Enhancing Student Employability Hard systems, 199–200
Co-ordination Team, 75 The Hidden Curriculum, 17, 18
Evaluative expertise, 49, 58, 60 Higher Education Academy, 75, 211
Examination, 2, 14, 17, 20, 29, 30, 34, 68, 72, Higher Education Funding Council for
81, 90, 98, 118, 128, 129, 138, 139, 140, England, 75
144, 145, 155, 157, 219, 221 Higher Education Quality Council, 71,
Expectations, 9, 29, 48, 69, 70, 71, 81, 75, 81
104, 107, 125, 134, 136, 149, 150, 153, High stakes assessment, 4, 8, 86, 87
154, 163, 165, 169, 170, 171, 172, 180, Holistic assessment, 3, 5, 6, 7, 8, 31, 40,
185, 186, 197 45–62, 69, 138, 157, 216, 218
Subject Index 231
I Multidisciplinary, 32, 96
Improving Formative Assessment, 157 Multiple choice, 2, 20, 21, 22, 90, 94, 134,
Inference, 16, 56, 102, 103 141, 144–146, 147, 150, 219
Institutional impacts, 194–195 Multiple choice tests/Multiple-choice tests,
Integrity, 50, 62, 115, 116, 119, 121, 123, 94, 134, 141, 144, 145, 147, 150, 219
124, 210
Internet Plagiarism Advisory Service, 115
Inter-rater reliability, 98, 99 N
Intrinsic motivation, 90, 158, 160, 165, 170 National Committee of Inquiry into Higher
Introjected motivation, 159, 161, 168 Education, 74, 80
New modes of assessment, 5, 7, 8, 85–109,
215, 217
J Norm-referenced (assessment), 7, 34, 41, 155
Judgement/judgment, 1, 3–4, 5, 6, 7, 8, 10,
13–25, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 76, 77, 78, 79, 81, 82, 93, 96, O
97, 98, 100, 105, 119, 215–221 Objectives, 3, 49, 69, 70, 134, 144, 146, 156,
200, 205
Objective test, 22
L Open book examination, 22
Latent criteria, 58 Oral assessment, 22
Learning culture, 9, 161–173, 187 Oral presentation, 133
Learning to learn, 6, 153 Outcomes, 3, 4, 8, 14, 21, 37, 39, 40, 41, 42,
Learning-oriented assessment, 216, 217 53, 60, 66, 68, 70, 78, 90, 96, 133, 136, 138,
Learning outcomes, 3, 7, 9, 29, 30, 35, 37, 66, 153, 155, 156, 159, 169, 171, 188, 189, 190,
76, 125, 133, 137, 138, 147, 149, 150, 159, 199, 203, 205, 209
172, 175, 176, 177, 180, 182, 183, 184, 187,
188, 189, 190, 191, 193, 197, 218
Learning styles, 135, 185 P
Learning that Lasts, 187 Patch writing, 121
Loose-coupling, 200 Pedagogical validity, 134
Pedagogy, 45, 61, 68, 138, 149, 166
Peer appraisal, 50, 60
M Peer-assessment, 46, 61
Making the Grade, 17, 18 Peer review, 127
Manifest criteria, 58 Perceptions of assessment, 14
Marking, 14, 34, 46, 48, 68, 69, 70, 77, 107, Performance, 2, 3, 7, 9, 10, 14, 23, 32, 34, 35,
137, 169 39, 41, 47, 49, 51, 61, 66, 67, 68, 69, 70, 71,
Massachusetts Institute of Technology, 18 72, 73, 75, 76, 77, 79, 85, 87, 94, 97, 100,
Measurement, 3, 4, 5, 7, 15, 29, 35, 36, 48, 75, 101, 102, 103, 104, 106, 107, 108, 109, 117,
77, 86, 94, 95, 96, 98, 99, 103, 139, 216, 138, 141, 147, 156, 158, 178, 179, 180, 181,
217, 219 185, 188, 189, 190, 191
Measurement model, 15, 35 Performance assessment, 10, 97, 102, 109
Memorising, 91, 107 Personal development planning, 80
Menu marking, 69 Plagiarism, 5, 7, 8, 115–130, 198, 199, 215, 217
Metacognition, 49, 135, 146 Portfolio, 21, 80, 81, 85, 88, 153, 181
Minimal marking method, 137 Post-assessment effects, 89–91
Misconduct, 122 Post-compulsory education, 154–160, 171, 206
Models, 15, 31, 32, 34, 35, 46, 194, 201, 211 Practice, 1, 2, 3, 4, 5, 6, 8, 9, 10, 13, 15, 22–24,
Monitor, 45, 48, 49, 56, 127 29–42, 46, 50, 51, 52, 56, 58, 59, 60, 65, 66,
Motivation, 8, 9, 49, 74, 86, 87, 90, 93, 104, 68, 73, 76, 79, 86, 87, 88, 92, 120, 122, 123,
106, 107, 125, 153, 155, 158–161, 163, 126, 129, 133–150, 155, 156, 161, 163, 166,
165–166, 168–170, 171, 172, 173, 180 170, 175–191, 193, 198, 199, 200, 206, 208,
Multi-criterion judgements, 57 215, 216, 218, 219–220
232 Subject Index
S
Scholarship, 15, 176, 184, 185, 186, 189, 215 U
Scholarship Reconsidered, 185 Understanding, 1, 4, 5, 6, 9, 10, 14, 19, 23,
Scholarship of teaching, 185 39, 48, 71, 80, 103, 117, 118, 122, 123,
Selection, 29, 34, 49, 54, 79, 94, 117, 125, 129, 135, 138, 141, 146, 153,
155, 171 154–163, 165, 166, 167, 170, 177, 178,
Self-assessment, 10, 15, 23, 106, 156, 163, 179, 182, 187, 191, 193, 194, 199, 200,
166, 170, 216 201, 202, 205, 207, 208, 215, 216, 217,
Self Assessment at Alverno College: Student, 218, 219
Program, and Institutional, 187 Unseen exam, 22, 219
Self-monitoring, 48, 49, 60, 61, 62, 80, 220 Up to the Mark, 17
Subject Index 233
V W
Validity, 4, 5, 8, 45, 51, 53, 54, 55, 65, 66, 72, Warranting, 77, 78, 79
73, 78, 86, 88, 89, 91, 93, 94, 95–97, 98, 99, Watchful anticipation, 203
101, 102, 104, 105, 106, 107, 108, 109, 134, Wellesley College, 18
144, 171 Who Needs Harvard?, 175
Valuing in decision-making, 177, 183 Wicked competencies, 3, 7
Vocational, 153–173 Work-based assessment, 74, 75, 76, 154
Vocational Qualifications, 66, 154, Write Now, 116, 121
155, 167 Written assignment, 22, 24, 47, 198