Developing A New Conceptual Framework For Pre-University Art Examinations

Developing a new conceptual framework for pre-university art examinations in
Portugal
Maria Teresa Torres Pereira de Eça
Submitted in partial fulfilment

of the requirements for the degree of
Doctor of Philosophy
School of Education Studies, University of Surrey Roehampton

University of Surrey
2004
1
Abstract
The research topic was external assessment for art and design at pre-university level
(age 17+). The research field was focused on formal assessment in art education,
especially on the visual arts at the end of secondary schooling (ages 17-18). In the
literature terms such as art, arts, art and design are used in the context of art
education. Through the study such terms were used interchangeably according to the
context.The principal audience for this study included key stakeholders in the
national examinations in Portugal - government, government agencies, university
admissions systems, and art teachers. It may also be relevant to the international
community of researchers in art and arts education and related subjects and
specialists in arts assessment in general.The study was undertaken in two stages: (1)
analysis of current art examinations at pre-university level in Portugal and England
and (2) the development of a conceptual framework for external assessment in art
education in general, and a pilot and trial of a new instrument and assessment
procedures for art examinations in Portugal. The first stage included three sections:
(1) The literature on assessment was reviewed and current problems in assessing the
arts were identified, including issues of quality such as reliability, validity, impact
and practicality. It was established that valid and reliable assessment in art education
required assessment instruments with clear instructions and criteria that allow
multiple evidence to be considered and that such assessment procedures should
include provision for in-service teacher training, standardisation and moderation. A
conceptual framework was developed for evaluating art examinations. (2) The
current Portuguese system of art examinations (age18) was critiqued. Analysis of
relevant documentation and questionnaires conducted with Portuguese art teachers
and art students enabled identification of key issues such as the lack of validity and
reliability in the system, as well as collecting proposals for alternative improved
forms of assessment. (3) The English system of art and design examinations (age
17+) was partially reviewed through direct observation of one example of
examination procedures, analysis of documents and the recording the views of
selected stakeholders. The second stage also included two sections that described and
evaluated: (1) the design and piloting of a new framework for external assessment of
art education in Portugal based on a portfolio assessment extended task, in-service
teacher training and standardisation procedures; (2) the trial of a new external
assessment instrument and procedures in five schools in Portugal.
The new assessment instruments and procedures were compared with the current
Portuguese art examination, establishing that greater validity and reliability was
achievable. The study concluded with discussion of the negative and positive aspects
of the proposed conceptual framework, the implications for assessment stakeholders
in Portugal including the need for reform in Portuguese art education. Further
research into alternative forms and methods of assessment was also recommended.
Keywords: Art Education; Arts; Art and Design; Art Examinations; Assessment and
Evaluation; Authentic Assessment; Portfolio Assessment.
2
List of Contents
3
Pages
Abstract i
List of Contents ii
List of Figures ix
List of Tables x
Acknowledgements xii
Introduction 1
0.1. The problem area 1

0.1.1. Multiple viewpoints about art education 3
0.1.2. Assessment considerations 5
0.1.3 Key difficulties in art assessment 6
0.2. Contrasting systems of art examinations: Portugal and England 11
0.3. Summary problem statement 13
0.4. Research aims 13
0.5. Research questions 14
0.6. Outline of chapters 15
Chapter 1:Assessment in Art Education 18

1.1. Assessment 18
1.1.1. Roles of assessment 20
1.1.1.1. Feedback, motivation 22
1.1.1.2. Surveillance, control, accountability 24
1.1.2. Forms of assessment: formative/summative 25
1.1.2.1. Formative assessment 26
1.1.2.2. Summative assessment 26
1.1.2.3. Ipsative assessment and self-assessment 27
1.1.2.4. Diagnostic assessment 27
1.1.2.5. Negotiated assessment 28
1.1.2.6. External assessment 29
1.1.3. Assessment as a social product 29
1.1.4. External assessment and examinations 30
1.2. Assessment and the arts 31
1.2.1. Why assess the arts? 32
1.2.2. Assessment instruments and objects of assessment 33
1.2.2.1. Moving away from tests 34
1.2.2.2. New approaches for external assessment 37
1.2.2.3. Records of achievement 38
1.2.2.4. Portfolio 39
1.2.3. Standards, criteria, attainment targets 41
1.3. Assessing the arts: current problems 46
1.3.1. The problem of standards and criteria 46
1.3.2 The case for a consensual view of criteria 47
1.3.3. The problem of teacher in-service training 48
4
Summary 49
Chapter 2: Reliability, Validity, Impact, Practicality 51
2.1 Reliability 51
2.1.1. Problems of reliability in art examinations 52
2.1.1.1. Instrument 53
2.1.1.2. Assessor and examiners training 54
2.1.1.3. Judgements and moderation 54
2.1.1.4. Threats to reliability summarised 55
2.2. Validity 55
2.2.1. Some types of validity 56
2.2.1.1. Content validation 57
2.2.1.2. Face validation 57
2.2.1.3. Response validation 58
2.2.1.4. Washback validation 58
2.2.1.5. Criterion-related validation 59
2.2.1.6. Validation related to examination bias 60
2.2.1.7. Construct validation 61
2.2.2. Problems of validity in art examinations 62
2.2.2.1. Assessment instrument 62
2.2.2.2. The examination underlying model 63
2.3. Impact 65
2.3.1. Effects on society and individuals 66
2.3.2 Effects on students 67
2.3.3. Effects on teachers and teaching methods 67
2.3.4. Effects on schools 68
2.3.5. Effects on standards and curriculum 69
2.3.6. Effects on higher education 69
2.3.7. Ethical considerations 70
2.3.8. Evaluating the impact of examinations 70
2.4. Practicality 71
2.5. Conceptual framework for evaluating art examinations 72
Summary 75
Chapter 3: Design of the research 77
3.1. Research questions 77

3.2. Choice of method 77
3.3. Stages of research 79
3.3.1.3.3.1. Stage 1 80
3.3.2 Stage 2 80
3.3.2.1. Trial 1 80
3.3.2.2. Trial 2 81
3.4. Instruments and data collection 83
3.4.1. Qualitative data collection 83
3.4.1.1. Document sources 83
3.4.1.2. Observation 84
3.4.1.3. Interviews 85
5
3.4.2. Quantitative data 90
3.4.2.1. Questionnaires 90
3.4.2.2. Other data 93
3.5. Data analysis 93
3.5.1. Document analysis 93
3.5.2. Analysis of observation notes 94
3.5.3. Analysis of interviews 94
3.5.4. Descriptive statistics 98
3.5.5. Multi-faceted analysis 98
3.6. Reliability of data 101
3.7. Triangulation 102
3.8. Ethical considerations 103
Summary 105
Chapter 4: Portuguese art examinations 107
4.1. Historical background overview 107

4.2. Assessment in secondary education after 1996 111
4.2.1. General regulations for assessment 111
4.2.2. National examinations 113
4.2.2.1. Instruments 113
4.2.2.2. Procedures 115
4.3. The model of Portuguese secondary art curriculum 119
4.3.1. Origins of current art curriculum 119
4.3. 2. Assessment instruments in Portuguese art examinations 122
4.3.3. Assessment procedures in Portuguese art examinations 125
4.4. Users’ perceptions of Portuguese art examination practices 131
4.4.1. Respondents 131
4.4.2. Description and analysis of questionnaire results 132
4.4.2.1 Syllabuses ( Section 2) 132
4.4.2.2. Opinions about the validity of the examination (Section 3 133
and 4)
4.4.2.3 The ideal form and structure of art and design examinations 137
(Section 5)
4.4.2.4. Opinions about current procedures (Section 6) 140
4.4.2.5. Reliability (Section 7) 141
4.4.2.6. Opinions about ideal assessment procedures (Section 8) 142
4.4.2.7. Impact: consequences of the examination (Section 9) 144
4.4.2.8. Teachers views about assessment procedures (Sections 10- 148
12)
Summary 150
Chapter 5: Art and design examinations in England 153
5.1. Introduction 153

5.2. Art and design examinations before the 1988 Reform Act 156
5.3. The consequences of the Education Reform Act (1988) in 161
England and Wales
5.4. AS and A level art examinations after curriculum 2000 163
6
5.4.1. The role of the QCA 164
5.4.2. The structure of awarding bodies 164
5.4.3. A modular structure of courses and examinations 166
5.4.4. Scheme of assessment 169
5.4.5. The specifications 169
5.4.6. Rationales for the specifications 170
5.4.7. Instructions for the conduct of examinations 172
5.4.8. Assessment instruments 176
5.4.8.1. Coursework 176
5.4.8.2. The ‘controlled test’ and question papers 176
5.4.8.3. Type and format of the assessment instruments 177
5.4.8.4. Process orientated instrument 179
5.4.8.5. Bias 181
5.4.8.6. Gender bias 181
5.4.8.7. Bias related to non English students 182
5.4.8.8. Bias related to geographical location and economical 182
background
5.4.8.9. Teachers’ influence upon students’ achievement 183
5.5. Assessment criteria 184
5.5.1. Mark schemes 186
5.5.2. Criterion referenced marking 188
5.6. Assessment procedures 191
5.6.1. In- service teacher training 191
5.6.2. Internal marking and internal moderation 192
5.6.3. Standardisation 194
5.6.3.1. Observation of standardisation procedures 194
5.6.4. Moderation 198
5.7. Conclusions in terms of validity summarised 199
5.7.1. Weaknesses 199
5.7.2. Strengths 200
5.8. Conclusions in terms of reliability summarised 201
5.8.1. Weaknesses 201
5.8.2. Strengths 201
5.9. Practicality 202
5.10 Impact 203
5.11. Suggestions for art and design examinations 205
Summary 206
Chapter 6: Design and piloting of a new external assessment 208

instrument and procedures in
Portugal
6.1. Introduction 208

6.2. Designing a new assessment instrument and procedures for art 209
and design external assessment in Portugal
6.2.1. Rationales 209
6.2.2. Skills in art and design 210
6.2.2.1. Knowledge base 210
6.2.2.2. Thinking skills and metacognitive skills 211
7
6.2.2.3. Self-evaluation skills 212
6.2.3. Defining a content framework for the new art and design 213
external assessment
6.2.3.1. Kinds of evidence 215
6.2.3.2. Portfolio 215
6.2.3.3. Portfolio evidence 215
6.2.3.4. Assessment criteria 217
6.2.3.5. Assessment procedures 217
6.3. Piloting the new assessment instrument 218
6.3.1. Pilot sample (trial 1) 219
6.3.2. Place, time and programme 220
6.3.3. Feedback from the pilot 221
6.3.3.1. Underlying model 223
6.3.3.2. Format 223
6.3.3.3. Bias 229
6.3.3.4. Tasks 230
6.3.3.5. Time and resources constraints 232
6.3.3.6. Criteria and weightings 234
6.3.3.7. Impact 235
6.3.3.8. Assessment procedures 236
6.4. Suggestions for improvement 238
Summary 239
Chapter 7: Main Study: Trial of the new external assessment instrument 240
and procedures in five schools in Portugal
7.1. Trial sample 240

7.1.1. Data collection 242
7.1.2. Location, time and programme 242
7.2. Description of the activities 243
7.2.1. Standardisation meetings 243
7.2.2. Web page and on-line meetings 245
7.2.3. Implementing the portfolio with students 246
7.2.3.1. School P 247
7.2.3.2 School B 249
7.2.3.3. School A 251
7.2.3.4. School K 253
7.2.3.5. School V 256
7.2.4. Moderation 258
7. 3. Feedback from Trial 2 260
7.3.1. Underlying concept 260
7.3.2. Preparation 260
7.3.3. Format 261
7.3.4. Instructions 262
7.3.5. Time 263
7.3.6. School resources 265
7.3.7. Tasks 266
8
7.3.8. Criteria and weightings 268
7.3.9. Bias 270
7.3.10. Assessment procedures 273
7.3.11. Consequences/Impact 274
7.3.11.1. Results 274
7.3.11.2. Effects upon students 275
7.3.11.3. Effects upon teachers 275
7.3.11.4. Effects upon schools and curriculum 277
7.3.12. Reliability of results 278
7.4. Comparing the current MTEP examination and new model 279
7.4.1. Validities 279
7.4.2. Reliability 279
7.4.2.1. Summary of the FACETS 281
7.4.2.2. Reliability index 284
7.4.2.3. Raters measurement reports 284
7.4.2.4. Probability curves 288
7.4.2.5. Expected ogives 290
7.4.3. Consequences/impact 292
7.4.4. Practicalities 292
Summary 293
Chapter 8: Conclusions 296

8.1. Current system of art examinations in Portugal (2000-2003) 296
8.2. Current system of art and design examinations in England (2000- 299
2003)
8.3. Impact of the English and Portuguese art and design 302
examinations
8.4. Increasing the validity of Portuguese art and design external 303
assessment
8.5. Increasing the reliability of Portuguese art and design external 304
assessment results
8.6. The proposed model for art examinations 304
8.6.1. The underlying model 304
8.6.2. The proposed instrument 305
8.6.3. The assessment procedures 311
8.6.4. Impact 312
8.6.5. Strengths and Weaknesses 313
8.6.5.1. Weaknesses 313
8.6.5.2. Strengths 314
8.7. Wider Implications 314
8.8. Further research 316
8.9. Reflections on the research methodology 319
8.10 Recommendations for assessment stakeholders 320
8.10.1. Students 320
8.10.2. Teachers and Assessors 322
8.10.3. Schools 322
8.10.4. Universities 323
8.10.5. Government 323
9
Glossary of terms 325
Bibliography 334
Appendices (Volume 2)
I: Portuguese education system 1

II: English education system 10
III: Scheme of Assessment England 19
IV: GSE AS and A level art and design areas of study in England 22
V: MTEP test 24
VI: Report 060 27
VII: Survey questionnaires 29
VIII: SPSS output survey Portugal 47
IX: Interviews in England 76
X: Observation of English art and design standardisation practices at 116
Edexcel awarding body
XI: Trial 1 – Schedule group interviews 132
XII: Trial 1 – Sessions 133
XIII: Trial 1 – External observer report 167
XIV: Booklet (Instructions for an experimental examination in art and 169
design)
XV: Trial 2 – Questionnaire students 193
XVI: Trial 2 – Questionnaire teachers 201
XVII: Trial 2 – SPSS questionnaires outputs 211
XVIII: Trial 2 – School P 240
XIX: Trial 2 –School B 250
XX: Trial 2 – School K 262
XXI: Trial 2 – School V 274
XXII: Trial 2 – School A 288
XXIII: Trial 2 – External observer report 292
XXIV: Facets output (current examination; trial 1 and trial 2) 297
XXV: Research journal, 4th May 2002 313
XXVI: On-line meeting (25-6-2003) 318
XXVII: Consent form 323
XXVIII: Diagram two countries 324
10
List of Figures
Figures Page
Figure 1: Conceptual framework for evaluation of art and design 76

external assessment
Figure 2: Stages in research 79
Figure 3: Sample of art and design examination stakeholders for 87
interviews in stage 1
Figure 4: Sample of students for interviews in stage 2 89
Figure 5: Response rates to the survey in stage 1 92
Figure 6: Diagram analysis of interviews in England stage 1 96
Figure 7: Diagram analysis of interviews in stage 2 97
Figure 8: MTEP question paper card board model (question paper 125
September 2002)
Figure 9: Responses to question 5.7 139
Figure 10: Structure of awarding bodies 165
Figure 11: Controlled tests 177
Figure 11: Blueprint for the design of a new art examination 214
Figure 12: Types of evidence for portfolio 216
Figure 13: Assessment criteria 217
Figure 14: Differences between marks in Joana’s and Luis’ portfolios 259
11
List of Tables
Tables Page
Table 1: Plan of action 82

Table 2: Current examination differences of marks (MTEP) 128
Table 3: Questionnaire survey sample of students 131
Table 4: Questionnaire survey sample of teachers 131
Table 5: Responses to question 2.3 133
Table 6 : Responses to question 4.4 135
Table 23: Interviews with English art and design examination stakeholders 156
Table 24: Trial 1 sample teachers 220
Table 25: Trial 1 sample students 220
Table 26: Trial 1 schedule 221
Table 27: Responses to questions about format 224
Table 28: Responses to questions about preparation 227
Table 29: Responses to questions about bias 229
Table 30: Responses to questions about tasks 230
Table 31: Responses to questions about time and resources 232
Table 32: Responses to questions about criteria 235
Table 33: Responses to questions about impact 235
Table 34: Responses to questions about assessment procedures 236
Table 35: Trial 2 sample 241
Table 36: Trial 2 schedule 243
Table 37: Trial 2: standardisation blind marking results 245
Table 38: Responses to questions about preparation 260
Table 39: Responses to questions about format 261
Table 40: Responses to questions about content 262
Table 41: Responses about instructions 262
Table 42: Responses to questions about tasks 266
Table 43: Questionnaire trial 2: one-sample T Test: sex/tasks 267
Table 44: Responses to questions about criteria. 269
12
Table 45: Responses to questions about weightings 270
Table 46: Responses to questions about bias 271
Table 47: Responses about assessment procedures 273
Table 48: Responses about consistency of marking 274
Table 49: Responses about impact 277
Table 50: Comparing inter and intra raters reliability between the current 280
examination and the new assessment instrument and procedures
Table 51: Measurable data summary: current examination 283
Table 52: Measurable data summary: trial 1 283
Table 53: Measurable data summary: trial 2 283
Table 54: Current examination raters measurement report 286
Table 55: Trial 1 raters measurement report 286
Table 56: Trial 2 raters measurement report 287
Table 57: Current examination probability curves 289
Table 58: Trial 1 probability curves 289
Table 59: Trial 2 probability curves 290
Table 60: Current examination expected score ogive 291
Table 61: Trial 1 expected score ogive 291
Table 62: Trial 2 expected score ogive 292
13
Acknowledgements
I wish to extend my grateful thanks to:
Dr John Steers, Director of Studies, for his knowledgeable advice, encouragement,

support, and for having introduced me to the English system of examinations;
Professor Cyril Weir, first co-supervisor and Professor Rachel Mason, second co-
supervisor for their expert advice, guidance and support; Dr Barry O’Sullivan for his
help with minifac files; the Portuguese government for their financial support,
especially the Ministry of Education and Fundação para a Ciência e a Tecnologia; the
English awarding body, Edexcel Foundation for letting me participate in the
GCE/AS/A2 standardisation meetings in 2001 and 2002; the English teachers and
students who generously gave me their hospitality and time; and the Centro de
Formação Penalva e Azurara and the Associação de Professores de Expressão e
Comunicação Visual for giving me the physical resources for the pilot and trial.
Particular thanks are due to the Portuguese art teachers and students who took part in
the research and to my fellow students in the Centre for Art Education and
International Research and the Centre for Research in Testing, Evaluation and
Curriculum at the University of Surrey Roehampton for their lively discussions and
companionship.
And finally my thanks are due to my family and friends for their support and
encouragement.
14
Introduction
This chapter establishes the research problem by defining issues and current
trends in art and design curriculum and assessment.
0.1. The problem area
The field of 'art education' can be elusive to define with precision because a variety
of terms are used in the literature, often interchangeably. For example, in the English
National Curriculum the term 'Art' was used until 2000 when after much lobbying
the subject title was changed to 'Art and Design'. Despite this change the National
Curriculum sub-text always stated that the subject title included 'Art. Craft and
Design'. The English awarding bodies have also used 'Art' and 'Art and Design'
inconsistently. North American literature on the subject usually refers to 'Art
Education' although the term 'Visual Culture Education' is becoming more common:
it is debatable whether these terms are synonymous. In Portugal the terms: ‘arte’,
‘artes’, ‘artes visuais’, ‘arte e design’, ‘educação artística’ are used and these have
been translated in this thesis as ‘art’. When the term 'Arts Education' is used it covers
a range of visual and performing arts including for example, dance, drama and music
education. Where any of these terms are used within quotations in this thesis they
are the choice of the original author. To avoid confusion discussion of these quotes
uses the same terms as the original unless there is an obvious reason not to do so.
Assessment is a broad concept with multiple potential purposes. According to Brown
(1994, p. 271), assessment should be closely integrated with the curriculum. Its
principal functions include diagnosis of the causes of the learner’s success or failure,
to provide motivation for the learner, to provide valid and meaningful accounts of
15
what has been achieved, and a means of evaluating teaching programmes.
Assessment can be used to provide feedback and motivation for teachers, parents and
students; as an instrument of surveillance, control and accountability and as an
instrument for ranking and selecting students (Eisner, 1998; Robinson, 1982; Gipps
& Stobart, 1997; Wiggens, 1993).
In the case of examinations at the end of secondary education, assessment (and
especially external assessment in the form of formal examinations) has serious
consequences for students because it is one of the indicators used to select them for
further study or employment. The various procedures used in external examinations
are frequently determined by the rationales that underpin different educational
systems. Different procedures and instruments are used in different countries, often
related to specific emphases in art education policy and practice.
Assessment in the context of this research refers to the process of arriving at a view
or judgement about individual student achievement. It should be distinguished from
reports or statements which result from that process, and from broader evaluation,
which is taken to be the process of making judgemental statements about educational
activities such as courses, programmes or some other larger aspect of the educational
process (Eisner, 1985). Examinations and tests are techniques used to assess students'
competencies through various forms of external assessment. External assessment
refers to procedures developed and implemented by assessment experts from
national, regional or other agencies outside the candidates’ own schools.
0.1.1. Multiple viewpoints about art education
16
In art education it is usual to find that multiple viewpoints and value positions are
celebrated (Boughton, 1996, p.77). Boughton (1996) raised several problems for
educators who have to judge student art works, particularly at the senior school level.
For example:
How do we accommodate emerging postmodern forms within traditional

criteria derived from modernism?
...Should the balance in emphasis of socially, critically theoretical

analysis and studio practice be reconsidered?
...To what extent should formalist and modernist concepts be

valued as artistic learning? (pp. 77-78).
Modernist and postmodernist views are two different underlying paradigms, which
may problematise the achievement of consensus about the quality of art works. In a
modernist discourse art is understood as a highly personal, individual expression and
often tends to apply formal criteria to student outcomes as measures of quality. In
postmodern discourse art works are viewed as social constructions, interdisciplinary
and instrumentalist justifications are proposed for art, using criteria emphasising
critical analysis and questioning, and accepts pluralist narratives of art (Efland,
Freedman, & Stuhr, 1996).
Different viewpoints about art education present problems for assessment. For
example the criterion of ‘originality’, is not so important in a postmodernist as in a
modernist perspective; and copies and reinterpretations of art and design exemplars
by students are valued as much as ‘original creations’. Finally the well-established
criteria for judging drawing performance such as faithful coping of reality may not
be valued so highly in postmodern conceptions of art and design assessment criteria.
17
In a modernist paradigm, events of judging artworks were often limited to formalist
criteria. In the postmodern paradigm judgements are more complex, they always
need moderation and a certain degree of consensus between teachers (Boughton,
1999; Freedman, 2003). The tension between modernist and postmodernist
paradigms in art education is not yet resolved at the time of this research; however, it
is important to consider their effects in assessment. Assessing the arts raises
particular problems.
On the one hand, dynamic, socio-cultural changes affect artistic expression of all
kinds; debates about, for example, reconstructing or maintaining the cultural identity
of minority groups, or about issues of national identity, rapid changes in technology
and the advent of postmodern philosophy are reshaping fundamental assumptions
about art and education. On the other hand, the curriculum reforms currently being
undertaken in countries that are members of the Organisation for Economic
Cooperation and Development (OECD) seem to be demanding more prescriptive
outcomes and clearer specifications of achievement standards for the arts. At a time
when long standing beliefs about the nature of art are under challenge, teachers are
being asked to account for their students' learning in clearer and more precise ways
(Boughton, 1999).
In Europe the subject of art education is not well-defined (Schönau, 1999). The
impact of postmodern thinking has created tensions between the expectations of
centralised assessment, and some of the traditional ways of defining art, raising the
need to debate the impact of distinctions between modern and postmodern ways of
defining art and art learning in assessment. Selecting the type of evidence of
students’ knowledge, understanding and skills in art assessment is a problematic task
18
in the sense that political issues, the ideologies of curriculum planners and practical
constraints of art examinations always influence the design and implementation of
assessment instruments and procedures. What is actually assessed in art and design
examinations may be not be congruent with what is actually learned in schools. It
may be limited to a narrow domain of knowledge, raising questions about the
validity of the system.
0.1.2. Assessment considerations
According to Broadfoot (1996, p.175) different forms of assessment influence what
is taught and how it is taught, and what and how students learn. Examinations
organise and legitimate knowledge in ‘testable form’, always on the assumption that
appropriate knowledge can be organised and tested. The choice of what is tested or
examined, and the criteria used, influences curriculum practice. Large scale external
assessment often constrains the range of knowledge or competencies to be examined;
in order to try to ensure reliability of examination results, certain competencies may
be omitted from examination papers because they are difficult to assess or because
they might require prohibitively expensive assessment procedures. Broadfoot (1996)
also points out that '…systems of assessment have been criticised for putting a
premium on the reproduction of knowledge and passivity of mind at the expense of
critical judgement and creative thinking'. External assessment procedures such as
examinations should have validity. In other words, assessment should serve its
prescribed purposes, reflecting the aims and objectives of the courses and units
taught (Eisner, 1998). Examination results should be reliable minimizing the effects
of measurement error (Bachman, 1990, p.161). Reliable assessment concerns the
consistency of scores obtained by the same persons when re-examined with the same
19
test on different occasions, or with different sets of equivalent items, or under
variable examining conditions (Anastasi, 1988, p.188). For example, two or three
qualified judges (assessors or examiners) rating the same works and using the same
criteria should similarly rate ‘answers’, including the products or outcomes of
practical work. The main assessment characteristics of reliability, impact, validity
and utility or practicality raise difficult problems for external assessment, since these
qualities may be in conflict or tension. When constructing examination papers,
examiners and test developers must make a choice of content, competencies and the
specific knowledge to be assessed. These choices will be determined by their concept
of what is most relevant and important in art education. They must define the criteria
expressing the qualities to be valued in student artwork and provide guidelines for
making judgements.
0.1.3. Key difficulties in art assessment
Assessing the arts has been considered an especially problematic field and doubts
have often been expressed about whether such assessment can be properly objective
(Boughton, 1996). Key difficulties include the extraordinarily varied nature of art
works themselves and the consequent limitations of expressing common assessment
criteria in words (Boughton, 1996; Eisner, 1985). Criteria express the qualities
thought to be desirable in students’ works and are considered essential to reliable
assessment (Boughton, 1996; Eisner, 1985; Blaikie, 1996; Schönau, 1996). The
concepts that underpin the criteria need to be shared by the whole art education
community that is subject to a particular model of assessment. However, the
expression of a criterion in words can be reductionist. Often the most important
qualities in art students' achievements are tacit in nature, and are difficult to express
20
adequately in words. According to Boughton (1996, p.72), the problem with the use
of criteria is that they express inexact human values and are subject to various
interpretations by judges who may hold different views about the significance, or
precise meaning of any given criterion. He had raised important questions for
assessment in art education, such as:
1. How specifically should criteria be written? Should sub criteria (rubrics) be

used to illustrate expected standards? Should visual exemplars be used?
2. How many criteria are required to describe the qualities expected in student
work?
3. Should criteria be weighted, i.e. are some more significant than others?
Even if examination papers are well constructed in terms of validity, and the criteria
are well defined, assessment of students' examination work may still lack reliability.
Agreement between judges, assessors or examiners involving some form of
moderation is a necessary condition of obtaining reliability of art examination results
(Beattie, 1994; Blaikie, 1996; Boughton, 1996; Steers, 1996). An important aspect of
examinations is the method or procedures used to judge students' artworks, and these
procedures vary according to different systems of external assessment. In Portugal,
art and design external assessment occurs at the end of secondary education in the
12th year of school (age 17-18). A single judge assesses examination works. In
England students of the same age usually are assessed by at least two judges and
moderated by an external examiner or moderator.
A system of moderation is essential to the reliability of an art and design examination
(Schönau, 1996; Steers, 1994). Moderation is the means by which the marks of
different teachers in different centres/schools are equated with one another and
through which the validity and reliability of assessment is confirmed. According to
21
Blaikie (1996) moderation is a fundamental aspect of reliability in qualitative
assessment in that it ensures the fairness of judgements through combining inter-
subjective and informed opinions. Boughton (1997) refers to the importance of group
discussion in assessing tasks. Beattie (1994) notes the necessity of a second judging
for evaluation of the product, and she argues: 'This procedure is extremely important
when reliability of scores is critical, as in the case where a norm-referenced
interpretation is mandated (i.e., comparisons of pupils and schools)'. The common
assumption is that if judges can agree about the merit of student work then the
judgement process has demonstrated objectivity. If different judges, using the same
criteria can agree about the standard of work over a variety of cases, then a high
measure of reliability has been achieved, and confidence can be placed in the
judgements. This assumption is derived from the natural sciences and is based on the
belief that a ' true' score for student work exists, and may be approximated through
the agreement of experts. Most effort in centralised systems is invested in developing
efficient strategies to obtain consensus on judgements. In that sense, the practice of
training of moderators and examiners by English examining boards aims to create
agreement between judges and, if they share the same assumptions about the value of
the students' art works, high reliability is obtained. Moderation meetings and the
exhibition and publication of exemplars for each level of achievement may serve the
purpose of calibrating assessors.
The English system of moderation appeared to have advantages over the Portuguese
system because classroom teachers are the principal assessors of student’s work.
Teachers mark students’ work and those marks are ‘moderated’ by external assessors
who verify the marking of a sample of candidates’ work at each school. Moderation
22
may take the form of a sample of all the candidates’ work being sent to a moderator
at an awarding body, or moderators may visit schools when, for example, the product
or the process is bulky or ephemeral. Moderators are trained to assess to agreed
common standards and their judgements are usually subject to various checks and
balances.
According to Wiggens (1993, p. 344) '…if assessment occurs without debate,
values cannot be examined, change cannot be institutionalised, and awareness
of alternatives is generally not discussed. It is all too easy for teachers whose
judgements are not challenged to resist change and judge their students' work
inappropriately'. In examinations it is assumed that assessors or judges have the
necessary knowledge, or 'connoisseurship' in Eisner's (1985) terms, to make
judgements about the quality of students' work. However, questions need to be
asked about the role and competence of judges. Boughton (1996, p. 300) suggests:
1. What experience and philosophy should the judges' have?

2. How should they be selected?
3. How should they be initiated?
These questions are particularly important in Portugal, where assessors of national
examinations at the end of secondary school (age 18) are teachers without any kind
of assessment training (Martins, 1999, p.152). By comparison, in England assessors
are supposed to receive some measure of specific training for their work from
examination awarding bodies. The adequacy of this training will be investigated in
this study (see Chapter 5). The literature suggests that a major source of unreliability
in art examinations concerns the interpretation of assessment criteria and assessment
23
objectives. In theory, if assessors share the same views and values they can act as one
single judge thus improving the reliability of the examination results.
It is usual for the procedures used in external assessment to be related to the
prevalent local, or national ideology of art education, and especially to concepts
associated with art students' achievement measurement. In countries such as
Portugal, art external assessment in the end of secondary school (age 18) is limited to
tests administered during a short period of controlled time. The question papers are
not available to the students before the date of the examination; students are not
allowed to use preliminary sketches or preparatory research and there are limited
guidelines available for assessors and teachers. In England, the assessment
objectives, criteria and question papers are made available to students, teachers and
assessors before a long, controlled period of external assessment. Students are
expected to use preparatory research during an examination. These external
examinations do not take the form of pencil and paper tests as in Portugal, but of
rather practical art and design course tasks with, in some cases, a written component.
The assessment procedures used in art examinations raise specific problems. Ideally
the techniques used to assess students' achievement should have the necessary
characteristics of reliability, validity and practicality or utility and not have a
negative impact on curriculum practice. But this is rarely the case (Shoamy, 2001).
In some systems reliability might be the main requirement, in other systems it might
be validity, in others practicality (Wiggens, 1993). It is necessary to analyse the
impact or consequences of such procedures to facilitate further review and to bring
about improvement of the system. In Portugal there are a significant number of
24
publications on assessment in general (Boavida, & Barreira, 1993; Estrela, 1994;
Fernandes, 1992; Lemos, 1993; Pacheco, 1995; Ribeiro, 1997; Valadares & Graça,
1998), but very little research has been completed on assessment in art. In England
and the United States, some research in art education assessment has been carried
out, however the American art educator Zimmerman (1992) claims that there is a
particular need for research and development on the interface between teachers and
assessment procedures. The nature of the concepts involved in studio art makes the
assessment task especially problematic. According to Boughton (1997): 'An
enormous amount of research remains to be done with these forms of assessment.
Although some research has already been undertaken, perhaps most notably by the
Dutch Institute for Educational Measurement (Schönau, 1996), little in the way of
systematic investigation underpins widely established practices of assessment of
studio art learning' (p.211).
0.2. Contrasting systems of art examinations: Portugal and England
During the year 2001, approximately 4000 students in Portugal took national art
examinations in general art courses and approximately 1500 students took national
art examinations in technological art courses at the end of secondary school (Source:
Departamento do Ensino Secundário, 2001). The most important Portuguese art
examinations are Drawing and Descriptive Geometry; Materials and Techniques of
Plastic Expression; Theory of Design and History of Art. The format is pencil and
paper tests requiring short essays and drawing exercises. Despite the considerable
number of hours of study allocated to them, Studio Art and Technologies are the only
specialisms in the Portuguese art curriculum which are not assessed externally. This
fact leads students to consider Studio Art and Technologies as less important in the
25
curriculum (Eça, 1999). In the Portuguese system, therefore, students are not given
an opportunity to reveal their true knowledge and ability, and this reduces the
validity of the system. Important skills and capacities in students' learning and
achievement, which are specific to art production, are left out of external assessment
thus raising doubts about the content validity of the assessment system. In the
assessment guidelines for Portuguese Studio Art (Departamento do Ensino
Secundário, 1992) assessment objectives and criteria are not clearly defined.
Assessment by jury or moderation is seldom used in Portuguese art courses; teachers
are left free to judge students' coursework and examination work according to their
own, often very different, interpretations. In previous research (Eça, 1999), the
techniques used in Portuguese art and design external assessment were judged to be
inappropriate, and serious doubts about the validity and reliability of the system were
raised . Through empirical evidence it was stated that a significant number of
Portuguese art educators had a clear and evident dissatisfaction with the national
system of external assessment for art disciplines. It was concluded that there was an
urgent need to reform and to improve external assessment in art and design courses
at pre-university level in Portugal.
In England, 37, 380 students took Advanced Subsidiary General Certificate of
Education (GCE) Art and Design examinations in 2001 (Source: Joint Council For
General Qualifications). Art and design examinations, in both studio art and
associated critical and contextual studies, are examined through a combination of
submission of coursework and work completed during controlled time, known as
external set assignments at the end of the course. The components of examinations
allow for the inclusion of process and product in coursework and external set
26
assignments. Students can use preparatory research during the examination. The
external examinations are not pencil and paper tests. Written essays, work journals,
coursework/portfolio, a finished piece of work as part of the externally set
assignment, tape/slides, video, web pages can be accepted as evidence.
0.3. Summary problem statement
It is widely recognised that there are fundamental difficulties in most forms of
external assessment related to the validity, reliability, impact and practicality of
examinations (Valadares e Graça, 1998; Anastasi, 1988; Broadfoot, 1996; Cardinet,
1986). These are particularly problematic in art and design where a consensus about
the interpretation and use of criteria is difficult to obtain because of the wide-ranging
nature of the works being judged because differentiation by outcome is a method
frequently employed and in art and design there are no right or wrong answers, and
examiners typically seek divergent and ‘original’ responses.
There is a need to reform Portuguese art and design examinations at age 18 since
evidence exists that the current system is deficient in most aspects of validity
(Afonso, 1998). At the beginning of the research it was hypothesised that a critical
evaluation of art and design examination systems in Portugal and England should
provide valuable insights into current problems in Portuguese art examinations. An
analysis of weaknesses and strengths of both systems might facilitate analysis of key
issues and suggest possible alternative assessment instruments.
0.4. Research aims
27
Therefore, the aims of the research reported in this thesis were to understand the
theory, policy and practice of art and design examinations in England and in Portugal
by comparing, contrasting, and evaluating the principles, processes and relative
reliability, validity, impact and practicality or utility of the particular national art and
design examinations at age 17 /18.The principal audience for this study included key
stakeholders in the national examinations in Portugal - government, government
agencies, university admissions systems, and art teachers. It may also be relevant to
the international community of researchers in art and arts education and related
subjects and specialists in arts assessment in general.
The intention was to carry out a number of small empirical studies of assessment
practices in Portugal and England, in order to establish a conceptual framework for
art external assessment. It was intended that the final conclusions and
recommendations would; (1) establish a firm basis for making proposals to the
Portuguese Government about the development of new forms of art and design
external assessment and, (2) contribute to the international debate in art education
external assessment. The specific aims of the research were:
1. To examine theory of assessment in the international literature about

assessment in art education and evaluate its significance for art and design
examinations in England and Portugal.
2. To analyse documents concerned with art and design examinations in
Portugal and England and identify the critical issues for assessment practice.
3. To investigate Portuguese students’, teachers’ and assessors’ views about the
nature and problems of art and design examinations in Portugal and their
views on possible means of improvement.
4. To compare policy and practice of art and design examinations in England
and Portugal and determine their relative strengths and weaknesses.
5. To develop a new conceptual framework for external art assessment at pre-
university level
6. To develop and trial an experimental assessment instrument and assessment
procedures.
0.5. Research questions
28
Taking into account the problem statement and research aims, the following
broad research questions were formulated:
1. To what extent do English and Portuguese art and design external assessment
systems at age 17+ have the necessary validity, reliability and practicability to
assess art students’ achievement fairly and accurately?
2. What is the impact of the English and Portuguese art and design examinations at
age 17+, on teaching and learning?
3. How can the validity and reliability of Portuguese art and design external
assessments at pre-university level be improved?
4. What kind of conceptual framework would attain more valid and reliable art and
design external assessment instruments and procedures at pre-university level in
Portugal?
5. How might the conceptual framework for such assessment be applicable in more
general international contexts of art external assessment at pre-university level?
In order to compare and contrast the two systems of external assessment the
following more specific questions were identified:
1. What types and models of art assessment instruments are currently in use
in England and Portugal at age 17+?
2. What are the similarities and differences between them?
3. What concepts and theories underpin art and design external assessment
in Portugal and England?
4. How are assessors and examiners trained in Portugal and England?
5. How is students' examination work judged in Portugal and England?
What criteria are applied? How is agreement reached about defining
criteria?
0.6. Outline of chapters
The structure of the thesis is as follows:
Chapter 1: constitutes on a literature review of assessment issues and broad
problems related to art examinations at pre-university level. Key terms are
defined and key problems associated with assessment and assessment within
art are identified and discussed.
29
Chapter 2: consists largely of detailed considerations of particular qualities of
examinations including reliability, validity, impact and practicality. The
concept of reliability and different types of validation are described, and
specific sources of invalidity are identified. The chapter ends with a proposal
for an evaluative framework addressing a list of questions concerning
potential sources of invalidity, unreliability and negative impact in relation to
art and design examinations.
Chapter 3: describes the design of the research. The first part concerns the
choice of a combined qualitative-quantitative method and the outcomes of the
two research stages. The second part describes considerations of the data
collection instruments, data sources and methods of data analysis.
Chapter 4: the first part is an overview of the history of art and design
examinations in Portugal and considers the current examinations in terms of
their validity, reliability, impact and practicality. The second part of the
chapter reports the findings of the analysis of the current assessment model
and of users’ perceptions of what might be an ‘ideal’ art and design
examination in order to present a list of key features that might improve the
current model.
Chapter 5: this chapter reports on the findings of analysis of strengths and
weaknesses of the English art and design examinations at 17 + level through
analysis of documents relating to their evolution; underlying concepts;
30
instruments and procedures of assessment; analysis of the researcher’s
observations and interview data with key stakeholders.
Chapter 6: (1) proposes a possible new framework for art and design
examinations together with a rationale. (2) the content framework for the
experimental art and design examinations is explained. (3) reports on and
evaluates a pilot experiment (first trial) at School Alves Martins.
Chapter 7: This chapter reports on and evaluate the larger-scale trial of the
new instrument and external assessment procedures in five schools in
Portugal.
Chapter 8: In the final chapter, findings from all the strands of the research
are synthesised to answer the research questions and conclusions are drawn
about possible ways to improve the Portuguese art assessment system.
Recommendations are made to Portuguese art assessment stakeholders and
international implications for further researches are identified.
31
Chapter 1
Assessment in Art Education
This chapter reports on a literature review of assessment issues and broad
problems related to art examinations at pre-university level. Assessment in art
and design is often seen as especially problematic because of the nature of the
work and knowledge to be judged and also because of the difficulty of
obtaining a consensual view of criteria and how to interpret them. Key terms
are defined and significant issues related to assessment and assessment within
art education are identified and discussed.
1.1. Assessment
Assessment in education is the process of gathering, interpreting,

recording and using information about pupils' responses to an educational
task. At one end of a dimension of formality, the task may be normal
classroom work and the process of gathering information would be the
teacher reading a pupil's work or listening to what he or she has to say.
At the other end of the dimension of formality, the task may be a written,
timed examination, which is read and marked according to certain rules
and regulations. Thus assessment encompasses responses to regular work
as well to specially devised tasks (Harlen et al., 1994, p. 273).
Assessment might be considered as an integral aspect of the teaching and learning
cycle and can include a range of methods for monitoring and evaluating student
performance and attainment. These methods range from formal testing and
examinations, performance assessments including practical and oral presentations,
teacher or classroom-based assessment and portfolios (Klenowski, 2002, p.42).
32
The word 'assess' is derived from the Latin ad+ sedere, meaning 'to sit down
together'. Wiggins (1993) points out that the etymology of the word alerts us to the
clinical aspect as when applied to psychological assessment, that it is a ‘client-
centred’ act:
…in an assessment, one 'sits with' the learner. It is something we do with

and for the student, not something we do to the student, this person who
'sits beside' is one who 'shares another's rank or dignity' and who is
'skilled to advise’ on technical points (p. 14).
This implies that assessment is an interactive relationship between at least two
persons: the student (or his/her work) and the assessor. MacGregor (1996, p. 29)
points out: 'The teacher who undertakes assessment is inevitably committed to
making decisions about values'. The assessor is viewed as someone who has
expertise and competence and can be trusted as an authority to guide students'
learning and establish the value of students' performance, through comments or by
grading the performance. This implies a power relationship between the assessor and
the person being assessed. Assessment can be constructive when it provides guidance
for the student and self-reflection for the teachers and schools. However, it can also
be reduced merely to an act of quality measurement and classification of students'
performance.
Wiggins’ (1993) ideas about assessment gives an important overview of the
phenomenon because he views it as a contractual relationship with consequences for
students and teachers. From this point ethical and social dimensions have to be taken
into account. Assessment and especially assessment in art and design cannot be
reduced to a simple act of measurement: it is judgement based.
33
1.1.1. Roles of assessment
Assessment is a broad concept with the capacity to fulfil multiple purposes. It should
be closely integrated with the curriculum (Broadfoot, 2000). Its principal functions
include diagnosis of the causes of young people's success or failure, motivation for
the learner and teacher, the provision of valid and meaningful accounts of what has
been achieved and it may form a part of the evaluation of teaching programmes
(Brown, S., 1994, p. 271).
Sutherland (1996) argues that three uses of assessment have dominated its history in
Britain so far: as a device for raising standards, as a device for measuring deviation
or abnormality, and as a device for securing equitable treatment. Assessment used as
a form of quality control is commonly seen as a way to improve educational
standards. According to Gipps and Stobart (1997 p. vi), professional assessment and
managerial assessment are two different functions that need to be taken into
consideration. These are defined as: (1) Professional assessment , where assessment
primarily helps the teacher in the process of educating the pupil, and (2) managerial
assessment, which uses test and assessment results to help manage the education
system. Assessment for learning can be equated with professional purpose: it takes
place at any time but is more useful during a period of teaching when the information
produced is being used by the classroom teacher, and perhaps by other teachers in a
school, to improve teaching-learning classroom strategies. Assessment for learning
may also mean assessment of pupils' understanding and mental processes, rather than
assessment of their performance. This is often referred to as ‘formative assessment’
(Ash et al, 2000).
34
According to Robinson (1982, p.82) 'The principal function of assessment in schools
is to provide information about pupils' abilities and levels of attainment'.
However, assessment has other functions and needs to be viewed in light of the
specific educational purposes it is intended to perform. Eisner (1998) has identified
and described five important functions of assessment as follows:
One function of assessment is a kind of educational temperature taking...

A second function of assessment is a gate keeping function...
A third function of assessment is to determine whether course objectives
have been attained…
A fourth function of assessment is to provide feedback to teachers on the
quality of their professional work. ...
A fifth function of assessment focuses on the quality of the programme
that is being provided (p. 139).
Educational theorists recommend that the above functions of assessment should be
treated separately (Cardinet, 1993; Rosales, 1992; Gipps and Stobart, 1993). The
functions of assessment as providing feedback for the student or teacher, providing
motivation for the student or teacher, providing a basis for selection to monitor
national or local standards, and control of the curriculum may conflict (Lambert &
Lines, 2000, Broadfoot, 2000). Although the same assessment instrument can
provide data that is useful for different situations, there is a need to view assessment
in the context of its utility. Gipps and Stobart (1993) argue:
...different purposes require different models of assessment, and different

relationships between pupils and teacher. It may be possible to design
one assessment system which measures performance for accountability
and selection purposes, whilst at the same time supporting the teaching
and learning process, but no one has yet done so (p.3).
For the purposes of the research reported in this thesis the main function of
assessment is summative and concerns the judgement of students' performance
or achievement for selection and certification purposes at age 17 or 18.
35
1.1.1.1. Feedback, motivation
Robinson (1982) argues that assessment and evaluation have important roles in
education because they can provide feedback for teachers and schools. He states:
Assessments of pupils may be used to decide on the course their work is

to take, or to summarise their achievements. Feedback and
encouragement are key elements in education. Providing these should be
a function of assessment. This should pervade the process of their
education and be as familiar to pupils as their lessons. This implies
something much broader and more integral to the work than periodic
testing and grading (p.84).
Feedback is information that provides performers with direct, usable insights, based
on tangible differences between current and hoped-for performance. Few teachers,
and even fewer tests, provide what students most need namely information designed
to enable them to accurately self-assess and self-correct, so that assessment in Wolf
‘s terms (Wolf et al, 1991, p.183) becomes 'an episode of learning'. Wiggins (1993)
argues that many classroom teachers believe that a grade and a short series of
comments constitute feedback. To provide effective feedback there is a need for
user-friendly information on how the student is doing and how, specifically, he or
she might improve his or her performance. Grades and scores are symbols, or
encoded information, and not always easy to understand. Wiggins (1993,p.195)
recommends that feedback for student should be qualitative. Assessment is not a
question of assigning symbols. Students must understand and share the meaning of
assessment in order to improve their performances, so assessment is much more a
constructive dialogue rather than a matter of marking works with numbers or letters.
Looking at assessment as a ‘situational interaction’ (Wiggins, 1993, p.195), the
communication between participants is essential, although a system based on codes
externally defined by experts may not be efficient. Feedback for students and
36
teachers is complex and need to be explained through sophisticated forms of
communication (Wiggins, 1993, p.196). Detailed visual, verbal and/or written
information related to the assessment context should be used in order to comment on
students’ achievement. Strategies such as ‘conversation’ may be more appropriate for
assessment procedures than is usually thought:
…we now feel able to suggest ways forward. A pupil’s self-esteem and
confidence need not be at risk in a genuine conversation designed to
support her own reflection and help her make informed and constructive
judgements while at the same time hearing the opinions and advice of her
partner’ (Ross et al., 1993, p. 164).
According to Wiggins (1993, p. 198-199) traditional assessment (such as tests)
provides ‘praise or blame, and non-situation specific advice or exhortation’, uses
‘language only experts can decode’ and ‘measures only easy-to-score variables’. This
is a kind of feedback that is ineffective. To obtain ‘authentic’ feedback teachers
should rely on several assessment strategies to assess students’ accomplishment,
specifying the degree of conformance with an exemplar, goal or standard.
Ross et al argued that grading in the arts, ‘must be criterion related and not based
upon normative, positivistic structures’ (Ross et al., 1993, p.164). As Vygotsky
(1972) pointed out grading should be accompanied by a rich descriptive profile of
each student, based upon what they are able to achieve in a supportive environment.
A system of assessment, which relies only upon quantitative data, may not provide
adequate information. To provide feedback and motivation for students and teachers,
assessment should be based and communicated not only through statistical/numerical
symbols but also through detailed verbal reports about performance. Furthermore
there is a risk of misuse of data and incorrect interpretation of results if the only
37
information available is revealed through grades and examination/tests scores, such
as, for example, in the English school ‘league tables’.
1.1.1.2. Surveillance, control, accountability
Assessment may be used as a mechanism of control and, for governments;
assessment may provide a form of accountability. Boughton (1999) criticises the
pervasive consequences:
The problem is that what both industry and governments want is

predictability, uniformity, and reliability of outcome. A quick fix for the
problems created by cultural change and economic leadership. What has
emerged from the frenzy of national curriculum reforms seen around the
world in the last decade has been uniformity in curriculum 'standards' or
'profiles' for each of the important curriculum subjects. There is nothing
wrong with the idea of standards, indeed it would be foolish to deny
attempts to define high quality benchmarks, but the manner of their
development and expression is critical if they are to be of constructive
use to any field. The problem is that uniformity and predictability are the
key elements of evaluation strategies useful to governments as a
surveillance and control mechanism to ensure schools and teachers
produce the goods (p. 337).
According to Wiggins (1993, p.289) '…accountability exists when the service
provider is obliged to respond to criticism from those the provider serves. The ability
to hold the service provider responsible depends upon a moral-legal-economic
framework in which the client has formal power to demand a response from the
provider, to influence the providing, and to change service providers if the responses
are deemed unacceptable'. In the case of education, the client may be a student,
parents, a community, employers, the government or a nation. The dictionary
definition of accountability, which includes a formal, legal element, makes it clearer.
The Oxford English Dictionary defines accountable as 'Liable to be called to
account, responsible, amenable'. Accountability is a moral (and sometimes, by
extension contractual) obligation to be responsive to those with whom one has a
38
formal relationship. And as the word's roots in European legal history suggest,
accountability presumes moral equality: no one is above the law. Wiggins (1993)
argues that teachers and administrators are obliged to answer questions about how
their students are doing, irrespective of what they choose to report (p. 257).
However, students and teachers are ill served by having complex performances
reduced to an arbitrarily single number (p. 260). Furthermore,
The absence of credible tests makes summary judgements about school

performance almost impossible. And whatever tests we use, if union
contracts and custom prevent teachers from being 'traded' or acting as
'free agents' it is far less likely that schools can be changed. And if they
cannot be changed, they are not accountable (Wiggins 1993, p.261).
Accountability may be also designed as ‘quality control'. Wiggins (1993) states that
‘if the schools set clear, public targets; if the parents have a clear and present voice;
if the students, former students, and institutional customers have a voice then we will
have accountability that will improve schools ' (p.277). Accountability is a political
issue, and Hermans (1996) makes the point that:
As art educators, we should not only question the educational purposes of

assessment in the arts, it is also vital to consider some of the political
issues relating to the position of the arts in education, the quality of art
education and the use of data and results. The results will provide a
partial and distorted view of the educational reality, if test scores are used
to paint a picture of the quality of the curriculum, art teaching or art
education in general. Many large-scale assessments serve only one
purpose: ranking students. Improving education might be an important
effect but it is not the prime goal (p.106).
1.1.2. Forms of assessment: formative/summative
Assessment may take many forms. Informally, teachers are assessing pupils all the
time, as indeed pupils are assessing them. For Ash et al. (2000):
Assessment is a general term embracing all methods customarily used to

appraise the performance of an individual or a group. It may refer to a
39
broad appraisal including many sources of evidence and aspects of
pupils' knowledge, understanding, skills and attitudes, or a particular
occasion or instrument. An assessment instrument may be any method or
procedure, formal or informal, for providing information about pupils'
learning: individual discussion, homework, project outcomes, mock
examination work, group critiques (p.135).
1.1.2.1. Formative Assessment
Assessment is formative ‘when the evidence is used to adapt the teaching work to
meet learning needs’ (Black et al, 2003, p. 2). Formative assessment can occur many
times in every lesson. It can involve several different methods to encourage students
to express what they are thinking and several ways of acting on such evidence (Black
et al, 2003, p. 2). The results may enable a teacher to give remedial help at an early
stage of learning, or change the emphasis of a course if required. They may also help
a student to identify and focus on areas of weakness. Ash et al (2000) point out that
formative assessment is usually continuous throughout the process of a particular
learning activity. ‘It has been recognised by most art and design educators that
formative assessment is particularly constructive’. It is concerned with providing
ongoing feedback during the process of making, rather than assessing and grading an
isolated, finished product. ‘Formative assessment is an integral part in the
development of projects, which support learning; it encourages and guides pupils’
work forward. If pupils do not receive regular feedback on their work, they quickly
lose motivation and become unsure of their own assessment of success or failure’.
Summative assessment
Summative assessment can be a culmination of evaluations made
over a period of time. Usually summative assessment occurs at the
40
end of a project, scheme of work or course of study and therefore
focuses on finished outcomes or more holistically on a body of work.
Ipsative assessment and self-assessment
Another type of assessment is described as ipsative (Ash et al., 2000): ‘Ipsative
assessment gauges the development of an individual from one moment in time to
another (usually the present). It is concerned with the evaluation of personal
achievement rather than an individual's relationship to national or local norms’. It
often takes the form of pupil self-assessment providing an opportunity for pupils to
appraise themselves in a non-competitive climate. Additionally pupils' self-
assessment provides teachers with insights into pupils' understanding of their own
progress. This can be used for diagnostic assessment.
In theory at least, self-assessment is gaining more and more importance in education,
not only as a form of assessment but also as a form of learning. As Ross et al (1993,
p.164) point out: ‘Pupils must first know what they have done before they are in any
position to judge [and understand] how they have done’. Furthermore, the process of
self- assessment can motivate students. The intent of self-assessment is to teach
students to monitor their own learning continuously, to help students reflect on their
own abilities and performance, related to specified content and skills and related to
their effectiveness as learners, using specific performance criteria, assessment
standards, and personal goal setting.
1.1.2.4. Diagnostic assessment
41
Diagnostic assessment approaches pupils' work and behaviour as evidence for the
analysis of their ability in a given field (it is often used to discover learning needs). It
can be used constructively as a vehicle for discussion between teacher and pupil
where both parties consider progress and set targets for future development (Ash et
al., 2000, p. 136). Diagnostic assessment can be a powerful motivating factor for
pupils.
1.1.2.5. Negotiated assessment
It is widely agreed that assessment should be negotiated (Wiggins, 1993; Boughton,
1996; Hannan, 1985). For Ash (2000) negotiated assessment in the form of
constructive criticism promotes learning and a degree of pupil ownership in the
assessment process. Two main aspects of negotiated assessment can be found: (1): in
the relationship between student and teacher/assessor, and (2) in the relationship
between teacher and the system.
Students and teachers can work on formative assessment together by sharing criteria,
discussing levels of achievement and looking for strategies to improve learning
(Shoamy 2000). ‘ Assessment done properly should begin with conversations about
performance’ (Wiggins, 1993, p.13) and this only happens by transparent means of
communication; in other words, it depends on the clarity of instructions, directives,
format, criteria and procedures for marking.
Teacher ownership of assessment is also important (Steers, 1994, p.289). Negotiation
occurs when all the users of the system understand and agree with the applied
procedures. In that sense, teachers should not be marginalized from the development
42
and application of external assessment. Debate and consensus should form part of the
process of standard setting in the assessment of student art products (Boughton,
1997). Assessment procedures in the arts should involve co-operation:
Judgements in art are fundamentally holistic, and it is the total patterning

of a work, which exhibits criteria appropriate for its evaluation ...
Assessment in art, at least in the context of its production, must
encourage co-operation and the active participation between teacher and
taught in negotiations about meaning and quality of artwork in exemplary
ways (Heyfron, 1986, p. 203).
1.1.2.6. External assessment
This research focused on external assessment, which is often associated with external
examinations. Assessment is external when experts working for an external
organisation, e.g. an awarding body or the Ministry of Education, develop the
instrument of assessment and it is applied in a large number of schools in a country
or a region. In external assessments examiners appointed by the external organisation
that designs the instrument mark the work submitted by students externally. External
assessment in the form of national examinations is not always synonymous with
summative forms of assessment. For example, English art and design examinations
such as the GCSE taken at age 16+ can be considered both formative and summative
in nature (Steers, 1996).
1.1.3. Assessment as a social product
From its beginnings in the universities of the eighteenth century and the school
systems of the nineteenth century, educational assessment has developed rapidly to
become the unquestioned arbitrator of value, whether of pupils' achievements,
institutional quality or national educational competitiveness (Broadfoot, 2000, p. ix).
43
Educational assessment can be seen as the representation of the desire to discipline
an irrational social world in order to establish rationality and efficiency. According to
Broadfoot (2000):
…the engine that drove [assessment’s] rapid development was the

aspiration that merit and competence should define access to power and
privilege; that investment in education could be tailored to identified
potential; that value for money could be convincingly demonstrated. All
these are worthy aspirations, and they remain the dominant agenda of
examinations and test agencies around the world who even today
continue what is often a heroic struggle to provide equitable and
defensible accreditation and selection mechanisms, to hold back the tide
of corruption and nepotism that threatens to engulf the whole enterprise
(p. x).
Assessment and examinations are often viewed as an equitable method of selecting
individuals for preferment. However, this is not always the case since education and
assessment systems tend to favour specific social classes, genders and races.
Assessment systems do not necessarily introduce more equality and fairness into the
system  they may offer less, with more streaming and stratification by family
income, wealth and race (Hilliard, 1991; Pullin, 1994; Madaus 1994; Darling-
Hammond, 1994; Miller, 1995).
1.1.4. External assessment and examinations
Examinations are a social construct rather than a neutral technological instrument
and may function to maintain social stability. Foucault (1977) argues that
examinations contribute to the accomplishment of social control, particularly with
respect to understanding and acknowledgement of authority and self-discipline. For
Atkinson (2002) assessment of art practice invokes processes of surveillance,
normalisation, discipline and regulation of students’ artwork. With regard to social
control Torrance (2000) argues:
44
Educational assessment can no longer claim to be naive about such
matters or uninterested in them. Future theory and practice must be more
self-conscious about these implications of the process; in particular,
research and development in assessment must take more seriously the
ethical issues of what counts as acting in the 'best interests' of the
candidate, and take more trouble to ascertain candidates' views of the
process and its outcomes (p. 178).
Foucault, (1977) documents the move from the direct, explicit, feudal exercise of
power through physical coercion, to the modern social practices of self-control and
professional management whereby discourses of care, training, rehabilitation and
treatment construe and construct discipline/individual behaviour in relation to
normative expectations. The examination is the exemplary manifestation of
normative disciplinary power:
Power is not possessed as a thing, or transferred as a property; it

functions like a piece of machinery... it is the apparatus as a whole that
produces 'power' and distributes individuals in this permanent and
continuous field... this network 'holds' the whole together and traverses it
in its entirety with effects of power that derive from one another:
supervisors perpetually supervised (Foucault, 1977,p. 176).
Foucault's argument is that the power of examinations derives from the use of
comparison and differentiation tools:
The art of punishing, in the regime of disciplinary power, is aimed

neither at expiation, nor even precisely at repression... it refers
individuals to a whole that is once a field of comparison, a space of
differentiation... the constraint of a conformity that must be achieved
(pp. 182-183).
For Foucault the school becomes a sort of apparatus of uninterrupted examination:
... [examinations] become increasingly a perpetual comparison of each

and all that made it possible both to measure and to judge... a constantly
repeated ritual of power... it is the examination which, by combining
hierarchical surveillance and normalising judgement, assumes the great
disciplinary functions of distribution and classification ( pp. 186-192).
1.2. Assessment and the arts
45
According to Barrett (1990), assessment refers to the judgement of a process or
product, in terms of a spectrum of explicitly stated criteria. Specifically, the degree
of successful achievement is assessed by these criteria. In other words studio art
assessment is the appreciation of the qualities of students' achievement through the
judgment of the qualities of the work (performance, product or process) submitted to
the judges (teachers, assessors or examiners) by the student in response to an
assessment task (examination question papers for example). The judgement is based
on a list of known and previously stated criteria and expressed by qualitative and/ or
quantitative means. In the case of studio art examinations, the judgement is
dependent on appreciation and interpretation of the qualities that are anticipated
(attainment targets or descriptors of achievement) by the judges. Usually assessors
express their judgements using scores or marks. Using mark schemes they attribute a
grade to a student's examination work. The grade symbolises a level of attainment in
terms of the variable balance of knowledge, skills and understanding about art
revealed in the student's examination work.
1.2.1. Why assess the arts?
Armstrong (1994) states that arguments for assessing art emanate from three arenas:
(a) professional education, (b) the government, and (c) the local community.
Professional education maintains that a sound education links instruction and
feedback about the effect of that instruction. According to Armstrong (1994)
governments maintain that a common, general education is in keeping with the
democratic tradition, and evidence of progress toward that end is expected.
46
Assessment and especially examinations in the arts subjects may also be viewed as a
form of legitimisation of their role in the curriculum. 'More and more art teachers see
examinations as guaranteeing survival in hard times' (Vickerman, 1986). Ross (1978)
argues:
If we do participate in public examinations we run the risk of allowing

our work to be wrested from its legitimate roots; and if we do not, we
seem to push the arts even farther out on a limb. The more the arts
become exceptions to the rules of schooling the less relevant they are
likely to appear (p. 261).
Armstrong (1994) points out:
Art educators cherish their uniqueness. At the same time, that uniqueness
contributes to a precarious position in which art teachers too frequently
find themselves. Few persons understand and value the uniqueness of art
education as an important part of general education for all students. Art
educators need to demonstrate that the learning unique to art contributes
to the education of all students and that students do produce evidence of
learning in art. Without assessing what students learn in art, art education
will remain in a peripheral value position in formal education.
Assessment provides an objective basis for the claims that arise from art
teachers' hunches and chance, casual observations made daily at the
scene of action in the art classroom (p.9).
1.2.2. Assessment instruments and objects of assessment
According to Cronbach, assessment '…involves the use of a variety of techniques,
has a primary reliance on observations [of performance], and it involves an
integration of [diverse] information in a summary judgement'. Wiggins (1993) argues
that assessment in the arts must be flexible and include several outcomes:
No assessment system worthy of the liberal arts assesses mere knowledge

or skill; it assesses intellectual character, good judgement −− the wise use
of know-how in context. We are therefore derelict if we construe
assessment as only the act of finding out whether students learned what
we taught. On the contrary, one could argue that assessment ought to
determine whether students have effectively reconsidered their recently
learned abstract knowledge in light of their experience (p. 38).
47
In the case of arts the objects of assessment often include the products and processes
of art making. Beattie (1996) recommends that assessment components in the arts
should always include products:
Current philosophical theories of postmodernism of art education present

a vision of the art object and its expansive context as inextricably linked.
Art criticism, art history, and aesthetics cannot be understood without the
art object that informs them. Personalised dimensions of these three
disciplines, therefore, require creation of an art object. A holistic
understanding of art cannot be evaluated without the product (p. 48).
But she also states that the process of making is an essential part of learning and
should be considered in assessment: the objects of assessment can be either process
or product. Beattie (1995, p. 54) makes the point that recent research in cognitive
psychology, in educational and teaching theories, and the current educational reform
initiatives provide a strong basis for a paradigm shift from emphasis on assessing
product exclusively to assessing process and product in tandem. Assessing the
process is an important aspect of education, what Resnick and Resnick (1992) call
the ‘thinking curriculum’. The thinking curriculum with emphasis on 'higher-order
abilities' such as problem solving, reasoning, inference, constructing personal
knowledge, and exercising personal judgement.
1.2.2.1. Moving away from tests
Traditional assessment instruments used to judge students' knowledge skills and
understanding about art are often based on tests requiring capacities of recall and
technical ability. Tests might not assess thinking and critical skills validly. In Eisner's
(1979) words, such methods provide a biased, even distorted, picture of the reality
that we are attempting to understand. Performance-based tasks tend to be highly
48
context-specific and do not easily allow direct testing (Beattie, 1996, p.53). Steers
(1994) points out:
A valid assessment instrument will assess, in an appropriate manner, the

specific curriculum/syllabus to which it relates. So for example, a course,
which principally consists of practical studio-based art and design
activity, cannot be adequately or validly assessed by means of a multiple
choice, pencil and paper test (p.290).
Gipps & Stobart (1997) also refer to the need for appropriate use of assessment
instruments according to the purposes for which they are designed. They argue:
Tests are useful instruments of assessment, but, even well-constructed

tests are not valid if the results are used inappropriately: to use a maths
test to select students for Fine Art courses or a music test to select
engineers, even if the tests are technically superb, provide extreme
examples of how it is the use of test results which determines validity
(p.42).
Eisner, 1998 makes the point that:
The multiple-choice test, true and false test, was considered reliable and
objective. Though, single correct answers and single correct solutions are
not the norm of intellectual life, nor are they the norm of daily life. The
problems that people confront in their intellectual endeavours, like the
problems they confront in ordinary living, are replete with reasonable
alternative solutions (p.144).
In the body of research that has been carried out in the assessment field there is a
move away from tests (Amabile, 1983; Gardner and Grunbaum, 1986). Tests have
been criticised for encouraging teachers to set tasks that promote de-contextualised
rote learning and that narrow the curriculum to basic skills with low cognitive
demands (Kellaghan and Madaus, 1991; Darling-Hammond, 1994; Herman et al,
1997). According to Wiggins (1993, pp 54-67) the use of ‘secret tests’ is a vestige of
the medieval past, a legacy of tests used as mere ‘gate keeping’ or as
punishment/reward systems, not as empowering and educational experiences
designed for displaying all that a student knows and understands.
49
Wood (1987) and Gibbs (1994) argue that there has already been a paradigm shift in
assessment from an essentially psychometric model (measuring individual
differences) to a more educational one (involving description and feedback to aid
learning). Assessment of accumulated knowledge has shifted to focus on students'
behaviours or capabilities to meaningful integration of learning of all kinds. The
proliferation of service industries and the changing character of work have created
demands for transferable skills such as those of communication, information
retrieval, problem solving, critical analysis, self-monitoring and self-assessment. As
a result a fast growing interest can be seen in the more formative, holistic,
contextualised forms of assessment, often described as 'authentic' or 'performance'
assessment. In vocational training these are often referred to as 'competence-based’
(Wolf, 1995). It remains the case, however, that traditional forms of assessment are
not easily replaced, embedded as they are in complex histories, cultures and power
relations of societies. Furthermore, as Broadfoot (1998, p. 473) argues: '…these
traditional forms with their emphasis on scientific rationality are so pervasive in
modern societies that they blind us to the potential for alternative forms of
assessment'.
For Torrance (2000) the way forward includes the necessity of assessment practices
that accommodate divergent experiences, pupil intentions, wider dialogue and
curriculum knowledge that can be local. He states that what justifies and legitimates
educational assessment is a meta-narrative of social and economic progress, giving
rise to a method for identifying individuals worthy of accreditation and selection
within a meritocratic system. However, in the context of an increasingly
50
multicultural society with global forms of communication, it is difficult to state that
one kind of selection of knowledge is more relevant than another; and Torrance
raises doubt about the veracity of obtaining a 'truer score', his question is whether:
‘…a true score can ever be produced for and adhere to an individual?’ For Torrance
(2000, p. 179) knowledge selections must be locally contingent, and assessment
results must be a function of the interplay of task, context, individual response and
assessor judgement. ‘Thus provisional descriptions of multi-faced achievements
would appear to be the appropriate goal for a system of educational assessment that
took such a challenge seriously’.
1.2.2.2. New approaches for external assessment
According to Armstrong (1994, p.110) ‘non-traditional methods of assessment
promote the value of art education by revealing a wider spectrum of student
achievements’. Wiggins (1993, pp 54-67) claims that an authentic assessment system
has to be based on known, clear, public, non-arbitrary standards and criteria. He
considers that assessment of thoughtful ‘mastery’ should ask students to justify their
understanding and craft. Furthermore he states that an authentic education makes
self-assessment central.
Instead of asking students to respond to a test in order to determine, for example their
art learning, Gardner (1989) has proposed that learning ought to be organised around
meaningful projects carried out over significant periods of time, and that evidence of
student thinking ought to be collected throughout that period, what he has called, a
‘process-folio’. His process-folios include concept sketches and early drafts of ideas,
along with final products. Reflection sheets, interview forms, and journals are used to
help students, teachers, parents, and administrators trace the evolution of particular
51
ideas as well as general progress over time. Assessment of student's learning is
based more upon the process than the final product. In this way the assessment more
successfully captures the way in which learning occurs within the discipline. The
term, which has been increasingly used in the United States in recent years to
describe this kind of approach, is 'authentic assessment' (Zimmerman, 1992). It is an
expression that has arisen in response to a growing recognition that standardised tests
capture only a small dimension of human intellectual functioning and it reflects the
kinds of changes to assessment practice suggested by Gardner (1996).
The impact of Gardner’s ideas on assessment has been more radical in the American
context, where a strong tradition of standardised tests derived from IQ testing
practice exists. But in American art education, assessment has not been a particular
concern of the recent discipline-based education movement. In that context, Harvard
University’s Project Zero and the Arts Propel (Blaikie, 1994; Gardner, 1996)
encouraged new approaches to performance-based testing where assessment is
central to curriculum planning. However, in Europe and especially in England a
different tradition of formal assessment exists in art education assessment, where
judgment grading is based on the evidence provided by student’s coursework
including, process and products (TGAT, 1988).
1.2.2.3. Records of achievement
Torrance (2000) argues that the use of records of achievement as an assessment
instrument has changed the field of assessment in general. He states:
Perhaps more intriguingly, the Record of Achievement movement, which

has ebbed and flowed over the past twenty years or so, has at times
sought to elevate the voice of the pupil over and above simply providing
52
employers and other third parties with a more comprehensive teacher-
written report on pupils' achievement (see Broadfoot et al, 1988). The
movement has also given significance to achievements outside the
mainstream academic curriculum and indeed, on occasions, to
achievements outside school. In this we can certainly see evidence of
attempts to accentuate the importance of the local and allow the voice of
the pupil to be heard.... (p.185).
Records of achievement have been criticised for potentially exposing pupils to even
more scrutiny than traditional examinations, since every aspect of a pupil's life, in
and out of school, becomes potential material for inclusion (Hargreaves, 1989,
quoted in Torrance, 2000, p.185). Despite such potential disadvantages, Torrance
(2000, p.185) concludes: ' I am minded to suggest that the Record of Achievement
movement is long overdue, as long as due attention is paid to identifying a wide
range of accomplishments, as witnessed by a wide range of adults in and out of
school, and is accompanied by a much more self-conscious commitment to
privileging the pupil'.
1.2.2.4. Portfolio
Closely related to records of achievement are portfolio tasks. According to
Klenowski (2002, p.26) a portfolio is a collection of work that can include a diverse
record of an individual’s achievements, such as results from authentic tasks,
performance assessments, conventional tests or work samples. A portfolio documents
achievements over an extended period. The development of the portfolio involves
documentation of achievements, self-evaluations, process artefacts and analyses of
learning experiences, strategies and dispositions. Generally, the individual identifies
work from an accumulated collection to illustrate achievement and to demonstrate
learning for a particular purpose, such as certification, summative or formative
53
assessment. Careful critical self-evaluation is an integral process and involves
judging the quality of one’s performance and the learning strategies involved. The
individual’s understanding of what constitutes quality in a particular context and the
learning process involved is facilitated by discussion and reflection with peers,
teachers, or others during substantive conversations, exhibition or presentation of
learning. Using portfolio for assessment of students may present several advantages.
As Zimmerman (1992, p.17) points out, through portfolios students can be observed
taking risks, solving problems creatively, and learning to judge their own
performance and that of the others.
According to Eisner, (1998) the tasks used to assess students should reveal how
students go about solving a problem, not only the solutions they formulate. He points
out:
The new assessment practices will need to provide tasks that resemble in
significant ways the challenges that people encounter in the course of
ordinary living. This will require an entirely different frame of reference
for the construction of assessment tasks than the frame we now employ
… the challenge to assessment is to somehow create tasks that give
students opportunities to display their understanding of the vital and
connected features of the ideas, concepts, and images they have explored.
Portfolios and records of achievement seem to present advantages for art assessment
because they can be used as assessment instruments providing different sources of
information through several tasks. Portfolios include process and product; through
evidence of self-evaluation, substantive conversation and reflective thinking and
practice.
The strength of using portfolios derives from students’ ownership of the learning
process. This contrasts with other forms of assessment where the teacher or examiner
54
is full charge of the process. The student in portfolio assessment is actively engaged
in thinking about his or her own work, the learning involved and the progress he or
she is making (Klenowski, 2002, p. 110). Use of portfolios encourages students to be
active creators rather than passive receivers of knowledge, giving them opportunities
to examine and analyse their ideas, beliefs, and values in collaboration with others.
Portfolio use supports this process and helps to develop increased awareness of their
learning (Klenowski, 2002, p111). When students manage their portfolios they are
also developing important organising skills which serve to extend their sense of
responsibility and ownership of their work. Students are challenged to be creative
and inventive as they strive to develop a portfolio of work that captures their unique,
personal achievements.
However some difficulties with implementing portfolio-based assessment have been
raised: extended tasks that encourage thinking and reasoning activities take time and
are open to different types of valid response. Klenowski (2002) points out some
obvious practical problems that need to be taken into account such as the in-service
training of teachers, raters or assessors; and the amount of assistance teachers
provide for students. Problems associated with portfolio implementation include the
significant resources required, ranging from the in-service training of teachers to
mark or grade the portfolios, support for students to understand the portfolio process
and increased workload for teachers and students.
1.2.3. Standards, criteria, attainment targets
Reliable assessment can be improved and facilitated by written criteria and defined
standards or attainment targets (Steers, 1996). Curriculum writers and examiners
55
frequently employ criteria to express the qualities valued in student artwork and to
guide the judgement of arts educators. The simple expression of the criteria used for
assessment in the arts does not necessarily provide an indication of standards.
Criteria express the qualities we might value in an object or performance, and
standards express the degree to which they exist. A criterion for judgement is not
quite the same thing as the specification of an achievement standard or attainment
target. Judgement of a work against pre-specified criteria does not necessarily mean
a good standard has been achieved, even if all the criteria have been evidenced in the
work.
Boughton (1999) argues that different cultures emphasise different criteria. Even
within a single cultural tradition, the criteria used for judgement may demand
different emphases according to the genre of the work especially for perceived
‘originality’. For example, contemporary work using new technologies and recycled
imagery may raise different issues for judgement than work undertaken within
traditional styles and using older media. Art works and art students' performances are
not predictable, comparisons of student outcomes with pre-ordained 'standards' or
'profiles' can be inadequate, not only because words do not do justice to visual
phenomena, but also because it is inappropriate to prescribe the precise qualities of a
potentially uncertain outcome.
Standards imply the existence of uniform exemplar answers. However in art and
design education such ‘answers’ are seldom the aim, rather, diversity of response
and outcome is sought. Boughton (1997) considers that: ' … the judgement task is
far more complex because the products being judged are expected to reflect a degree
56
of original thought, and some sense of the idiosyncratic [personal] characteristic of
their creators' (p.203). According to Boughton (1999) it is not possible, nor would it
be appropriate in a postmodern context, to describe a priori an ideal performance or
icon against which students' work is matched to determine a level of achievement.
He points out: '…what is much more appropriate is the use of criteria which express
conceptions of qualities thought to be desirable in the work' (p.343).
However, assessors cannot exercise judgement or use any assessment tools unless
they are clear about the criteria and standards to be used when making their
judgements. The use of criteria and holistic judgements, rather than standards
frameworks, may provide the degree of flexibility necessary to accommodate
variations of outcome in the students' work. Criteria provide guidelines for making
judgements. Any single criterion is able to indicate desirable qualities which may or
may not be relevant to a given work, or body of works. According to Boughton
(1999) criteria singly, or in combination, are constitutive elements, which serve to
provide an array of windows through which a judge may look. They do not provide a
definitive assessment matrix. In Boughton's ' view:
These windows may sharpen a judge's focus, but it still remains for the
judge to value the qualities of the work, to understand its genre and
context in which it was produced. An important element in the
construction of assessment policy in studio art that is most likely to be
responsive to variation of genre and cultural values is the employment of
criteria rather than standards for the guidance of examiners and a single,
holistic judgement following attention directed by the criteria (p.343).
The critical element in the definition of standards is the establishment of a (socially)
'recognised measure' which comes into being through some kind of imposed act of
authority, by accepted custom, or by simple agreement. Assessment is a crucial part
57
of arts education, particularly if educational managers and the community are to be
convinced that arts educators are able to reliably determine the quality of student
achievements. While national curricula focus on standards and prescriptions of
predictable content outcomes, the real issue is the problem of cultural relevance and
validity in assessment.
Eisner (1985) argues that criteria are essential for assessment:
In visual arts education, and particularly in studio work, the judgement

task is complex since the work being judged is normally expected to
show originality of thought, and convey some sense of the idiosyncratic
nature of the maker. It is not possible, nor would be appropriate to pre-
define exactly the ideal performance, or icon, against which students'
work is matched to determine level of achievement. It is much more
appropriate to use criteria which express conceptions of the qualities
thought to be desirable in the work.
However, it must be borne in mind that the expression of a criterion in words can be
reductionist. Often the most important qualities in art students' achievement are tacit
or elusive in nature, and they are very difficult to express in words. The nature of
'aesthetic response' is one of these. One temptation is to write criteria that can be
easily interpreted, rather than to attempt to write those that are more appropriate, but
are difficult to define, or are controversial.
For Boughton (1996) the problem with the use of criteria is that they express inexact
human values and are subject to various interpretations by judges who may hold
different views about the significance, or precise meaning of any given criterion.
The problem with the notion of art is that it too is a social construction
subject to various interpretations in different social, educational,
historical contexts. It is no wonder that education bureaucrats who value
secure and predictable worlds __ who lust after precision and reliability __
becomes anxious about the validity and reliability of qualitative
judgements. In the age of educational reform, characterised by an
obsession with standards, reporting frameworks, and accountability
measures, the value of qualitative assessment information becomes more
58
and more difficult to justify to those who are preoccupied with statistics
and management strategies. At the same time as the industrialists and
politicians are demanding from educators increased precision in their
judgements, clear standards, and higher levels of accountability, the
impact of postmodern ideas is having the opposite effect in the world of
art (p.72).
According to Walling (2000) : 'To be effective without being restrictive, standards
must be broadly, even loosely, cast to allow for diversity, for plural visions of what
art is and how art may be created' (p. 47). Localising standards is a necessary
component of assessment that is ends driven. Walling points out:
Teachers must have in mind some goals that are measurable and then
design instruction that will move students toward the achievement of
those goals. This is a highly useful way of thinking about instruction, but
this approach also has limitations and, I would argue, should not be used
exclusively. In the visual arts, open-ended exploration and
experimentation – true creativity – are valid ends as well (p. 63).
For Walling a constructivist and postmodern approach to education can be in conflict
with traditional means of assessment, '…educators will need to consider carefully
how a balance can be struck between the need to define clear standards in advance to
instruction [previous established] and the need for standards to emerge as a result of
constructing new knowledge and understanding' (p. 64).
In the case of studio art examinations, where practical skills are assessed, the
question of whether studio work should be compared to standards must be
considered. Schönau (1999, p.32) answers: ' Yes and no. No, it is not, as no two art
works are the same. Yes, in practice art teachers do use standards. They compare
what they see with what they have seen before…. Art teachers set their own criteria
and standards'. Schönau points out that
…the procedures used by the teacher or the standards themselves are

almost never public …studio work is judged on a series of criteria, but
59
how much does each criterion relate to the final score? What is the
maximum level of competency? It is well known practice that teachers
never give a maximum score to their students, because things can always
be more perfect (Schönau, 1999, p.32).
The nature of an artwork creates difficulties for assessment by criteria; an artwork
can be seen as an example of Gestalt – an integrated perceptual structure conceived
as functionally more than the sum of its parts. According to Schönau (1999, p.32) '…
judging studio work using criteria is not fair when it ends with adding up separate
scores. For how do the separate scores relate to the overall score?' However Schönau
argues for the use of national standards or minimum levels of competency when
assessing on a national level. The problem is not if standards, levels of competence,
attainment targets or grade descriptors must be used in art and design examinations
but how they might be expressed in order to be commonly understood and applied. In
other words: how do assessors reach consensus?
1.3. Assessing the arts: current problems
1.3.1. The problem of standards and criteria
The difficulty of defining domains of knowledge in art and design education is not
the only problem in external examinations. Another important question is about the
possibility (or impossibility) of defining standards in art and design examinations and
the difficulty of defining unambiguous criteria in verbal terms. In a summative
assessment context different judges’ views present a problem for system managers.
According to MacGregor (1992) the criteria used by teachers to judge studio work
shows little variation in countries such as United Kingdom, Canada and The
Netherlands. Mostly they employ some variation of three categories of criteria:
60
competencies to develop and interpret a theme, level of technical expertise, and
competencies to achieve sensitive personal expression through the use of a variety of
techniques and processes. However it seems that criteria are sometimes confused
with standards, attainment targets or marking schemes (Schönau ,1999). Furthermore
standards may not be written. The marking schemes used in art and design
examinations are not always defined in explicit terms:
The examination committees, or the Inspectorate in France for that

matter, generate their own standards, based on comparisons through the
years. Each single assignment, criterion or question leaves open [sic]
when a student has failed. In the written exams for most subjects the
summation of scores on separate questions leads to a final score. This
score lays somewhere between nil and the maximum score. The cut-off
score – the score that is decisive for pass or fail – in most cases lays
somewhere halfway. This cut-off score one could call a standard, but it is
a mathematical standard that used in this way lacks a transparent basis on
what is considered essential. So the word ' standard' in this case has a
dubious meaning (Schönau, 1999, p.32).
1.3.2. The case for a consensual view of criteria
It is tempting to suggest that where coherent programmes of study are

supported by an agreed-upon range of appropriate attainment targets, it is
probable that appropriate assessment procedures to meet specific
perceived needs, rather than assessment for assessment's sake, will
emerge without the need for undue anxiety (Steers, 1996, p. 189).
The necessity of a well-designed assessment instrument and scoring process
establishing clear and unambiguous criteria and attainment targets increases in the
field of art and design where teachers use different underlying concepts. Beattie
(1997, p. 229) points out that '…establishing criteria, objectives, and standards can
help a state or school determine what it values and seeks to accomplish, an authentic
reform initiative'. But the real problem in art and design examinations is not the type
of criteria but how they are written and applied in order to facilitate validity and
reliability. A consensual interpretation of criteria is a necessary condition in art
61
examinations and this may be possible through the development of assessment
communities (Boughton, 1997; Freedman, 2003). Boughton (1997, p.207) argues
that there is a need to reach a shared understanding of cultural contexts, reciprocal
understanding of competing ideologies, and agreement about the meaning of criteria
and objectives.
…instead of placing our faith in written standards we should,

instead, employ a process of judgement, based upon the notion of
community [judges/ examiners] as arbiter of quality, to determine
standards in studio art education. Such a procedure is consistent with
arts practice in the broader community and responsive to the
dynamic nature of the discipline (Boughton, 1997, p. 209).
In a study of teachers’ assessments of primary children’s classroom work in the
creative arts, carried out in 13 primary schools in Leicestershire and Leicester,
Hargreaves et al (1996) conclude that teachers may reach agreement about the
quality of students’ artwork if they achieve a common understanding about criteria
and constructs:
This study, although on a relatively small scale, has some important

implications for future work on arts assessment. First, it has
demonstrated that when teachers are given the opportunity to clarify their
ideas and the ambiguities in the language used to describe children’s
work, they are capable of substantial agreement about the quality of
different pieces of work from different pupils, and apparently make these
assessments in unidimensional evaluative terms. Second, the more
explicitly teachers define the end product of the activity which they set,
the more rigorous they seem to be in assessing the quality of this work
(Hargreaves et al, 1996, p.210).
1.3.3. The problem of teacher in-service training
Another current problem in assessing the arts concerns teacher in-service training.
Eisner recommends a more qualitative orientation to educational evaluation that
focuses on educational connoisseurship and educational criticism. Connoisseurship
is taken as the art of appreciation, the ability to define the quality of an object or
62
environment (Armstrong, 1994). Eisner (1985) observes that assessors should be
qualified artists and teachers who know how to interpret the visual terms used in the
criteria. He claims that assessors need enough experience and knowledge
(connoisseurship) to recognise creative work in order to make a holistic judgement.
Therefore by this account, assessors should act like art critics.
Steers (1994, p.296) also considers that art and design education in schools requires
teachers with a high degree of professionalism and a real understanding of their
subject, supported by adequate resources of time and materials. The teachers' role in
the assessment process is essential and without their collaboration and agreement
there can be no reform or innovation. Teacher ownership or commitment to the
assessment depends on the quality of teacher in-service training. According to Steers
(1996): ‘This ownership only can be effective if the system provides an efficient and
continuous teacher training, ensuring connoisseurship and professionalism of
classroom teachers and assessors’. Assessor training is an important aspect of
connoisseurship, but not all systems provide such in-service training (Schönau, 1999,
p.33).
Processes of teacher in-service training and strategies for reaching agreement about
criteria and standards such as moderation and standardisation are further discussed in
the next chapter.
Summary
In this chapter, general and specific art and design assessment literature was located
and discussed. Several key roles for assessment were identified: as a way to provide
63
feedback and motivation for teachers and students; as a mechanism of control; and a
device for students' selection and certification. Different functions of assessment
including formative and summative were described. Assessment instruments such as
tests and alternative forms of assessment, for example, records of achievement and
portfolios were briefly analysed in terms of their advantages for art assessment and it
was evident that tests are not normally the best instrument for assessment in the arts.
Specific assessment problems of the arts were described and the following issues are
considered crucial for this study:
• Assessment instruments in the arts should allow students to reveal a wide
spectrum of achievements, including evidence of process and products.
‘Alternative’ forms of assessment present advantages for art assessment,
including for example portfolios and records of achievement.
• The nature of art raises difficulties in defining a shared language. There are
no single ‘correct answers’ and it is difficult to define precisely criteria,
standards and attainment targets. However there is a need to accommodate a
wide range of often unanticipated outcomes.
• Developing a consensual view of criteria and standards provides a possible
way to overcome such difficulties. The creation of a community of judges,
through teacher in-service training might provide a way to improve the
reliability of assessment.
64
Chapter 2
Reliability, Validity, Impact, Practicality
This chapter analyses some particular qualities of examinations identified as
essential for the design and evaluation of assessment, namely: reliability,
validity, impact and practicality. The chapter ends with a conceptual
framework for addressing a list of questions concerning potential sources of
invalidity, unreliability and negative impact in relation to art and design
examinations.
2.1. Reliability
An instrument for assessing art learning by tests or other means must demonstrate
reliability or consistency. Reliable assessment should enable students with similar
abilities, in different geographical regions to obtain the same result. In test literature
the traditional definitions of reliability focus on the consistency of results: a test is
reliable if two identical students obtain the same results (Cardinet, 1993; Ribeiro,
1997; Valadares & Graça, 1998). However in reality it is unlikely that there will be
identical candidates (on two or more occasions) and in order to verify consistency
between test results some proxy measures are used such as: (a) test-retest reliability,
which is based on gaining the same marks if the test is repeated; (b) mark-remark
reliability which looks at the agreement between markers; and (c) parallel
forms/split-half reliability in which similar tests give similar marks (Gipps & Stobart,
1997). Reliability is also concerned with evaluation of the marking or rating process
in terms of the consistency of assessors' judgements: inter-rater reliability (an
estimate of reliability based on the degree to which different assessors agree in their
65
assessment of candidates’ performance) and intra-rater reliability (the degree of
agreement between two assessments of the same sample of performance made at
different times by the same assessor.
Reliability as understood by test developers may not be as useful for teachers, and
especially for art teachers who usually assess variables outcomes of coursework or
portfolios. A broader conception of reliability might be useful. For example William,
(1992) argues that reliability is only a small part of the dependability of an
assessment instrument in providing accurate information about the performance of
individual students. He states that it is more appropriate to work with the concepts of
disclosure and fidelity. The disclosure of an assessment is the extent to which it
produces evidence of attainment from an individual in the area being assessed. In
general,
there should be reluctance to conclude that failure to elicit evidence of attainment
means that the student has not attained: very often, all it means is that the question
has not been asked in the right way for that individual, and the verdict cannot be
proven. The fidelity of an assessment is the extent to which evidence of attainment
that has been disclosed is observed, correctly interpreted and accurately recorded.
The concept of fidelity will be developed later through the discussion of validity.
2.1.1. Problems of reliability in art examinations
The major problems associated with reliability in art examinations can be attributed
to: (1) in a priori stage: the construction of the examination instrument and, (2) in a
posteriori stage: the implementation of the instrument where aspects such as
66
teachers, assessors, examiners’ training, and judgment of students' work through
interpretation of criteria all need to be taken into account.
2.1.1.1. Instrument
In art and design examinations tests are seldom the sole instrument. Several
instruments can be used and complement each other. This may present advantages
for the reliability of results because it provides several assessments of the same trait.
Eisner (1991) advocates techniques such as systematic sampling in assessment
procedures as a process to obtain consistency through corroboration: 'In seeking
structural corroboration we look for recurrent behaviours or actions... that inspire
confidence that the events interpreted and appraised are not aberrant or exceptional,
but rather characteristic of the situation'. Systematic sampling allows a teacher to
regularly observe those behaviours that occur consistently or those that suggest
change with the intervention of teacher-guided experience. The teacher can
repeatedly assess the same thinking process (Armstrong, 1994). Gipps & Stobart
(1997) also refer to systematic sampling as a way to improve assessment reliability:
Curiously, another form of reliability __ repeated measures__ is often

neglected. This is unfortunate because, in discussing the reliability of
coursework and of teacher assessment, the fact that performance is
assessed on a number of occasions makes the measurement in this respect
more reliable than a 'one-off' test. Indeed, the perception of examinations
as reliable and coursework as unreliable usually has more to do with
opinion than with evidence. It is often the case that reliability of
examinations is over-rated and that of coursework under-rated (p.42).
Here, Gipps & Stobart (1997) raise an important aspect of reliability that is,
particularly acute in art and design. In fact coursework or portfolios usually provide
67
more reliability than a single test because they allow repeated measures of the same
knowledge skills, understanding and art making over a period of time.
2.1.1.2.Assessor and examiners training
Sources of unreliability can be found not only in the design of an examination
instrument but also in the use of methods and procedures that are unclear or weakly
defined and which result in inconsistency of the whole process. For example the use
of vague scoring instruments such as performance and grade descriptors, criteria and
standards and the consequential problem of different interpretations of such
instruments to judge students works. Boughton (1999, p. 72) considered that the arts
do not easily reveal their quality or character through measurement. Standardised
measurements of learning in the arts describe very little about highly personal
qualities of performance. Boughton (1997) argued that agreement between judges is
essential for reliability in art assessment. He states:
A more appropriate source of ideas to determine standards in arts

education is to look to the arts in the broader community for guidance.
One important cue is the nature of institutionalised debate about what
counts as good work in the community. Similarly, challenge and debate
among teachers has useful institutional potential in arts assessment
contexts. In other words the community becomes the arbiter of quality.
2.1.1.3. Judgements and moderation
Another important procedure in large-scale qualitative assessment methodology
in the arts is the use of moderation (Askin, 1985; Chalmers, 1981; Steers, 1987).
This procedure is extremely important where reliability of scores is critical.
Moderation is a system of assessment using several judges to assess students' works.
68
According to Steers (1996, p.185): ‘ Moderation is the means by which the marks of
different teachers in different centres are equated with one another and through
which the validity and reliability of assessment are confirmed’. Moderators calibrate
teachers’ marks in relation to an agreed standard. Moderation improves reliability of
assessment results and reduces the variations of the individual assessor's
interpretation of criteria (Schönau, 1996; Steers, 1994, Blaikie 1996; Heyfron, 1986).
Moderation occurs in assessment of art on a national scale, including in England
(Steers, 1988), Australia (Boughton, 1994) and the Netherlands (Schönau, 1996), in
international assessments that take place for the International Baccalaureate
examination (Chalmers, 1981) and for the Advanced Placement examinations used in
the United States of America to select some students for progression to higher
education (Askin, 1985). This procedure of marking students' examinations requires
expensive assessor training and resources, however it will be important to consider
the advantages of such systems in order to minimise the unreliability of examination
results.
2.1.1.4. Threats to reliability summarised
To sum up, threats to reliability in art and design are usually a consequence of a
deficient assessment instrument, ineffective assessment procedures and lack of
assessor training. Standardisation meetings for assessors, when carefully planned and
implemented can improve examination reliability because they serve as assessor
training and help to foster a common understanding of standards. Moderation and
post moderation procedures also might reduce sources of unreliability by
encouraging informed debate about standards.
69
2.2. Validity
Validity is used in the context of this research as a unitary concept: unless the type of
validity is specified in the text the term validity is used to include all types of
validities (content, face, response, construct, theory-based, criterion-related, etc.).
Validity concerns the appropriateness and meaningfulness of an examination in a
specific educational context and the specific inferences made from examination
results. The validation process is the process of accumulating evidence to support
such inferences. Whether an examination is valid hinges on the question of whether
or not the examination assesses what it is supposed to assess usually in relation to
what has been taught. Assessment of either process or product requires critical
scrutiny of the soundness of the interpretations that will be made from the derived
score. Questions and issues surrounding the appropriateness, meaningfulness, and
usefulness of inferences from scores are concerns of validity. Validation can be
sought through two main stages: A priori validation involving a scrutiny of the
examination paper or examination instructions before it is put into use, and a
posteriori validation involving the investigation about the way the examination
appears to have worked after the event, it involves both the analysis of scoring data
and qualitative investigation.
Only as a form of shorthand is it legitimate to speak of the validity of a

test [or examination]. We need first to look at validity in the design and
development stage where the test is constructed based on an empirically
derived theoretical specification. Next the implementation stage when the
test has been administered, we are able to look at the data generated and
apply statistical analyses to these to tell us the degree to which any
inferences are well founded. Finally we can collect data on events after
the test to shed further light on the well foundedness of those inferences
and value for end users of the information provided (Cronbach, 1990,
p.150).
70
2.2.1. Some types of validity
Validity is often discussed in test literature. Beattie (1996) makes the point that '…
although validity is currently viewed as a unitary concept, the scope of questions and
issues is broad subsuming all of the historically discrete validity types (e.g., content,
criterion-oriented, construct, face, and the like)'. In assessment and testing literature
several types of validity are defined (Messick, 1971,Cronbach, 1971, Anastasi,
1988). For the purposes of this research seven different types of validity have been
selected, which are discussed in the following sections.
2.2.1.1. Content validation
Content fidelity is concerned with the faithfulness of the task to the

domain or object that it purports to assess. If procedural knowledge is to
be evaluated, then the selected task must cover the dimensions of the
processes adequately. On a broader level, if holistic knowledge of an art
discipline is the object of assessment, then examined processes and
products per se must sufficiently represent the art discipline's methods of
inquiry or content, respectively. A review of the literature in the field and
discussions with peers are means for determining most valuable
knowledge (Beattie, 1996, p. 52).
Content validity concerns the coverage of appropriate and necessary content
(if the assessment instrument covers the skills necessary for good performance,
or all the aspects of the subject taught). Content validation is also about ensuring
the authenticity of the tasks or, in other words, if the assessment instrument is
representative of some domain of artistic competencies used in real life.
Usually, content validation is carried out through qualitative experts' judgments of
examination instruments. Sources of invalidity can be: assessment instruments that
are constructed using a poor sample of artistic competencies or domains; tasks and
procedures that do not enable students to reveal their competencies in an authentic
71
way; failure to restrict inferences from examinations results to what an examinee can
do in the specified content domain.
2.2.1.2. Face validation
Face validation is also about checking if the assessment instrument measures what it
is supposed to measure, but, as opposed to content validity, which is judged by
experts, face validity is judged by various groups of non-experts who come into
contact with the instrument taking into account the feedback from test takers and
stakeholders of examinations. Threats to face validation may be unfamiliarity of
format and lack of authenticity in the assessment instrument.
2.2.1.3. Response validation
Response validation is the extent to which examinees respond in the manner
expected by the examination developers. It is a process of inquiry about the clarity of
the examination paper, for example, whether or not students understand the
instructions and if the tasks motivate them. Using the designation '
straightforwardness' Beattie (1996) describes aspects of response validity:
Straightforwardness, as it relates to either assessment

of process or product, is important to consider. Every
aspect of the task (format, directives, standards,
scoring criteria) should be transparent enough for
students to understand what is expected of them. Peer
review and pilot testing of tasks help to uncover
problems of transparency.
To minimise sources of response invalidation it is important to evaluate the
assessment instrument conducting a trial version of the assessment instrument with
a selected sample.
72
2.2.1.4. Washback validation
Washback validation is concerned with the influence of the examination process
on the teaching and learning situation and checking if the instrument and
underlying model of artistic skills and knowledge are related to the learning goals.
Washback validity is also concerned with the tasks and scoring process, verifying if
they clearly reflect the specified model and domains – in other words, if the tasks
enable examinees to reveal their artistic skills and knowledge in a reasonably
authentic way.
Washback validity can also be referred to as backwash (ALTE, 1998), which is
also related to the impact of an assessment instrument on classroom teaching.
Teachers may be influenced by the knowledge that their students are planning to take
a certain test, and adapt their methodology and the content of lessons to accordingly
reflect the demands of the test. The result may be positive or negative. Sources of
washback invalidity include: basing the assessment instrument on an underlying
model of artistic skills and knowledge which are divorced from learning goals; tasks
and scoring procedures that do not fully and clearly reflect the specified underlying
model and domains of artistic skills and knowledge; tasks and scoring procedures
that do not enable the examinees to actively engage their artistic skills and
knowledge in a reasonable and authentic way; and tasks and scoring procedures that
encourage and reward aspects of performance that draw on irrelevant competencies
(Hasselgren, 1998).
2.2.1.5. Criterion-related validation
73
Criterion-related validation is an a posteriori evaluation of the examination that can
be checked by two main processes: predictive validity and concurrent validity.
Predictive validity relates to whether the test predicts accurately or well some future
performance. Concurrent validity is concerned with whether the assessment
instrument correlates with, or gives substantially the same results as, another
assessment instrument for the same skill.
When designing an assessment instrument it is important to verify whether or not the
test is measuring unrelated abilities. Criterion-related validation is threatened by the
use of external criteria that measure different competencies from the assessment
instrument in question and by failing to look for evidence to ensure that the
assessment instrument is not measuring irrelevant competencies. Messick (1989,
p.10) has argued:
We fail to serve the demands of validity if we try only to correlate

simplistic tests with other tests (even other performance tests). We cannot
use content validity procedures alone, if the aim is to capture the essential
'doing' of the ultimate performance and the most valid, contextual
discriminators for assessing 'doing'.
In describing the typical validation procedures that involve mere correlation between
test scores and criterion scores, Messick (1989, p.10) notes that 'criterion scores are
measures to be evaluated like all measures. They too may be deficient in capturing
the criterion domain of interest’. The solution, he argues, is to evaluate the criterion
measures, as well as the tests, in relation to construct theories of the criterion domain.
2.2.1.6. Validation related to examination bias
74
Validation related to examination bias or equity of an examination is concerned with
the verification of the assessment instrument or checking if the tasks and procedures
place one or other group of examinees at an advantage. Sources of bias can include:
cultural background, background knowledge, cognitive characteristics, native
language, ethnicity, age and the gender of the examinees.
A powerful case needs to be made that the task for assessing process or
product is clearly unbiased and fair for all students involved. Aspects of
content and format might favour some group over another. The emphasis
on written or oral descriptions of learning processes in reflective journals
may be unfair to disadvantaged students, minorities, or non-standard
language speakers. A gender bias might also occur. Females tend to be
more reflective than males; therefore, reflecting on their processes might
be easier for them. An example of a contextual bias that might occur with
respect to the evaluation of an art product stems from different emphases
art educators place on product assessment criteria. In one context, the
teacher may consistently emphasise formal criteria, while in another
context, value is placed on conceptual and affective criteria.
Critical to the assessment of procedural knowledge is the opportunity
afforded all students to learn and practice methods of inquiry
(Beattie, 1995, pp. 52-53).
In order to identify points of bias is important to ask experts to revise the task prior to
the assessment and also to analysis the responses after assessment.
2.2.1.7. Construct validation
Construct validity relates to whether the examination is an adequate measure of the
construct – that is the underlying skill, knowledge or understanding being assessed.
Important to the development of an assessment is a clear and detailed definition of
the construct. The term construct can be viewed as definitions of abilities that permit
us to state specific hypothesis about how these abilities are, or are not, related to
other abilities, and about the relationship between these abilities and observed
behaviour. Two aspects of a construct validation must be considered: (1) a posteriori
empirical and (2) a priori theoretical (Weir, 2004). Anastasi (1988) describes
75
specific empirical techniques that can contribute to construct validation related to test
validation. For example: correlations with other tests; factor analysis; internal
consistency measures; convergent and discrimination validation. According to
Anastasi it is only through empirical investigation of the relationship of test scores to
other external data that we can discover what a test measures.
Construct validation can be viewed as the unifying concept of validation integrating
considerations of content, criteria and consequences (Messick, 1992). According to
Messick, two major threats to construct validity are construct under-representation
and construct irrelevancies. The more fully examination developers are able to
describe the construct that the examination instrument is attempting to measure at the
priori stage, the more meaningful might be the statistical procedures contributing to
construct validation that can subsequently be applied to the results of the
examination. According to Weir (2004) construct validation of an assessment
instrument can be checked using primary or secondary evidence to ensure that
constructs, descriptors of performance and the scoring system are adequate. A trial
version of the assessment instrument is recommended in the a priori stage of
examination design.
Examination developers should posit a theory of the construct of artistic skills,
knowledge and understanding as the basis for designing examination instruments.
Furthermore, a methodology for exploring the validity of examinations is essential
in order to provide tools and empirical evidence to support the instruments and
processes. Construct validity can be threatened by faulty or incomplete description
of the abilities or traits to be assessed and lack of empirical evidence supporting
76
the creation of descriptors of performance and division of constructs in the
scoring system.
2.2.2. Problems of validity in art examinations
The major problems of validity in art and design examinations are related to: (1) the
design of the assessment instrument, (2) the examination’s underlying conceptual
model and domains of competencies and, (3) a consensual concept of the most
important aspects of art and design.
2.2.2.1. Assessment instrument
In art and design, one single event such as a short period examination seldom
allows the valid assessment of students' artistic skills, knowledge, and
understanding (Burkhart, 1965; Chalmers, 1990; Zimmerman & Zurmuehlen,
1987; Dorn, 1988; Efland, Koroscik & Parsons, 1991). A range of tasks and
student's works are necessary. Beattie (1996) has argued that:
A single process or product is not exhaustive of an art discipline's

methods of inquiry or content, and to infer anything about broad
knowledge in a discipline from one process or one product is
problematic. Neither is a single score on a process a strong indicator of
knowledge about the process. Several measures increase the chance of
determining an accurate score and making an appropriate interpretation
(p. 52).
The choice of the assessment instrument is crucial to attain validity and also
reliability of examinations. In the case of art and design, the complexity of the
subject raises problems about the use of traditional assessment instruments structured
principally on basic learning strategies such as repeating, recalling and ordering.
77
2.2.2.2. The examination underlying model
Examination developers need to be explicit about the nature of examination, while
acknowledging concepts such as: artistic skills, knowledge and understanding and
judgement. For the purpose of this research it might be helpful to have a sharper
definition of whether the intention in art and design examinations is to assess student
performance or student ability. In the English report of the Task Group on
Assessment and Testing (TGAT, 1988) it was stated:
There has been some misunderstanding about the assessment of

'ability'…. We had intended to confine our proposals to the assessment of
'performance' or attainment and were not recommending any attempt to
assess separately the problematic notion underlying 'ability'. If ' ability
were to be assessed, its meaning would have to be carefully defined; and
the problem with defining it without making it merely a measure of a
particular type of performance is hard to solve (TGAT, 1988, p.2)
Beattie (1996) cited Glaser’s (1991) dimensions of competence. Glaser (1991) has
identified four dimensions of developing performance that can be useful concepts for
assessment:
- Coherence of knowledge (i.e., experts are better able to structure their

knowledge in meaningful, interrelated ‘chunks’ [p. 26] rather than
fragmented, isolated bits of information, and can access this knowledge more
readily);
- Principled problem solving (i.e., experts recognize underlying principles and
patterns in a problem);
- Usable knowledge (i.e., experts progress beyond content or declarative;
knowledge to procedural to goal-oriented knowledge);
- Self-regulatory skills (i.e., experts monitor their own performances).
Glaser’s description of these four dimensions are particularly relevant to the
establishment of criteria, assessment objectives, and consequently to the design of
appropriate assessment instruments for art and design. The dimensions are
appropriate for the specialist field of art and design where knowledge and
78
understanding, problem-solving skills and aesthetic appreciation all need to be
evidenced.
When an assessment instrument is designed a coherent theoretical model of art and
design curriculum should underpin it for the purposes of theory based validity. The
assessment instrument should cover the artistic skills, knowledge and understanding
necessary for good performance, or all the important aspects of the subject taught
within that model. But it is widely agreed that art and design is a subject where there
are no single or ‘correct’ answers. Boughton (1997) notes that the problem with
judgements of artistic outcomes is that there is no single exemplar or icon against
which to judge a given student's performance. Originality, expressiveness, personal
interpretation and creative solutions, which are usually highlighted in attainment
targets and criteria, are influenced by trends and fashions in art. Moreover in the light
of postmodern theories, the nature of art defies attempts at a tidy definition, and
competing definitions of art may need to be brought to bear in judgements of quality.
Art and design examinations are concerned with the judgement of

performance through the appreciation of students' artistic, knowledge,
understanding and skills. 'Two judges viewing the same work may hold
different opinions about its value. A judge with modernist values, for
example, may reject eclecticism as unoriginal, and regard copying as
evidence of technical and imaginative ineptitude; whereas a judge
holding postmodern values may reward eclecticism and copying, because
both are appropriate and consistent with contemporary art practice. At the
cutting edge of artistic exploration these dilemmas will always exist in
judgment about student work. Because they are matters of value, and not
fact, they are difficult to resolve (Boughton, 1996, p. 83).
However the problem of different visions of art and design can be overcome by
agreement and consensus between the users. The scoring methods should be
constructed around agreed concepts of art and design students’ performance.
79
The underlying model of skills, knowledge and understanding being assessed should
be carefully defined in advance, seeking the empirical evidence supporting such a
model (Weir, 2004). The agreed aims and objectives of art education are essential
elements to take into account in the development of art and design examinations.
One necessary condition for the success of the examination is a ‘high level of
consensus about the aims of art education’ (Steers, 1996, p. 189), rather than
imposed policies underpinning its constructs and contents.
2.3. Impact
Assessment shapes institutional learning. Different forms of assessment

influence what is taught and how it is taught, what and how students
learn. Through the twentieth century, many educationists have deplored
the effects of public examinations on the quality of learning in English
schools. Systems of assessment have been criticised for putting a
premium on the reproduction of knowledge and passivity of mind at the
expense of critical judgement and creative thinking (Broadfoot, 1996,
p.175).
The impact of an assessment process is concerned with the effect created by that
assessment, both in terms of influence on general educational processes, and in terms
of the individuals who are affected by test results. The impact of an assessment
instrument and procedures includes all sort of consequential validities and washback.
Examinations formats and selections of syllabus content are not value-free; its effects
are both educational and societal. The former refers to the changes as a result of
examinations on curriculum, teaching methods, learning strategies, materials,
assessment practices and knowledge examined, while societal effects are concerned
with the impact of examinations on gate keeping, ideology, ethicality, morality and
fairness.
80
2.3.1. Effects on society and individuals
Examinations as features of power cause detrimental effects on those users, who
affected by the results, are forced to change their behaviour and comply with the
examination’s demands in order to maximize and gain the benefits associated with
high scores. Examinations are used ‘to gather evidence upon which inferences are
drawn about the unobservable’ (McNamara, 1999), they might be imperfect tools, but
their power and use has strong consequences on society and on individuals.
Examinations are not used solely by educators and students – they can also serve
political agendas. Shoamy’s (2001, p. xxii) comments about tests are also valid in
the examination context:
… policy makers and test developers had different agendas regarding

tests; they each viewed tests through very different eyes. Testers believe
that the main criterion for good testing is that the tests they construct
possess features that provide an indication that they can measure
accurately. Policy makers; on the other hand, view tests as a means of
promoting educational agendas…I observed the many ways that tests
were used in education and society, not only to force teachers to teach
and students to learn but also to impose policies, define knowledge and,
worse, to punish, exclude, gate keeping and perpetuate existing powers.
2.3.2. Effects on students
For the examinees, examinations are threatening rituals detached from reality with
long-term effects (Shoamy, 2001, p.14). The uses of examination results have
detrimental effects for examinees since such uses can create ‘winners and losers,
successes and failures, rejections and acceptances’ (Shoamy, 2001, p. 15).
Examination results are used as indicators to placing students in class levels, for
granting certificates, for determining whether a person will be allowed to continue to
further study, for deciding on a profession, or for obtaining a job.
81
2.3.3. Effects on teachers and teaching methods
Art and design examinations influence teaching practice. To predict intended and
unintended results there is a need to study the impact of similar previous assessments
on curriculum and instruction. Beattie (1996, p.53) makes the point that:
The art curriculum might be changed to align more closely with the
emphasised object of assessment. Moreover, the methods by which
each is assessed could bring about a distortion of teaching with positive
or negative results. Task format might discourage other viable methods
of teaching concepts and skills.
Similarly Spolsky (1995, p.56) argues that the power of examinations inevitably
narrow the educational process. This is because once teachers have familiarised
themselves with the content of an examination they recognise that:
… it becomes more or less the precise specification of

what knowledge or behaviour will be rewarded (or
will avoid punishment). No reasonable teacher will do
other than focus his or her pupils’ efforts on the
specific items that are to be tested; no bright pupil will
want to spend the time on anything but preparation for
what is to be on the examination. The control of the
instructional process then is transferred from those
most immediately concerned (the teacher and the
pupil) to the examination itself.
The examinations act as a curriculum model for art teachers because of the
importance of results for students and schools. The competence of a teacher may be
judged by his or her students’ examination results. In order to improve students’
results teachers tend to pay a great deal of attention to exemplary work graded as
‘good’ in a previous examination and to allocate more time to learning prioritised in
them. Examinations usually stimulate teachers to produce teaching materials and
strategies in accordance to the examination guidelines (published or unpublished). In
the case of art examinations, where ‘originality’ is often an assessment criteria, this
82
effect can be pervasive, Steers claims: ‘When original work is seen to be rewarded
by the examination system it is rapidly imitated by students and teachers alike and by
the time of the next annual round of examinations an innovative approach is reduced
to a cliché’ (Steers, 1994b). On the one hand examinations offer models of ‘good
answers’ but on the other hand such models are contrary to the very nature of art
making and learning. Consequently examinations may introduce orthodoxy to a field
characterized by a plurality of personal view points and innovative answers causing a
threat to washback validity on teaching and learning.
2.3.4. Effects on schools
Examinations are powerful methods of control and a means for monitoring the
performance of schools. The examination results are the basis for accountability:
schools are compared and differentiated by the ranking of their examination results
in league tables. The impact of an examination is concerned with the extent to which
the examination results are used in the way intended, and are successful in bringing
about the aims of the examination. A misinterpretation of the results threatens the
examination process. Results based solely on quantitative data with no other
specifications are not recommended by Torrance (2000):
When accountability is based only on quantitative data with no regard for

the contexts in which the results were obtained, the credibility of such
results may be questionable. The validity of national scoring results or
inspection reports may be criticised. The results on a national scale may
not represent the reality, since reality is plural. But results in a small scale
may have more accuracy if they are focused in what, in fact, happens in
schools. The codified results of students' achievement as they appear in
league tables are not easily decipherable, except for assessment experts.
And furthermore they do not represent the quality of learning in schools
unless the 'value-added ' is described in such tables. Value-added
measures measure the progress made by individual pupils in an
institution rather than raw outcome scores.
83
2.3.5. Effects on standards and curriculum
Examinations are perceived as an expression of authority. Because they establish the
measure of students’ achievement using the language of numbers and
standardisation, they are used for purposes of monitoring and controlling the
education systems (Shoamy, 2001, p.50). Examination results are the recognised
measure of school, local or national standards. The curriculum will be affected by
such views.
Steers (1994) points out: ‘… while governments might have a legitimate interest in
assessment as an instrument to monitor local and national standards, assessment also
can be used as a potent means of controlling the curriculum’.
2.3.6. Effects on higher education
Examination acts as a disciplinary tool punishing and rewarding and, it also acts as
a means of controlling the number of students entering higher education. The study
of predictive validity in the case of art and design GCE/A level examinations might
provide valuable insights to evaluate the effectiveness of the examination.
The following questions may be worthy of further study:
- Do higher education courses in art value GCE/A level art and design
examinations results?
- Is there any correlation between students’ pre-university and design
examination results and their grades or marks obtained in higher education?
2.3.7. Ethical considerations
For students, examinations are not synonymous with fairness; often they perceive
examinations as biased, unjust, unfair and depending on luck (Shoamy, 2001, p.14).
84
Examinations can have detrimental effects on students; they are used as disciplinary
tools, providing both rewards and sanctions (Foucault, 1979). Student voices are
often neglected by current examination systems, although some improvements seem
to appear through the establishment of codes of practice (e.g. Code of Ethics for
International Language Testing Association; Test taker’s bill of rights by the
American National Council of Teachers of English; Wiggins, 1993). In this study it
may prove relevant to analyse the views of examination users about the ethical
aspects of examinations.
2.3.8. Evaluating the impact of examinations
It is important to be able to monitor and investigate the educational impact that
examinations have within the contexts in which they are used. Examination
developers should operate with the aim that their examinations do not have a
negative impact and, as far as possible, strive to achieve positive impact. This
might be accomplished through collaboration with stakeholders, in relation to:
• Development and presentation of examination specifications and
syllabus;
• Professional support programmes for the users (teachers, assessors,
students) and training of experts (examination developers, assessors,
moderators)
• Evaluation of the examination by collecting examination results
(marks) and teachers’/students’ opinions.
The impact of the examination is closely linked with consequential validity,
washback validity and reliability. The study of the impact of an examination
85
might provide evidence to demonstrate that the examination is or is not
sufficiently valid and reliable for the context in which it is used.
2.4. Practicality
Practicality, sometimes called utility, is the extent to which an examination is
practicable in terms of the resources necessary to produce and administer it in its
intended context of use (development, administration and validation of
examinations). The practicality of an examination is its convenience, flexibility and
cost effectiveness. According to Steers (1994) those factors are important because
awarding bodies and school centres have to operate within the overall time and
resources available. Aspects such as the type and quantity of students' work to be
included in an art examination, the number of assessors, and the period of the
examination are all important considerations in the context of finite fiscal constraints.
The usefulness of an examination is related with its practicality and appropriate
procedures must be implemented for managing all aspects of the examination
process. The following factors must be considered:
• Design and question papers production;
• A priori validation of question papers;
• In-service teachers and moderators training;
• Availability of the examination in terms of dates, frequency, location and
number of centres or schools;
• Fees to be paid by examination takers;
• Central and local costs in terms of: production of question papers, marking
and scoring, meetings, visits, post-examination evaluation;
86
• Security, transport and storage of materials;
• Special circumstances, arrangements for candidates with special needs;
• A posteriori validation of examinations.
2.5. Conceptual Framework for evaluating art examinations
The examination constraints described in this chapter are used here as the basis of a
framework (see Figure 1 below) for evaluating the validity, reliability, impact and
practicality of art and design examinations, taking into account the identified major
sources of potential invalidity, unreliability and negative impact.
• Validity
Aspects concerning: (1) the extent to which the design of the examination
provides tasks that contain representative and relevant evidence of knowledge,
understanding and skills in art and design, (2) the constructs used, (3) the
authenticity of the assessment instrument and procedures, (4) the appropriateness
of format, directives, standards and scoring criteria, (5) the relationship with
other assessments of the same performance, (6) examination bias.
• Threats to validity in art and design examinations summarised
- Poor sampling of artistic knowledge, understanding and skills in art and
design. Unclear, irrelevant or reduced versions of the main aspects of
knowledge, understanding and skills in art and design. (Under-representation
and construct irrelevancies). Using an underlying model of artistic
knowledge, understanding and skills, which is divorced from learning goals.
87
- Using methods and procedures that may prevent candidates from performing
in the way intended. Using tasks and marking procedures that do not fully
and clearly reflect the specified model of artistic knowledge, understanding
and skills. Using tasks and marking procedures that encourage and reward
aspects of performance that draw on irrelevant abilities. Lack of authenticity
in the examination tasks.
- Criteria, assessment objectives, attainment targets and or grade descriptors;
which are not supported by empirical evidence.
- Lack of evaluation in the three stages of the examination: design, delivery
and outcomes.
- Examination bias. Format, directives, tasks which discriminate candidates’
cultural background; background knowledge; native language; ethnicity; age;
gender.
• Reliability
Aspects concerning the generalisation of results, consistency of marking and the
marking procedures used to reach such consistency will need consideration. In
particular, the processes used to reduce variability of scores produced by
different concepts and distinct philosophies in art and design education, for
example in the design of the instrument, standardisation, and moderation and post
moderation processes.
• Threats to reliability summarised
- Examination methods and procedures that are unclear or poorly defined.
- The influence of a weak or dominant partner in the assessment team.
88
- Instructions and procedures for marking that are unclear or weakly defined.
- Directives and marking instruments, for example: assessment objectives,
criteria, attainment targets and/or grade descriptors using vague terms.
- Lack of teacher and assessor training.
- Lack of consensus. Lack of standardisation.
• Impact
Analysis of the uses of an examination as well as an analysis of the way
examination results are interpreted and acted upon, together with the after-effects
of the examination on teaching and learning.
• Sources of negative impact
- Inappropriate profiling of individual strengths and weaknesses (e.g. unclear
grade descriptors/assessment matrix)
- Criteria, assessment objectives, attainment targets and/or grade descriptors
(referring to art and design knowledge, understanding and skills) which are
expressed in vague or negative terms, unable to clearly express what learners
can and need to be able to do.
- Unclear instructions to users on how (and how not) to interpret examination
results.
- Basing the examination on an underlying domain model of artistic knowledge
that is divorced from learning goals. A narrow selection or a selection of
irrelevant knowledge, understanding and skills for assessment.
89
- Inauthentic assessment instrument: tasks that do not enable the candidates to
actively engage their knowledge, understanding and skills in a reasonable and
authentic way.
Summary
In this chapter the reliability, validity, impact and practicality of art and design
examinations have been discussed in order to establish the key problems and issues
related to assessment processes. Threats to the reliability of the examinations were
established in relation to: inadequate design of the assessment and scoring
instruments; lack of marking consistency due to inadequate training of assessors and
moderation procedures. Sources of invalidity of examinations were identified
through inadequate content fidelity; ambiguous and unshared concepts of domains
and descriptors of performance; assessment instruments covering a poor sample of
artistic skills, knowledge and understanding; lack of authenticity in the assessment
instrument; unfamiliarity of format; use of external criteria that measure different
competencies from the assessment instrument in question. Assessment instruments
using tasks and criteria that place one or other group of examinees at an advantage.
The need to evaluate the consequences or impact of the examinations was
established and, finally, questions about the convenience, flexibility and cost-
effectiveness of the examination procedures were identified in order to evaluate the
practicality of the system. The chapter ends with a proposed conceptual framework
90
to evaluate essential aspects of an art and design examination through document
analysis, observation and questionnaire data analysis.
Reliabilities
Validities
• Content validity • Inter-rater reliability
• Face validity • Intra-rater reliability
• Response Validity
• Bias
• Construct/Theory based
validity
• Predictive validity
• Concurrent validity
Impact Assessment
Assessment Procedures
Contexts of use Processes mark
Instrument schemes,
Effects on society and
individuals training,
Effects on schools consensus,
Effects on curriculum Type, format, internal marking,
Effects on teachers, directives standardisation,
Effects on higher Content, moderation, post
education, question moderation,
Effects on students papers, tasks, consistency in
Feedback. assessment marking and
objectives, grading.
Criteria, Ethics of
weightings examinations
Constructs,
underlying Practicality
Practicality
models
Figure 1: Conceptual framework for evaluation of art and design

external assessment
Chapter 3
Design of the research
91
The first part of this chapter reports on the choice of a combined methodology
to be used for the purposes of the study and evaluation of validity and
reliability aspects in current art examinations in Portugal and England and for
the search and trial of new procedures to improve art examinations in Portugal.
The second part of the chapter describes the selected qualitative and
quantitative instruments and methods of data collection.
3.1. Research questions

Taking into account the research questions stated on the Introduction Chapter (p. 15)
three main areas of investigation were established corresponding to the overall aims
of this research:
(a) An investigation and evaluation of some current art examinations in Portugal
and England (research questions 1, 2).
(b) A proposal for improvement of Portuguese art and design examinations
(research questions 3, 4).
(c) Application and evaluation of the proposal (research questions 4,5).
3.2. Choice of method

For this study, a research method was needed that would give access not just to what
goes on in each assessment culture but, more importantly to the meanings behind the
practices. Initially this led to the choice of a qualitative research method. According
to Creswell (1998) qualitative inquiry represents a legitimate mode of social and
human science exploration. He defined it as:
…qualitative research is an inquiry process of understanding based on

distinct methodological traditions of inquiry that explore a social or
human problem. The researcher builds a complex, holistic picture,
analyses words, reports detailed views of informants, and conducts the
study in a natural setting (p.15).
However, it was recognised that using a qualitative approach alone would limit the
study in terms of its ability to generalise. Furthermore triangulation of data from
92
different data collection instruments was considered necessary to obtain reliable
findings. Such considerations led to the choice of a combination of qualitative and
quantitative methods. As Miles & Huberman (1994, p.253-254) point out
quantitative analysis provides the means to verify a hypothesis derived from
qualitative research data and it helps to avoid bias.
Taking into account the previous considerations, the research design was mixed-
method, combining both qualitative and quantitative data in case studies. The main
instruments used to collect data were documents, observation, interviews and
questionnaires. The combination of qualitative and quantitative instruments in this
research, although time consuming, was particularly useful for testing out the validity
(i.e. authenticity) and reliability (i.e. consistency) of data gathering techniques.
93
3.3. Stages of research
The research was undertaken in two Stages.
Stage 1:
Stage 2:
Review of
literature
• Defining a new model
of external assessment
for Portugal
Analysing aspects of
validity, reliability, • Trial 1: Piloting in one
impact and practicality school
in current art • Evaluating the model
examinations • Redefining it
• Portugal • Trial 2: trial in five
• England schools
3.3.1. Stage 1 • Evaluating the model
• Examining the
F implications
Suggesting
improvements
Figure 2: Stages in research
94
3.3.1. Stage 1
Stage 1, which concerned the study of current art examinations in Portugal and
England, and was, mainly conducted during 2001-2002. This stage was intended to
analyse aspects of validity, reliability, impact and practicality in both systems and
also to develop suggestions for improving the Portuguese art examinations.
3.3.2. Stage 2
Stage 2 was intended to create a new assessment instrument and assessment
procedures for art examinations in Portugal, it was conducted in 2002-2003. Based
on the analysis and findings obtained during Stage 1 and an additional literature
review,
a blueprint for a new external assessment was designed (see p. 214). Rationales, draft
instruments and procedures were established and verified through a priori validation
taking into account the views and advice of experienced Portuguese and English art
and design teachers who provided useful comments on how to redefine the blueprint.
3.3.2.1. Trial 1
A first trial or pilot of the new assessment instrument was conducted in a Portuguese
secondary school with seven art teachers and 51 students who volunteered to
participate in late 2002– and early 2003. Five meetings were convened with the
teachers between 29th September 2002 and 18th January 2003 in order to discuss
assessment objectives, criteria and grade descriptors (see p. 221). The meetings acted
95
both as training and standardisation and were used also to provide feedback and
evaluation of the instrument. The draft assessment instrument was administered to
students during the period October – December 2002.
3.3.2.2. Trial 2
After the first trial evaluation and subsequent revision of the instruments and
procedures a second trial was conducted in five Portuguese secondary schools during
the period April-July 2003. A list of 15 art teachers potentially willing to participate
in the experiment was identified from the questionnaire responses (Appendix VII-2,
question 14.9.) These were contacted and the purpose of the trial, activities and
schedule were explained. Ten teachers agreed to participate with their students. All
but one of their students accepted making a total of 117 students. The trial included
standardisation meetings, administering the new assessment instrument to students,
on-line meetings, moderation and post moderation meetings (see p. 242).
After the trials, the experimental assessment instrument and procedures were
analysed. Data was collected through: (1) the researcher’s notes of school visits,
(2) interviews with teachers and a sample of students; (3) teachers’ and students’
questionnaires; (4) teachers’ and external observers’ reports; (5) on-line teachers’
meetings; (6) students’ portfolios; (7) students’ portfolio marking results.
96
Stage Dates Research objectives Instruments/sources Samples
s
Stage 2001 To examine international Literature review
1 literature pertaining to the
theoretical domain of
assessment in art education.
2001-2002 To analyse official documents Document analysis

concerned with art and design (documents,
examinations in Portugal and statistical sources)
England.
May 2001- To investigate Portuguese Interview 1 test developer
July 2002 students, teachers and Questionnaires 104 students
assessors views of the nature 44 teachers
and problems of art and design
examinations in Portugal and
identify possible forms of
improvement.
May 2001- To compare the practices and Interviews 5 (4 teachers+1
July 2002 policies of art and design student)
examinations in England and Observation (Edexcel
Portugal and evaluate their standardisation 2
relative strengths and meetings)
weaknesses.
Stage To develop a conceptual Literature review

2 framework for external art Documental analysis
assessment in pre-university (syllabuses,
level. curriculum materials)
September To develop and trial an Trial 1 (pilot) 1 school

2002- experimental assessment Interviews 7 teachers
January instrument and procedures Questionnaires 51 students
2003 (with the aim of improving Reports 1 external observer
reliability and validity in Observation
Portuguese art and design Portfolios’ marks
examinations).
April 2003- Trial 2 (main trial)

July 2003 Interviews 5 schools
Questionnaires 10 teachers
Reports 117 students
Observation 1 external observer
Portfolios’ marks
2004 To explore the broader

implications of the study.
97
Table 1: Plan of action
3.4. Instruments and data collection
3.4.1. Qualitative data collection
3.4.1.1. Document sources
The documents analysed included secondary source documents and primary source
documents, the latter included articles and official publications by bodies such as:
the Qualifications and Curriculum Authority (QCA); the Edexcel Foundation
(Edexcel); Oxford, Cambridge, and RSA Examination Board (OCR); the Assessment
and Qualifications Alliance (AQA); the Joint Council for General Qualifications; the
Portuguese Department of Secondary Education; the Portuguese Board of National
Examinations (GAVE) and the Portuguese Jury of National Examinations (JNE).
Access to documents that were not in the public domain and permission to use them
for the purposes of this research was formally requested from the various assessment
organisations. Other documents, found in the public domain, were analysed,
including documents published on the Internet in the web sites of the various
organisations.
The principal documents for analysis were concerned with relevant legislation,
assessment information such as codes of practice, syllabuses, examination papers and
tests from the Portuguese national assessment department, the English Qualifications
and Curriculum Authority (QCA), and the English awarding bodies. Other
98
documents used included minutes of meetings, working papers, examiners’ and
moderators’ reports, both in Portugal and in England. The documents also enabled
the identification and selection of individuals who contribute to the research. Both
the literature review and the document analysis provided information with which to
frame questions about specific areas of assessment.
3.4.1.2. Observation
Burgess (1993) recommended observational studies in order to identify some of the
key features and strategies for testing in these areas. She suggested that researchers
should work alongside advisers, teachers and students – a recommendation that
proved particularly important in the context of this research. Observations focused on
assessment procedures such as standardisation meetings to witness at first hand how
teachers applied assessment procedures and made judgements about students'
artworks. The key assessment events, identified through document analysis led to
identification of and contact with the relevant assessment experts who provided
authorisation to observe assessment practices in England. The Edexcel Foundation
gave permission to observe the standardisation meetings held at their premises in
Mansfield, Nottinghamshire in May 2001 and July 2002. The researcher wanted to
understand how moderators were prepared; what kind of processes were used to
reach a common understanding and application of criteria; what kind of visual
exemplars were used and how the marking of such exemplars was explained. The
observer was introduced to examiners as a Portuguese research student by the
Edexcel art assessment coordinator. During these meetings the researcher acted as a
moderator, while taking notes and informally interviewing the participants.
Subsequently these notes were complemented by semi-structured interviews
99
conducted with English art examination stakeholders (For details of observations
conducted in England see Appendix X).
3.4.1.3. Interviews
Interviewing is often described as a straightforward method of trying to discover
what people think (Robson, 1993, p.228). It is essentially a way of collecting
subjective information. As a consequence, its use offers a most important technique
in the search for meaning. The purpose of the interviews was to understand practices
from the point of view of stakeholders and to obtain deep information about the
expectations and problems experienced by participants. The initial plan was to use
structured and unstructured interviews in order to allow them ‘to tell their stories’.
According to Stewart (1996) an interview-based methodology attempts to construct
and
reconstruct personal and socio-cultural stories about particular concepts in people's
lives. The interviews in this research provided a rich array of data about users’
perceptions of art examinations (Appendix IX). It was hoped that they would enable
an evaluation of ways in which teachers used the examinations, processes of
standardisation and moderation and ways in which results were utilised by pupils,
parents and other teachers.
In stage 1 the purpose of the interviews was to obtain key stakeholders’ perceptions
about the current art examinations. An interview schedule was designed to gather the
100
most useful information, without spending disproportionate time, by organising the
questions into sections on the validity and reliability aspects of assessment identified
in Chapters 1 and 2: (1) assessment instruments, (2) assessment procedures and, (3)
impact of assessment results (Appendix IX, 1.). The schedule was piloted with an
English art teacher moderator in order to identify potential problems, the pilot sought
to test out the interpretation of questions by the respondent and helped to refine them.
Forty-seven questions were selected although not all of them were later used they
were useful for guiding the conversations.
Selecting the sample for interviews was crucial to the research because the
participants' responses were being used to validate or negate the findings from
documentary sources. Interviewees were identified using purposeful sampling
procedures from information gathered from advisers and documentary sources.
A letter explaining the purpose of the research and asking if they would collaborate
was sent to potential participants and the arrangements for the meetings were made
through letters and emails. The interview sample for England included six English art
examination stakeholders including a chief-examiner, two moderators, two art
teachers and one art student. Since in Portugal questionnaires were conducted with a
large number of students and art teachers a decision was taken to make a life history
interview with only one art teacher and examination developer who had been
identified as having extended experience in art examinations. The researcher
intended to complete information about the historical background of the Portuguese
art examination by interviewing one of the pioneers of art programmes in Portugal in
the 1970s and 1980s.
101
Roles Name Sex Age Region Experience Interview
Date
teaching examinations
Portugal Curriculum Isabel F 69 Lisboa 45 34 7-05-2001
and test 14.00h-
developer 18.00h
Principal Richard M 57 SouthEast 12 4-02-2001

examiner England 16.00-
16.30h
Team-leader Elisabeth F 67 London 30 20 7-02-2001
region 11.00h-
England 13.00h
Moderator Ian M 48 London 18 11 7-07-2001
region 18.00h-
22.30h
Art teacher Sally F 46 London 1 7-07-2001
region 18.00h-
22.30h
Cindy F 42 NorthWest 18 10-02-2001
England 11.00h-
14.00h
Student Annie F 16 South England 6-05-2002
14.30h-
16.00h
Figure 3: Sample of art and design examination stakeholders for interviews in stage
The interviews in Stage 1 were conducted in 2001, at the interviewees’ homes, and
varied between one and four hours in length. The interviews took the form of long
conversations and it was found that this strategy helped the collection of data in a
non-prejudiced and non-prescriptive way. The researcher’s role in the interviews was
one of orientation to a given theme; the approach was like a teaching situation in
102
which respondents taught the interviewer about facts and personal perspectives.
The interviews were tape-recorded and notes were recorded in a research journal at
the time that events and processes were witnessed. Later, all the interviews were
transcribed in the original language. After the first analysis the researcher’s notes and
preliminary findings were shared with the interviewees and feedback and further
advice was obtained through correspondence. Interviewees’ comments on important
points of assessment helped to develop an understanding of practice.
Some interviews were quite long and participants refer to many details and examples.
After transcription, the interview data was re-organised to relate to the three main
areas of concern: the assessment instrument, assessment procedures and impact.
The interview data was described and summarised by reducing the statements to
those aspects that emerged as being essential through the recurrence of terms,
similar or contrasting opinions, through extensive or repeated descriptions and by
discarding parts where participants expressed opinions about non related-themes
(Appendix IX, 2.).
In stage two it was decided to use both individual and group interviews for students.
Focus groups interviews seemed to offer advantages because they enabled interaction
among interviewees. Initially it was planned to use semi-structured interviews in
order to focus on essential issues of validity, reliability, impact and practicality of the
experiment. The interview schedule contained sixteen questions organised in three
sections: (1) instrument and criteria, (2) contents and resources and, (3) ownership of
the assessment (Appendix XI). The schedule was piloted during the trial 1 with ten
students after they concluded the portfolios for the new examination and it was found
103
that the interview schedule did not obtain very rich responses, so a decision was
taken to use open interviews during the main trial (trial 2).
A sample of students was selected in four schools for group interviews, each teacher
selected five student volunteers for interviews trying to represent boys, girls,
different ages, different socio-economic backgrounds and races whenever possible
(five students group interviews including a total of 31 students). The interviews took
place in the different schools’ art rooms after the portfolios’ moderation procedures;
the length varied according between 30 minutes and one hour.
Schoo Date Student Age Sex Race Parents

l education
P 2-06-2003 Luis 18 M B Basic
14.30h-15.30 Sergio 17 M W Basic
Sandra 20 F W Basic
Raquel 18 F W Basic
Susana 19 F W Basic
B 2-06-2003 Telma 17 F W Higher
17.30h-18.30h Sara 17 F W Higher
Ana 18 F W Higher
Sandra 17 F W Higher
Susana 17 F W Basic
Diana 17 F W Basic
K 26-06-2003 David 18 M B Basic
13.00h-14.00h Júlia 20 F W Secondary
Joana 18 F W Basic
Inês 19 F W Secondary
Isabel 18 F W Basic
26-06-2003 Zé 18 M W Secondary
14.00h-15.00h Ruben 20 M W Higher
Jocelyne 17 F B Basic
Marta 18 F W Secondary
Inês A. 19 F W Secondary
V 17-06-2003 António 18 M W Secondary
10.00h-10.30h Sandra 18 F W Higher
Carla 17 F W Higher
Joaquim 18 M W Secondary
Manuela 17 F W Secondary
Tiago 18 M W Secondary
104
Figure 4: Sample of students for interviews in stage 2
3.4.2. Quantitative data
3.4.2.1. Questionnaires
In stage one a survey questionnaire was designed in order to obtain students and
teachers/assessors perceptions of external art assessment in Portugal (see Appendix
VII). The questionnaire sought to ascertain, first, the degrees of validity, reliability,
impact and practicality of the Portuguese art and design examinations and second, to
identify suggestions for improvement. The questionnaire schedule was designed
based on data obtained from the document analysis and through the lens of the
framework for evaluating art and design examinations described in Chapter 2 (p. 76).
The main concepts of the framework such as validity, reliability, assessment
instruments, assessment procedures and impact were used to design the questions
developing each one of the conceptual issues. Nine sections were developed for the
students’ questionnaires: (1) general information, (2) syllabuses, (3) criteria, (4)
external assessment, (5) form and structure of examinations, (6) assessment
procedures,
(7) reliability, (8) assessors, (9) impact. Teachers’ questionnaires included three
more sections: (10) assessing, (11) feedback and (12) in-service teacher training.
The questionnaires provided space for respondents to add comments or suggestions
(Section 10 - students and Section 13 - teachers).
105
The questionnaire was piloted with 10 teachers during an art teachers’ congress
(Apecv, Evora; March 2002) and later with 10 students from design courses at the
Instituto Superior Politecnico das Caldas da Rainha and Universidade de Aveiro.
After piloting the questionnaire some small language revisions were made in order to
clarify the language, for example technical words such as ‘reliability’ were replaced
by terms like ‘consistency of results’. The questionnaire was administered during
October-November 2001, in two different versions: (1) for secondary art and design
teachers with 83 questions (Appendix VII, 2.) and, (2) for students who took
national art examinations in subjects related to art and design with 67 questions
(Appendix VII, 1.).
The questionnaire sample was intended to include a significant number of art
teachers and art assessors in Portugal (approximately thirty percent of the 350 art
teachers who teach 12th year art classes). It was also intended to include a
considerable number of students (approximately three hundred of the 4000 students
who took national art examinations in 2001) from different art courses in universities
across the country. Questionnaires were posted to: (1) students involved in the 2001
national art examinations, (2) all the teachers in the art departments of secondary
schools identified as having art courses. Questionnaires for students were posted to
those attending art teacher training courses at the main Portuguese universities and
Institutes of education such as: Universidade do Minho; Escola Superior de
Educação de Portalegre; Universidade de Évora . They were also sent to students
attending art, design, multimedia and architecture courses at: Universidade de
Aveiro; Faculdade de Arquitectura do Porto; Faculdade de Belas Artes do Porto,
Instituto Politécnico de Viseu; Instituto Politécnico das Caldas da Rainha.
106
Although pre-stamped envelopes were enclosed, initially it was found difficult to
obtain the questionnaires responses, especially from art teachers, even after sending
follow up letters. So, other approaches were devised. The researcher delivered
questionnaires during Portuguese congresses and conferences on art and design held
in 2001 and 2002. Finally by the end of 2002 a forty per cent response rate was
achieved thus allowing some confidence in generalisation of the findings.
Forty-four art assessors and one hundred and four students (68 students from general
courses; 36 students from technical courses) replied to the questionnaires. The final
sample included a wide variety of geographical locations, ages, teachers’ experience
of assessment and students’ art courses (for details see p. 131).
Total of art Sample Response Total of art Sample Response

teachers rate students rate
350 100 44 4000 300 104
Figure 5: Response rates to the survey in stage 1
In Stage 2, questionnaires were used to obtain data about the participants’
perceptions of the new assessment instrument during the first and second trial. The
questionnaire schedules for students (Appendix XV) and for teachers (Appendix
XVI) were written in accord with the conceptual framework for art examinations (see
Chapter 2, p.76), the concepts of the framework were used to develop a series of
questions focusing on the evaluation of: (1) the portfolio – instructions, format, tasks,
assessment criteria, weightings, (2) – assessment procedures such as standardisation,
moderation, consistency in marking and (3) impact – effects on students, teachers,
schools, and the curriculum. The questions were revised after a pilot because some
107
questions were repetitive and some terms were replaced in the interests of achieving
greater clarity, finally the questionnaires included 64 questions for students and 93
questions for teachers. All students and all the teachers involved on the trials
responded to the questionnaires during the last sessions of the experimental
assessment instrument.
3.4.2.2. Other data
Statistical information about examinations and appeals was collected from the press
and in response to requests to the Portuguese Ministry of Education. Other data on
assessment results was sought through a mock examination marking results (MTEP
question paper – 36 scripts) and the marking results of portfolios developed by
students during Trial 1 (13 portfolios) and Trial 2 (12 portfolios).
3.5. Data Analysis
The different types of analysis used were as follows: (1) Content analysis for
documents, observation notes and interviews (see Appendices IX, X); (2) Descriptive
statistics for questionnaires (see Appendices VIII and XVII);(3) Multi-faceted Rasch
analysis for marks of the current examination students’ MTEP mock examination
scripts and students’ portfolios in the new assessment (see Appendix XXIV).
3.5.1. Document analysis
Documents were analysed in order to understand current systems of art examinations
in Portugal and England in stage one and to understand teachers’ problems with the
implementation of the new examination in stage 2 (teachers’ reports). Documents are
108
not neutral. In the course of the research, the documents used for describing and
explaining different assessment systems were analysed taking into account the
ideologies underpinning them. First the contents were summarised to permit rapid
retrieval when needed, categorised by type (primary and secondary sources),
classified according to their apparent authenticity and credibility. Then enumeration
of the frequency by which textual items (words, phrases, concepts) appeared in the
text was made and finally interpretation and meaning was constructed (see example
Appendix VI).
3.5.2. Analysis of observation notes
The notes collected during the observation at Edexcel standardisation meetings in
stage one were essentially analysed crosschecking the information obtained with the
documents about examinations in England in order to understand how the assessment
procedures were applied, it was sought to see how many visual exemplars of
students’ works were used in standardisation meetings, the type and format of the
exemplars, how the assessment matrix and criteria were explained to the moderators
and how moderators were trained to ensure consistency.
In stage two the observations conducted in the five schools of the trial 2 were used
to crosscheck information with teachers’ reports, informal interviews with the
participant teachers and students interviews, it was sought to understand the school
resources and the style of teaching and learning strategies.
3.5.3. Analysis of interviews
109
A decision was taken to adopt an interpretative approach to analysing interview data.
Interpretation is the process whereby a researcher seeks to understand and construct
meaning from participants’ words and actions (Cohen & Manion, 1994, p.27).
The steps of data analysis were as follows:
(1) Segmenting information; organising; managing and retrieving information; and
using code-and-retrieve techniques.
(2) The meaning of categories was shared and discussed with participants in order to
test the credibility of the research (Lincoln & Guba, 1985; Robinson, 1993).
(3) Generating concepts. The information was crosschecked with established
theory.
The data was transcribed and condensed into analysable units. Labels or tags were
created in the form of codes for assigning meaning to the descriptive or inferential
information compiled during the study. The codes were useful as a way of providing
a means to achieve the generation of concepts (Gough & Scott, 2000, p. 339).
The codes were first defined in accord with the research questions and conceptual
framework (see Chapter 2, p.76), they were used to retrieve and organise words,
phrases, sentences or whole paragraphs connected or unconnected to a specific
setting. However, the codes were not fixed from the start, they developed during the
study, in the form of sub-codes or even entirely new ones such as ‘teacher aid’
or ‘overloaded’ which emerged from documents and from participants’ voices.
Codes served to compact and cluster segments of information, in the first stage of
analysis codes were mainly descriptive, for example: ‘assessment instrument’,
‘assessment procedures’, and ‘impact’. In the second stage they became more
interpretative, for example: ‘clear instructions’, ‘standardisation, ‘moderation’ and
110
‘effects’. Finally more explanatory codes were defined according to patterns
discerned during the data interpretation and establishment of relationships between
different segments of information, such as: ‘interpretation of criteria’, ‘consensus
between judges’, ‘ shared language’, ‘teacher aid’, ‘teacher training’, ‘causes of bias’
and ‘negative/positive effects’.
111
Teacher
Clear bias aid
Assessment instrument Instructions instructions
Format validity authentic + effects
modular Over
assessed
Criteria
Interpretation of criteria
Shared language
Assessment + effects
procedures Standardisation
Visual Training
Moderation exemplars
reliability consensus between judges’
curriculum Teaching practice

Impact effects - effects
practicality
overload
Figure 6: Diagram analysis of interviews in England stage 1
Diagrams and other graphical forms were devised to organise the coded segments of
information (see Appendix XXVIII), in order to establish causal and consequential
links. Networks are helpful to show complex interactions of variables arising from
statistical and content analysis in order to create a narrative or define a final matrix.
Bliss et al. (1983, p.8) described networks as follows:
Networks can usefully be regarded as an extension of the familiar business of putting
things into categories. To categorise is to attach a label to things, in effect to place
them in boxes. A network can be seen as a map of the set of boxes one has chosen to
use, which shows how they relate to one another.
112
Portfolio instructions Teacher aid
bias
time resources non prescriptive tasks
Improve validity
flexibility Independent study Need
tasks previous
preparation
Self-assessment Assessment
criteria for learning
Assessment responsibility
procedures ownership Students’ voice
In-service teacher
Improve training resources Visual dialogue
reliability exemplars
time
ownership
Shared language
Impact Effects students Learning styles Potential

negative and
Effects teachers Teaching practice positive effects
Effects schools Learning environment
Effects curriculum Reform
Practicality
Need
Less practicality expensive resources
Need time
Figure 7: Diagram analysis of interviews in stage 2
113
3.5.4. Descriptive statistics
The questionnaire data was analysed using the computer programme for managing
statistics SPSS. The purpose of the questionnaires was not focused on finding causal-
consequential relationships but rather to detect problems related to validity,
reliability and impact assessment. Overall the principal data analysis used descriptive
statistics such as frequencies, which established percentage of agreement or
disagreement with statements included in the questionnaire. The variables entered
corresponded to the recoded questions. Additional to frequencies and graph analysis
some correlation tables were made and T-tests were calculated to define the measure
of some potential relationships, for example between sex and tasks (Chapter 7,
p.267).
3.5.5. Multi-faceted analysis
In order to analyse the degrees of markers reliability in the current Portuguese art
examinations and in the new assessment instrument and procedures, a sample of
thirty- six MTEP examination students’ scripts was collected from a mock MTEP
examination conducted with one class at the pilot school on the 28th September
2002. During the implementation of the new assessment thirteen portfolios were
collected during the pilot and twelve portfolios were collected during the main trial
experience. The scripts and portfolios were selected in order to obtain a wide variety
of different performances, excluding scripts and portfolios marked zero and twenty
because null achievements and perfect achievements are not useful for multi-facet
measurement analysis (FACETS). The MTEP mock examination scripts were
114
marked by ten art examination assessors selected from the survey questionnaire;
teachers were selected in order to obtain a sample with different ages; gender;
geographical location, years of teaching and experience in art examination
assessment. Each teacher assessed nine scripts on two different occasions; all the
teachers marked six of them. The scripts were posted to the teachers who marked
them between January and April 2003.
The sample of portfolios were developed by the students during the pilot and trial
and marked by all the pilot and trial teachers participants during 2003.
Many logistical models have been developed for use in measuring tests’ estimates of
reliability. From the known methods of estimation, the Rasch model has provided
particularly useful information for assessment developers (Baker, 1997, p.24). Rasch,
who developed his models in the early 1950s, defined two specific types of
parameters (one for persons and one for items) thereby introducing a new concept of
measurement and individual centred statistical techniques. According to Baker
(1997, p.58) the measurement principle with which Rasch was primarily concerned
was that of specific objectivity in comparisons, i.e. comparisons of persons which do
not depend on the particular items used, and comparisons of items which do not
depend on the particular persons tested. Baker pointed out that the Rasch’s models
have certain common characteristics: ‘each has two parameters, one identified with
person ability and the other with item (or test) difficulty, and the ability and difficulty
parameters are in each case ‘separable’ in that they can be estimated independently,
in a manner which Lord and Novick describe as analogous to the estimation of
parameters in a two-way factorial analysis of variance with no interaction terms’
(Baker, 1997, p.62). Many further developments of the Rasch have been used but, in
115
the context of this study, it was important to determine the effects of raters in
performance-based assessment because the focus of the analysis was the markers’
reliability. This was possible using multi-faceted Rasch measurement because types
of interaction between the rater and the scale could be calibrated. An individual
markers’ reliability could be estimated based on the degree to which he or she, or
different markers, were consistent in their assessment of candidates’ performance.
According to Clapham & Corson (1997) user-friendly Rash analysis computer
programs have led many researchers to apply Rasch multi-faceted measurement to
analysis of rater performance. Since the researcher did not have a strong background
in statistics such computer programmes appeared to be a good solution to the
problem of how to analyse data in the form of examination results. According to
Weir (2003) multi faceted Rasch offers a ‘…sophisticated way of looking at degree
of overlap between raters but it also provides evidence on level of marking and is a
systematic method for calibrating scores to iron out differences occasioned by inter
marker differences’.
Multi-facet measurement is available on a computer program known as FACETS
(Linacre and Wright, 1992). In this research a free sample of the programme (unifac)
was downloaded from the Internet at (www.winsteps.com/minifac.htm). Through
FACETS multi-faceted measurement it was possible to isolate and make interactions
between variations of subjects’ abilities, variations of tasks’ difficulties and variation
associated with the raters. Therefore, three facets or factors of variation of the
assessment setting were investigated: candidates, items and raters. Each rating can be
understood as a function of the interaction of the three facets: ability of the
116
candidate; the difficulty of the item and characteristics of the marker. Data about
each facet was introduced in the programme and through the use of Rasch formulae
it was possible to bring them together into a single relationship expressed in terms of
the effect they were likely to have on a candidate’s chance of success. Consequently
it was possible also to estimate overall predictions or probabilities, and precise
degrees of inter and intra-marker reliability. As McNamara (1996, p.143) points out:
‘We can express the difficulty of items according to the likelihood of a candidate of
given ability getting a given score (or better) from a rater of given severity on that
item. Finally we can express the severity of raters in terms of the chances of the rater
awarding a given rating (or better) on a given item to a candidate of given ability.
The facets can thus all be brought together in a single frame of reference’.
3.6. Reliability of data
During the research it was acknowledged that the bias of an individual researcher
interferes with the reliability of the data (Coehen and Manion, 1994; Cresswell,
1998). The researcher professional background was particularly pertinent to this. The
researcher, a Portuguese art teacher for twenty years in secondary education, had
been involved as test developer of Theory of Art question papers in the Portuguese
national examinations between 1994 and 1997 and had been assessor of art
examinations in MTEP and History of Art for several years. Her disappointment with
the experience in the Portuguese national examinations and more broadly with the
current assessment system was the cause for starting the research and was also a
potential source of bias. Being aware of such bias influenced the conduct of the
researcher during the two stages, the researcher tended to adopt a distant role trying
to avoid passions and memories from past experiences, trying to understand the
117
current systems of art external assessment in England and Portugal in terms of their
historical contexts and others’ perceptions of it, trying to implement the new
assessment instrument as if it was solely generated by participants and literature
suggestions and, trying to evaluate the current systems and the new one impartially.
However such ideal conditions might not be always respected and this led from the
very beginning of the research design to the necessity of creating strategies for data
validation to minimise the bias
3.7. Triangulation
Richardson (1994) provided a metaphorical description of validity, which illustrates
the main aspects of validation in qualitative research. This was particularly helpful in
the context of this research:
The central image is the crystal, which combines symmetry substances,

transmutations, multidimensionalities, and angles of approach. Crystals
grow, change, alter, but are not amorphous. Crystals are prisms that
reflect externalities and refract within themselves, creating different
colours, patterns, arrays, casting off in different directions. What we see
depends on our angle of repose.... Crystallisation, without losing
structure, deconstructs the traditional idea of "validity" (we feel how
there is no single truth, we see how texts validate themselves); and
crystallisation provides us with a deepened, complex, thoroughly partial
understanding of the topic. Paradoxically, we know more and doubt what
we know (p.522).
According to Sullivan (1996) the criteria for assessing the viability of qualitative
findings is not so much a matter of whether outcomes are statistically significant but
whether they are meaningful. He pointed out that:
The emphasis on discovery requires one to maintain a special vigilant

pose in dealing with issues of validity and reliability. This involves sound
reasoning, systematic analysis and sustained focusing, along with the
process of subjecting emerging findings to continual empirical challenge
as new observations are brought to bear on existing insights (p.17).
118
Taking into account Sullivan’s (1996) views and the metaphorical description of
Richardson (1994), triangulation procedures to verify and validate the findings such
as for example making use of both quantitative and qualitative data were used.
Procedures for examining verification were implemented at different stages of the
research. The data collected through the different instruments was correlated and
causal-effect relationships were sought by making contrasts and comparisons across
the data. Theory in the literature was used for supplemental validation by referencing
the literature to provide validation of the accuracy of the findings. Verification of the
interview and observation data was conducted by interacting informally with the
interviewees afterwards, in order to ‘pick up’ additional information. These
processes confirmed but in some cases raised doubts about the data. Another strategy
was used for verification purposes during the pilot and trial when external observers
were invited to attend to the sessions and their reports reflected upon the experiences
they had witnessed (see Appendices XIII and XXIII). The external observers Graça
Martins and Leonardo Charréu were recognised experts in the field of art education.
3.8. Ethical considerations
The study investigated public documents, and documents which had been authorised
to be used in the research by their owners or were published in the public domain.
The study involved participants’ permission by using consent forms designed in
accordance to the University of Surrey Roehampton guidelines (Appendix XXVII).
Anonymity of participants was guaranteed by the use of fictitious names for persons
and schools. Participant’s permission was given to publish the written accounts of
interviews. The protocols developed for interviews and trials referred to the
following matters:
119
- Participants right to voluntarily withdraw from the study at any time.
- The central purpose of the study and the procedures to be used in data collection.
- A statement about protecting the confidentiality of the respondents.
- A statement about known risks associated with participation in the study.
- The benefits expected to accrue to participants in the study.
- A space was provided for participants to sign and date the form to confirm
that they understood the permission conditions (see Appendix XXVII).
An important ethical concern is the extent of the potential impact on the lives of the
subjects of the research. This may be particularly true in the case of studying
assessment, where the relationship between researcher and participants can make it
difficult for the former to remain neutral. Torrance’s (1989) writing about his
assessment research reveals a number of important points that were taken into
account in this study:
There have been many times when I have attended marking and
moderation meetings and been asked my opinion of pieces of work.
Given that one is often perceived as 'expert' in assessment, or even
'from the board', it is sometimes difficult, though not impossible, to
avoid being drawn into discussion. More acute still however, are the
occasions when one may actually have an impact on the performance
and subsequent grade awarded to candidates – vulnerable minors
placed in an already threatening and stressful situation (p.177).
During the trials of the assessment instrument or visits to schools, the researcher
adopted the role of a non-participant observer and attempted to influence proceedings
as little as possible. It was clearly explained to the participating teachers that the
researcher would not intervene in lessons or marking process. However on some
occasions there was interaction between researcher and participants. For example
when students and teachers formulated questions and posed them to the researcher
120
about their projects and about the projects that other students were carrying out in
other places. It is possible, nevertheless that the researcher did influence events also
by giving some indication of how other participants were carrying out work projects.
Summary
The research was carried out in two stages. The first, which investigated the current
situation in art examinations in Portugal and England, was carried out in 2001-
2002.The second, which focused on the creation, and evaluation of a new assessment
instrument for Portugal was carried in 2002-2003. The design of the research was
complex and combined both qualitative and qualitative methodologies.
A survey questionnaire was used in stage one to establish art teachers’ and students’
evaluations of art examinations and needs in Portugal. In the investigation and
evaluation of the strengths and weaknesses of equivalent art examinations in
England data was collected through documents, observation and interviews.
The researcher was a participant observer in the trials of the new assessment
instruments and procedures developed and tested out in Portugal in phase 2 and the
data collected for the purposes of joint evaluation of the model by the researcher and
participants was extremely diverse including documents such as teachers’ reports;
moderators' reports; external observers' reports; researcher's observations and
students' and teachers' questionnaires and interviews and portfolio students' marks.
Documents and interviews were analysed qualitatively (content analysis). The data
as a whole was coded descriptively first and then by more interpretative and
121
explanatory categories. Codes were developed from this base in order to describe,
relate, explain and evaluate conditions, strategies and consequences of the
assessment. Then they were shared with participants to arrive at credible findings
and to establish conclusions.
Descriptive statistics were applied in analysis of the survey questionnaire data using
SPSS. Degrees of marker reliability in both the current Portuguese examination
system and the new model developed and trial led during the research applied Rasch
multifaceted measurement using the computer programme FACETS.
122
Chapter 4
Portuguese art examinations
The first part of this chapter describes the current art and design examinations
in Portugal in terms of its history, underlying model, validity, reliability,
impact and practicality. The second part of the chapter presents the analysis
of the current assessment model and the analysis of users’ perceptions of what
should be an ideal art and design examination in order to present a list of
possible means to improve the current model.
4.1. Historical background overview
Throughout the period 1836-1974 the Portuguese art curriculum at a pre-university
level was exclusively centred on knowledge of technical drawing. Drawing was often
identified with geometry and perspective. Following the establishment of public
education the first art examinations for students prior to university entrance date
from 1836 and were based on technical drawing exercises. This tradition was very
strong in Portugal and shaped the form and content of the present day underlying
concept of art education that overemphasises the discipline of ‘Descriptive
Geometry’.
An exploratory interview was conducted with one of the pioneers of art programmes
in Portugal in the 1970s and 80s to understand the history of the art examinations:
Isabel, an art teacher aged 69 years, stated:
There were two periods of external examinations: at the end of the 5th
year (age 15) and at the end of the 7th year (age 17). I remember when I
123
passed my 5th art examination at the ‘liceu’; we had to draw a pot and we
had another question paper in geometry. The art exams for the 5th year
were usually ornamental drawing or still life and a geometry test. In the
7th year the examination was a single test on geometry (Isabel, Lisboa, 7-
05-2001).
Since the revolution of 1974 official policies have regulated the Portuguese
examinations at pre-university level. Between 1974 and 1980 the country was living
through a period of political change after the fall of the dictatorship. Society was
taking the first steps towards a democratic political regime and under such
circumstances education experienced radical changes. Before 1975 education
differentiated student achievement through general courses, which were intended to
train an elite, and technological courses where students had no access to university
entrance. This separation between courses and schools was eliminated after 1975.
Secondary schooling increased to twelve years in 1977, including the foundation
year: ‘ano propedêutico’ (Decreto Lei nº 491/77 de 23 de Novembro) and in 1978
the 10th and 11th years of the ‘complementary course’ were created (Despacho
normativo nº 140-A/78). In 1980, the first 12th year examinations replaced the ‘ano
propedêutico’ examinations (1980 Decreto-lei nº240 de 19 de Julho). The
‘complementary courses’ had two modes: (1) ‘via de ensino’ designed for students
pursuing studies in higher education and (2) ‘via profissionalizante’ designed as a
vocational pathway, but students in mode 2 could also gain access to university
courses. The qualifications obtained were based on continuous internal assessment of
each specialism during the course and external examinations in the core specialisms
taken in the 12th year (Despacho nº 67/81). In the early 1980s, national external
examinations were abolished (Despacho nº 23/ME/83) in state schools. Universities
selected their students using schools’ internal assessments and their own
124
examinations entrance. For example Fine Arts schools offering painting; sculpture;
and architecture courses had required students to pass a day long test of figure or still
life drawing in order to be admitted in the course. However the secondary school art
curriculum did not prepare students for such examinations.
After the ‘liceu’ we had to pass an entrance exam in Fine Arts; and
for that we took extra courses in representational drawing (figure
and still life drawing) with a painter or a sculptor. I remember my
entrance examination in Fine Arts. We had two days of intensive
charcoal drawing from casts (Isabel, Lisboa, 7-05-2001).
The art curriculum in secondary education progressively evolved from a vision of art
education as technical expertise in geometrical drawing to a broader vision with
separate specialisms. Drawing and History of Art were introduced with syllabuses
shaped by formalist rationales. However, Descriptive Geometry was still considered
the most important specialism in the curriculum:
The History of Art only appeared after the 1970s in the secondary art
curriculum; there was reluctance to accept it for entrance because they
valued more the descriptive geometry (Isabel, Lisboa, 7-05-2001).
Drawing as a specialism was introduced in the curriculum for the last years of
secondary education (former 6th and 7th) in 1977. After the analysis of drawing
syllabuses it was found that the content focused on the formal elements of visual
language, media and technical skills. The approach was essentially formalist and the
key concepts for teaching came from Weimar Bauhaus teachers such as Johannes
Itten, Wassily Kandinsky, and Paul Klee. Art appreciation included consideration of
the formal elements of art and design such as line, tone, form, colour, and texture and
the exercises were driven by these visual elements and the ‘rules’ of composition.
125
In 1986 national examinations in the three core specialisms of each subject were
introduced. In the arts, the examinations included: History of Art; Descriptive
Geometry and Drawing (Despacho nº 10/EBS/86 and Despacho 43/SERE/88-
/SERE/90). The format of these examinations was a test (120 minutes in the case of
Geometry and Drawing and 90 minutes for the History of Art paper) conducted
under controlled conditions.. Entrance to university depended on three factors: (1)
the secondary education internal assessment results, (2) external examinations in the
core specialisms, and (3) results of a test called the ‘General Access Test’: ‘Prova
Geral de Acesso’ (PGA) covering general cultural knowledge. The general access
test and core specialism examinations were externally set and assessed by teachers
appointed by the Departamento do Ensino Secundário. The question papers related
directly to the syllabus content because, as Isabel confirmed 'the question papers
were designed by the same people who wrote the syllabuses’. These examinations
have not changed since 1981 and still preserve their original format.
From an analysis of 18 question papers from 1981 to 2002 it was evident that the
general structure of the Drawing question paper was preserved unaltered throughout
this time period. The question paper required a short period test (120 minutes) and
knowledge; understanding and skills were understood to reside exclusively in
understanding the so-called formal qualities of art and design. The visual tasks were
comprised of abstract exercises exploring formal elements of visual language
such as for example: line, form, texture, pattern, balance, composition and rhythm.
The recommended media was pencil and the question papers required recognition of
formal elements in reproductions of Western art works (painting or sculpture),
usually from the twentieth century. The weighting was usually 70 percent for visual
126
tasks and 30 percent for written tasks. The criteria for assessment included a list of
model answers for the written questions and ambiguous criteria for the visual tasks
related to abilities for exploring visual elements ‘expressively’. The assessors marked
them using a holistic process for judging technical and expressive drawing skills.
These examinations continued in the early 1990s until the introduction of the reform
of secondary education after the implementation of a general law for the education
system (Lei nº 286/89 de 29 de Agosto), the regulations for assessment in secondary
education were approved in 1993 (Despacho nº 338/93) and the first examinations
under these new regulations started in 1995/96 for students in the 12th year of
schooling.
4.2. Assessment in secondary education after 1996
4.2.1. General regulations for assessment
The assessment documents, which have regulated secondary education since 1993
(Despacho Normativo nº 338/93), expressed the general democratic intentions of the
Education System Law (Lei de bases do sistema educativo nº 46/86). The first
aim of this assessment model was ‘to promote students’ educational success’.
However, the documents presented a dichotomy between internal and external
assessment. This dichotomy served two different purposes: an internal assessment
system (without any form of moderation or control) seemed to subscribe to the
principles of ‘assessment for learning’, or formative assessment, whereas the external
assessment using more ritualised forms of control seemed to serve the functions of
accountability and selection. Internal assessment offered a measure of freedom for
teachers and schools. External assessment or national examinations were based;
127
it is claimed, on parity of treatment as a guiding democratic principle. And the so-
called parity of treatment required uniformity in the art and design examinations.
In the interview Isabel stated:
It was important that all students should have the same object [to
draw] to put all the students in the same situation. You know, if
students or schools arrange their own models for drawing some
would be more difficult than others and it wouldn’t be fair (Isabel,
Lisboa, 7-05-2001.
But in art and design such prescriptive methods may conflict with the diverse and
creative outcomes that the curriculum objectives for art education claimed to foster
(Eça, 1999).
Internal summative assessment was described as ‘the global judgement about the
level of students’ knowledge, competencies, capacities and attitudes’ in the official
documents (ME, Desp. Norm. nº 338/93). Assessment outcomes were quantitatively
expressed on a scale from 0 to 20 points as a means of informing students and
parents about student achievement. Internal assessment was more or less continuous,
occurring three times per year. The results were expressed as marks for each
specialism and were awarded by students’ own teachers during teachers’ assessment
meetings that took place at Christmas, Easter and at the end of the course year.
The last meeting of the year established the students’ final marks. External
summative assessment included external set examinations.
The examinations consisted of written tests devised by the Gabinete de Avaliação
Educacional (GAVE) and the National Jury of Examinations (Juri Nacional de
Exames) implemented the assessment procedures. The final results of secondary
education for the school-leaving diploma were calculated through a combination of
128
the final internal assessment results for each discipline and the examination results in
core specialisms that were weighted at 30 percent of the final mark. Students should
achieve a final score of ten in each specialism to be awarded certificates of secondary
education. Such weightings gave a significant value to internal assessment in the
students’ final awarding and consequently a significant value to teacher assessment.
4.2.2. National examinations
The principles of education followed since 1986 had been much in favour of
continuous and formative forms of assessment. However the need for control and
accountability of the education system for student selection (Machado, 1994, p.53)
brought in the re-introduction of national examinations which contradicted the
formative model of assessment favoured in the General Law of Education (Lei de
bases do sistema educativo nº 46/86).
4.2.2.1. Instruments
Members of the Gabinete de Avaliação Educacional (GAVE), a department of the
Ministry of Education developed the examination papers. The responsibilities of
GAVE included planning, coordinating, designing and controlling external
assessment instruments (Decreto-lei nº 229/97 de 30 de Agosto). The team at GAVE
comprised: (1) a Director for all the subject areas, (2) a co-coordinator for each
subject area and for each test, (3) two authors (usually secondary teachers) for each
specialism, (4) two or more specialist revisers (auditors), (5) two language revisers,
(6) two text revisers, (7) and one specialist adviser (consultor).
129
The question papers were written in accordance with a statutory assessment
framework, general aims and objectives for the Portuguese National Curriculum and
national syllabuses adopted for each specialism. The examination paper had a list of
questions with a mark scheme totaling 200 points corresponding to the mark scale
(0-20). The design of question papers included some procedures of validation, for
example: verifying the graphic aspect; checking clarity and the form of language;
checking if questions were related to students’ age and related to the syllabus
content. Language and specialist advisors carried out this kind of verification.
The role of the specialist advisers was to provide the expert validation considered
essential for art and design examinations, especially because they were not piloted.
We could not pilot our question papers; we based our evaluation on

the last year examination results. That’s why it is so important to
have the advisers (‘auditores’); they are all art teachers in
secondary schools and they know how it works (Isabel, Lisboa, 7-
05-2001).
At the same time that the examination papers were finalized, schools received the
instructions about the examination. The Departamento do Ensino Secundário
published a list of specific contents to be tested in the examination question papers
and teachers could focus their lessons on these contents. In January/February they
received a form called ‘matrix for the examination question paper’ from the GAVE,
with the content headings, an indication of the number and type of questions and a
list of the materials to be allowed for the examinations. Schools also received a
model (i.e. exemplar) test, usually during March/April. Teachers could prepare
students for the examination by using the model test either as a mock examination,
or by including items from the model test in their own tests during the year.
130
4.2.2.2. Procedures
The National Jury of Examinations (JNE) was responsible for the process of
delivering question papers and assessment. The JNE was composed of a President
and Vice-President and technical assessors from the secondary department of the
Ministry of Education, coordinators of regions and coordinators of the school centres
(‘agrupamentos’). The JNE responsibilities included the special arrangements and
special considerations for candidates with special educational needs (SEN); the
management of assessment procedures and appeals against examination results.
There were 30 school centres in Portugal, which were comprised of a cluster of
schools in each region. They selected assessors; received students’ examination
scripts sent by schools, contacted the assessors and arranged the regional meetings
where scripts were distributed. Assessed scripts were returned to the centres, and
the results were sent to the central section of the JNE at Lisbon. This process was
designed to preserve the anonymity of all the candidates.
All examinations were taken on exactly the same time and on the same day all over
the country. There were three examinations periods. Until 2004 candidates could opt
for the ‘first call’ (1ª chamada at the end June, early July); or ‘second call’ (2ª
chamada in mid-end July); another call took place in September (2ª fase) for
candidates who did not pass or were not able to attend the previous examination
periods. In 2004 the ‘2ª fase’ was abolished. This means that for each subject at
least three question papers of a similar level of difficulty were designed.
Examinations were administered under highly controlled conditions to ensure
uniformity and the security of the process. In the instructions for the conduct of
131
examination (Departamento do Ensino Secundário, 1996), detailed information was
given to schools about timings; how to arrange the rooms; how to invigilate the
examinations; the number and roles of personnel involved; the type of materials that
students were allowed to use; the instructions for checking students’ identity cards;
instructions for filling the examination response papers; the exact time to open the
envelopes containing the question papers (which were transported by police
officers); how to attach the pages of scripts; the exact time to collect scripts;
counting of scripts; how to deal with SEN students requirements.
During the examinations, government inspectors might visit schools. In each room,
there were two invigilators: teachers who were non-specialists in the examination
subject. In a separate room, a specialist teacher should be available to resolve
practical problems that might arise. Students were not permitted to leave the room
during an examination.
Teachers usually had to mark the examination scripts of students from other schools
in the cities and towns within the geographical region. The process was co-
coordinated at local level by school centres in each area. Schools had to propose a
list of teachers/markers related to the number of students registered for examinations
to the school centres. For example, a school with 100 students registered for an
MTEP examination was expected to nominate two markers (correctores).
Teachers could not refuse to undertake this task.
Meetings with all the teachers/markers attached to school centres were convened to
receive students’ scripts and discuss the criteria for assessment. These meetings
132
were supposed to establish dialogue between markers about the application of
criteria. One assessor marked students’ scripts using criterion-referenced procedures.
The assessors returned the scripts with completed marking forms to the school
centres. The JNE collected the marks and sent them back to schools by end of July.
Candidates could ask for a review of their examination marks (appeal). Candidates
were required to explain the reasons for the request in a written report. (Despacho
normativo nº 13/2002). Another teacher dealt with the appeal from the school centre
(relator) who re-marked the question paper. The second mark prevailed. If the
candidate was not satisfied with the result, he/she had one more opportunity to
appeal directly to the Ministry of Education.
According to the Ministry of Education database (Departamento do Ensino
Secundário), 12,755 examination scripts in all subjects were revised in 1998
following appeals: 11,776 marks were changed to a higher grade and 979 to a lower
grade. The Portuguese Council of National Examinations for Secondary Education
(CNEES, 1998) acknowledged that there was a need to pay more attention to the
variations in the interpretation of criteria in assessing examinations.
In 2000, 72 percent of the candidates asked for appeals. In 2001 the Jornal de
Notícias (25/April) published an article about appeals during the 2000 examinations
based on statistical data which revealed that candidates had a strong chance of
increasing their marks on the second appeal. For example, in one secondary school,
61 out of 72 marks were increased. Particular cases of candidates asking for a second
appeal were highlighted; for example one candidate, who obtained 13 points in the
133
History of Art paper, appealed twice, and maintained the score in the first appeal, but
after a second appeal her mark was changed to 17.8 points.
Year 12 examinations were covered extensively in the press. Newspapers published
the question papers for subjects with large number of candidates in the core
specialisms for university entrance. Before 2001, examinations results were not
released publicly but political pressures during 2001 forced the government to
publish the examination results of all 625 secondary schools. The league tables
generated huge public debate about the ranking of schools. The left wing deputies in
parliament argued that the ranking of schools was undemocratic, increasing
discrimination between schools: the right wing deputies claimed that not publishing
examination results was a plot to hide mediocrity and leniency (Público, 26 April
2001). The results raised a controversy about accountability and the quality of
education. The Diário de Notícias (27 August 2001) and Público (26 August 2001),
were alerted to the great disparity between internal assessment and examinations
results, and argued that:
Although such results express two different kinds of

assessment: one continuous and formative [internal
results] and the other punctual and eliminatory
[examinations], the continuous interval pattern
between the two assessments results doesn’t seem
normal…. (Diário de Notícias, 27 August 2001,
Educação III).
The Diário de Notícias, article was illustrated by specific and often extreme cases.
For example one student’s internal mark in Descriptive Geometry was 18 points and
the examination mark was 6.2 points. But there were other examples where students
obtained much better marks during examinations than during internal assessment.
134
The newspaper’s interpretation of such discrepancies did not question causes related
to the system of examinations but rather focused on possible grade inflation by
teachers and schools that over-marked or under-marked students.
According to statistics published by the government (Departamento do Ensino
Secundário, 2001) in 2001 there was variation between internal assessment and
examination results in art specialisms. These were especially: (a) in Descriptive
Geometry: 3 points (general courses), 4.3 (technological courses); (b) in History of
Art: 3 points; (c) in MTEP: 2.5; (d) in Theory of Design: 2 (general courses), 0.3
(technological courses); and (e) in Theory of Art and Design: 1 point. Such
differences may not appear particularly significant at a first glance, however these
results were not a true illustration of the examination results since the Departamento
do Ensino Secundário did not published other statistical data such as minimum,
maximum or standard deviations.
4.3. The model of Portuguese secondary art curriculum
4.3.1. Origins of current art curriculum
Art examinations between 1993-1996 were internally set and assessed. Students’
final results for qualifications and university entrance were expressed through
internal marks (Despacho 16/SEEBS/93). The current art examinations appeared
after the curricular revision for secondary education that was implemented in 1996.
Isabel was actively engaged in this curriculum revision. She said:
I was involved in the last curricular revision as a syllabus

developer; it was a very interesting programme for the arts.
They [the Ministry] introduced Theory of Design in the
secondary art curriculum; they thought it was important
because design was present in all sort of art courses, for
135
example; textiles; ceramics; product design [courses at
specialist art schools such as Antonio Arroio at Lisboa and
Soares dos Reis at Porto]. Art teachers from Antonio Arroio
and Soares dos Reis schools were also invited to help to
design the art curriculum for secondary education.
Other content considered to be part of all specialisms

was materials and techniques. So they decided to make
a subject called Materials and Techniques. We said
that there was nothing in the curriculum specialisms
related to art production, and they said that Materials
and Techniques should be directed towards art
production; so we decided to replace the Drawing
specialism with Materials and Techniques. We also
decided to call it Materials and Techniques of Plastic
Expression rather then just Materials and Techniques.
We had to create theoretical subjects. It was their idea.

They said we had to design theoretical specialisms.
When we did the syllabuses we didn’t think about
forms of external assessment; there were no external
examinations in the beginning of the ‘nineties, just
formative and summative internal assessment (Isabel,
Lisboa, 7-05-2001).
From an analysis of the syllabuses of the various art specialisms (Departamento do
Ensino Secundário, 1995) a conclusion was drawn that the structure of the art and
design curriculum was fragmented by rigid boundaries between specialisms and by
two main approaches: theoretical as in Descriptive Geometry; History of Art and
Theory of Design, and theoretical- practical or studio production such as MTEP;
Studio Art/ Design and Technologies. The main learning domains in each specialism
were technical and perceptual, although contextual and critical studies were also
expected to be a component of Theory of Design, History of Art and Studio Art.
The MTEP and Studio Art syllabuses presented a broad range of aims related to
knowledge, understanding and skills of art and design production including ‘artistic
and aesthetic values’, ‘problem-solving skills’, ‘creativity and critical skills’,
136
‘critical awareness’, and ‘research’. However making such a clear demarcation
between theory and practice could reduce the opportunities for students to develop
contextual and critical studies of art and design in studio art production.
The same demarcation existed in the recommended assessment instruments for art
specialisms. Tests were recommended for internal assessment in Descriptive
Geometry, History of Art, Theory of Design, and for MTEP. In Studio Art/ Design
and Technologies the recommended instruments for assessment were very flexible
and mainly based on coursework components. External assessment or national
examinations in the arts only existed, in Portugal, for specialisms widely viewed as
more objective such as Descriptive Geometry; History of Art and Theory of Design;
Theory of Art and Design and MTEP. The national examinations in such specialisms
reinforced their status in the curriculum and at the same time emphasised technical
drawing per se and the history of art and theory of design as encyclopaedic
knowledge separate from art production.
MTEP included a broad range of assessment criteria such as: ‘quality; quantity;
organisation; ability to faithfully copy reality; ability to communicate ideas;
expressiveness; creativity; problem solving’. The assessment criteria specified for
Studio Art were as follows:
Capacities for observation; involvement in research;

investigation and collection of data; curiosity about artistic
phenomena and personal investigation; capacities for
image’ analysis; capacities for organisation of data;
technical abilities in expressive and graphic media;
creativity; imagination; experimentation; formulation of
relevant questions; involvement and integration in
individual and team work; persistence; representational
137
skills; critical skills, inventive and intervention abilities; use
of knowledge’ (GETAP, 1992, p.13).
These criteria were recommended for internal assessment, which was considered to
be both formative and summative. Each teacher marked his/her students’ work
according to his/her own interpretation of them. Besides the list of criteria and
objectives there were no other written documents to help students and teachers.
In practice, internal assessment depended on teachers' various, often idiosyncratic,
interpretations of criteria without any sort of procedures to ensure consistency of
standards or inter-marker reliability.
4.3. 2. Assessment instruments in Portuguese art examinations
Compared with disciplines like Portuguese language and mathematics, art disciplines
had a small number of candidates. The total art examination entry in 2001 was as
follows: MTEP (general courses) 3,627 candidates; Descriptive Geometry (general
and technological courses) – 7,168 candidates; History of Art (general courses and
technological courses) – 6,008 candidates; Theory of art and Design (technological
courses) and Theory of Design (general courses and technological courses) – 8,435
candidates. (Source: Departamento do Ensino Secundário, 2001).
History of Art and Descriptive Geometry largest number of candidates for
examination was evidence of the comparative status of the specialisms.
However this apparent ranking could be because they were considered to be more
‘objectively’ assessable, especially Descriptive Geometry. As Isabel pointed out:
…and besides descriptive geometry was very easy to assess (Isabel,

Lisboa, 7-05-2001).
138
The assessment instruments used for national examinations in the arts were uniform
written tests covering the syllabus contents for year 12 in each specialism.
The Descriptive Geometry, Theory of Design, Theory of Art and Design, and History
of Art written tests lasted 120 minutes and Materials and Techniques of Plastic
Expression examination was considered to be a theoretical and practical test with a
duration of 210 minutes (Despacho normativo nº 13/2002). The Descriptive
Geometry examination paper included a set of exercises about conventional
representational systems. The History of Art, Theory of Art and Design and
Theory of Design question papers were a set of open-ended questions requiring
recall, contextual and perceptual analysis of art and design objects and art and
design movements.
The only specialism with question papers offering a more comprehensive range of art
and design production opportunities was Materials and Techniques of Plastic
Expression (MTEP). An analysis of 20 MTEP question papers, from 1996 to 2002,
revealed that the structure of the test did not change over the years. The test included
opportunities for written and visual responses. The required tasks were an
observational drawing exercise (first part) and in the second part a simulation of a
design problem-solving exercise through annotated sketches, a more complete sketch
as final solution and a written explanation of the proposed intentions and materials
(see example in Appendix V).
The criteria and weightings for assessment in the examination did not vary greatly
over time. In the 2002 question paper these were as follows:
139
• Part I: Representational drawing (60 points): Ability to represent
the object faithfully (form and proportions); Personal interpretation
(expressiveness and global effect).
• Part II (1) Preparatory studies and sketches (60 points): Capacity to
present alternative different solutions (creativity). Capacity to communicate
ideas in visual form. Adequateness of the techniques and materials used in
the sketches. Expressiveness.
• Part II (2) Final solution (60 points): Adequate proposal for the
solution of the problem; Composition and organisation, Space-form; Plastic
qualities.
• Part II (3) Written Report (20 points): Description of the process;
Justification of the options; knowledge of the properties of materials and
techniques.
Isabel described some of the constraints faced by an examiner developer. The tight
structure of a curriculum based on separate specialisms and examination regulations
in Portugal left very little opportunity for creating authentic assessment situations.
The only question paper that included art production, MTEP, was reduced to basic
technical skills of representational drawing and some problem solving skills.
According to the coordinator of the developers of the MTEP examination question
paper (Isabel), the test was created as a remedial strategy to compensate for the lack
of drawing skills elsewhere and the type and format was constrained by the general
regulations for examinations in the art curriculum. Isabel claimed that under such
constraints it was impossible to design examinations, which exhibit content and
theory-based aspects of validity
But, suddenly they said that all the subjects in the specific
areas should have external examinations; it didn’t fit our
syllabuses; and worse, they said that examinations should be
just theoretical pencil and paper tests. We tried to convince
them to give more time for practical question papers; at
least in MTEP, we proposed 15 days for the examination
time, but they didn’t accept. So we had to create these
question papers for MTEP. We decided to focus on drawing
skills since drawing was absent from the art curriculum
(Isabel, Lisboa, 7-05-2001).
140
In the end they gave 3 hours for the MTEP examination. So,
we invented these question papers for MTEP. We wanted
students to make drawings and we had to think about a
model for drawing which could be easily sent by post for
schools and be the same for everyone. And we did manage
to make the cardboard model (Isabel, Lisboa, 7-05-2001).
Figure 8: MTEP question paper cardboard model

(question paper September 2002)
But we were bound by the syllabus, which is oriented

towards problem solving; then we included the second part
of the question paper as a problem solving exercise; of
course it is not correct; but we were bound by the time of
the examination (Isabel, Lisboa, 7-05-2001).
The Portuguese art teachers association ‘Associação de Professores de Expressão e
Comunicação Visual’ (APECV) also endorsed the view that MTEP art examinations
were not a valid reflection of the course content in terms of content, face, construct
and theory-based validity:
In this discipline [MTEP] a pencil and paper test like the current one is
not valid to completely assess the knowledge, understanding and skills
of candidates in the arts (Parecer MTEP1.1.2002, APECV).
4.3.3. Assessment procedures in Portuguese art examinations
141
Another thing we asked for was a second assessor; but they refused
it. A second assessor would certainly improve the reliability of the
results; you know art examination results are not very reliable
(Isabel, Lisboa, 7-05-2001).
Analysis of the Portuguese documents suggested that assessment relied on teachers’
judgements both at internal and external assessment. There were no regulations or
procedures to verify the consistency of marking – controlling and monitoring
assessment procedures seemed to have little importance in the documentation.
The fact that marking was a solitary task revealed a complete but misplaced trust in
teachers’ understanding of standards and criteria as evidenced by the very high
percentage of appeals. At the same time standards were tacit, since there was no
explicit public information about them. The written criteria for arts provided by
syllabuses, instructions for the examinations and question papers were vague and
appeared to be a no more than a repetition of the objectives in each specialism.
There was no formal system for training assessors or formal meetings to debate or
develop common understanding of standards and the application of criteria.
The only process to ‘standardise’ assessors, established by JNE through the
examinations instructions, was the markers’ meeting convened at each school centre
for reception of students’ completed scripts. Copies of the minutes of these meetings
in the 2001 examinations were requested from five school centres after receiving
formal authorisation from the JNE. The school centres were: Porto Cidade; Coimbra
Centro; Lisboa Central; Faro and Viseu. The minutes’ schedule was defined by the
regulation: nº 48 /Norma 02/2000: (1) solving the question paper and discussion of
the various processes of solving the problems; (2) debate about the application of the
criteria provided by the GAVE; (3) report of the information required (phone or fax
142
request of information to GAVE or JNE) in case of doubts about the question paper
and application of criteria.
Each school centre provided six sets of minutes corresponding to the different
meetings for MTEP and Drawing (1st and 2nd call and 2nd phase) – 30 in total.
Two minutes reported that there was no meeting because the markers did not all
attend at the same time. In 80% of the minutes the text was: ‘nothing to report’. The
text of the other minutes was translated as follows:
We are very pleased because this question paper is in

accordance to the model test and matrix sent to
schools in the beginning of the year; what didn’t
happen last year’ [MTEP question paper].
There were omissions in the text of the question paper;

the missing words are in …’ [MTEP question paper].
Information requested from GAVE: ‘how to mark the

candidates who used colour pencils; ink and other
materials in Group III?’ [the question required
exclusively the use of graphite]. Answer from GAVE:
‘the student must be penalised under the criterion
graphic quality of the composition’ [Drawing question
paper].
The assessors consider this test too easy for the 12th year. Some students
use this exam as a core discipline for university entrance [Drawing
question paper].
From the minutes, two serious concerns about the MTEP assessment instrument were
evident: the fact that in the previous year the MTEP examinations instructions were
not in accord with the question paper and the 2001 MTEP question paper was not
clearly legible because of graphic design errors. The minutes concerning the
143
Drawing question paper were also interesting, especially because the question paper
was considered ‘too easy’ for the level by one centre.
Consideration of these minutes raised two different hypotheses: (1) that teachers
discussed the application of criteria but did not record the results of the discussion in
the minutes, or (2) teachers did not discuss the criteria at all. According to the
minutes, it seemed that teachers did not feel the need to ask for any explanations
about marking from the JNE and GAVE.
Isabel considered that the reliability of assessment results was a problem created by
assessment procedures based on single assessors. Another explanation for
unreliability could be because assessment instruments were designed by one
organisation (GAVE) and a separate body, the ‘National Jury of Examinations’,
(JNE) managed assessment procedures: resulting in a lack of coherence between the
examination instrument and examination procedures.
In order have precise data to check inter and intra-raters reliability of the current
examination ten MTEP examination assessors were asked by the researcher to mark
thirty-six MTEP examination scripts mock examination taken in September 2002 by
12th year art students. Art examination assessors selected from those who had
returned the survey questionnaire marked the scripts between January and April
2003. A sample of teachers was selected of different ages, from different
geographical locations, sex, and years of teaching and experience of art examination
assessment. Each teacher assessed nine scripts on two different occasions; all the
teachers marked six scripts in common. The difference between teachers’ marks was
144
from five to seven points on a scale of 20 (inter-rater reliability). The marking
difference with the same teacher (intra-rater reliability) varied from zero to six points
on a scale of 20.
Current Exam
Differences between 5-7 points

markers (inter-raters)
Over 36 scripts
Differences between the 0-6 points
same marker (intra-rater)
Over 9 scripts
Table 2: Current examination differences of marks (MTEP)
Another important aspect of reliability is the consistency of markers’ interpretations
of criteria. The minutes of some MTEP examination appeals procedures were helpful
in understanding how teachers applied the criteria. Copies of 40 reports from appeals
(2001 examinations) were requested from five school centres (20% of the total
number of centres) after having received formal authorisation from the JNE. The
following statement was quoted from one report:
… in the sketches there is an evident failure in the

capacity to communicate visually the ideas; there is no
correct representation of space and no correct framing
of the presented solutions in the space for the
scenario; particularly in the first and second sketches.
There is no representation of the perspectives and
principal plans (face and floor);
the chosen materials for the solutions are not the most
adequate for a scenario for example the stone. The
used media are not well controlled. In the final
solution, there is no representation or perspective of
the scenario plans; space representation is incorrect …
The final drawing is not elaborated in terms of use of
media (colour pencil); there is no care in making a
finished and expressive drawing. The overall
impression is that the work reveals problems of
organisation and composition and an unsatisfactory
plastic treatment (061 minute report).
145
Clearly, the marker was concerned about the student’s technical expertise for
representing space, and his or her use of techniques and materials.
Her marking approach was negative, pointing out what the student was
incapable of doing rather than what he/she could do. This is another example:
In group I there were some errors in the faithfulness of the form and
proportions and problems in expressiveness and global effect. In group
II, a) there was errors in creativity, capacity for visual communication
of ideas; problems in articulating techniques and materials; in the
sketches errors of expression. In group II, b) errors in articulating the
final solution with the problem, weak organisation, composition and
plastic treatment (551 report).
In group I, 3 points were awarded because of errors in the correct

representation of form (faithfulness to the model). In group II question
a) 2.8 points were awarded because of errors in creativity; difficulties
in communicating the ideas visually; problems in using materials and
techniques (462 report).
The marker seemed to use the terms derived from the criteria in the reports.
He marked by punishing the candidate; in other words subtracting points in relation
to the observed technical difficulties. Some explanations like ‘errors in creativity or
‘errors of expression’ are hard to decipher. In the next example, the same kind of
unclear discourse occurs based on terms like ‘failures’:
In Group I, 2.5 points were awarded because although the form

represented respected the proportions of the object there were evident
failures in the use of value, there is no representation of the projected
shadow of the object over the plan of the table and no accurate
definitions of the edges of the object. There were too many dark zones
in the representation of plans and in general the drawing is too dark.
In group II question (a). 2.5 points are awarded to the sketches because
there is no diversity of proposals that indicates a failure in creativity.
Besides the proposed materials are not the best indicated for the
solution, being too heavy, e.g. concrete and iron… (report 060).
All the judgements in the reports were based on academic criteria such as the use of
formal visual elements, techniques and materials. The main justification for awarding
146
marks was based on concepts of ‘composition’, ‘formal organisation’, and ‘graphic
adequacy’. Personal interpretation or style was not mentioned. In problem-solving
exercises the main justification was ‘the candidate is unable to communicate ideas
visually’; the reports did not mention if the candidate took creative risks in the
sketches of their proposals. Creative and expressiveness qualities were not described;
creative work was vaguely referred to, usually in relation to the number of sketches
and the appropriateness of chosen materials for the final product. The judgements
depended on overall impressions. But this was not surprising given the inadequate
assessment criteria. Nevertheless, these transcriptions of the minutes should be seen
as an imperfect indicator of the process of marking because it might be difficult for
teachers to translate their judgements about visual products into written language.
4.4. Users’ perceptions of Portuguese art examination practices
4.4.1. Respondents
As described in Chapter 3 (3.4.2.1., p. 90) a survey questionnaire was designed to
obtain teachers/assessors and students general perceptions about external art
assessment in Portugal (see Appendix VII). Forty-four art assessors and one hundred
and four students replied to the questionnaires. The final sample of teachers/assessors
and students included a wide variety of geographical locations, ages, teachers’
experience of assessment and students’ art courses. Respondents’ ages, sex,
geographical location, and other characteristics were distributed as follows:
147
Age sex Secondary school place University courses
Rank Number Sex Number Region Number Course Number

age
18-21 72 M 36 North 40 Architect. 5
22-25 24 F 68 Centre 31 Fine arts 23

26-29 8 South 29 Design 26
Islands 4 Education 50
Table 3: Questionnaire survey sample of students
Age Sex School place Nº of years of Nº of years as

teaching assessors
Rank Number Sex Number Region Number Years Number Years Number
age
F 26 North 14 6-10 4 1-3 13
27-35 5 M 18 Centre 14 11-15 17 4-6 26
36-40 14 South 12 16-20 12 7-10 4
41-45 17 Islands 4 21-25 9 11-15 1

46-55 8 26-31 2
Table 4: Questionnaire survey sample of teachers/assessors
4.4.2. Description and analysis of questionnaire results
4.4.2.1. Syllabuses (Section 2)
The respondents had different views about the value of the various art specialisms in
the curriculum (question 1.8). Sixty-eight respondents, 45.9 percent, considered that
all the art specialisms were important. For the others, Studio Art was considered the
most important specialism by 45 respondents, followed by History of Art (25
respondents); MTEP (12 respondents); Descriptive Geometry and Theory of Design
(11 respondents).
The existing aims of art specialisms were considered appropriate for art and design
courses by 68.6 percent of respondents (question 2.1), while 65.4 percent of
148
respondents considered that the objectives and contents described in syllabuses were
adequate for art and design courses (question 2.2). A further 51.1 percent considered
that art and design syllabus at school provided a good basis for further studies
(question 2.3).
There was no significant variation of responses between students (SR) and teachers
(TR) in the first two questions 2.1 (aims), 2.2 (objectives and contents). In question
2.3 which asked for a response to the statement: ‘The art and design syllabus at
school provides a good basis for students' further studies’, teachers and students
responded differently.
2.3 1-Strongly 2-Agree 3-Disagree 4-Strongly Median

agree disagree
Teachers: TR 2.5% 62.5% 25% 10% 2.00
Students: SR 5.1% 40.4% 46.5% 8.1% 3.00
Table 5: Responses to question 2.3
The tendency of students to disagree was illustrated by the following student’s
comment at the end of the questionnaire:
History of Art and Descriptive Geometry specialisms are generally well

taught and the examination question papers are adequate for the courses.
But they are not really relevant in university. Secondary courses do not
prepare students for the core disciplines of the university at least in
plastic art courses (SR 16, section 10)
149
4.4.2.2. Opinions about the validity of the examination (Section 3 and 4)
Respondents’ opinions about the validity of art examinations suggested considerable
dissatisfaction with the current model. Although 58 percent of respondents
considered that instructions (rubric) for examinations were clearly stated (question
4.1), only 48.8 percent thought the questions were clear (question 4.2). One students’
comment in section 10 of the questionnaire was: ‘The language of question papers is
not very clear’ (SR 80). But 4.3 percent of the respondents strongly agreed that they
were clear and 7.1 percent strongly disagreed.
The time allowed for answering the question paper was considered adequate by 57.2
percent of respondents (question 4.3). And 50.4 percent of respondents considered
that the questions in the examination paper were similar in their form and content to
tasks set in courses (question 4.4), however 9.4 percent strongly disagreed with this.
Of the respondents, 56.1 percent disagreed and 29.5 percent strongly disagreed with
the statement in question 4.5: ‘students are given a good opportunity in the
examination to show well what they know, understand and can do in and about art’.
The tasks set out in the question papers were not considered adequate: 58.7 percent
of the respondents disagreed and 27.5 percent strongly disagreed with the statement
in question 4.6: ‘The examination allows the most able students to demonstrate their
true ability’. Furthermore 52.2 percent of respondents considered that the
examination did not allow students to display the knowledge, understanding and
skills set out in the syllabuses (question 4.8). And 55.7 percent of the respondents
disagreed and 17.1 percent strongly disagreed with the statement (question 4.9):
‘The examination questions or tasks focus on highly relevant knowledge,
150
understanding and skills for future artists and designers’. Furthermore the current
criteria in the examinations were not considered clear by the majority of the
respondents, 55.8 percent disagreed and 18.1 percent strongly disagreed with the
statement in question 4.7 that: 'The assessment criteria in the examinations are
clearly stated’.
Teachers gave varied responses to the statement in question 4.4: ‘ The questions
in the examination paper are similar in their form and content to the tasks set in
the classroom’. The same was the case for the statement in question 4.8:
‘The examination allows students to display the knowledge, understanding and skills
set out in the syllabuses’.

agree disagree
Teachers: TR 45% 40% 15% 3.00
Students : SR 4% 48.5% 40.4% 7.1% 2.00
4.8 1-Strongly 2-Agree 3-Disagree 4-Strongly

agree disagree Median
Teachers: TR 37.5% 55% 7.5% 3.00
Students: SR 2% 50% 45% 3% 2.00
According to one student’s comment in section 10, the examinations did not cover
all the contents of the course and students could take advantage of this:
We do not need to be good during internal assessment of

secondary courses. In fact we can choose to cancel our
151
registration and pass the exams as external students; this
has advantages for students because we only need to study
the contents of the exam question paper and not the whole
course syllabus (SR 18).
The majority of the respondents thought that the examinations were

unbiased;
for example 89.2 percent considered that they were free from gender bias
(question 4.10); 70.7 percent considered that the examinations were free
from bias related to where students live (question 4.11). And 84.8 percent
agreed with the statement
‘The examinations are free from bias related to racial/ethnic background
of the students’ (question 4.12) and 97.8 percent considered that
examinations were free from social class bias (question 4.13). However
students and teachers responses to the questions related to examination
bias (4.10- 4-13) varied. It seemed that while students were unaware of it,
teachers recognised the possibility of gender and geographical bias.

No Gender agree disagree
bias
Teachers: TR 5% 35% 47,5% 12.55% 3.00
Students: SR 48% 35% 16% 1% 1.00
4.11. 1-Strongly 2-Agree 3-Disagree 4-Strongly Median

No local bias agree disagree
Teachers: TR 5% 35% 47.5% 12.5% 3.00
Students: SR 48% 35% 16% 1% 1.00

No racial bias agree disagree
Teachers: TR 20% 42.5% 30% 7.5% 2.00
Students: SR 55.1% 38.8% 4.1% 2% 1.00

No social agree disagree
class bias
Teachers: TR 22.5% 40% 30% 7.5% 2.00
Students: SR 51% 41.8% 7.15% 1.00
152
Responses to questions about assessment (section 3) revealed that both teachers and
students considered it was important that students know and understand the criteria
being used to assess their work; 96.6 percent agreed with the statement ‘ It is
essential that students know exactly what qualities the teacher is looking for in
students’ work’ (question 3.1), and 97.8 percent agreed with the statement: ‘It is
important that students understand the criteria used to assess their work’ (question
3.2). Respondents also strongly agreed with the statement that students should
participate in the development of criteria; 91.3 percent considered that ‘ students
should have some say about the criteria used to assess their works’ (question 3.3).
4.4.2.3. The ideal form and structure of art and design examinations
(section 5)
The respondents considered art and design examinations were important

and only 28.6 percent of them agreed with the statement that there
should be no final examinations in art and design courses (question
5.4). It was evident that they preferred assessment instruments to be
used in examinations other than the current pencil and paper tests
Only 5.7 percent of respondents thought that art examinations should consist only of
questions requiring a written response (question 5.1); 29.3 percent were in favour of
art examinations consisting only of tasks requiring practical studio-based responses
(question 5.2) and 89.3 percent favoured art examinations consisting of
questions/tasks requiring a combination of written and practical responses (question
5.3). However some differences were found between teachers’ and students’ views
in response to the questions about the nature of artworks to be submitted for
examinations. Table 12 shows the responses to the statement in question 5.1:
‘Art examinations should consist only of questions requiring a written response’.
153
agree disagree
Teachers: TR 5% 52.5% 42.5% 3.00
Students: SR 2% 4% 42% 52% 4.00
Table 13 below shows responses to the statement in question 5.3:
‘Art examinations should consist of questions/tasks requiring a combination of
written and practical responses’.

agree disagree
Teachers: TR 55% 37,5% 5% 1% 1.00
Students: SR 48% 40% 8% 4% 2.00
Respondents considered the introduction of portfolios to be a valid assessment
instrument for art examinations; 75.9 percent believed that individual portfolios
provide the best evidence for assessing the work of art and design students (question
5.5). Similarly coursework was understood to provide sound evidence for
examinations; 80 percent of the respondents agreed with the statement: ‘I believe
that final art and design assessment should be based entirely on coursework’
(question 5.6).
Self-assessment was considered important by 69.8 percent of respondents (question
5.7). However there were some differences in the responses of teachers and students.
As shown in the figure 4, teachers (indicated in green) seemed to favour self-
assessment as evidence for examinations more than students (indicated in red),
154
80 percent of teachers agreed with the statement 'students' self-assessment should
be taken into account for the final examination’ whereas only 65.7 percent of
students where in favour.
70
60
50
40
30
20
role
10
Percent
0 t
Missing 1 2 3 4
self-assessment
Students’ responses
Teachers’ responses
Figure 9: Responses to question 5.7
Respondents stated their preferences for the types of evidence of student
achievement to be submitted in art examinations (group of questions 5.8).
They prioritised as follows:
(1): Final products: 92.8 percent

(2): Developmental studies: 90.6 percent.
(3): Records of students’ intentions and how these are realised: 86.9 percent
(4): Preliminary investigation and research work: 77.9 percent.
(5): Work journals/ annotated sketchbooks: 77.1 percent.
(6): Self-assessment notes or reports: 73.4 percent.
155
(7): Notes of group criticisms: 72.8 percent.
Of the respondents 66.9 percent agreed with the statement that question papers and
criteria should be internally set by teachers and externally approved (question 5.9).
The frequency of responses to questions about the duration of art and design
examinations was inconclusive. The reason for this could be that the questions were
not sufficiently well constructed and the respondents thought that they should answer
of the sub-questions in question 5.10 separately rather than by choosing just one
option.
Some comments added at the end of the questionnaires (section 10) revealed that
individual students had strong opinions about how their work should be assessed.
For example:
The ideal procedure should be continuous assessment that

includes student’s achievement and progress through
evidence like preliminary studies showing the processes
used and constant dialogue between teacher and student.
The student must know how to justify her work; and this
must be also assessed. It is absurd to have examinations
based on one single piece of work because it is impossible
for the student to reveal all her knowledge, understanding
and skills in art in a single piece of work (SR 104).
The exam should consist of the presentation of a project

developed during the year of study; the project should be
marked by the student’s own teacher and by an external
assessor (SR 102).
4.4.2.4. Opinions about current procedures (section 6)
A percentage of 59.4 of respondents stated that students usually read instructions for
marking (question 6.1), 44 percent thought that the assessment matrix and how
instructions about how their work could be marked was clearly explained to students
156
before the examination (question 6.2), and 38.2 percent considered existing mark
schemes entirely appropriate for judging their artworks (question 6.3). But 53.8
percent of the teachers considered that guidance about how to mark students' work
was unclear (question 6.4).
A difference of opinions between teachers and students was evident in responses to
the statement in question 6.2 that ‘The assessment matrix and how work will be
marked is explained clearly to students before the examination’. The majority of
students disagreed while the majority of teachers agreed with this statement.
6.2: 1-Strongly 2-Agree 3-Disagree 4-Strongly Median

agree disagree
Teachers: TR 53.8% 33.3% 12.8% 2.00
Students: SR 7.4% 32.6% 42.1% 17.9% 3.00
4.4.2.5. Reliability (section 7)

The responses in the questions about inter-rater reliability revealed that 61.8 percent
of the respondents did not believe that assessors always marked students work
consistently (question 7.1); 90.6 percent of them thought that their interpretations of
assessment objectives and criteria varied (question 7.2), and 94.2 percent thought
that the same student work could be marked differently by two different assessors
(question 7.3). Furthermore the answers to the question 7.4 were confirmation of
general distrust in the reliability of examinations: 79.1 percent of the respondents
agreed with the statement: ‘In my opinion it is only a matter of luck or personal
preference whether an assessor likes a particular work’. Intra-rater reliability was
also considered problematic: 72.3 percent of respondents considered that assessors
were not always consistent and might mark the same work differently at different
157
periods of time (question 7.5). Different tendencies in the views of teachers and
students were evident in responses to the statement in questions: 7.1: ‘Assessors
always mark students work consistently’; and 7.2: ‘Different assessors have different
interpretations of assessment objectives and criteria’; and 7.3: ‘The same student
work is often marked differently by two different assessors’. According to these
results it seemed that the students were more concerned with the issue of reliability
of assessment procedures than teachers.

‘assessors agree disagree
always mark
students work
consistently’
Teachers: TR 10.3% 41% 38.5% 10.3% 2.00
Students: SR 1% 32% 48.5% 18.6% 3.00
7.2 1-Strongly 2-Agree 3- 4- Median

different assessors have agree Disagree Strongly
different interpretations of disagree
assessment objectives and
criteria’
Teachers: TR 35% 47.5% 15% 2.5% 2.00
Students: SR 52.5% 41.4% 4% 2% 1.00
7.3 1-Strongly 2-Agree 3- 4-Strongly Median

‘the same student agree Disagree disagree
work is often
marked differently
by two different
assessors’
Teachers: TR 32.5% 52.5% 15% 2.00
Students: SR 60.6% 37.4% 1% 1% 1.00
4.4.2. 6. Opinions about ideal assessment procedures (section 8)
158
In section 8, the questions asked about ‘ideal’ assessment procedures and 73.2
percent of respondents agreed with the statement ‘Students’ examination work
should be assessed by the students’ own teacher and at least one external judge
(question 8.4). Less agreement was found with the statement in question 8.3:
‘Students’ examination work should be assessed by two external judges’ (table18).
A percentage of 79.7 of the respondents considered that students should have a voice
in the assessment process (question 8.5).
8.3 1- 2-Agree 3- 4- Median

‘Two external judges Strongly Disagree Strongly
assessing students' agree disagree
examination work’
Teachers: TR 20.5% 28.2% 35.9% 15.4% 3.00
Students: SR 13.7% 46.3% 29.5% 10.5% 2.00
Several students’ comments on Section 10 stressed the reliability of assessment
procedures:
Examination scripts should be assessed by art critics, in other words

competent people who respect art (SR 60).
I think that students should have access to assessment criteria and their
exam work should be assessed by their own teachers (SR 83).
It is very important to have national exams; students’ scripts must be

assessed by external assessors (SR 80).
From students’ comments it was evident that they considered that the current
assessment procedures needed to be improved. For example:
I think that students’ work for examination should be

marked by more than one teacher (SR101).
159
Some teachers in Section 13 also wrote comments about assessment procedures,
although these tended to be vague. One teacher also wrote about the need for reform
and possible consequences:
I think that assessment in the arts needs to be improved, especially

because there are many assessors who judge students art works as a
matter of taste, which is not a real criterion. However to do that, we
need to revise not only the examinations but also the syllabuses and
teaching practice (TR 49).
4.4. 2.7. Impact: consequences of the examination (section 9)
The questions in section 9 of the questionnaire asked for opinions about the impact
of art examinations and one response stressed their effects upon students:
Examination results are not related to results obtained by

students during internal assessment. Exam results are
crucial for our lives because we depend on them to go to
university but exams narrow our performances (SR 102)
Opinions about the value of the results of the examination in society revealed that
both teachers and students had low expectations, 36.1 percent of the respondents
thought that employers valued the results of the art examinations (question 9.5), and
67.4 percent thought that universities and polytechnics highly valued them (question
9.6). Of the respondents, 60.4 percent considered that school administration valued
them (question 9.8), 69.6 percent thought that parents valued the results of the art
examinations (question 9.9), and 81.8 percent of the respondents considered that
students valued them (question 9.10).
It was evident that students valued their results in the art examination because they
depended on them for university entrance. However the responses from teachers
160
revealed low trust in their value; teachers may distrust the results as a consequence of
the examinations perceived lack of validity.
9.7 1- 2-Agree 3- 4- Median

‘teachers do highly value Strongly Disagree Strongly
the results of the art agree disagree
examinations’.
Teachers: TR 7.5% 22.5% 57.5% 12.5% 3.00
Students: SR 9.2% 56.1% 30.6% 4.1% 2.00
Currently, in Portugal, examination results are used to select students for university
entrance. However one teacher questioned this function in the survey:
‘The universities should provide their own process or exams for student selection’
(TR- 29). The responses to questions about the use of the art examination results
showed that 79.5 percent considered that examinations were used to select students
for progression to higher education and employment. And 72.8 percent of
respondents considered that the examination results were the principal determining
factor with regard to students' choice of university courses.
Teacher responded in the following way to statements about other uses of
examinations:
• To control and monitor the curriculum (71.8 percent agreed);

• To check if schools and teachers are teaching the curriculum (56.4 percent
agreed);
• To monitor school standards (56.4 percent agreed);
• To rank school and teacher performance (51.3 percent agreed);
• To provide feedback to teachers (64.1 percent agreed);
• To revise next year's examinations (53.8 percent agreed)
• A mechanism to impose curriculum policies (64.1 percent agreed).
The responses to the statements about effects of the examination upon curriculum
practice revealed that 57.7 percent of the respondents agreed that the examination do
161
influence the work set during courses (question 9.1), more teachers than students
believed this was the case (see table 20). Although 52.2 percent of respondents
considered that a lot of work carried out during courses was not related to
examinations (question 9.2), the majority of respondents (64.9 percent) thought that
examinations influenced teaching practice negatively by focusing on a narrow range
of art and design knowledge, understanding and skills (question 9.3).

agree disagree
Teachers: TR 20% 65% 10% 5% 2.00
Students: SR 5.2% 41.2% 44.35% 9.35% 3.00
Teachers and students considered the correlation between internal assessment and
external assessment results weak since in question 9.13 only 39.6 percent of
respondents agreed with the statement ‘The marks received by students during
internal assessment in the secondary art course (10th, 11th, 12th year) will be a good
indicator of results obtained in national art examinations’. And the majority of
students did not consider that internal results were related to external examination
results.
Responses to the statements referring to predictive validity of art examination results
revealed that only 22.1 percent of respondents considered that the results of national
art examinations (12th year) were a good indicator of students’ results in their
university courses (question 9.11). But 44.8 percent of respondents expressed the
opinion that the internal assessment marks obtained in secondary art courses (10th,
11th, 12th year) were a good indicator of university course results.
162
Students and teachers responses to the statements in question 9.11 and 9.12 varied.
It seemed that students were less confident than teachers about the correlations
between examinations and internal assessment results and university results.
9.11 1- 2-Agree 3- 4- Median

‘The results of national Strongly Disagree Strongly
art examinations (12th agree disagree
year) will be a good
indicator of the results
obtained by students in
their university courses’
Teachers: TR 5.1% 53.8% 35.9% 5.1% 2.00
Students: SR 3.1% 31.6% 55.8% 9.5% 3.00
9.12 1- 2-Agree 3- 4- Median

‘The internal Strongly Disagree Strongly
assessment marks agree disagree
obtained in the
secondary art course
(10th, 11th, 12th year) will
be a good indicator of
the results obtained by
students during their
university courses’
Teachers: TR 51.3% 43.6% 5.1% 2.00
Students: SR 3.2% 31.6% 55.8% 9.5% 3.00
The questions 1.10-1.14 in section 1 of the questionnaire for students were designed
to obtain data to crosscheck with questions about the predictive validity of art
examination results. However a very small number of respondents were able to
provide the marks awarded in the first year of their university art studies
(58 respondents). The small number of responses did not make reliable data.
163
Therefore the correlations described in Appendix VIII (section 1.1, p. 47-48)
between the marks obtained by student respondents in their certificate of secondary
education (internal + external results) and the results obtained in the first year of
university can only be viewed as interesting but very tentative. Similarly the
correlations between examinations results and the results obtained in the first year of
university need to be read with caution.
Overall the questionnaire results about the impact of art examinations tended to be
contradictory and inconclusive. However the following conclusions were drawn:
• The results of art examination were generally considered important for
students’ selection and certification purposes.
• Art examination results were highly valued by students, but were not
generally accepted as important by teachers.
• The respondents did not recognise the impact of art examinations upon
curriculum practice as important.
• The great majority of respondents considered the predictive validity of art
examinations to be low.
4.4.2.8. Teachers views about assessment procedures (Sections 10-12)
In responses to questions in section 10 of the questionnaire for teachers, only 35.9
percent of the teacher respondents said that they assessed students’ examination
works using holistic or overall judgements; 56.4 percent said that they usually
assessed it by aggregating marks based on detailed criteria. The majority of
respondents said that they usually assessed student’s examination works using both
holistic judgements and judgements based on detailed criteria.
164
Notwithstanding the above responses, detailed criteria were not considered
necessary for art assessment by 74.4 percent of teacher respondents. A significant
majority of 94.9 percent of them thought that it was useful to discuss interpretation of
assessment criteria with their colleagues and only 5.1 percent strongly disagreed.
A percentage of 71.8 said that they usually met with their colleagues and tried to
reach a common interpretation of how to apply the criteria and 63.2 percent
considered it was not difficult to reach consensus about how to apply criteria.
These responses contradicted previous data derived from the reports of assessors
meetings (see section 4.3.3. pp.126-128).
Sections 11 and 12 of the questionnaire for teachers were about ideal assessment
procedures, 79.5 percent of the teacher respondents thought that evaluation
reports of examinations should be sent to schools. A significant percentage of
74.4 agreed with the need to receive notes of guidance and visual exemplars to
help teachers and assessors (question 11.2), and 89.7 percent stated that there
should be regular in-service training courses for teachers and assessors based
on the previous examination experience. Only 59 percent agreed with the
national publication of examination results in the form of league tables.
The majority of teachers who participated in the questionnaire favoured special training
for assessment. A significant percentage of 87.2 of respondents suggested the
need for special teacher in-service training on assessment. And 89.7 percent
agreed with the need for special assessment training for assessors, while 94.9
percent agreed with the statement that current students’ work should be used
for training of assessors. The statement: ‘The aim of in-service training should
165
be to reach consensual agreement about the interpretation of criteria’ had the
agreement of 89.7 percent of respondents.
Teachers’ open comments in section 13 of the questionnaire confirmed the general
opinions expressed in these questions. For example:
We need valid in-service teacher training (TR 26)
Teachers need to be well informed and have access to in-service

teacher training in order to prepare students for the future (TR 23)
Summary
From an analysis of the questionnaire the following evidence emerged:
• A great percentage of Portuguese teachers and students expressed
dissatisfaction, in various ways, with the current art external assessment
instruments; although some believed that the instructions and instruments
were clear they also believed that the examination focused on a narrow part
of art and design learning, the question papers did not allow students to
properly reveal their knowledge, understanding and skills in art and design
and consequently lacked content validity.
• Respondents did not consider the procedures used for assessment reliable
and, students especially, distrusted the inter-reliability of markers.
• The impact of art examinations was evident upon the users; the respondents
considered that students valued examination results. Teacher respondents
identified major outcomes of the art examination such as student selection,
monitoring and controlling the curriculum and standards and imposing
curriculum policies. Some respondents believed that art examinations
166
influenced teaching practice by focusing on a narrow range of art and design
knowledge, understanding and skills. Some respondents and especially the
great majority of students did not believe in the predictive validity of
examination results.
• Respondents’ opinions about how to reform the system suggested that it was
important to reconsider the nature of the examination, devising instruments
with both written and practical tasks, and including an element of self-
assessment. For example portfolios comprising coursework evidence such as:
preliminary investigation and research work; work journals; developmental
studies; final products; self-assessment notes; a record of student intentions
and how these are realised; records of group criticism discussions.
• The teachers’ responses to questions about how to improve assessment
procedures strongly supported the implementation of a system of teachers’
in-service training in order to develop assessment skills through consensual
interpretation of criteria and by the use of visual exemplars of student works
for training purposes. All respondents considered that assessment by the
student’s own teachers and at least one external assessor would be the best
way to improve the examination procedures.
The analysis of the history of Portuguese art examinations and current practice
revealed an emphasis on Descriptive Geometry as a core specialism of the art
curriculum. Assessment in Portugal was fragmented into internal summative
assessment which valued teachers’ judgments but had no external controls, and
external summative assessment or national examinations based on written tests.
The procedures used for external assessment seemed to lack quality assurance
167
mechanisms and consistency of marking. The underlying model of the art
curriculum and assessment was based on separate specialisms labelled as theoretical
or practical, and the core domains of art education were fragmented because of this
separation. Consequently assessment in art seemed to include only narrow aspects of
what could be a broad and rich art education. Problems with reliability of the
assessment procedures were detected. The existing assessment criteria were vague
and the process of marking extremely dubious; the absence of moderation procedures
and lack of consensual interpretation of criteria were identified as major problems of
the system.
168
Chapter 5
Art and design examinations in England
This chapter sets out to describe and discuss the strengths and weaknesses of
English art and design examinations at 17 + level through the analysis of their
evolution, underlying model, assessment instruments and procedures. This part
of the research was intended: to study the policies and practices of art and
design examinations in England; to identify critical issues for assessment
practice by analysing concepts and theories that underpinned art and design
external assessment in England; and to understand how assessors and
examiners were trained in England.
5.1. Introduction
From the review of literature a number of art and design examinations had been
identified where portfolios or coursework constituted the principal assessment
instrument and where the assessment procedures relied on different judges or
assessors. For example, Arts Propel; International Baccalaureate; Advanced
Placement; national examinations in The Netherlands, Finland and England.
From this list, the English GCE A level examinations appeared to fit most closely
with the preferences expressed by Portuguese students and art teachers about an ideal
examination (4.4.2.3; p.137 and 4.4.2.6, p.142), since the assessment instrument
included coursework and assessment procedures requiring teacher in-service training
and moderation. A study of the English model presented an opportunity to
investigate established procedures and strategies and to anticipate problems before
the design of a new assessment instrument and procedures for Portugal.
169
References to the English system of art and design examinations at AS and A levels
was included in this research because it was found during the Portuguese survey of
art teachers and art students that it appeared closely related to their views on an ideal
model for art examinations in Portugal. It is acknowledged that this section of the
thesis provides only a partial view of the English system of art and design
examinations and there was no intention to make a detailed comparative
study. However analysis of some key aspects of English art and design examinations
through documents, interviews and observation of the Edexcel Foundation awarding
body standardisation processes was useful for the purposes of collecting insights
about possible strengths and weaknesses of an examination system based partly
on coursework portfolios and assessment procedures reliant on standardisation and
moderation.
The documents used to inform this description included a review of literature about
the history of art and design examinations in England, articles in the press and
official publications (for details see 3.4.1.1, p.84). Since the English model of art and
design examinations was based on a system of standardisation, data was collected
through two observation periods conducted at the Edexcel Foundation awarding
body standardisation meetings (Appendix X). The observations were intended to
explore how moderators were prepared, the processes used to reach a common
understanding and application of criteria, what kind of visual exemplars were used
and how the assessment and marking of such exemplars was explained to the
moderators.
170
This stage of the research also set out to understand stakeholders’ perceptions of
English art and design examination practices. For that purpose, data collected
through interviews (Appendix IX) was analysed in order to determine the strengths
and weaknesses of the model in terms of validity and reliability and to develop a list
of tentative suggestions for new assessment instruments and procedures. The purpose
of the interviews was to understand examination practices from the point of view of
different stakeholders and to obtain deep information about the expectations and
problems experienced by the participants. A group of six interviewees was selected
including one principal examiner (Richard), one team-leader (Elisabeth), one
moderator (Ian), two art teachers (Cindy and Sally) and one student (Annie). Ian and
Elisabeth were first contacted during an Edexcel GCE AS standardisation meeting in
2001, and Cindy during an art teachers’ workshop (NSEAD, 2001). Cindy, Elisabeth
and Ian lived and taught in different regions of England and, through informal
conversations with the researcher in previous meetings, revealed extensive
experience and different points of view about the examination. Richard was a chief
examiner with considerable experience of developing examinations. Sally was
included because of her special situation as an inexperienced art teacher. Annie was
selected to introduce a student’s view of the model. The sample was not intended to
be representative of English students, teachers or moderators. The interviews were
expected to provide individual oral histories about these examinations worked in
micro-contexts rather than provide a general account of them.
171
Roles Name Sex Age Region Experience Awarding
body
teaching examinations
Principal Richard M 57 SouthEast 12 AQA
examiner England
Team-leader Elisabeth F 67 London 30 20 Edexcel
region
Moderator Ian M 48 London 18 11 Edexcel
region
Art teacher Sally F 46 London 1 Edexcel
region
Cindy F 42 NorthWest 18 OCR and
England Edexcel
Student Annie F 16 South England Edexcel
Table 23: Interviews with English art and design examination stakeholders
5.2. Art and design examinations before the 1988 Reform Act
Art education in England in the 1960s was influenced by rationales such as self-
expression and by formalist approaches (Ross, 1989). Practices in separate arts
subjects in the curriculum were strongly influenced by the modernist tradition in art
itself and the ‘progressive’ ideas of John Dewey and Herbert Read.
The leading exponents were singular charismatic individuals- artistic

teachers whose concern was with the child as artist – and untroubled by
any other considerations (Ross, 1989, p. 17).
These approaches to art education promoted changes in the way that candidates’
work was presented and assessed. According to Carline (1968), personal expression
came to be more valued than the previous tradition of accurate representation:
A quarter of a century ago, inaccuracy in representing objects,

mistakes in perspective and untidiness generally were almost the
only, and certainly the main, grounds for criticism. More recently,
examiner’s reports have emphasized that a personal interpretation
of the group [still life] is much more important than accuracy of
representation. The Oxford report for 1958 goes further, in
reminding schools that ‘good candidates’ show that they have
been excited by some aspect of the form, the pattern or the
colour”. To make a record of what is before one is manifestly
insufficient (Carline, 1968, p. 204).
172
The predominance of observational tasks changed with the new conceptions about
the value of artistic work. For example, living models, humans and plants, were
introduced as drawing subjects and personal interpretation of the object or model
was considered more important than copying: ‘…such work, even though it may
bear little resemblance to the model, is much to be preferred to a facile drawing of a
figure executed without character and culled from constant copying of fashion
drawings in magazines’ (Carline, 1968 , p. 214). The inclusion of ‘imaginative
composition’ and its subsequent development into ‘pictorial composition’ or ‘visual
composition’ introduced another important element derived from new concepts
where imagination was valued as an essential cognitive aspect of art education.
Imagination, personal response and problem solving were all rated highly in the
examination requirements. Later a formalist emphasis on abstract works was
gradually recognised in the examination syllabuses and question papers. Art
appreciation was mainly concerned with the history of art. According to Carline it
was the ‘least popular’ of the art papers, although it was taken by ‘some thirty per
cent’ of the Advanced level candidates in art, mainly by ‘candidates who plan a
career in which art may be useful, and are likely to be granted extra time at school
for preparation’. However according to Carline (1968, p.255) teachers experienced
difficulty ‘in combining art history with practice in the limited time allowed’ and it
was ‘unsatisfactory from an educational point of view’.
Carline’s (1968) description of art examinations in the 1960s was written from the
point of view of an examiner. His writing makes clear that examinations influenced
teaching practices. On several occasions he mentions the correlation of certain
subjects and their impact on teaching: for example, in the chapter about ‘craft’ he
173
stated: ‘…the inclusion of craftwork in the art examinations has given enormous
encouragement to the practice of crafts as a branch of art’ (Carline, 1968, p.246).
Carline used quotes from examining board reports to reinforce the claim that art
teachers should align their teaching to qualities of artworks valued by examiners.
GCE A level art and design examinations progressively evolved over time. In the
1970s education in England was characterized by a decentralised system (Ross,
1989). Before the implementation of the Education Reform Act in 1988, which gave
new statutory powers to the Secretary of State and central Government, 'Local
Education Authorities, schools, and even the individual teacher [had] almost total
autonomy' (Steers, 1989, p. 8). In fact, the only regulatory device for schools was
the examinations system (White, 1998, p.5). There was no national curriculum for
art and teachers had considerable control of the form and content of art education.
At that time, in England and Wales, secondary art teachers taught what they, as
individuals, considered important, within the context of a wide choice of fairly
flexible examination syllabuses (Steers, 1989, p. 9). Although examinations were
subject to scrutiny by the Schools Council, the numbers of options for examinations
and flexible procedures ' led the emphasis upon teacher-control of examinations'
(Earle, 1986, p145). Teacher ‘ownership’ of assessment systems was considered a
most important factor for examination improvement. In Steers’ words:
In my view, one consequence of this ‘ownership’ has been an

unprecedented, sustained national effort in which the
professionalism of classroom teachers has played a particular
significant role: by professionalism I mean both the ‘effectiveness
of implementation’ of the examination procedures, in this case by
the overwhelming majority of art and design teachers and, in turn,
the way in which this serves… the intrinsic values and interests of
the profession (Steers, 1994, p. 289).
174
An example of teacher ownership of external assessment was provided by the range
of options offered to teachers, not only through the choice of an examining board but
also through the choice of examinations structure. According to Earle (1986, p.51) in
the 1970s and 1980s there were three possible modes of examinations for 16+ and
18+ levels: Mode 1 – an external examination based on syllabuses provided by the
regional subject panel; Mode 2 – an external examination on syllabuses provided by
a school or a group of schools; and Mode 3 – an examination set and marked
internally in a school or group of schools but moderated by the regional board.
Within each of the eight regional bodies it was possible for a school to devise its own
examination if required (Earle, 1986, p.51). This freedom of choice was still
remembered by the English art teachers interviewed in this research (see interview
Cindy, Appendix IX, p. 101) as being a model for innovative and creative teaching
within an atmosphere of professionalism and consensus. Mode 3 was the structure
that allowed most teacher ownership, according to Earle (1986, p.51), but it was not
always preferred, perhaps because teachers did not have sufficient time or inclination
to produce individual syllabuses and because it was very ‘expensive to administer’.
A finding of review of literature about the history of art examinations in England was
that the model of the examinations had progressively evolved from a test of several
hours or a day to the inclusion of preliminary work and coursework in the form of a
portfolio or an exhibition. Candidates usually received the question papers before the
examination in order to prepare the themes and develop ideas in visual and written
forms. In 1977, a survey of the syllabuses of the GCE A level Art examinations
conducted by Earle, found that eight boards conducted a variety of art examinations
175
in different combinations and ways (art, art and craft, art history) providing a total of
sixty-two subjects; thirty two of these were in art history.
Earle (1986) pointed out there were problems of comparability among the
examination boards:
It could be argued at one level that one board expects candidates to undertake a
broad course of study while other boards demand a greater depth of study in specific
areas of art. The problem comes in equating one board's work with another, which
used to be the function of the A level sub-committee of the art committee of the
schools council. In its absence there is no indication of how standards among boards
will be moderated at this level.
‘In 1978 no examination board required the submission of coursework but in 1988
OCR allowed some coursework to be presented, in the beginning with no guidance
on how it should be marked or graded’ (Steers, 2003). Similarly, Earle (1986, p123)
pointed out: ' …there is not much stated in writing to assist the teacher with the
presentation of coursework, the amount required varies considerably...the boards
who require coursework and preliminary works are interested in the way which
candidates carry out their work as well as the final product'. Although art
examinations had evolved to include coursework as evidence for marking candidates
achievement, instructions for this were stated in vague terms and no explicit
guidance for teachers was given.
Earle (1986) stated that GCE A level art examiners and teachers became more and
more concerned with the problem of comparability of results:
176
The comparative wealth of examinations available to teachers has been a
luxury. While the curriculum was the 'secret garden' idiosyncratic
practices in art could continue. If teachers have to re-think their practices
in the light of the new examinations, it could become an advantage to the
profession to have a new and more standardised series of art
examinations (Earle, 1986, p126).
Earle was a chief examiner and, may be because of this his account of art and design
examinations argued for a more centralised model of external assessment.
He believed that the reliability of results was dependent on standardisation systems.
His vision was in line with a strong move towards criterion-referenced assessment,
which emerged during the 1980s, probably because of the external pressures upon
education driven by politicians (White, 1998). According to Steers (1989):
The policies of the present Government are having a profound influence

on all sectors of education. At all levels there are demands for greater
accountability and for the curriculum to be more 'relevant’ – relevant to
the Government's perception of national economic health rather than the
educational health of the nation (Steers, 1989, p. 7).
In the cause of a more reliable system of examinations the awarding bodies made
developments in mark schemes. The growing interest in defining marking schemes
and establishing domains of knowledge, grade criteria and criterion referenced
assessment was seen by teachers and examiners (Davies, 1987) as a way to introduce
more rigour and reliability into the examination.
5.3. The consequences of the Education Reform Act (1988) in England and
Wales
The introduction of the national curriculum in 1988 resulted in radical changes in
English education that were not always popular. The national curriculum was often
177
seen as the teachers’ losing control of the education system. According to White
(1998):
After 1988 one form of sectionalism –teacher control of the

curriculum –was replaced by another. Power shifted to the
government. This meant in practice that those in control of
education policy were able to institute the national curriculum
they preferred with minimum consultation and regardless of
obvious difficulties with it (p.5).
According to Ross (1989, p. 17): 'Authority in education passes swiftly from the
professional to the politician, from the researcher to the bureaucrat’. English teachers
were not used to such forms of policy-making – until then the only form of control
they had known being external examinations within a flexible choice provided by
several examination boards. However, at this time small boards were subsumed in
larger awarding bodies. For example AEB, NEAB and City and Guilds became the
Assessment and Qualifications Alliance (AQA) awarding body; ULEAC and BTEC
became Edexcel; Oxford and Cambridge and RSA examining boards become
Oxford, Cambridge, and RSA awarding body (OCR). The model of examinations
became more and more rigid with the amalgamation of examining boards. In the late
1980s and 1990s the trend was towards removing teacher ownership of education
and reinforcing more centralised control of the curriculum and external assessment.
Teachers were aware of the negative implications of such changes upon the
profession as Steers pointed out (1994, p. 289): ‘In recent years this professionalism
has been under threat from a government that has made known its mistrust of teacher
assessment’.
178
According to Steers (2003) usually A level art examinations were based on the
progression of a student’s work. The evidence for marking was submitted as a
portfolio or exhibition of selected works from the two years course of study.
Current examinations have less focus on drawing or craft skills per se,
but place a higher value on knowledge and understanding where both
process and product are assessed. The syllabuses are designed to allow
candidates to provide evidence of their achievement throughout a course
of study rather than test their ability to perform within set and very short
time limits. It can be argued that, in general, the 1998 examinations
provide more demanding but potentially more rewarding opportunities
for both candidates and their teachers (Steers, 2003).
During the 1990s ‘process’ progressively was more emphasised in relation to ‘end
products’ in examination values. Gradually the criteria for assessment focused
more and more on students’ evidence of developing ideas and investigative work.
This may have been a consequence of tendencies in art education theory at the time
to focus more on the process than on final products and by the development of more
conceptual contemporary art and design forms.
5.4. AS and A level art examinations after curriculum 2000

At the time of writing there were several qualifications in the English system for
external assessment at age 17+. There were general qualifications such as GCE AS
(Advanced Subsidiary) and GCE A levels (Advanced GCE); Vocationally related
qualifications such as Advanced GNVQs and occupational qualifications such as
NVQs (National Vocational Qualifications). Vocationally related and occupational
qualifications were linked to ‘national occupational standards, which set out real-
world job skills as defined by employers’ (QCA; 2000). In the General Certificate of
Education (GCE) at Advanced Level in any particular subject candidates could
achieve pass grades of A to E. Students who wanted to follow an academic route to
179
higher education or employment usually took A Level examinations. A Level
courses normally lasted two years and the examinations could be taken in a linear or
modular fashion. AS and Advanced GCE examinations in Art and Design were
suitable for students who wished to study art, craft and design at a higher level,
in order to take up careers for which an art background was relevant.
5.4.1. The role of the QCA
English art courses at level 17 + were regulated by a government agency, the
Qualifications and Curriculum Authority (QCA) that defined the main examination
specifications, including assessment objectives, grade descriptors, and assessment
procedures. The general procedures for all examinations were stated by the QCA in
the Code of Practice (QCA, 2000). The three awarding bodies in England set their
syllabuses, question papers and assessment procedures only with the approval of the
QCA, which also monitored and evaluated the awarding bodies through a
programme of ‘scrutinies’. QCA scrutineers checked if the awarding bodies could
guarantee ‘ parity in each subject and qualification from year to year, across different
syllabuses, and with other awarding bodies’ and ‘ compliance with the requirements
of Code of Practice and other relevant regulations’ (QCA, 2000).
5.4.2. The structure of awarding bodies
At the time of writing, English schools and examination centres could choose one of
the three English awarding bodies to deliver the examination or an examination
offered by the Welsh Joint Examinations Committee (WJEC). This choice could be
that of the head of school, head of art department or through agreement between
teachers in the centre. Awarding bodies managed ‘all stages of the examining
process’ ensuring that the procedures were ‘carried out in accordance with the Code
180
of Practice and with the awarding body’s policy procedures’ (QCA, 2000). Figure 5
(below) illustrates the typical structure of an awarding body.
181
Chief examiner
• Supervises the construction of the question papers, mark schemes and the criteria
for coursework assessment.
• Responsible for monitoring the standards of principal examiners and principal
moderators
• Evaluation report of the examination at the end of each examination period.
Assessor Reviser Principal examiner Principal moderators

Second expert in Expert required to Compile exemplar
the design of the evaluate question Responsible for setting coursework tasks,
question papers; papers and question papers and the annotated to show
responsible for provisional mark standardisation of its how the marking
checking the final schemes marking. criteria in the
drafts of all syllabus are to be
question papers applied,
ensure moderators
coordination and monitor
Team Leaders
Assistant Principal Coordinate and supervise the
team of moderators
Examiners
are responsible for the

coordination and
Moderators
supervision of the group
Checking if the teachers’
of team leaders and Team
marks in the centres are in
Leaders coordinate and
accordance with the
supervise the team of
established standards and for
moderators
making adjustments if
necessary.
Assistant Moderators
Assistant moderators moderate centres’ assessment of candidates
in accordance with the agreed marking criteria and the awarding
body’s procedures
Figure 10: Structure of awarding bodies
5.4.3. A modular structure of courses and examinations
182
From September 2000, a modular structure established the specifications for GCE
AS and A level examinations, with the modules corresponding to units. At the time
of this writing most A level qualifications were made up of six units that were
broadly equivalent in size (QCA, 2000). Three of these units constituted the
Advanced Subsidiary (AS) qualification, representing the first half of the full A
level. The other three constituted the second half of the full A level and were called
A2. A level qualifications included two distinct examinations, AS and A2, each one
covering three units of the course. The units were separately set and assessed.
The modular structures of courses and assessment according to the Code of Practice
(QCA, 2000), were intended to allow flexibility. But they appeared to be a potential
source of validity problems as well as posing a serious threat to the established
theory and practice of art and design education. In the interview with Richard he
said: ‘We were all bound by…the modular structure for the AS and A2’ (Richard,
SEE, 4-02-2001). The structure of the courses in separate units tended to make the
system uniform. One consequence of the modular structure was an increasing
numbers of students taking art and design qualifications at AS level, another was a
distortion of the art and design courses. In theory, modularity was supposed to
allow course flexibility. However, according to the interviewees it had resulted in
student over assessment, and prevented teachers from experimentation in their
courses. This was understood to be a particular threat to sequential development of
ideas and skills considered important in art and design learning. According to
Elisabeth:
Art is on-going, it is a developmental thing and it goes on, you can get
back to a piece, you can use it as a starting point for something else in
the future, whereas with this modular structure you can’t do that.
183
Now everything they do is assessed, they don’t have time to explore
and experiment which is what art is about. They always have a
deadline for something and that is in their heads, I think it stops
things…. I think if you are all the time focusing on deadlines things
have to be finished off by rushing through, I think that it does interfere
with the creative process (Elisabeth, L, 7-02-2001).
For Cindy:
‘I don’t think that’s how we learn. I think its based on the wrong
theory… to assess in a way like this, by putting things into chunks of
experience, is not how you work, I don’t think it is a good practice
too’ (Cindy, NEW, 10-02-2001).
The introduction of assessment by units to replace the single final assessment at the
end of the course, as was the case with the old A level, was explained by Cindy:
‘I know why they have done it, to an extent. Well I think it is so the
students don’t have the pressure of the work at the end, they can
assess as they go along’ (Cindy, NEW, 10-02-2001).
But in practice, in Cindy’s view, it had put more pressure upon students because of
strict deadlines and overloaded courses. Elisabeth thought that the modular structure
was intended to homogenise the system and also to create intermediate
qualifications:
‘…this is really to be in line with other subjects and the government

as well wanted something that students could gain at the end of the
first year [of the course]’ (Elisabeth, L, 7-02-2001).
She thought this could be the reason for an increasing number of students opting for
art and design at AS level. However a head of art at the North London Collegiate
School considered this unfair and he was quoted as saying: ‘Art students need elbow
room for the pursuit of a personal aesthetic and a modular straitjacket does them no
favours at all’ (Hardy, 2001).
184
This practice seemed to contradict the course rationales. For example the OCR
specifications stated:
Progression from breath to depth, from AS to A2 is achieved by the

various units emphasising particular assessment objectives. This ensures
that the specifications have a developmental nature’ (OCR Specifications
2000).
The National Society for Education in Art and Design documentation showed that
one year after the implementation of curriculum 2000, the modular course, despite
the intention to present ‘a developmental structure’, was often seen a threat to art and
design. The newsletter of Summer 2001 included an article about the new A and AS
levels examinations, based on considerable correspondence from members,
expressing concern. A principal complaint was:
The new essentially modular examination places pressure on candidates

to produce fully realised projects from the beginning of the course.
Consequently, some teachers and students feel that there is no time for
the proper experimentation appropriate to the early part of A/AS level
study with an inherent possibility of ‘failure’. This is likely to lead to
increased prescription by teachers and students who feel inhibited from
taking risks – from being creative (Steers, 2001,p.11).
It appeared that the introduction of modular assessment, which was intended
to allow more options for students, was problematic in terms of fairness and
explanation for the adoption of such measures could be the government desire to
increase rigour and accuracy. Cindy said that ‘…particularly in arts, in the 70s and
early 80s assessment was quite lax’ (Cindy, NEW, 10-02-2001). And this was also
an opinion expressed in an exploratory interview with a former Edexcel art
assessment coordinator (IPB; L, 12-06-2000). However the fact that students were
assessed three times per year did not provide proof of fairness or accuracy.
Furthermore, the new reform brought in a serious problem of over-assessment.
185
According to the interviewees students and teachers were being hard pressed by
assessment and losing time to develop exploration, experimentation, risk taking,
and reflection on progress, all which are typically understood as fundamental to
artistic development.
5.4.4. Scheme of assessment

As noted previously, at the time of writing, art and design examinations were based
on coursework completed in each unit. At AS level the scheme of assessment
required two units of coursework, each one weighted 30 percent of the total AS
marks and 15 percent of the total A level marks and another unit, the ‘controlled
test’. For A2 examinations, two other units of coursework each one weighting 15
percent of the total A level marks were required and another unit of controlled period
test weighting 20 percent of total A level marks. The scheme of assessment was the
same for all the boards; the differences were mainly in the names given to
coursework units and the time allocated for the controlled test or externally set
assignment (Appendix III).
5.4.5. The specifications

A considerable quantity of documentation was published by awarding bodies,
for example: specifications, instructions for the conduct of examinations (for
teachers, moderators; team leaders and principal assistant moderators); question
papers for externally set assignments; guides for marking and reports of examiners.
The specifications, formerly known as ‘syllabuses’ defined approaches to the
structure of the course and established the scheme of assessment through description
of assessment objectives; assessment criteria and grade descriptors in general, and
186
the qualities sought in students’ works in each one of the specification titles or areas
of study.
According to the Code of Practice, chief examiners usually devised or wrote
examination papers, specifications and mark schemes. The validity of the
examination instructions and question papers was estimated a priori through
evaluation processes involving revising question papers and provisional mark
schemes by revisers, assessors and a committee chaired by the chair of examiners.
The evaluation took into account clarity of language, graphic design, content
relevance, bias, and consistency across the years, and discrimination issues (QCA,
2000).
5.4.6. Rationales for the specifications

Specifications were developed according to the QCA Code of Practice general
requirements involving (1) spiritual, moral, ethical, social and cultural issues; (2)
environmental education, the European dimension and health and safety issues, and
(3) the general skills of communication; information technology; working with
others; improving own learning and performance; and problem solving (QCA, 2000).
The rationales as stated by the three awarding bodies emphasised the need for breath
and depth in the art courses. For example, teachers were supposed: ‘to encourage…
an adventurous and enquiring approach to art and design’ and ‘focus on art and
design practice and the integration of theory, knowledge and understanding’
(Edexcel Specifications, 2000); ‘…the specifications have a developmental nature’
(OCR Specifications 2000) and [the specification is designed to] ‘form the basis of a
course enabling candidates to develop their own personal responses to their
187
experiences, environment and culture in both their practical and theoretical activities
…and aims to achieve a sense of progression’ (AQA Specifications 2000).
External assessment in England was conceived as an integral part of the art and
design curriculum. According to the interviewed teachers, the guidelines established
by QCA and specifications of the different awarding boards were broad enough to
allow some diversity of approach to curriculum.
The specifications leave a certain freedom for teachers to set out the
course (Ian and Sally, L, 7-07-2001).
‘[Provide] different ways of working and opportunities for personal

development and independent work through the course (Richard, SEE,
4-02-2001).
‘They state things quite clearly and they give good direction for
teachers and the more mature students’ (Elisabeth, L, 7-02-2002).
‘The way this is written can be encouraging for teachers. If teachers

are creative and interpret it creatively they provide a good framework’
(Cindy, NEW, 10-02-2002).
As described in the documents the areas of study seemed to encourage a broad
approach to art and design allowing students to develop a sound grounding in a
number of areas and media. The courses allowed flexibility in students’ choice of
the areas of study.
From an analysis of QCA documents about art and design examinations and
awarding boards specifications the assessment objectives set out by QCA appeared to
provide a common framework for developing specifications, grade descriptors and
criteria. This framework emphasised investigation of art and design products. In this
188
model there was a focus on art and design process, more weight was placed on the
development of ideas and critical and analytical capacities through the understanding
of context, meanings and purposes of art and design. It seemed to encourage analysis
and critical processes more than the final products. This appeared to be adequate for
the cross disciplinary nature of contemporary art and design, providing broad
assessment objectives seeming to be appropriate for all the areas of study (Appendix
IV). However the breadth of this model could restrict areas of experimentation and
production. For example, Annie complained about the lack of development of
specific technical skills during her course (Annie, SE, 6-05-2002). According to the
interviewed teachers ‘Curriculum 2000’ brought considerable changes to art and
design. It had replaced ‘traditional’ ways of teaching, which tended to emphasise
final products and specialisation in art and design. By reducing art to component
parts for assessment purposes the underlying model seemed to have been divorced
from established art and design teaching and practice, minimising the importance of
quality production. Ian said:
‘ It is all about proof and documentation rather than finished

product’ (Ian, L, 7-07-2001).
5.4.7. Instructions for the conduct of examinations
The clarity of instructions provided for the users is one essential aspect of validity. In
subjects that deal with creative outcomes, such as art and design, achieving a balance
between unambiguous instructions and direct, prescriptive effects upon the
curriculum is critical. And this was particularly crucial in the case of English art
and design external assessment where visual exemplars were often provided.
189
The interviewees expressed contrasting opinions about the degree of prescription
inherent in the instructions for assessment and their consequences.
The interviewees thought that instructions were not sufficiently clear for students (or
at least for all students) especially in the question papers, and the role of the teacher
in explaining the instructions to students was considered essential. Elisabeth said:
‘…there were parts of it that were quite difficult …with those

students who haven’t had it explained to them by the teachers’
(Elisabeth, L, 7-02-2002).
Ian and Sally wanted the instructions to be less ambiguous, at least in terms of
numbers of works to be submitted for examinations. According to them the
instructions for teachers from Edexcel were not clear, and some teachers had
misunderstood the requirements of the examination in 2001. The reasons for this
could include the new examination format, because teachers did not pay enough
attention to instructions, because they needed training, because the instrument had no
face validity, or a combination of these factors.
Cindy and Elisabeth said that the instructions needed to be broad and not too
prescriptive:
‘I think its up to teachers to understand and interpret it in a very

creative way. If the teacher is creative, he then interacts with the
students’ (Cindy, NEE, 10-02-2001).
According to Elisabeth there was a danger that the instructions could act as a
limitation and as a way to impose specific teaching approaches. Too much
prescription narrows courses, she said:
190
‘…because, these units are very prescribed, the way they are
explained, in some centres, not all, make it very tight in the way that
all the candidates do similar work rather than the more individual art
that we would expect at that level’(Elisabeth, L, 7-02-2002).
She concluded that there was a need for instructions that allow for teachers’
professionalism and inventiveness. The instructions for examinations and visual
exemplars provided by awarding bodies for schools, like an Edexcel video
illustrating units of work, could be a factor contributing to increased prescription of
courses and loss of freedom to produce personal responses. Annie’s work was
influenced by examples of ‘good’ work displayed on the Edexcel video sent to her
school (researcher diary, St.X School, 7-06-2002). The samples of students’ work
observed during standardisation meetings at Edexcel during 2001 and 2002 did not
generally demonstrate ‘…a wide variety of outcomes and media; specially the
sketchbooks’ (researcher diary, Mansfield, 4-05-2001. This may have been a
consequence of the impact of examinations. Whether deliberately or not, teachers
and students may have been encouraged to ‘play safe’ and reproduce the best
examples in order to ensure good results and consequential benefits for all
concerned. The examination could be preventing some risk-taking and independent
thinking, which should be fostered in art and design. There was a tension between
instructions for the art and design assessment instrument and art and design
curriculum practice. Should the instructions be broad and encourage diversity or
should they be prescriptive? Should visual exemplars be used? Two teachers
interviewed were in favour of such visual guidance (Ian and Sally, L, 7-07-2001).
But the literature showed that it could have pervasive effects upon schools and that
exemplars themselves are controversial. Hardy commented on the Edexcel visual
instructions for AS in 2002 as follows:
191
My fears were confirmed when I saw extracts from the induction video.
Although peppered with talk of ‘building on good practice’; prescription
and prohibition had resulted in exemplar material from pilot schools that
was so dull and devoid of wit it took my breath away. My idea of a good
sketchbook is an open ended tool for research and exploration; evidence
of an individual path of discovery, not the series of formulaic and
dedicated exercises exemplified therein. I wonder how many examiners
thought they would scream if they saw another colour wheel (Hardy,
2002).
A particular problem in designing instructions for assessment was detected during
this part of the research: on the one hand, if instructions are too prescriptive, even
though they are clear for students and other users, they may limit the breadth of the
instrument. On the other hand, if the instructions are broad enough to allow for
diverse outcomes, teachers need to interpret them. In the case of these examinations
where the assessment instrument was delivered by the students’ own teachers, broad
instructions had advantages, but only if these teachers were well-trained
professionals. There may be issues of equality of opportunity also for those students
who do not have the support of a good teacher. Interpreting and delivering
instructions necessitated teacher training, which was provided through in-service
training courses at which attendance was not compulsory. The nature of these
training courses was questionable. In one interview, Cindy noted, there was a need to
foster professionalism and creativity of teachers but sometimes in-service teacher
courses tended to be too prescriptive (Cindy, NEE, 10-02-2002).
5.4.8. Assessment Instruments

5.4.8.1. Coursework
At the time of this writing, A-level candidates submitted selected coursework from
each module. This could take the form of a portfolio (structured sequence of
annotated drawings, paintings, photographs or three dimensional objects); an
192
exhibition; multimedia presentations; written work such as essays and written and
visual work such as work journals. Two units of coursework were required at both
the GCE AS and A2 examinations, accounting for 60 percent of the final A level
mark. The specifications, students’ guides and instructions for teachers set down
explicit parameters and instructions for setting coursework tasks and defined
marking criteria. Every unit should provide evidence of all the assessment objectives
being met.
5.4.8.2. The ‘controlled test’ and question papers

The ‘controlled test’ was not a test in the traditional sense. The question paper
described a theme and a list of starting points for developing a unit of work to be
carried out during a controlled period. According to Edexcel the controlled test
represented the ‘culmination’ of the course (Edexcel, Specification 2000), and
therefore was considered as synoptic assessment. Students received question papers
in advance and they had a preparatory period for the controlled work. During this
period students could consult with staff and be supplied with supporting guidance
and materials. The period of the controlled test was fixed for AS and A2 at 20 hours
distributed as follows by the different awarding bodies:
Preparatory AS A2
period Controlled Controlled
test (Unit 3) test ( unit 6)
Edexcel 6 weeks 8h 12 h
AQA 4 weeks 5h 15 h
OCR 4 weeks 5h 15 h
Figure 11: Controlled tests
193
Students should submit unaided work produced under examination conditions for
assessment, but could make use of supporting work in the form of research or
preliminary studies and work journals developed during the preparatory weeks.
Whilst the controlled period work in Unit 3 was not expected to be a finished piece
of work, at Unit 6 the test required the production of a final piece using the
preliminary studies and investigation carried out during the preparatory time.
5.4.8. 3. Type and format of the assessment instruments
Instructions for the conduct of examinations stated the types of work candidates
should submit. The three awarding bodies usually required preliminary studies,
research work in visual and written forms, preparatory and annotated visual studies,
and final products. At the time the research was carried out, Edexcel was the only
awarding body requiring a compulsory work journal for each unit.
A conclusion drawn from the analysis of documents about art and design
examinations was that the assessment instruments were related to art and design
methods of inquiry. The evidence required from candidates through written and
visual work allowed wide variety of responses. Coursework and controlled period
work in the form of portfolios permitted the gathering of a wide range of evidence
through several tasks, allowing students to reveal knowledge, understanding and
skills through both process and product. According to Elisabeth the large amount of
evidence to be marked was beneficial for students. She said:
‘…you need it, the more you have, the more you can see, the more you
relate to it, it benefits the students’ (Elisabeth, L, 7-02-2002).
194
This may have advantages in terms of reliability of the results because it
enabled several assessments of the same trait allowing systematic sampling or
repeated measures, thus enhancing consistency in the assessment procedures
through corroboration.
The main problems detected with the type and format of the instrument were the
excess of tasks created by the modular system of assessment and requirement of one
synoptic assessment unit per year. The regulations set out by QCA seemed inflexible
in terms of time allocations and number of units per year. Such measures tended to
increase prescription. Hardy (2002) considered that the imposition of short timed
tests (units 3 and 6) was inappropriate for art, especially because English art and
design external assessment had previously showed a tendency to increase the length
of such examinations over time. The controlled period tests were imposed apparently
to align art and design assessment with other subjects, with little consideration
for established art and design assessment practices. Hardy (2002) suggested that
the mandatory controlled period tests could be interpreted as distrust of teachers
and students:
This lack of trust of teachers and students alike to keep their noses to the
grindstone is most blatantly illustrated by the extraordinary belt and
braces approach to assessment. If all units are to be examined why the
synoptic requirement for the timed test which, once again requires
students to back track and showcase skills already in evidence? (Hardy,
2002, p.57)
A conclusion was drawn that the type and format of tasks required in the assessment
in England reasonably provided accurate information about the performance of
individual students and generally had the characteristics of disclosure and fidelity
described by William (1992). The work submitted for examinations (research
195
studies, preliminary studies with annotations, work journals and final products
produced over long periods) provided a set of written and visual evidence of
attainment that was disclosed and faithfully recorded using methods of work related
to appropriate patterns of artistic learning and production. Cindy described the
instrument as ‘evidenced-based’ and suggested it could be used ‘creatively’ (Cindy,
NEW, 10-02-2001). Furthermore the instrument was also considered to be a good
preparation for students’ future careers by all the interviewees. The majority
described it as being writing-dependent; this was observed in the samples at the
standardisation meetings. In fact the need for written evidence was emphasised
through the instructions and mark schemes; one key issue of validity was the degree
to which writing skills were relevant to art and design courses. This requirement
could be seen as a consequence of the QCA requiring general skills for all courses
and also as a consequence of an underlying model of art education which valued
process more than (or at least as much as) products.
5.4.8.4. Process orientated instrument
The interviewees considered art and design examinations process orientated – a
change that had been introduced gradually. Earlier examinations had focused on final
products and little by little process had gained more emphasis. The emphasis on
providing evidence about process significantly changed the way teachers set courses.
According to Cindy:
‘…this is helpful, this is coursework oriented so, process oriented. It

opens up more possibilities by looking at how you are doing it, how
you think’ (Cindy, NEW, 10-02-2001).
196
However she believed that there was an imbalance of process and product in the
assessment and the instrument tended ‘…to focus unnecessarily on processes’
(Cindy, NEW, 10-02-2001). According to Annie, focusing on process and reflection
was helpful for students, but students found it ‘boring’ and ‘time consuming’ (Annie,
SE, 6-05-2002).
It is possible that the increased emphasis on process can be explained by trends in
contemporary art, for example in the conceptual art movement, which puts more
value upon the development of ideas than upon final products. It is compatible with
postmodern theories of art education, which value critical understanding of contexts
and development of ideas. The move towards an instrument based to a larger extent
upon process could signify a loss of content validity. For example Ian and Sally still
believed that final products were more important than process, which, according to
their explanation, was a consequence of their training and experience using
traditional methods based exclusively on assessment of final products (Ian and Sally,
L, 7-07-2001). It was evident that these changes were quite radical and, as a
consequence older teachers had to reframe their own view of the purposes of art and
art education after the Curriculum 2000 reform.
Within process-oriented examinations the amount of required written work from
candidates increased. It was noted during the observations and interviews that the
quantity of written work had increased with the new post-Curriculum 2000
assessment instrument being use at the time. Cindy stated:
‘This again is all words, there are so many words!’(Cindy, NEW, 10-
02-2001)
197
And Elisabeth said:
[The instruments] ‘rely a lot on writing, writing skills’ (Elisabeth, L,

7-02-2001).
5.4.8.5. Bias
At first glance the instrument did appear equally fair for all the students. During the
interview Elisabeth said:
I think the exams cover everybody from any country, there is a scope
there for everybody to find something of interest and as a teacher, again
you often pick up the students own interests and it is dependent on the
quality of the teachers (Elisabeth, L, 7-02-2001).
However, the interviews revealed some problems of bias. For example students from
big cities with good resources in their schools and access to cultural events could be
advantaged; the instruments could advantage girls rather than boys; students from
different ethnic backgrounds could be discriminated against and finally students who
had access to committed teachers or involved parents had clear advantages.
5.4.8.6. Gender bias
According to the 2002 art and design examination results (Joint Council for General
Qualifications, 2002) it appeared that for several reasons girls performed better than
boys. Elisabeth thought that this was because girls tended to develop superior writing
skills; ‘…they are more competitive; more mature than boys’ (Elisabeth, L, 7-02-
2001), and Cindy thought that it was ‘…because they are more systematic, more
responsive’ (Cindy, NEW, 10-02-2001). Bowden (2001) reported on a study of
boys’ underachievement in art and design examinations. Bowden claimed that the
emphasis on planning and use of sketchbooks, while valuable, inhibits art making
using media in a direct ways that do not, by their nature ‘…involve prior
198
investigation’ or ‘gathering systematic evidence or imagery’. He suggested that boys
in particular were resistant to preparatory work in sketchbooks before doing final
projects.
5.4.8.7. Bias related to non-English students

According to Elisabeth, Cindy and Sally, the examination was problematic for some
students from ethnic minorities because of the issue of English language. They
helped these students to understand the English language by explaining the
instructions in more detail. Annie said that the instrument was unfair for non-English
students because of the increasing emphasis on written responses in the examination
(Annie, SE, 6-05-2002).
5.4.8.8. Bias related to geographical location and economic background

According to Cindy it was very difficult for students in the town where she taught to
have access to books or major exhibitions. However the instrument required
‘informed answers’ and the question papers proposed starting points based on
research into exhibitions, books and the Internet. Whereas Cindy believed that
teachers could help students by supplying resources, she pointed out that if teachers
were not specially committed then this raised problems of equal opportunities
(Cindy, NEW, 10-02-2001).
Differences in local resources were not the only problem. The compulsory themes for
the question papers might also be a source of bias. For example, the theme for the AS
controlled period test at Edexcel, in 2001, was ‘Cities’. According to Ian and Sally
this was a very fashionable theme in that year because of the current major exhibition
at Tate Modern, in London. Ian commented that his students had no opportunities to
199
‘know a city’ or to see the exhibition referred to in the question paper and he
believed that this inevitably would influence their performances negatively (Ian, L,
7-07-2001). Sally and Ian also said that the examination required a lot of extra work
outside school from students and this also was a bias because it disadvantaged
students from low-income families who had part time jobs (Ian and Sally, L, 7-07-
2001).
5.4.8.9. Teachers’ influence upon students’ achievement
According to the review of literature on art and design examinations in England,
the English model of external assessment had permitted a great degree of teacher’
ownership of assessment in the past, but it was losing this strength. The teachers
who were interviewed felt they had less and less autonomy within the increasing
prescription imposed by the awarding bodies and the QCA. However, the role of
the teacher was still considered essential within the English model of external
assessment. It was obvious that students with access to good teachers would have
more opportunities to develop their knowledge, understanding and skills in art and
design. This topic arose frequently in the interviews. Cindy said that ‘…teachers’
effective explanation of the instructions for examinations and orientation of the
work’ could help students to perform better (Cindy, NEW, 10-02-2001). It may be
that the examination instructions were not sufficiently clear or accessible to all
students. It seemed that teachers often had to remedy the ambiguity and complexity
of the instructions so as to help avoid bias, for example, by providing resources for
students or translating instructions more simply, particularly for those with
inadequate English language skills.
200
5.5. Assessment criteria
In the English art and design examinations reviewed for this study, the assessment
criteria were developed from general requirements set out by the QCA. The four
assessment objectives established by the QCA for art and design were:
• A01 : Record observations, experiences, ideas, information and insights in

visual and other forms, appropriate to intentions
• A02 : Analyse and evaluate critically sources such as images, objects,
artefacts and texts, showing understanding of purposes, meanings and
contexts.
• A03: Develop ideas through sustained investigations and exploration,
selecting and using materials, processes and resources, identifying and
interpreting relationships and analysing methods and outcomes.
• A04: Present a personal, coherent and informed response, realising intentions,
and articulating and explaining connections with the work of others (QCA,
2000).
The interviewees considered the assessment objectives published by QCA clear and
appropriate for art and design courses. In Cindy’s words they were ‘excellent’;
‘broad’ and respected the nature of art and design. The assessment objectives that
seemed to provide the frame within which the criteria were established by the
awarding boards acted as ‘windows’ for teachers to focus on particular work in the
way Boughton (1996) has proposed. In effect, the assessment objectives
corresponded to broad criteria. In Richard’s opinion assessment objectives provided
essential guidelines for teachers and students:
It’s up to the student to decide what they want to present. The key to this
is how they are answering the assessment objectives, and criteria. We
don’t actually specify exactly what they should do, we give them
instruction material [instructions for examinations] – the assessment
objectives are the key to that (Richard, SEE, 4-02-2001).
However, a conclusion was drawn that the assessment objectives were not as broad
as they appeared at first sight. Some problems of content and face validity in the
assessment matrix described by the interviewees could be a consequence of such
201
assessment objectives. First of all, the term ‘objective’ suggests a rigid, compulsory
and universal concept, which is not in accord with the postmodern emphasis on
fostering plurality and ‘little narratives’ rather than universal ‘truths’. Secondly,
objectives were centrally imposed and not negotiable or adaptable to different
contexts. So, on the one hand, the assessment objectives seemed to be broad and
include essential aspects of knowledge, understanding and skills, but on the other
hand, the fact that they were imposed may restrict the nature and range of art and
design work. For example, Elisabeth (7-02-2001) stated that not all students needed
to look at the work of other artists in order to develop their own ideas. Cindy (10-02-
2001) pointed out that not all students had to make connections between their own
works with the work of others; some students developed strong ideas and produced
very creative outcomes without systematically using references or connections with
the work of others. Annie (6-05-2002) particularly questioned the need to link
students’ work to work of other artists. However the assessment objectives required
all students to follow the same method and submit evidence for all the assessment
objectives in each and every unit.
5.5.1. Mark schemes
The assessment criteria or assessment matrix published by the awarding bodies in
England were intended to guide teachers and moderators to focus on those aspects to
be developed by students in their work and in doing so a common language of
assessment was established. Elisabeth said:
202
…because it’s focusing, it’s making teachers and students focus
on those specific things , …teachers were looking at the work in a
different way, they were confronted with this new matrix, and
they really have to look at the work that they were marking… It
clearly guided teachers and moderators (Elisabeth, L, 7-02-2001).
Drawing on Williams (2001) ideas, a conclusion was drawn that the assessment
objectives and subsequent assessment criteria presented problems of content,
construct and face validity. The instrument placed an emphasis on activities that were
not necessarily the core of art and design because of the search for objectivity. In an
article entitled ‘Forced in the same mould’ published in The Times Educational
Supplement (November 2001), Williams suggested that the search for objectivity in
examinations was distorting art and design learning and assessment. The article
reported several teachers concerns with the new GCSE and GCE AS art and design
examinations. They claimed that students who were creative, but not good at
recording, were being penalised and, vice versa, students who were not particularly
outstanding with only the ability to record could achieve very good grades. Williams
concluded that the obligation to fulfil all the criteria in each unit of work was ‘…
making art into a hoop-jumping exercise, and creating a topsy-turvy world whereby a
candidate with poor creative and technical ability can nevertheless gain a decent
grade if all the boxes can be ticked’. Teachers were not against the introduction of
criteria related to historical context and contemporary artwork but did not agree with
the requirement to meet such criteria in every piece of work, which in their view
prevented students developing ‘exciting work’. Steers was quoted in this article as
saying: ‘We have stopped assessing creative and technical skills and the imagination
of students’.
203
In the same issue of the Times Educational Supplement there was an article by the
head of art at North London Collegiate School entitled: ‘Farewell to the Wow factor’
(Hardy, 2001). Hardy asserted that risky experimentation and open-ended
exploration had been reduced to a minimum and ‘…it seemed as though the course
had been deliberately designed to reward mediocrity’. Hardy strongly disagreed with
the unit structure of the courses and questioned the advantages of a common
structure across subjects claiming that ‘ …art is important as a necessary divergent
foil to the convergent curricular core’, and that one of the most essential aspects of
art ‘…that joy at seeing students transcend the shackles of the syllabus’ was being
lost. He argued that ‘ The equal weighting for all objectives has led to the reward of
the spurious over the essential’. The analysis of the assessment objectives and the
detailed mark schemes of all awarding bodies confirmed that creativity, imagination
and experimentation were seldom explicitly mentioned but, in contrast, recording
and researching sources were heavily emphasised. This may be a consequence of the
underlying conceptual model, which emphasised understanding, and criticism of
contexts more than creativity and innovative studio production. However, it may also
represent a quest for more objective ways of marking, as Elisabeth suggested it was
‘…much easier to assess the ability to record and write rather than to assess
genuinely creative work and risk taking’ (Elisabeth, L, 7-02-2001). A conclusion was
drawn therefore that the content validity of the instrument was being threatened by
the search for greater ‘objectivity’.
5.5.2. Criterion referenced marking
According to Richard, publication of detailed mark schemes was intended to
reinforce criterion-referenced marking, setting out clear demarcations between
204
statements of achievement and assessment objectives. Richard expressed the views
of an examiner when he argued in the interview that:
It has to be mathematical, this criteria, ……….I think it is very

straightforward, because it is based on the idea, if you got 3 marks in
each box and you know it is basically the idea of marking through the
criteria, it gives flexibility (Richard, SEE, 4-02-2001).
For examiners who look for uniform judgements in order to obtain accuracy of
assessment, the ‘objectivity’ of numbers may be tempting. However, as the literature
on art assessment shows, assessing artwork is not possible through a simple
aggregation of numbers. Judging an artwork involves more than counting the parts; it
is through holistic appreciation the overall harmony between the parts is recognised.
According to Gestalt theory, a work has to be judged as a whole because the whole is
more than the sum of its parts (Boughton, 1996; Eisner, 1985; Wiggins, 1993). Rigid
methods of criterion-referenced marking might not be appropriate therefore to judge
a whole. Analysis of the recommendations for marking during the 2001 Edexcel AS
standardisation meeting showed they were driven by a rigid criterion-referenced
process of marking of each assessment objective followed by aggregation of the
marks. Interestingly one Assistant Principal Moderator at the 2002 Edexcel A2
standardisation meeting (4-05-2002) recognised this when he advised moderators to
start looking holistically before marking by criteria. In the interview (10-02-2001)
Cindy admitted to assessing students’ work by combining criterion-referenced and
holistic methods. A conclusion was drawn that in practice, methods of assessment
were not exclusively criterion referenced.
Elisabeth, Cindy, Ian and Sally objected to the marking scheme, which they
considered inappropriate for judging the quality of art works. In Elisabeth’s words:
205
…it needs something else, it is that peculiar thing for the art subjects, I
don’t know what to call it…. it is not necessarily the creativity … it is
something that makes it different…that extra…I don’t think this is
included. I think that it should be something else (Elisabeth, L, 7-02-
2001).
Elisabeth recognised that judging art works was not a rigid and immutable process,
she understood that holistic marking processes should be taken into account. Overall
judgement or holistic marking is widely viewed as a legitimate procedure for
assessing art and creative work (Hennessy, 1994; Eisner, 1986; Armstrong, 1994).
Boughton (1999) reported that in the 1999 revision of the International Baccalaureate
examinations, overall judgement was included in the list of criteria in order to
improve the model.
Overall judgement or holistic assessment was not mentioned in the GCE assessment
matrices published in 2000, and according to previous examination reports this was a
radical change from the traditional processes of assessment in art and design.
A report on art and design examinations published in 1999 by the AQA awarding
body noted that teachers and moderators usually assessed students’ artworks by
holistic judgements. So even in 1999 it was recognised that criterion-referenced
assessment was in conflict with the nature of learning in the arts:
There are inherent dangers in devising coursework experiences,

which present the objectives as discrete elements to be covered in a
mechanistic and often self-contained manner. Similarly, a
distinction has to be made between the assembling of information,
as an end in itself, and the use of this information in subsequent
developments. This issue is referenced in one report and amplified
in others. It states, ‘Generally, a major concern is this – the
assessment objectives reflect intentions and responses within the
holistic nature of making Art. This is an art course and the
candidates’ artwork – whether progressive, developmental or
outcome – is what we reward’ (AQA, 1999 Art and Design, Report
on the examination)
206
However, the regulations for GCE mark schemes in England introduced in 2001
for new modular A/AS examinations used criterion-referenced marking exclusively
with the aim of making assessment in all subjects more ‘objective’ and uniform.
When interviewed, Elisabeth objected to this saying:
I think this is difficult to understand by non-art people. This is what we

have been told we must do, we have been told for a long time that we
must move to be in line with other subjects, and this has been terribly
difficult. It is very difficult because you have non-art people telling you
to do things when they don’t really understand the art process. That’s
why I think that in the assessment there should be that little extra, there
should be another objective somewhere to cover (I know that it sounds
terrible arrogant) [the fact] that art is different. I think the process is very
different; it is a very different process to any other subjects. I think that
there are other things which go into it, although very difficult to assess,
but need acknowledging, you need to acknowledge students who are
particularly creative (Elisabeth, L, 7-02-2001).
This raised the question of whether mark schemes exclusively based on discrete
criteria were appropriate for art subjects. Elisabeth and Cindy viewed criterion
referencing as an imposition introduced mainly ‘to align it with other exams’.
When these criteria are compared with Boughton’s concept of criteria as ‘windows’
(1996), they may be too small for art and design, which explains Elisabeth thought
that they ‘ just need opening out a little bit more’.
But Richard, who was chief examiner, expressed the view that the majority of
teachers agreed with existing mark schemes and a considerable consistency of
marking was achieved:
‘There is an agreement about the marking; we have less than ten re-
marking situations in about 100 centres. Teachers generally were very
positive about it; they were not negative about the marking scheme.
207
That mark scheme actually works quite well’ (Richard, SEE, 4-02-
2001).
This suggests that mark schemes were clear and easy to apply despite the complaints
of being inappropriate. However, consistency of results could also be a result of
teachers’ tacit knowledge. It could be a demonstration of connoisseurship, which is
an essential quality for achieving reliability of results (Eisner, 1985).
5.6. Assessment procedures
Any instrument ultimately depends on the success of assessment procedures.
Training teachers and moderators was at the core of the English model of
assessment. In fact the breath of submitted evidence and the nature of art and design
criteria in this assessment model made in-service training a condition of success.
5.6.1. In-service teacher training
Well, the training is based on this [assessment criteria] and based

on samples, so we have standardisation samples for teachers.
When we look at these criteria [sic], this is before they mark,
alongside particular samples of work, and so they can see what
the numbers are, and also leave that meeting with a booklet with
samples alongside the marks and commentaries, so that is used by
everybody. Nothing is hidden in how it is marked; everybody
knows exactly what is to be marked, and all hinges on the
interpretation of criteria (Richard, SEE, 4-02-2001).
Although the awarding bodies had established procedures in place for teacher
training in the form of INSET courses, attendance was not compulsory; Cindy, Sally
and Ian found it difficult to attend because of the great number of candidates and
because of the demanding workload at school. Cindy (10-02-2001) claimed that,
208
while the government was especially interested in teacher improvement, there were
insufficient facilities for ‘guaranteed continuous professional development’.
The teacher interviewees regarded teamwork and regular teachers meetings at
regional level to share and discuss assessment as a most important strategy to ensure
consensus about standards and criteria. Training, as understood by Cindy, not only
depended on centralised courses but was part of a long-standing tradition of art and
design teachers in England using dialogue to arrive at consensus about assessment
(Cindy, NEW, 10-02-2001). However, Cindy said that such practices, once common,
had tended to disappear. The courses promoted by the awarding bodies at the time of
the research were not intended to promote negotiated assessment procedures.
In Cindy’s view they were designed to impose policies. However, they must have
worked, to some extent, because the instructions for external assessment were
explained to teachers. A conclusion was drawn that an efficient system of training
can foster common interpretation of criteria.
5.6.2. Internal marking and internal moderation
The analysis of documents revealed that the assessment procedures were the same
for all the awarding bodies, because the QCA Code of Practice defined them.
Students’ own teachers assessed candidates’ work in ‘centres’, which may be one
school or college, or a group of institutions. Typically the internal marking included
internal moderation. Teachers were required to write down marks for all units of
work, for all candidates, using the assessment matrix and then transfer a final mark
for each unit for every candidate onto specially designed forms; they had to arrange
209
a display of folders of work from candidates in the moderation sample, ensuring that
all pieces of work within each unit were clearly labelled and to send a copy of
relevant marking forms to the awarding body.
Cindy said that in her school internal moderation took place as required by the Code
of Practice and awarding body instructions, but in Sally’s school this was not the
case, possibly due to pressures of time, extra work and an uncommitted head of
department. Sally said that she experienced significant problems with a lack of
information exacerbated by the fact that it was her first year of teaching and she got
little support from her head of department. Cindy’s description of the process was
very different. As an experienced teacher, she had clearly acquired appropriate tacit
knowledge and contacts with other schools and the art department at her school
worked cohesively.
According to Richard, who worked at the OCR awarding body, there was a high
consistency of results in the 2001 examinations and Elisabeth commented that no
significant changes had been made to internal marks in the Edexcel AS examinations
in the schools she visited as a team-leader in 2001. Therefore in Elisabeth’s and
Richard’s views there was consistency of results, which suggests that internal
marking, and moderation were completed according to expected standards.
5.6.3. Standardisation
At the time this research was carried out, the awarding bodies in England had
established a team structure for standardisation and moderation procedures
210
involving: chief examiner(s); principal examiners; assistant principal moderators;
team leaders and moderators. Standardisation procedures consisted of meeting(s) to
secure consistent application of mark schemes by all examiners. According to the
Code of Practice, standardisation involved the following phases: (1) an
administrative briefing, normally by an awarding body officer to explain awarding
body procedures, time schedules, documentation and contact points; (2) explanation
by the principal examiner of the nature and significance of the standardisation
process; and a briefing by the principal examiner on relevant points arising from
previous examinations, drawing as necessary, on relevant statistical data and points
made in reports about examinations; (3) training and simulation of marking
candidates’ work. This agenda was strictly applied during the two observations of the
Edexcel awarding body carried out as part of this research (Appendix X).
5.6.3.1. Observation of standardisation procedures
At the Edexcel awarding body there was evidence of formal training and
standardisation for moderators in one-day meetings for a large group consisting
mainly of teachers. Fifty-three moderators attended an AS standardisation meeting
on 3rd May 2001 and 80 moderators attended an A2 standardisation meeting on 4th
May 2002. Training and simulation of marking candidates’ work was observed
taking place during familiarisation and blind-marking exercises.
In standardisation, the awarding body used exemplar work from its archives and
‘live’ work collected from centres in the year of examination to explain how
candidates’ work should be marked in accordance with assessment criteria. In
addition, exemplar portfolios for teachers and moderators should be published in
211
book form and videos should be available to centres. According to the QCA Code of
Practice, examples of coursework assessment and moderation must be prepared by
the awarding bodies containing assessed work that show ‘ clearly how credit is to be
assigned to particular assessment objectives’ (QCA, 2000, p. 17). At the AS and A2
Edexcel standardisation meetings attended on 3rd May 2001 and 4th May 2002 there
was evidence of some exemplar work; but a great part of the examples displayed at
the latter consisted of photocopies.
The work was displayed by unit. Near each display were the title of
the unit and a copy of the assessment matrix. Six sets of works were
already marked, accompanied by the completed assessment matrix
with underlying objectives and marks attributed for each objective
(familiarisation exercises). Seven sets of works were not marked
(blind marking exercises), they were marked during the second part
of the meeting by the moderators (researcher diary: observation AS
standardisation meeting, Mansfield, 3rd May 2001).
Only two samples presented the original student work. A set of ten
samples of photocopied students’ work was displayed for
familiarisation; an exemplar of the assessment objectives and marks
attributed to the work was attached to each sample A set of twelve
samples of photocopied students’ work was displayed for blind
marking exercises. An exemplar of the assessment objectives and
marks proposed for the work by centres was attached in each
sample. In the same fashion as blind marking samples, a set of 7
samples of photocopied students’ work was displayed as ‘RES’, or
reserved work, to be used if the moderators could not attain
accuracy during the blind marking exercises (researcher diary:
observation A2 standardisation meeting, Mansfield 4th May 2002).
During the observations of Edexcel standardisation meetings, the chief examiners,
assistant principal moderators and team leaders explained the assessment criteria to
the moderators using samples of student work during the ‘Familiarisation Exercises’.
Although the samples were not very varied the exercise seemed to help moderators
understand how to apply the assessment objectives and trained them in the
moderation procedures (See Appendix X).
212
All the moderators received photocopies with the assessment matrix
for each one of the examples with the marks and the justification of
the marks attributed to each one of the assessment objectives
(which were underlined). The Team Leader asked the moderators to
look carefully at the examples and to the attributed marks and to
discuss them. Afterwards, the Team Leader explained the work and
the attributed marks showing the evidence related to the assessment
matrix and why it was marked in that particular way (researcher
diary, observation AS standardisation meeting, Mansfield 3rd May
2001).
The team leader explained the marks using the language of the
assessment matrix and adding comments such as ‘the work revealed
a weak area of response to other artists’; ‘the student revealed
confidence in some areas’; ‘the student spent too much time on
research’; ‘the degree of analysis lacked depth and breadth’
(researcher diary: observation A2 standardisation meeting,
Mansfield, 4th May 2002).
The ‘Blind Marking Exercises’ consisted of simulation of marking candidates’
work.
According to one of the chief examiners the purpose of the blind

marking was the training of moderators through the marking of
examples of students’ work that illustrate the range of performance
likely to be demonstrated by the candidates in the centres and
helped moderators to consolidate a common understanding of the
application of the mark scheme. The purpose of the standardisation
was not to discuss the assessment matrix. In the words of one
assessment coordinator: ‘the moderator has to accept examples not
to discuss them’ (researcher diary: observation AS standardisation
meeting, Mansfield, 3rd May 2001).
Standardisation seemed to be a very rigid procedure. Richard (4-02-2001) explained
that this was training, although Ian (7-07-2001) expressed some doubts about this
training. During a GCE AS Edexcel standardisation meeting in 2001, standardisation
was observed by the researcher to be a strategy to ensure uniform interpretation of
criteria; moderators were ‘calibrated’ to apply the mark schemes without having an
active opportunity to discuss them. The researcher did not perceived standardisation
213
as negotiated interpretation but rather as the imposition of the chief examiner’s
interpretation of the criteria. From the events that took place during the Edexcel A2
standardisation meeting in 2002, a conclusion was drawn that standardisation was
also a strategy to evaluate moderators’ marking accuracy.
….the group had to mark the blind marking samples; and verify if
the marks attributed by the centres to the works were consistent. It
was an individual exercise; moderators had 10 minutes to mark
each sample; they marked in silence; the team leader was constantly
advising the group to read the sketchbooks because ‘everything or
almost a great part of the evidence is in there’ (researcher diary,
observation A2 standardisation meeting, Mansfield, 4th May 2002).
However problems of time and lack of diversity of samples were apparent. It was not
clear in the time available whether or not the standardisation would be accurate.
During the exercises a speaker was constantly reminding the group

about the schedule; moderators had to move very fast (researcher
diary, observation A2 standardisation meeting, Mansfield, 4th May
2002).
A conclusion was drawn that the large scale of the standardisation meetings could
work against the achievement of consistent marking. The most positive aspect of
standardisation appeared to be the cascade strategy through which chief examiners
standardised assistant principal moderators and team leaders and, in turn, team
leaders standardised a group of moderators. Familiarisation exercises usefully
exemplify the application of assessment objectives in students’ work and assisted a
common interpretation of criteria. The blind marking exercises provided a useful
mechanism evaluating moderators’ accuracy in applying criteria and consistency of
marking. It was concluded that standardisation meetings were also means of
enhancing moderators’ performance through team leaders’ feedback providing
214
additional training meetings or individual training for new moderators by team
leaders where necessary.
5.6.4. Moderation
According to the Code of Practice (QCA, 2000) moderation was designed to verify
the consistency of teachers’ marks in centres. In the English model it played a very
important role in ensuring the reliability of teachers’ (and moderators’) internal and
external assessment, not only in ensuring that the same standards were applied but
also the fairness of the system. Students’ work submitted for examination was
guarded from the personal idiosyncrasies of the teachers through this process.
The fairness of results was ensured and teachers could use moderator’s feedback
positively to improve their performance.
Not all the candidates’ work was submitted for moderation. The moderation sample
comprised work randomly selected across the mark spectrum by an awarding body
and candidates achieving the highest mark and the lowest marks at a centre. Centres
had two options for moderation: (1) to send a moderation sample to the awarding
body or (2) to have the sample moderated at the centre by a visiting moderator.
Option (2) was most commonly used for art and design examinations.
Moderation consisted basically of ensuring that standards were maintained from year
to year and were consistently applied. Moderators arranged the visits to centres
within fixed time schedules set up by the awarding body. A centre might be visited
by a single moderator or by several, for example accompanied by team leaders,
assistant principal moderators and chief examiner. At the start of a visit the head of
215
department ,or the teacher representing him/her, described and discussed the course
content and organisation, any problems experienced and the timetable (Edexcel,
2000, Instructions for teachers). According to Edexcel’s Instructions for moderators
(Edexcel, 2000, Instructions for moderators) the primary function of a moderator was
to decide whether marks awarded by a teacher were in accord with published
marking criteria and conformed with overall national examination standards.
5.7. Conclusions in terms of validity summarised
5.7.1. Weaknesses
The main weaknesses of the model were related to the modular structure of courses
and assessment, which were understood to threaten validity. This presented practical
problems for teachers and students because of the enormous amount of evidence
required and pressures of time. The negative consequences were that it distorted the
curriculum. The following particular weaknesses in terms of validity were
identified:
• Some problems of response validity were detected in the question papers for
the controlled test because of the written nature of the papers and
information required.
• Sources of potential bias included: gender bias, social class and geographical
backgrounds bias, and ethnic bias. But, at the same time, teachers appear to
have strategies for minimising these potential disadvantages by giving
additional help to students. These strategies could be adopted because
students’ submissions for external assessment were guided by the students’
own teachers.
• Process and product were not equally weighted in the assessment criteria.
216
• Core domains of art and design and essential knowledge, understanding and
skills in art and design were minimised by the mark schemes.
• A major problem was caused by the mark schemes that were entirely based
on criterion referenced marking, and which rejected art and design
judgements based on holistic methods.
• The examination imposed strong policies without sufficient respect for
traditional beliefs and practices in English art education A great
contradiction was found between teachers’ perceptions of art and design
assessment and the centrally imposed model of criterion referenced mark
schemes.
• It was apparent that examination developers at Edexcel awarding body did
not pay sufficient care in the design and development stage of the instrument
to the reality of art and design practices through empirical research and
analysis of needs because government imperatives were implemented in
haste.
5.7.2. Strengths
The following strengths in terms of validity were identified:
• The model had well articulated criteria, illustrated by visual exemplars
provided by the awarding bodies, which was a great advantage in terms of
understanding and applying the criteria.
• The instructions for the examination included a framework for identifying
learning needs and for interpreting examination performance; a broad
explanation of what candidates should be able to do in the different domains
using statements of achievement; broad explanations of what candidates
217
should be able to submit for the examination as final products, sketchbooks,
preliminary studies, etc., and clear explanations of how examination work
was graded.
5.8. Conclusions in terms of reliability summarised
5.8.1. Weaknesses
The archives of exemplary work, as required by the QCA Code of Practice, were
vital to the success of teachers and moderators’ training. However the lack of
samples for training at Edexcel seemed to threaten the validity and reliability of the
examination because the clarity of criteria and standards depended essentially on
these samples.
5.8.2. Strengths
The assessment procedures in the English model had significant advantages over the
Portuguese model in terms of reliability. The following strengths were identified:
• The type and format of the instrument, which was based on multiple evidence
of student’s achievement produced during coursework and controlled tests,
provided several assessments of the same trait. Systematic sampling was
possible in order to obtain consistency through multiple measures.
• The awarding bodies system of standardisation of teachers and moderators
was well designed to develop a uniform understanding of the standards.
• This was an advantage in terms of reliability of the procedures because there
was a common understanding of criteria and also because teachers,
moderators and awarding body officers were constantly monitored.
218
• External moderation provided a way to check the consistency of results in
relation to agreed national standards determined by awarding bodies.
5.9. Practicality
I think it is true to say that all the art and design examinations involve
internal assessment and external moderation. Most of those who have
written [letters from members of NSEAD] welcome the involvement of
teachers in the assessment process but question the ‘unpaid’ extra work
this involves… An additional problem is that the marking load is all
concentrated around Easter and cannot be moved forward without further
disadvantaging students whose courses in art and design are already
effectively shorter than for the most other subjects (Steers, 2001).
The interviewees constantly referred to problems with the practicality of the
examination. For example, problems with logistics – there was not always room in
the schools to exhibit and store the amount of evidence produced for examinations;
unreasonable teacher workload – there were difficulties in setting schedules for
assessment within schools; with regard to teachers timetables and the unpaid extra
work that they carried out. This suggested that the examination was not sufficiently
adapted to available resources. The fact that three separate units of work were
assessed in each year increased the problems of practicality, especially because this
required a great amount of extra work for students and teachers, which was seldom
provided for in school timetables.
5.10. Impact
The English model of external assessment was complex and fulfilled several
functions. These included temperature taking, gate keeping, determining whether
219
assessment objectives had been attained, providing feedback for teachers on the
quality of their professional work, and ranking schools. It was not used only for
purposes of students’ selection for higher education or for judging student
achievement, but also as a way to monitor centres and teachers’ performance and
methods of delivering the curriculum. Assessment results were important for all
kinds of users, including parents, students, teachers, employers, local educational
authorities, and government and higher education institutions.
According to Cindy (10-02-2001: ‘It [art curriculum] is very much assessment driven
and criteria driven’. The impact of external assessment upon schools was considered
to be potentially negative:
The more good results the schools have in examinations the more money comes in. I
am not sure if you are a parent you want to send your son or daughter to a school – if
a school has poor grades you won’t look for that school, you look for another school
(Elisabeth, L, 7-02-2001).
The interviewees constantly referred to concerns about the consequences of the
examinations upon teachers, curriculum and schools. They suggested that
examinations were intended to influence practice, distorted the curriculum, imposed
policies, and restricted core domains of art and design and limited learning time.
The examinations could also have a very negative impact upon the type of work that
students developed during the courses because of strict instructions or strict
interpretation of the examination instructions and exemplar material.
220
During the standardisation meeting the team leader warned moderators
about the similarity of students’ works in centres. I thought that it was
kind of revealing that in the centres the students followed strict
orientations and the work should be very similar, at least in the processes
used and organisation; does this mean that art and design in schools is
very prescriptive and there is no great variations in students outcomes? In
this school [St. X School] looking at the art students’ works I have
exactly this feeling: all the student work seems to follow strict
prescriptions even in the kind of sources they used (research diary; 8th
May 2002).
The interviewees considered that accountability in the form of league tables was
excessive. It was considered to have a major negative impact upon schools and
students; it was seen as unfair:
Grades are public. Teachers’ unions have been claiming that it is not a
way to judge a school, there are some very good schools doing amazing
things but not getting good grades, they are doing a lot of social skills,
lots of activities. Things in the community – none of these are assessed.
It is very unfair (Elisabeth, L, 7-02-2001).
A dramatic situation in which students were advised to withdraw from an
examination because schools did not want to get bad examinations results was
described by Sally (7-07-2001), as an extreme example of the negative impact of
examinations results. Similar situations were reported by the press (Halpin, 2003) in
which teachers ‘cheat’ by helping students with coursework submissions or refuse to
enter less able students for examinations so as not to damage their ranking in the
league tables.
A conclusion was drawn that the examiners’ reports published at the end of each
examination had a positive impact because they provided feedback for teachers and
schools. These included an analysis of work submitted for examinations and included
an evaluation of art and design practice in schools. Awarding bodies also published
general reports about examinations, which constituted a form of evaluation of all
221
the examination procedures, analysing strengths and weaknesses and providing
recommendations for teachers and moderators. This feedback provided by awarding
bodies to teachers and moderators was a very positive consequence of the model,
providing a form of training for teachers and moderators. As Elisabeth said:
We all report as well, as moderators. We pick up lots of things and

then put it into the reports; there is actually quite a lot of feedback
from the centre by the moderators. It is important that teachers have
this feedback. The booklet, reports that the exam board sends to
everyone, it is quite useful, quite a lot of comments in there about
the tasks, highlights things that have gone wrong and again is up to
them to take that [advice] (Elisabeth, L, 7-02-2001).
5.11. Suggestions for art and design examinations
The following key ideas for developing art and design examinations emerged at this
stage of the research:
1. It would be necessary to develop an instrument of assessment that provided
authentic tasks, related to art and design methods of learning and making,
such as portfolios or selected coursework with no terminal test.
2. It should be equally weighted requiring evidence of process and product.
3. It should allow both written and visual language; enable students to decide
individually the best kind of communication skills for them to use.
4. It should not fragment art and design teaching and learning into modules of
assessment. A single final ‘examination’ should be developed to confirm
coursework evidence.
5. It should have flexible criteria allowing a combination of criterion referenced
and holistic assessment.
6. Clear unbiased instructions for the examination will need to be produced.
222
7. Teachers should be trusted to deliver the examination under normal
classroom conditions and to explain the tasks and criteria to students
according to their different individual contexts and situations.
8. The instrument and procedures should be based on teachers’ ownership of the
assessment.
9. It should be adequately piloted.
10. Regional programmes of compulsory teacher and moderator training should
be integrated to develop a common understanding of criteria thorough
dialogue and consensus.
11. Processes of checking teachers and moderators’ assessment performance
need to be developed.
12. Provision for examination feedback for schools and teachers would be
necessary.
Summary
Art and design examinations in England were traditionally characterised by
flexibility and diversity, but the Education 1988 Reform Act brought about a chain of
considerable changes culminating in the Curriculum 2000 reform, which tended to
centralise art education curriculum and assessment with greater prescription and
monitoring. Without pretending to give a general evaluation of the whole system,
some positive and negative aspects were noted which provided insights for the
development of a reformed assessment instrument and assessment procedures in
Portugal. The English instrument for external assessment was considered generally
relevant and appropriate to art and design methods and courses. The tasks addressed
context and authentic situations. It was a process-orientated instrument, emphasising
the need to research artworks and design products. Requiring written evidence as
223
well as visual evidence provided an opportunity to assess the same knowledge,
understanding and skills on separate occasions. However the instructions were not
always considered clear for the users and the instrument relied on teachers’ particular
approaches, which could be a threat to the response validity of the instrument.
The underlying model of the instrument raised some problems of content and
construct validity, especially in the ways that the mark schemes appeared to
minimise core aspects of art and design such as production, exploration, independent
thinking, risk-taking and more genuinely creative artistic work. The modular
structure of the assessment raised problems of validity, threatening the established
developmental process of art and design learning and making.
The instrument was judged as not equally fair for all the students and problems of
gender, social class and ethnic bias were found. The model offered considerable
reliability of assessment results and suggestions were made about the reasons for
such reliability, including the type and format of the instrument, the traditional
professionalism of teachers, the strong system of in-service training and the
structured system of control of assessors and moderation. Finally a list of
consequences of the assessment was provided illustrating positive and negative
effects upon students, teachers, schools and curriculum.
224
Chapter 6
Design and piloting of a new external assessment instrument and procedures in
Portugal
This chapter is a report of a new conceptual framework for art and design
examinations at pre-university level that was developed and tested out in this
stage of the research. The content framework for the art and design
examinations is discussed in order to explain the materials developed for the
pilot experiment, which is described and analysed.
6.1. Introduction
Some suggestions and draft materials for the new assessment instrument and
procedures were devised during the first stage of the research taking into account
ideas in the research literature, and analysis of the data from the study of Portuguese
and English art and design current examinations.
After analysing the data and conclusions of the study of English and Portuguese art
and design examinations described in Chapters 4 and 5, some principal guidelines
and constraints were drawn up for a new model of external assessment in Portugal.
Although the English examinations had some of the strengths sought by Portuguese
participants, problems had been identified by the researcher which needed to be
addressed in order to avoid future problems of validity and practicality, for example,
sources of bias and teacher and student overload (See Appendix XXVIII).
6.2. Designing a new assessment instrument and procedures for art and
design external assessment in Portugal
225
The design of the assessment instrument and procedures was developed in three
stages:
• Stage 1: Development: Rationales, draft specifications of the assessment
instrument and assessment procedures.
• Stage 2: Trial 1  Piloting: Negotiate with participants; trial with a
reasonable sample, evaluate and refine the draft specifications for the
assessment instrument and assessment procedures.
• Stage 3: Trial 2  Implementation: trial a larger scale experimental
assessment and evaluate the specifications for the assessment instrument and
assessment procedures (see Chapter 7).
6.2.1. Rationales
The first stage of the design concerned the specifications for the proposed art and
design examinations. A rationale and a content framework were developed from
ideas encountered in research literature; the views of teachers and students (Chapter
4); and through an analysis of Portuguese art and design syllabuses. The specification
of the instrument followed the rationale for art and design education described by
Swift & Steers (1999) who argued that art education should address three
fundamental principles: difference, plurality and independent thought.
These principles are embodied in the promotion of risk-taking, personal enquiry
and challenging established orthodoxies, especially those associated with cultural
hierarchies. Thus, learning in the arts should develop disciplinary knowledge,
creative and critical thinking, and the skills to interrogate dominant ideologies.
The rationale was further inspired by four dimensions of student
understanding in the arts proposed by Ross, et al (1993, p. 51):
226
1. Conventionalisation: an awareness and ability to use the
conventions of art forms.
2. Appropriation: embracing for personal use, the available
expressive form.
3. Transformation: the creative search for personal knowledge.
4. Publication: the placing of the outcome in the public domain.
These dimensions of aesthetic learning operated as four basic quadrants from which
to systematize the contents and skills of art and design for assessment purposes.
6.2.2. Skills in art and design
The review of literature (Chapter 1/2) provided some insights into what is
or what should be assessed in art and design. Eisner (1972 p. 212- 216)
claims that the qualities that could be evaluated in the arts are: 'Technical
skills, aesthetic-expressive aspects, creative imagination', and he notes
that these are in constant interaction. According to Dorn (2002) there are
three important elements that need to be assessed in art education:
expression, knowledge and skill, and concept formation.
6.2.2.1. Knowledge base
The design of the new assessment instrument was influenced by the idea that
creativity or creative thinking requires a substantial knowledge base, (Alexander,
1992; Amabile, 1987). The knowledge base should include theory, for example
concepts and techniques of art and design, as well as knowledge of past and
contemporary art, design and media works from different cultures. Knowledge and
understanding of art and design works (and, more broadly, visual culture) using both
formal and contextual analysis was considered very important. But the new
instrument should not only provide both a test of memory and analytical skills; it was
227
essential also to include critical skills. Creative work in art and design depends on
knowledge of art and design works, understanding the principles and elements of
visual language, and the acquisition of skills, technologies and materials. (Dobbs,
1992; Best, 1996). Creative art and design work needs the contextual knowledge of
art and design discovered through research and critical analysis of wide-ranging
material from visual culture. An understanding of disciplinary knowledge may
enable students to extend boundaries through personal appropriation; what Eisner
(1972, p. 217-222) called ‘boundary pushing’: the ability to attain the possible by
extending the given.
Like other creative processes, artistic work involves memory and researching
relevant resources, response generation and response evaluation (Amabile, 1990).
Through understanding the symbolic rules and conventions the student can combine
and reorganise existing knowledge structures or conceptual categories. These new
linkages or rearrangements can provide new ways of understanding a problem
situation, providing a source of new ideas.
6.2.2.2. Thinking skills and metacognitive skills

Intentions and motivations are seen as intrinsic parts of the creative process.
The student receives stimulus in the form of external inputs from their social
milieu and also initial impetus from within the individual (Amabile, 1990;
Csikszentmikalyi, 1990). A decision was taken, therefore, that the new assessment
instrument should include the personal and social milieu inputs through a record of
intentions, experiences and motivations. Creative skills involving convergent and
divergent thinking abilities such as critical thinking, problem-finding, problem
228
solving and decision making skills were also considered important to allow students
to challenge established ideas, concepts and ways of making; to resist stereotyped
visions of the world and break boundaries. In boundary breaking, ‘students see gaps
and limitations in present theories and proceed to develop new premises, which
contain their own limits, they must be able to establish an order and structure
between the gaps they have 'seen' and the ideas they have generated’ (Eisner, 1972,
p. 217-222). In the new instrument, skills of creative thinking and making involve
appropriation and transformation strategies concerned with understanding
conventions within cultural contexts to enable students to convert them into a
personal style, signature or voice (Ross et al, 1993, p.53). These skills, considered in
the process, could be described by terms such as: ‘experimentation’, ‘exploration’,
‘discovery’, ‘problem-finding’, ‘analysis’, ‘synthesis’, ‘evaluation’, ‘risk-taking’,
‘decision-making’, ‘problem solving’ and ‘communication’.
6.2.2.3. Self-evaluation skills
Evaluation skills play a significant role in creative thinking (Feldhusen & Goh,
1995). Together with self-evaluation they were considered an essential component of
the design of the new instrument. Self-evaluation skills were interpreted as the
capacities to justify an outcome; to explain how to solve or to define a problem; they
involve sustaining the original insight, evaluating, elaborating and developing it to
the full (MacKinnon’s; 1962, p.485). Self-evaluation, in what Ross (1993) calls the
publication stage, was extremely important in the design of the instrument. By this
means students could reveal their ability to persuade others, including the audience,
viewers or assessors who would later validate the work, of the value of their work.
229
6.2.3. Defining a content framework for the new art and design external
assessment
The aims and general objectives described in the Portuguese Studio Art and Drawing
syllabuses were broad enough to enable the construction of a content framework that
included knowledge of art and design, independent thinking, critical and creative
skills and self-evaluation. Although technical skills in art and design formed a large
part of the content, space was left for creative process and contextual studies of art
and design. The current Portuguese syllabuses allowed the construction of a core
framework in the form of blueprint (see figure 11), which attempted to foster a
holistic vision of art and design, promoting risk-taking, personal enquiry and critical
skills.
230
Components Contents Activities
Process- Knowledge base Search, collect information, Interpret

Conventionalisation Understanding symbolic organise information,
rules: history, meaning and mapping, listing, analysis, Record personal ideas,
Public: Social purposes and conventions synthesis, select, combine, intentions, experiences,
culture/ of visual products. associate, articulate, information
visual culture; Understanding visual connect, re-organise,
students social language; visual perception; problem-finding, critic, Critical analysis- sources
environment visual media; methods of evaluate Reveal search/critical skills
visual representation;
Private situations, process of inquiry and
experiences, making
motivations,
dispositions,
intentions
Process- Independent thinking Experiment, explore, Develop ideas
Appropriation Critical thinking discover, problem-finding, Reveal technical expertise
Metaphorical thinking analysis, synthesis, (process of inquiry and
Convergent-divergent evaluate, combine, generate making) imagination and
thinking ideas, explore possibilities, persistence
risk-taking, regression,
boundary pushing, Craft skills, Technical skills
decision-making Problem Media skills
solving abilities, Evaluative Communication skills
abilities, flexibility, Evaluation skills
endurance, persistence,
resistance
Process/Product- Making generation, boundary Produce outcomes
Transformation invention, imagination, breaking, problem solving, Showing contextual
intervention, flexibility, decision making, understanding and technical
communication, evaluation create, communicate, expertise, find a new
evaluate, order/personal aesthetic
response or intervention
fitting the purposes and
conditions
Craft skills; Technical skills

Media skills
Communication skills
Evaluation skills
Product- Publication Self-evaluation Communication, Reveal evaluation skills
The public validates Display evaluation, explanation, Showing understanding of
the Explain, justify justification, persuasion, art and design theory,
work/idea/concept Persuade validation techniques, and purposes
Communication skills
Evaluation skills
Figure 11: Blueprint for the design of a new art examination
231
6.2.3.1. Kinds of evidence
According to the majority of Portuguese art teachers and students who responded to
the survey (Chapter 4, p. 139), ideally art and design assessment instruments should
be developed using evidence from portfolio tasks requiring a combination of written
and practical responses. They wanted the portfolio to include: a record of students’
intentions and how these are realised; preliminary investigation and research work;
work journals/ annotated sketchbooks; developmental studies; final products; self-
assessment notes or reports; and group criticism notes. These tasks should provide
students with an opportunity to reveal what they can do and what they know.
6.2.3.2. Portfolio
After having considered the views of Portuguese teachers and students and the
experience of English examinations, the portfolio was adopted as the assessment
instrument for the new examination. In compiling a portfolio, students explore a
theme, plan, elaborate, present and evaluate their work (Lindström, 1998).
Some degree of portfolio assessment has been tested out and used in a variety of
contexts (International Baccalaureate; Arts Propel; Project Zero; GCSE Coursework;
The Netherlands; Finland). Research on portfolio assessment had already
demonstrated the positive potential of portfolios as evidence for assessment in both
formative and external assessment (Gardner, 1992; Beattie, 1994; Hernandez, 1997).
6.2.3.3. Portfolio evidence
The portfolio was devised as an open-ended project to explore a theme that should
include selected evidence of students’ intentions, motivations, critical inquiry,
explorative and developmental studies, final products and records of self-assessment.
232
The required evidence needed to be flexible enough to allow students to choose
themes and tasks that best suited their individual learning styles, gender, ethnic group
and social background. The intention was to give them opportunities to choose
themes and tasks in order to allow fairness and equity in assessment, so that those
who were disadvantaged in one task could have an opportunity to offer evidence
of alternative expertise in another. It was considered also important to permit a wide
range of media and techniques in order to enable differentiated outcomes.
Self-assessment evidence was included in order to ensure students’ voices were
heard and their intentions made clear. Portfolio evidence was organised into three
main components: (1) Process; (2) Product and (3) Self-evaluation. However process
and product were not seen as discrete units; the framework used both concepts with
the necessary flexibility to respect different ways of working in art and design. The
evidence to be selected for assessment by students may be schematised as follows:
Reports or notes Preliminary

about previous studies,
experiences, Student developmental
interests, etc. Portfolio records
(visual/written) (visual/ written)
Folder,
Final Products exhibition;
(visual): Paintings, workjournals; Investigation
drawings, scultpures, CD; Webpage, reports and data
printings; graphic etc. critical inquiry
design; product (written and
design; multimedia visual)
photographs,films,
video records of Self-assessment report.
performances, Interviews: written or oral: tape, video, digital record about
installations, the students’ intentions, progress, investigation,
exhibitions,etc. achievement, presentations, evaluation. Records of self-
assessment and ‘crits’
Figure 12: Types of evidence for portfolio
233
In light of the Portuguese tradition of one single examination period at the end of the
art and design course, it was considered appropriate to develop one single portfolio
project as the final art and design assessment instrument for Portugal. A decision was
taken that the external assessment should be conducted in the last term of the
academic year during normal art and design class time. Students and teachers would
receive all the instructions at least one month before the examination to allow for
preparation of the project.
6.2.3.4. Assessment criteria
The design of the instrument was limited by the nature of assessment itself: it is an
exercise of power, involving some kind of judgement over others. It was limited, in
particular, by the need to establish criteria that would be subject to common
interpretation by users. The five criteria suggested by the researcher for negotiation
and refinement, during the pilot were devised in accord with the content framework:
AC1: Record personal ideas, intentions, experiences, information and opinions in

visual and other forms.
AC2: Critical analysis of sources from visual culture showing understanding of
purposes, meanings and contexts
AC3: Develop ideas through purposeful experimentation, exploration and evaluation.
AC4: Present a coherent and organised sample of works and final product revealing a
personal and informed response that realises their intentions.
AC5: Evaluate and justify the qualities of the work.
Figure 13: Assessment criteria
6.2.3.5. Assessment procedures
From the results of the survey in Portugal and research literature it was suggested
that the new examination should have assessment procedures based on appropriate
in-service teacher training and a process of moderation. The example of Edexcel
234
procedures for standardisation of moderators and moderation contributed to the
development of the new assessment procedures.
In-service teacher training was viewed as particularly important in order to achieve
the consensus of interpretation of criteria that is necessary for consistency of results
(Hennessy, 1994; Amabile, 1983; Csikszentmihalyi, 1988; Feldman,
Csikszentmihalyi & Gardner, 1994). A plan for in-service teacher training and
moderation was developed by the researcher to include explanation of instructions
and simulation of marking using examples of students’ portfolios all designed to
achieve consensus when interpreting criteria. In order to achieve consistent results a
procedure for external verification of the internal marks was also designed. Students’
portfolios (or a significant sample) would be made available to one or more external
assessors for checking the fairness and consistency of internal results.
6.3. Piloting the new assessment instrument
The design of the assessment instrument needed a priori validation through a pilot
examination. The pilot (Trial 1) was intended to ensure that negotiation took place
between users in order to determine final formats, tasks, media, criteria, mark
weightings, grade descriptors, time, and assessment procedures (see Appendix XIV,
pp. 170- 188). The objective of piloting the initial instrument was intended to: (1)
test the validity and reliability aspects of the instrument, and (2): propose strategies
to overcome any problems thus detected.
During the pilot data was collected through researcher observation notes,
questionnaires, student group interviews and reports. A research diary was kept in
235
order to collect information about the experiment, where notes about the pilot
sessions were recorded and transcriptions made of informal interviews with the
participants. One external observer evaluated the pilot or first trial: Graça Martins,
a recognised Portuguese expert in art and design education (see Appendix XIII).
Three student’s group interviews were conducted with a sample of twelve students
between 12 and 16 January 2003, the schedule for interviews included three sections
with questions about the validity of the assessment instrument (see Appendix XI).
A questionnaire for students and teachers involved in the trial and a checklist for
teachers were also used to collect data (see Appendix XII, pp. 141-166). The
questionnaire for students was conducted between 11 and 17 January 2003. The
checklist and questionnaire for teachers were conducted during the fourth pilot
training meeting (session 4, 11-01-2003). The pilot questionnaire included questions
used in the initial survey questionnaire (section 3, 4). However, in the trial, it was
found that the questionnaire and checklist schedules were not very effective and they
needed further revision before they could be used in the main trial. Some questions
were repeated (2.6; 5.3; 5.4.); others were irrelevant (3.10; 3.11; 3.12) or too
technical for students (4.5). So a decision was taken to remove repeated questions,
add new detailed questions about instructions, criteria, weightings and tasks, add
questions of the checklist for teachers to the questionnaire for teachers and remove
the checklist and provide more space for observations in
the questionnaire.
6.3.1. Pilot Sample (Trial 1)
A list of teachers willing to participate in the trial of a new instrument was compiled
from the survey questionnaire. From that list, ten teachers from the central region of
236
Portugal were contacted to participate in the pilot. Initially the ten teachers and their
students agreed to participate, but later three of the teachers withdrew because they
could not attend the time consuming meetings that were necessary.
Teachers Age Gender Background School Teaching years Role in the pilot
30- 4 M 2 Painting 3 School V OA 4 Teacher 4
40 Studio Art
31- 2 Sculpture 1 OD 1 Teacher 1
50 Studio
design
51- 1 F 5 Design 3 DGD 2 Moderator 2
62 geometry
Table 24: Trial 1 sample teachers
Students Age Gender Course

16 3 M 24 Technological 12
17 6
18 9
19 5 F 27 General 39
20 1
Table 25: Trial 1 sample students
6.3.2. Place, time and programme
The pilot was held at Escola Secundária Alves Martins in the city of Viseu. An
auditorium was used during the first session to allow exemplar portfolio documents
to be projected. The other sessions took place in a normal classroom. The schedule
for activities was as follows:
237
Training ( meetings) Activities
Session 1 • Explanation of the aims and purposes of the
pilot Distribution of roles: teachers were asked to
28/9/02 • Explanation of the assessment instrument: experiment with a portfolio with their
(8.30h- portfolios and assessment procedures based on students; only IL and AC could not do this it
13.30h) moderation because they were teaching descriptive
• Explanation of the proposed underlying geometry at the time. However they were
model and content framework interested in participating as external
• Examples of portfolio assessment assessors.
instructions were displayed from Finland; Arts
Propel; The International Baccalaureate and
Edexcel (UK);
• Reproductions of visual exemplars of
students’ portfolios (AS and A2) from Edexcel’s
art and design examinations were also displayed
• Discussion
Session 2
Production of instructions for the new examination:
12/10/02 a draft document including general guidelines for the Experimental trial project using portfolio
(8.30h- format; tasks; criteria; weightings, grade descriptors with students (22hours)
13.30h) was discussed. Negotiation with students
Session 3 (October- December)
16/11/02 Evaluate progress, experiment and discuss marking
8.30h- procedures and mark schemes (Familiarisation
13.30h) exercises).
Session 4: First marking by the teacher

Standardisation: Assessing samples; revise criteria; Moderation by one external assessor
11/1/03 mark schemes Discuss assessment with students
( 8.30h- For this session fourteen completed portfolios were (Questionnaires and interviews with
13.30h) marked collectively in order to establish the teachers)
standards. (Questionnaires and interviews with
students)
Session 5: Analysis of: teachers’ questionnaires;
Evaluate the pilot; revise instrument and procedures: students’ questionnaires; checklists;
18/1/03 verify the accuracy of marking and evaluate the interviews.
(8.30h- experiment.
13.30h) Multi-rash analysis (assessment results)
Table 26: Trial 1 schedule
6.3.3. Feedback from the pilot
The pilot was an essential part of the design of the new instrument since teachers and
students opinions and suggestions were constantly sought in order to obtain their
expert judgments and to negotiate processes. It was important to explain to the
volunteers the nature of portfolio based assessment and moderation using current
examples because, for them, it was very new in terms of external assessment.
238
Teachers reflected upon the proposed model, content framework and draft
specifications and agreed that the proposal was generally in accord with their own
models of teaching art and design and perception of content. The time for the
external assessment was set by common agreement: (1) preparation work: 25 hours
or 3 weeks extra class time; (2) developmental studies and final products: 15-20
hours during the three hour class-time sessions; (3) 2 hours for the self-assessment
report during the last class session.
The participants considered standardisation of teachers and moderators a useful
exercise in order to establish a common interpretation of the criteria. All the
volunteers marked a sample of thirteen portfolios and the results showed that the
difference between markers was of 1-3 points out of a total of 20 (for details of the
Portuguese marking system see p. 114). Internal consistency was 1-2 points’
difference between first and second marking of the same assessor. Although two
markers obtained less consistency, the moderation marking results for inter-rater
reliability showed some degree of consistency (See Chapter 7, p.286). The
volunteers considered that the instrument had strong validity and provided valuable
results. They suggested final revisions of the instructions and helped to inform the
definitive instructions for the trial (see Appendix XIV).
The pilot exercise was centred on negotiation. ‘The design was based on teacher
ownership: sharing power and constructing knowledge instead of an elitist, top down
vision of the assessment’ (Trial 1 External observer report, 2003-03-08/Appendix
XIII). Participants and the external observer considered this collaborative approach
to the design of the instrument a novelty. Teacher ownership of the assessment was
239
ensured and emphasised, and students’ opinions also were taken into account during
the pilot exercise.
Examinations used to be imposed by the government; the question

papers only express the view of the people in the GAVE about
what they value in art and design or about what they think should
be assessed. With this experiment I felt that things can be different;
we can also express our views about what must be assessed and we
ended up by enlarging the vision … (TR: AP; 18/01/2003)
It was quite odd during the examination preparation, because the

teacher was always asking us if we agreed with the instructions and
so on; I think it was a sign of respect for us; it was helpful not just
because we could express our opinions but also because by talking
about it we better understood the instructions (SR: M; 15/01/2003).
6.3.3.1. Underlying model
The participating teachers and external observer considered the underlying model
appropriate for the subject (See Trial 1 External observer report, 2003-03-08/
Appendix XIII). Generally teachers said that it was in accord with their own views
about what should be taught in art and design and with the learning objectives for art
and design.
I think this has much to do with my own ideologies and it is very

close to my way of teaching. …all this is about [the] art and design
process of inquiry and making (TR R, 12/10/02).
6.3.3.2. Format
In general, the examination based on portfolio was considered a valid assessment
instrument and teachers, students and the external observer welcomed the flexibility
of portfolios, which allowed for differentiated responses and respected students own
knowledge, opinions and interests. The questionnaire results showed a general
agreement with the validity of the external assessment in terms of contents; clarity of
instructions and criteria; appropriateness of tasks and weightings. However students
240
judged the instructions and criteria to be less clearly stated than teachers, possibly
because of the novelty of both tasks and criteria.
Questionnaire results: % of agreement with: Students Teachers

(51) ( 7)
2.1. The exam allows a considerable range of options. 94.1% 100%
The exam respects diversity of learning styles. 94.1% 100%
2.3. The exam respects students’ motivations and intentions. 86.3%) 100%
2.4. The exam respects students’ opinions and interests. 86.3%) 100%
2.5. I think that the examination allows students to display the 96.1% 100%
knowledge, understanding and skills set out in the syllabuses.
2.7. The exam components (Process, Product and self- 92.2% 100%
evaluation) are appropriate for the disciplines of art and design.
2.8. The assessment criteria are appropriate for the disciplines 94.1% 100%
of art and design.
2.9 The assessment matrix is appropriate for the disciplines of 94.1% 100%
art and design
2.10. The examination questions or tasks focus on highly 96% 100%
relevant knowledge, understanding and skills for future artists
and designers.
3.1. The instructions for the examinations are clearly stated. 80.4% 100%
3.3. The assessment criteria in the examination are clearly 82.4% 100%
stated.
Table 27: Responses to questions about format.
Students considered the instrument valid, in their own words:
Yes, we could make whatever we wanted to, in terms

of choosing artists and developing ideas. And also
with final products, it was very open (SR S,
15/01/2003).
It was useful to be properly assessed, because we

could show a variety of sketches and experiments (SR
G, 17/01/2003).
Students considered that the external assessment required them to perform in an
authentic way, realising tasks that were closely related to art and design learning
practices:
…we had to search the work of others, finding

problems in artistic works, in other works like media
and our daily lives, we had to sketch a lot, represent
the ideas, transform them, solving a problem through
the final product and we had to evaluate our work (SR
F, 15/01/2003).
241
We had to experiment with materials, techniques, formal elements
and composition, like colour, shadow, textures, etc. We had to
show technical abilities; knowledge about art forms and art making,
we had to show how creative we were (SR A, 15/01/2003).
We had to show that we could investigate sources like design and

art works; we had to show expressive works and technical abilities
and capacities to communicate like expressing ideas and judgments
(SR T, 17/01/2003).
We had to show persistence; effort, imagination, critical abilities

and personal responses (SR S; 17/01/2003).
However some problems resulted from the unfamiliarity of the assessment
instrument format. Although teachers said that they normally used coursework for
assessment, they considered that the format of the portfolio was more demanding in
terms of the working methods and types of evidence submitted.
…students are not used to justifying their work; explaining the

reasons and the processes (TR AP; 18/01/2003 ).
Students are not used to writing comments. I never asked for it

before, I only used conversations with students (TR C; 18/01/2003)
Things like the work journal are not commonly used, however
Studio Art syllabuses talk about it. But students are not used to it
(TR C; 18/01/2003).
In fact students had not previously been required to make notes or comments about
their working processes, progress or achievement. So providing this type of evidence
was new for them:
I was not used to being critical towards the work of others and my
own work. It was quite difficult to make the written comments (SR
F; 16/01/2003).
But, for F and some others, the difficulty was not the writing skills but the evaluative
skills:
242
… it is not the writing which is difficult; I prefer to write because I
have time to think during the writing. The difficulty was in being
critical, analysing the works and expressing personal opinions.
We were not required to do so before the portfolio. Last year,
they [teachers] were more interested in our drawing and painting
abilities (SR F; 16/01/2003).
The new assessment instrument, although designed in accord with curriculum
materials (Trial 1 External observer report, 2003-03-08/Appendix XIII), introduced
unfamiliar approaches into Portuguese art education. It was a challenge for teachers
and students used to art and design assessment focused on craft-skills and self-
expression. The pilot experiment revealed that students, more than teachers, felt the
different approach to art and design assessment. Although 92.2% percent of students
agreed that they were well prepared to develop portfolios (questionnaire statement
6.1) and 100% percent of teachers agreed that they were well prepared to conduct the
examination (questionnaire statement 6.1), they complained about a lack of
familiarity with tasks requiring evidence of critical thinking. According to Graça
Martins, the assessment instrument promoted a culture of self-reflection and
expected students to become reflective-practitioners (Trial 1 External observer
report, 2003-03-08/Appendix XIII). However, teachers and students alike considered
critical skills important. Teachers suggested that to overcome this problem there was
an urgent need to develop such skills. Some strategies were proposed, such as
promoting more group discussions in which the teacher could act as a mediator,
introducing facilitating questions to help students to reflect upon their work and the
work of others. Students considered group discussions useful during the development
of portfolios. For example M said.
It was fine when we had the group discussions about our work, we
helped each other. Well, sometimes my friends told me I was doing
wrong and I didn’t agree with them; but it was an useful exercise,
243
in the way we had to explain what we were doing and making
statements about the quality; what is good and why…using the
criteria words to explain it. But I am not sure about its importance
for assessment (SR M; 14/01/2003).
According to the questionnaire results, students and teachers used group discussion
during the preparation of portfolios, but students and teachers had different views
about the importance of teachers’ influence upon students. Teachers’ responses
showed that they believed that their influence was less important than students
perceived it. This may be because teachers thought that their influence was not very
strong since the portfolio was produced with a minimum of their help, whereas
students believed that teachers’ opinions were influential.

(51) (7)
6.2. It was important to discuss the contents of the portfolios 88.2% 100%
6.3. It was important to have regular group criticism or ‘crits’ with students 96.2% 100%
6.4. The contents of the portfolio were regularly discussed with the teacher 82.4% 100%
6.5. It was important that students have the chance to talk about and explain 88.2% 100%
the work in their portfolios regularly with the teacher during preparation
time.
6.6. It is true that I am very interested in hearing students’ views about the 92.2% 100%
work in their portfolios
6.7. Students had freedom of choice to make their project briefs according to 94.1% 100%
their interests and motivations.
6.8. It is true that I influenced students in their choice of works and approach. 58.8% 20%
Table 28: Responses to questions about preparation.
However including evidence in the external assessment of students’ performance
during group discussions was regarded with caution:
I think the teacher should take it into account. But some students
are more individualistic; they need to be left alone. So it is not fair
for them, if they don’t like to talk, it is their right to be silent during
group discussions (SR F; 16/01/2003).
244
Students agreed that for some of them, group discussions might benefit assessment.
Teachers agreed that the best way to include them in the final assessment would be
through the recommendations of the students’ own teacher. They agreed that
evidence of students’ performance during group discussions should be used as
a value-added, confirmatory element and should not be used in any way to
disadvantage students. The use of work journals was another strategy designed
to foster self-reflection. Some teachers observed that the students who developed
their projects with work journals were more critically aware than the others.
Students agreed that work journals were useful:
I liked the work journal, it helped me a lot to explain things and

also to have ideas, because I wrote about my feelings and made
collages of images that impressed me (SR N; 15/01/2003).
The question of whether work-journals should be compulsory was considered.
It was agreed that, in some cases, students could be disadvantaged by this
requirement and, besides, alternatives such as a PowerPoint presentation or a CD-
Rom including visual, oral or written comments could be equally valid. Therefore, it
was agreed to strongly recommend the use of work journals in the external
assessment or other evidence of reflection using verbal or non-verbal language.
The questionnaire results showed that 68.8 percent of students and 85.7 percent of
teachers agreed with statement 6.13: ‘It was important to have the option of
providing visual and oral (taped) comments about the intentions, quality and progress
of the work in the portfolio’. However in the pilot all the students chose to express
their notes and comments in writing but this should not eliminate the possibility of
using other means of expression. Whether to use verbal or non-verbal languages was
245
debated during teachers’ meetings and there was consensus that students should have
the freedom to use the kind of language that best suited their knowledge and
personality. Poor writing skills should not disadvantage students and there was a
shared opinion that visual and oral commentaries were acceptable to express
students’ intentions; opinions and evaluations. It was recommended, therefore that
students could submit notes and comments by audio, video and digital means and
such records should be afforded the same value as a work journal or a written report.
6.3.3.3. Bias
The questionnaire results showed that no significant bias was detected related to
gender, geographical location, and racial/ethnic/religious background. However
some students detected some bias related to social class; in their views, portfolios
demanded a considerable effort to acquire resources like drawing materials and
bibliographies that are not available for all.

(51) (7)
3.6. The examination is free from gender bias (e.g. equally fair to all students, 94.1% 100%
boys and girls).
3.7. The examination is free from bias related to where students live (e.g. equally 90.2% 85.7%
fair to students from big cities and students from small cities or rural areas).
3.8. The examination is free from bias related to racial/ethnic/religious 100% 100%
background of the students.
3.9. The examination is free from bias related to social class. 76.5% 85.7%
Table 29: Responses to questions about bias.
After analysis of teachers’ comments during the pilot training sessions and
students’ group interviews, evidence showed that the instrument increased teachers’
responsibilities. The fact that teachers’ role was so important in supporting the
preparation of the portfolios raised a problem of equality of opportunity.
The teachers discussed this point during the last pilot training session and
246
recommended that teachers should have training meetings and mutual support before
and during the examination. They thought meetings should act as a vehicle for
training and also for monitoring. However it was agreed that students should not be
left out of the dialogue and the researcher proposed that for future trials an Internet
website for the examination might be published, with a teachers’ and students’
discussion forum available to all the users. A students’ forum might reduce bias in
the instrument because those students with a less committed teacher could ask other
teachers for additional support during the preparation of the portfolios.
6.3.3.4. Tasks
Students, teachers and the external observer considered the portfolio tasks
appropriate and well organised, presenting a balance between process and product
(for details about tasks see Appendix XIV, p. 171 and p.173).
It was the first time students had been required to submit a considerable amount of
evidence for assessment. Although they complained about lack of time, they agreed
that the portfolio provided a way of structuring their work, because they found the
instructions clear and the tasks inspiring.

(51) (7)
6.9. It was important to develop investigative and developmental studies 90.2% 100%
during preparation time.
6.10 It was fair to develop some developmental studies and final products in 94% 100%
examination conditions.
6.11. It was important to make self-evaluation reports and notes 51% 85.7%
6.12. It was important to have the students’ written comments about their 84.3% 100%
intentions, quality and progress of the work in their portfolios
Table 30: Responses to questions about tasks.
247
I think the tasks are fine; and the way portfolio is organised helped
us to develop the project, gave us a kind of method of work (SR M;
15/01/2003).
Exploring ideas, materials and techniques throughout the time was

quite good. We had time to learn and improve our skills, sometimes
we had to go back, we had to select the best solutions, I think it was
really creative because without doing it, if we just had final
products, we could not go so far (SR N; 15/01/2003).
Some students found the investigative studies or research tasks, which were new to
them, very time consuming.
…the investigative work was boring; we are not used to doing this
in Studio Art . [R: Do you think it will be better to eliminate
investigative work?] No, not at all, we learned through searching
the work of artists, I think it is helpful to develop our own ideas, in
the sense that we see other things; we discover ways, techniques,
materials, approaches to ideas and it is not like in history of art, we
can chose our sources and study works we like and which are
useful for our projects (SR T; 16/01/2003).
Previously students had written essays about artists and art movements but had not
been required to connect their practical work with what they learned from theory.
In a great majority of the portfolios, the researcher observed that students failed to
connect the analysis of sources with their own projects. Interviews with students
showed they were aware of this. They said that it was very new in terms of
requirements and they did not fully understand it at the beginning.
Investigative studies were boring. Well, I think I did it them badly

because I collected so many things and I didn’t use them in my
project (SR M; 17/01/2003).
Some portfolios showed little variety in terms of developmental studies and
exploration of ideas. The teachers thought that this might be a consequence of the
breadth of the instructions. They recommended that instructions for students should
clearly state what was expected from students more clearly and simpler language.
248
All the teachers were strongly in favour of self-evaluation tasks, but self-evaluation
requirements were not considered so important by all students, and some boys in
particular distrusted it:
Self-evaluation, I don’t think it is very important. I don’t like to do

it (SR G; 16/01/2003).
Generally boys did not like self-assessment, and it was apparent from the interviews
that this was partly because it implied extra effort. However the majority of girls
agreed that the inclusion of self-evaluation tasks improve their learning and
performance.
We need to know what we have done to merit the marks. I think we

need to understand the quality, I mean what was wrong and good
and why and self-evaluation is a good way to help me to
understand that. And it is also good for the teacher to know what I
think about, we can discuss it. I don’t think I can influence my
teacher but he can understand and help me to improve (SR S;
17/01/2003).
I think self-evaluation is important. Because the external assessor

does not know about the development of the project, we need to
explain it. And it is good for us, because we have the opportunity to
say what we think, to express our views about progress and
achievements. I think we have lots of benefits by doing self-
evaluation (SR F; 16/01/2003).
6.3.3.5. Time and resources constraints

(51) (7)
2.12 . I think that students have plenty of time to complete the examination. 21.6% 57.1%
2.11. The physical conditions and resources for the examination 54.9% 57.1%
(accommodations; equipment and materials) were adequate in my school.
Table 31: Responses to questions about time and resources.
The responses from both students and teachers to questions about timing in the
questionnaires suggested that this was a main constraint, but after reflecting on this
249
during interviews they agreed that time was not the real problem – rather it was lack
of a disciplined work method. Because it was the first time they had developed a
portfolio with such requirements, they did not plan the various phases of the work
sufficiently.
I liked to search the history of alphabets; I liked the searching part

but it took a lot of time, I was not really organised and in the end I
had little time to explore ideas and to make the final product (SR S;
14/01/2003).
We had little time for the final product; I think the final product is
very important. But I agree with the sketches, we need to
communicate the ideas before the final product… but I didn’t plan
it well (SR F; 16/01/2003).
According to the teachers, planning was not the only problem; students were not
motivated to carry out extra schoolwork especially because Studio Art was not
considered an important specialism.
But this is time-consuming; I had difficulties with the students

because they are not willing to make extra-work; they only work
during class time. They had other subjects they considered more
important, like maths and geometry to study. And they have no time
to fully develop their abilities. You know they are not used to
working hard in Studio Art (TR R; 18/01/2003).
But when students were asked if they thought the preparatory studies should be
eliminated from the model, surprisingly all seemed to be in favour of them. It was
clear from their responses in interviews that they had not realised how this
examination was.
No, it is not that we don’t want to make this extra-school work. I

think it is much better for us if we have that possibility… but, you
know this time we didn’t realise how important it was for
developing the portfolio in class. We were not used to submitting
so many things and usually we had good marks without making any
extra effort. I think we didn’t realise that this time it was different,
the teacher was more demanding and we could not have a good
mark without making extra efforts (SR M; 15/01/2003).
250
It was evident that the new assessment instrument challenged old habits of work in
the discipline, and all students were quite aware that standards had been raised in
terms of achievement and commitment. The need to improve methods of planning
artwork for portfolios was noted by teachers and students.
The physical conditions and resources for the examination (classrooms space and
conditions for storage of materials) were inadequate in the school where the pilot
took place; however the school had a good library and all the students had access to
the Internet at this school.
6.3.3.6. Criteria and weightings
The proposed assessment criteria and assessment matrix introduced students to a
different approach to the qualities to be judged in students’ performance. It was
agreed that it was well adapted to the curriculum, underlying teaching models, and
were fair and balanced for art and design students. The emphasis on process, critical
reflection and self-evaluation, was approved as essential to the nature of art and
design learning and practice. However the unfamiliarity of some tasks was a problem
because of the inherent risk of failure. Teachers agreed that these risks should be
taken, at least in the pilot. They were in favour of the underlying conceptual model,
even though they had not required students to reveal so fully their capacities for
critical reflection and self-evaluation in the past.
251
Teachers and students agreed with the assessment criteria and weightings (see
Appendix XIV, pp. 183-184), teachers thought they were a good basis for
establishing common ground for judging students’ performances but they identified a
need to simplify language in the assessment matrix (see Appendix XII, p. 136).
Although students agreed with the criteria in the questionnaire, some of them
complained they were unsure of what qualities their art and design teacher and
external assessors were looking for in their portfolios. According to the teachers AC
and IL, students did not understand the statement in question 3.4 (Appendix XII,
p.147): ‘Students are aware of what qualities their art and design teacher and
external assessors are looking for in their portfolios’, and tended to misinterpret the
term ‘qualities’ as ‘taste’. A revision of the criteria in the instructions for students
was needed to explain further the concept of ‘qualities’ and how this was intrinsic to
the assessment criteria.

(51) (7)
3.4. Students are aware of what qualities their art and design teacher and 70.6 100%
external assessors are looking for in their portfolios
4.1. In my opinion, the criteria used to assess the portfolios are fair and 96.1% 100%
recognise students’ effort and achievement.
4.2. In my opinion, the criteria used to assess portfolios are sufficient 90% 100%
flexible to accommodate a wide range of work and student responses.
Table 32: Responses to questions about criteria.
6.3.3.7. Impact
Students understood the portfolio not only as an assessment instrument but also as
a learning experience. They agreed that the portfolio was an important tool for
preparing them for further study in art and design.
252
(51) (7)
5.1. I agree that putting together the art and design portfolio has been a 90.2% 100%
worthwhile learning experience for the students.
5.2. Students’ art and design portfolio are a good foundation for further 84.3% 100%
study in art and design.
Table 33: Responses to questions about impact.
The new instrument influenced the way art and design was taught and perceived, not
by narrowing but by increasing the range of contents. In the responses to the
questionnaire, 85.7 percent of teachers agreed with statement 5.5: ‘It is true that the
examination positively influenced my teaching practice’ and 71.4 percent of teachers
agreed with statement 5.7: ‘The examination provided me with a different
perspective on art and design education’. According to the results of the teachers’
checklist, the main effects of the instrument on teaching were related to the need to
foster investigative studies and self-evaluation tasks in their teaching practice (see
Appendix XII, section IV).
6.3.3.8. Assessment procedures
Students, teachers and the external observer considered the assessment procedures to
be fair. According to the questionnaire results there was a great percentage of
teachers’ and students’ agreement with the assessment procedures.

(51) (7)
4.3. I think it is important that students’ portfolios are assessed by more than one 90.2% 100%
person for the final assessment, to minimize subjective bias.
4.4. T: I think that my students’ portfolios were fairly marked./S: I think my 98% 100%
portfolio was fairly marked
4.5.The assessment procedure combining holistic and criterion referenced 100% 100%
marking is appropriate to judge art and design works.
4.6. The standards were consistently applied. 100%
7.1. It was useful to discuss the interpretation of criteria with other teachers. 100%
7.2. During standardisation meetings we reached consensus about interpretation 100%
of criteria.
253
7.3. It was useful to have visual exemplars to understand the criteria and 100%
standards.
7.4. I had adequate training to assess students’ portfolios. 100%
Table 34: Responses to questions about assessment procedures.
In the last pilot session (18-01-2003) teachers said that they had reached a common
interpretation of criteria and that the meetings had proved very important for training
them to help mark students’ work consistently. Teachers considered that using
students’ portfolios containing real examples of work, rather than photographic or
digital records, was very helpful in reaching a common understanding of standards.
I was afraid about external marking; because it was the first time I
encountered it. But in the end it worked quite well and this was a
sort of checking out my standards. The exercise with portfolios was
helpful to share a common interpretation of criteria, so I am not
surprised with the results of external assessment; we shared the
same views about the levels of achievement. (TR C; 18/01/2003).
And this process of external marking was fair for everyone; at the
same time I feel safer because someone is checking the accuracy of
my marking (TR P; 18/01/2003).
The proposed assessment procedures were considered more reliable than the existing
examination procedures. Clarity of assessment criteria and the matrix was identified
as a key factor enabling consistency of results, but the major reason given for
improved accuracy of marking was the quality of teacher training and the use of real
student portfolios for standardisation. However the training sessions were time-
consuming, partly because all the instructions had to be negotiated through long
group discussions and feedback from students.
The time spent in training sessions for teachers was not realistic given that the
teachers were already overloaded: the withdrawal of three participants made this
clear. This aspect of the model needs further thought. Shortening the training
254
sessions would be practical, but it would probably have negative effects on the
procedures. A possible solution to the lack of time for training could be providing
on-line support and arranging training/standardisation sessions in at least two regions
of the country in order to avoid the problems of teachers undergoing long distance
travel. However it is reasonable to expect that the time spent might be reduced in
further trials as familiarity with the assessment model increased.
6.4. Suggestions for improvement
Fostering the active involvement of teachers and improving the clarity of instructions
for students (see Appendix XIV, pp. 176-181) might overcome the problem of the
unfamiliar format of assessment. Portfolios demand more work, and more organised
work, from students in terms of motivation, structure and presentation. In the new
assessment instrument, the relationship between teacher and student changed when
assessment was integrated into learning. The instrument was considered to have the
necessary validity characteristics because it enabled students to display their
knowledge, understanding and skills in art and design – and offered a rich range of
art and design activities that were considered inspirational. The instrument was
judged to be flexible in its range of options. But points of fragility were detected
such as unfamiliarity with the format and potential bias resulting from the role of the
teacher. To overcome such problems there is a need to establish clearer instructions
and assessment matrix, effective in-service teacher training and purposeful dialogue
between teachers and students. A possible strategy for attaining such goals within
time and space constraints might be: (1) to conduct only one standardisation meeting
with the volunteer teachers, (2) to create a permanent support structure during the
255
examination in order to establish dialogue between teachers and students – possibly
online support through a dedicated website.
Summary
In this chapter the design of the new instrument was described and the external
assessment draft materials were explained. The assessment materials were discussed
and negotiated during the pilot in order to investigate issues of validity, practicality
and reliability. Problems were identified and were used to re-design the examination
instructions (see Appendix XIV). It was concluded that the new instrument and
procedures for assessment required: (1) development of programmes for in-service
teacher training, (2) a change of attitudes towards studio based art disciplines,
increasing the value of Studio Art in the curriculum and encouraging students to
engage with the subject with greater commitment and, (3) developing critical and
self-evaluation skills required a major change in the approach to curriculum and
pedagogy, educating students as reflective-practitioners.
256
Chapter 7
Main Study: Trial of the new external assessment instrument and
procedures in five schools in Portugal
In this chapter the trial of the new instrument and external assessment
procedures in five Portuguese schools is described, analysed and evaluated.
The evaluation was based on data collected during the trial through the
researcher observation notes, questionnaires, interviews and reports.
7.1. Trial sample
The sample was determined from a list of nineteen teachers who were willing to
participate in the trial of a new instrument obtained from the questionnaire survey
respondents (Appendix VII) and from the list of teachers involved in the pilot.
The teachers were contacted by telephone, the trial activities and timetable were
explained to them and they were asked if they were still willing to participate. Ten of
them volunteered accepted the invitation. A sample of teachers was identified aged
thirty to fifty years-old, with six to twenty five years of teaching experience and from
different regions of the country (North, Centre and South). They included five
teachers who had collaborated in the pilot, three of whom continued to use portfolios
in their studio art internal assessment throughout the year while the other two were
experienced teachers who had acted as moderators during the pilot study and were
willing to continue as moderators in the main trial.
However these teachers were clearly not representative of all secondary studio art
teachers in Portugal since their availability for the trial suggested a particular interest
in reform and motivation to experiment with new assessment approaches.
257
Names Age Teaching Gender Background School Role in Number
Experience the trial of
(Years) students
J 42 20 Male Master P (south) Teacher 5
Multimedia
AM 39 16 Female BA Design B (south) Teacher 15
C 30 6 Female BA Design V Teacher 23

(centre)
AP 32 8 Female BA Design V(centre) Teacher 10
I 39 14 Female BA Design K(south) Teacher 22
E 42 16 Female BA K(south) Teacher 15
Architecture
M 50 25 Male BA Painting A(north) Teacher 11
R 38 14 Male BA V(centre) Teacher 17
Sculpture
IL 49 23 Male BA Painting V(centre) Moderator
A 42 18 Female BA Design V(centre) Moderator
Table 35: Trial 2 sample
The schools and studio art classes were very different and included: (1) two schools
located in suburban regions in the South region of Portugal with students from
middle and working class families and ethnic minorities (Schools P and K); (2) one
school located in the biggest city of Portugal with students from middle and high
social status families (School B); (3) one school from a medium size city in the
Central region with students from middle class families (School V) and, (4) one
school from a small city in the North West region with students from rural and
middle class families (School A).
The volunteers asked their 12th year Studio Art students if they wanted to participate
in the trial; it was explained that the new instrument was demanding and they would
need to submit more work than usual but, at the same time, it would allow them more
freedom and opportunities to reveal their knowledge, understanding and skills in art
and design. With the exception of those at School P, the students were very
258
motivated to try out the new instrument. One hundred and seventeen students aged
between seventeen and twenty years of age were involved in the trial (seventy-two
females and forty-five males).
7.1.1. Data collection
A research diary was kept in order to collect information about the trial, where notes
about visits to schools and standardisation meetings were recorded and transcriptions
made of informal interviews with the participants. One external observer evaluated
the trial: Leonardo Charréu who was a recognised Portuguese expert in art and
design education (see Appendix XXIII). Five group interviews were conducted with
a sample of 31 students between 2 and 26 June 2003 (for details see Chapter 3, p.89).
A questionnaire for students and teachers involved in the trial was also used to
collect data (for details see Chapter 3, p.93). The questionnaires were conducted
during July 2003. Other data was collected through teachers’ reports.
7.1.2. Location, time and programme
The main trial started at the end of the second school term in 2003, before the Easter
holidays and finished by the end of June 2003. The schedule for activities comprised
standardisation meetings, implementation of the trial with students; researcher visits
to schools and interviews with teachers and students; internal marking of students’
portfolios; moderation of a sample of portfolios in each school by one or two
moderators; and on-line meetings with the volunteers (moderators and teachers).
The schedule of activities was developed as follows:
259
Teacher and moderator Teacher and student activities
activities
March/April Standardisation meetings: Reading the assessment booklets in
2003 (1) 7th March 2003 (School class
K) Explaining the instructions
(2) 10th March 2003 (School Aided work: Start preparatory work
V) (investigative studies and first
developmental studies)
May 2003 Unaided work: Continuation of

On-line meetings (web page) developmental studies, final products
Researcher visits to schools and self-evaluation reports, compiling
(students and teachers the portfolio.
interviews at art classrooms) Questionnaires students
Visit to schools- observation
Interviews
June/ July On-line meetings (web page) Internal marking (10-15 June)
2003 Post-trial meeting (18 July at Moderation ( 16-30 June)
School V for final evaluation Questionnaires teachers, reports
and marking samples) Interviews
Table 36: Trial 2 schedule
7.2. Description of the activities
The external observer characterised the trial activities as ‘… a critical and innovative
dialogue about assessment, where art assessment problems were discussed from
practice and solutions were tested in order to achieve a model of assessment adapted
to the nature of art with more validity and reliability’ (Trial 2, external observer
report, 30-07-2003, Appendix XXIII).
7.2.1. Standardisation meetings
Two standardisation meetings were held. The first meeting with teachers took place
on the 7th March 2003, in the ceramic room at School K, from 9.00h to 19.00h with
AM; E; I; J; M, IL and A. The second meeting took place on the 10th March 2003,
260
in the studio art room at School V, from 9.00h to 19.00h with teachers; AP; R; C and
moderators IL and A. The agenda for the meetings followed a common format:
• Part 1: Explanation of the instrument and procedures, discussion of

instructions; criteria, assessment matrix and grade descriptors. Detailed
explanation of the internal standardisation and moderation procedures and
forms to be used for marking.
• Part 2: Familiarisation exercises using 15 portfolios developed by students
during the pilot experience.
• Part 3: Blind-Marking: volunteers were asked to mark 10 other portfolios
developed by students during the pilot experience.
• Part 4: General discussion about standardisation; feedback from blind-
marking and dealing with practical questions such as dates for the researcher
visits to schools; moderation visits, online meetings, and deadlines for
sending questionnaires, marking forms and the final reports.
The volunteers received a copy of the assessment booklet resulted that was
developed for Trial 1 (Appendix XIV) two weeks before the standardisation
meetings. At the School K meeting the booklet was new for all the teachers involved,
since it was the first time they had participated in the experiment. However their
responses to question 15, in the questionnaire had shown that all of the teachers
found it easy to understand it, and were in agreement with the statement ‘The
instructions for the examination were clearly stated’. AM; I and M said that they
adopted similar strategies in their teaching, J said that although he was motivated by
the trial and agreed with the rationale, he expected some resistance from his students
because they were not very motivated and were unfamiliar with the tasks. Since the
moderators had participated in the pilot they described the pilot experience and
showed the fifteen marked portfolios, explaining how the criteria had been applied.
The group spent two hours looking at the marked portfolios asking questions about
specific issues related to the type of evidence and how it was marked. After the
familiarisation exercise, the volunteers were asked to mark individually the ten
261
portfolios displayed for the blind-marking exercises. Afterwards the group discussed
the marks.
At the meeting in School V the standardisation was easier to conduct because all the
teachers involved had participated in the pilot. The first and second parts of the
agenda were covered very quickly. The volunteers described other portfolios they
had developed with students during the second term (January-March) and showed
student work. Since they already had marked the exemplar portfolios during the pilot
the group also decided to use the new ones for the familiarisation exercises.
The results of the blind-marking exercises in both meetings revealed that the
volunteers were reasonably consistent in their marking with only one to four points
difference between markers (see table 37).
J IL AP AM C M E I A R diff
Portfolio 1 6 5 6 6 6 6 6 6 6 5 1
Portfolio 2 10 12 11 12 12 12 13 13 12 12 3
Portfolio 3 8 8 8 7 8 7 6 8 8 7 2
Portfolio 4 9 9 8 12 9 9 10 9 8 10 4
Portfolio 5 18 18 17 19 17 16 17 18 17 19 3
Portfolio 6 11 11 10 11 10 10 9 10 12 11 3
Portfolio 7 10 10 10 11 10 10 10 9 10 11 2
Portfolio 8 20 20 19 20 19 18 19 20 18 20 2
Portfolio 9 16 17 16 17 16 16 16 17 16 17 1
Portfolio 10 8 8 7 8 7 7 7 7 7 9 3
Table 37: Trial 2: standardisation blind marking results.
7.2.2. Web page and on-line meetings
Under the designation ‘Art assessment’ a web page was constructed on the Internet
(www.prof2000.pt), on a site administered by a consortium of schools dedicated to
distance in-service teacher training. The contents of the site were: (1) explanation of
the rationale and conceptual framework; (2) explanation of the portfolio and
262
evidence required for assessment submission; (3) instructions for teachers: (4)
instructions for students; (5) digital reproductions of marked portfolios; (6) teachers’
and moderators’ forum; (7) students’ forum; (8) on-line chat programme (#af31
channel at IRC). The web page, forums and on-line channel aimed to provide a space
for continuous dialogue between the users of the new instrument. On-line meetings
were conducted with some of the volunteers and the researcher on 5/5/03; 21/5/03;
16/6/03; and 25/6/03 between 18.30-21.30h. However they were not as useful as
expected because of a lack of teacher and student participation. Only four students
sent messages to the student forum asking questions about the appropriateness of
their project briefs. According to the external observer the on-line debates ‘were
interesting and teachers participated in the majority of the dialogue’, however
teachers’ and moderators’ participation in the forum and on-line meetings, although
important, was very limited. According to his report: ‘ It seems obvious [in this case]
that inter-personal communication via IRC does not have the same qualities as
physical communication, however it presents advantages because it allows contacts
between people from different geographical locations, furthermore it allows people
to think about their replies allowing a richer dialogue between the participants’
(Trial 2, external observer report, 30-07-2003, Appendix XXIII).
7.2.3. Implementing the portfolio with students
The portfolio assessment was administered during the last term of the year.
The teachers explained the portfolio requirements to the class, the types of evidence
to submit and assessment criteria. I, E, M, AP, J and C convinced their students to
opt for a project brief and media that could be easily realised taking into account
available resources and class time. R and AM gave their students complete freedom
263
of choice. However a significant number of students chose the proposed theme:
‘Excluded’ or themes related to it (for details see Appendix XIV, pp. 190-192). A
great diversity of media used was evident in R’s art class (multimedia; web pages;
product design; fashion design; installation; video; photography, painting, sculpture,
technical drawing). Different media were also chosen by some students at Schools K
(for details see Appendix XX) and B (installation; illustration; advertising; painting;
performance; video; sculpture and photography), for details see Appendix XIX. At
School A all the students chose themes related to the theme given by the teacher,
‘Human figure’ They developed expressive drawing and prints (for details see
Appendix XXII). AP and C suggested the theme ‘Moods’ to their students who
submitted drawings and paintings (for details see Appendix XXI).
7.2.3.1. School P (Appendix XVIII)
The art class at School P only had five students and the teacher viewed this as
negative. The teacher described the school location (J interview, 2 June 2003),
as ‘a dormitory town’, and the students’ economic background were ‘very poor.’
He characterised their cultural interests as ‘shopping halls and sporting events’.
The school did not provide students with access to any other kinds of cultural
activities; the school environment lacked stimulus. As J pointed out ‘in the school
nothing really happens’. In his view students’ poor attitudes and lack of motivation
were a consequence of negative educational practices:
...they have a passive attitude towards art disciplines;

because they are used and were trained as passive
objects to make pre-determined tasks
(J report, 6 June 2003).
264
All the students rejected the new instrument because they did not understand it as
related to art and design specialisms:
I didn’t like the portfolio; Studio art is for drawing and experimenting
with new techniques, not for investigative studies (SR Sergio, 2nd June
2003).
All the P students expressed the same opinions in the section 3 of the questionnaire.
They were unfamiliar with the new requirements, their previous learning in art was
underpinned by a formalist approach, and because of that they could not perform in
the majority of tasks.
[During the course] It was not the same thing; the preparation/search was
just finding an image and after we did the work using the image; for
example an impressionist or abstract work. We are used to making
drawings (MTEP; Studio Art) to experiment with specific techniques the
teacher ask us to experiment… The work is the same for everyone; for
example a face and then we work the form; light; colour, etc (SR Sergio,
2nd June 2003).
According to the teacher, the educational practices in the school followed different
underlying approaches and the new assessment instrument disadvantaged students
because they were not been prepared to develop independent thinking and critical
skills at school:
Practice in the arts has centered on knowledge of

content and technical skills without helping students to
understand and to be able to justify process. And this
was the main cause of the poor student results in the
portfolio. Because in the instructions there was no
detailed task prescription; students felt that anything
could be done and lost direction without any
prescription. They were unable to be creative (J report,
6 June 2003).
Students found the new instrument difficult and inappropriate for them:
Students got lost; they couldn’t define a problem.

They did not know how to create their own ideas; how
265
to experiment. They limited themselves because they
were afraid to experiment or explore (TR J interview,
2nd June 2003)
We didn’t know what to do; we felt lost because we

didn’t know what kind of final work we should make
(SR Sandra, 2nd June 2003).
I didn’t know what to do; I had no concrete ideas to

make the developmental studies (SR Raquel, 2nd June
2003).
Overall students considered the new instrument too time-consuming and too broad.
For these students the instructions were not detailed enough and the criteria focused
on knowledge, understanding and skills in art and design which they considered
unimportant; they wanted to be assessed on short and prescriptive tasks and not have
to make decisions for themselves, or carry out investigative studies or critical
reviews. Unfamiliarity with such tasks and a lack of previous training in critical
thinking were the main reasons why this trial failed in that school. These students
understood studio art merely as technical skills. In the two months available for the
trial, the teacher was not able to modify habits and foster development of
independent thinking, critical skills and decision-making. While it was probably an
error to use the new instrument with these students, the teacher did not use the
resulting marks for his internal assessment, so participating in the trial did not
disadvantage the students.
7.2.3.2 School B (Appendix XIX)
School B was a contrast. Located in the centre of Lisbon, it was a ‘top’ school.
According to the teacher, the students’ social and cultural background was rich, they
266
came from elite basic education schools and their parents were involved with school
activities. They frequently visited exhibitions, concerts, and theatre and dance
festivals.
My school has very good resources; students are

highly motivated and competitive. Students’ socio-
economic and cultural background is high. The school
life is dynamic promoting various activities and
cultural events (TR AM, 2nd June 2003).
According to the teacher they were also very competitive and motivated.
The researcher observed that the relationship between the teacher and students
was good and built on mutual respect:
There is a friendly relationship teacher/student; a student invites the

teacher to go to a concert; another student brought texts and magazines for
the teacher about graffiti; she says: ‘Teacher; if you want to know more
about the subject there is some material’ The teacher says: ‘Thank you
Sara; I am learning a lot from your projects’. She is also helping students
with additional technical support; one of the students is doing a video
about gipsies, she asked the student: ‘Telma did you contact the video
teacher I told you about? Telma answers: ‘Yes; he was really kind to me;
we fixed the time to mix the clips in the video; next week I will do it
because the lab is available’. Another students says: ‘Teacher; you know
the film you recommended to me about the hippies, the Jesus Christ
Superstar; I saw it during the weekend with Telma and Joâo; but I think
the Woodstock images are much more helpful to use in my portfolio’
(research diary, Lisboa, 2nd June 2003).
AM had no problems applying the new instrument with the students because she
already assessed students in a similar way and fostered the kind of knowledge,
understanding and skills required by the new assessment instrument in her classes.
So, the tasks were not unfamiliar and students had the necessary training to perform
within the constraints and opportunities it presented. Their previous experience
seemed to be a key factor for the success of the new assessment instrument in this
case.
267
In general, students considered that the new assessment instrument was valid and
they enjoyed the relative freedom of choice:
The portfolio was good essentially because we had to make different

works; it was not prescriptive; we had a theme and we had to develop
the work… From our heads; not like ‘go and draw, draw a bench with a
monkey’ … I did a portfolio with things I like to do and showing what
I wanted to show (SR Telma, 2nd June 2003)
The portfolio demands more creativity; it is much more enjoyable and

motivating for us; but it is also more difficult because of that (SR Sara, 2nd June
2003)
The success of the new instrument at School B was related to its characteristics, the
previous learning of students and, especially, their teaching strategies:
The teacher also helped us; I mean psychologically, by motivating us,

giving us confidence in our work (SR Ana, 2nd June 2003).
The teacher was always telling us to work autonomously and to think

independently (SR Susana, 2nd June 2003).
Autonomy; we learned it little by little during the course (SR Diana, 2nd
June 2003).
However other important factors were the students’ attitudes and behaviours; their
past working practices and ability to learn independently and collaboratively
facilitated their performance.
We learn with what the teacher gives us, but we also learn by ourselves
(SR Ana, 2nd June 2003)
It was very important to have ‘crits’; the opinions of other students is

very important …Yes, it helps us to improve …We had a lot of help
from our colleagues; and it helped us to think better (SR Sara, 2nd June
2003).
7.2.3.3. School A ( Appendix XXII)
268
School A was a small school far away from the big cities on the coast in a small town
in the North East region of Portugal. It has a museum and a library but not many
cultural events. The school lacked appropriate physical resources for the arts and the
teacher said that he tried to compensate for the lack of library and cultural activities
by using his own books and promoting school trips to exhibitions and museums.
He described students’ context as follows:
The school receives students from the region. We have students from
different socio-economical backgrounds. The students are very
motivated but general standards in all disciplines are low. Students are
used to a school system based on memorisation and they often have
difficulties in independent thinking skills; this is not only my opinion,
the same opinion was expressed by the Philosophy teacher at the last
teachers’ meeting (M’s report, 27 June 2003).
Implementing the new assessment raised several difficulties at this school:
The main difficulties in implementing the portfolio with my art class

were related to the rationale of the portfolio; it was very difficult to
require students to think independently because they are used to
following detailed prescriptions for each task (M’s report, 27 June
2003).
However the teacher ended up with a positive implementation of the portfolio and
there was evidence of good quality in the students’ outcomes. This probably
happened because the teacher had gradually prepared his students to develop
independent and critical thinking during the last years of the course, so they were able
to perform confidently in the majority of the assessment tasks. And besides, M knew
that his students had difficulties in decision making, and to overcome it, M
administered the new assessment instrument using a relatively prescriptive approach
by choosing two themes for the project briefs: ‘Human Figure’ and ‘Self-portrait’;
providing books for investigative studies and carrying out human figure drawing
exercises during the art classes. These students received a great deal of support from
269
their teacher; consequently their portfolios were not very diverse in terms of media
and experimentation. However they revealed considerable understanding of ways of
thinking and developing ideas in their notes and work journals.
By using such strategies M avoided failures like the one at School P. Nevertheless this
trial showed that it is possible to use the new assessment instrument with students
who were unfamiliar with the tasks if a teacher adopts a directive and supportive
approach in the preparation stage of portfolios. However some students still
commented negatively the lack of preparation. Mafalda wrote in her questionnaire:
This kind of examination is fair, but I think that students should have
better training for this type of exam. We should be taught for example
how to make investigative studies according to our themes; I think we
also should experiment with more techniques and media in order to
improve our technical skills and ability to evaluate our problems (SR
Mafalda, June 2003)
The experience was not wholly positive because problems of time and lack of
familiarity with tasks disadvantaged some students. According to M’s report:
Overall the portfolio was a good experience for me and for the
students. But I think that time should be more flexible. The main
advantage of the portfolio is fostering students’ sense of
responsibility, it has a very good structure but it should be developed
during the year. Students should be prepared for portfolio tasks during
the first year of the course. At this moment, I think it is very difficult
for students, at least for the average, because they are not prepared for
this kind of task (M’s report, 27 June 2003).
7.2.3.4. School K ( Appendix XX)
School K was located in a suburban region of Lisbon, well known for its baroque and
neo-classical monuments. It lacked physical and cultural resources but during the
year the staff promoted several cultural activities. According to information given by
the schools’ administrative service, the students were from different socio-economic
270
backgrounds and came from middle class parents and low-income families.
There was small minority of Black and Indian students but the majority were White.
According to the teachers the students were highly motivated: ‘They work hard to get
good grades; and they work hard in extra-curricular activities: (TR I; 9 June 2003)
‘They like to do things’ (TR I; 9 June 2003); ‘They are looking forward to making
things (TR E; 9 June 2003).
The implementation of the new instrument in this school revealed surprising results.
The two teachers, I and E, had different conceptions about art education and different
relationships with their students. E could be viewed as a prescriptive teacher while I
promoted independent work and group discussion in her classes. However during the
portfolio implementation the two teachers adopted a similar less prominent role; after
they had given the instructions to the students and provided some reference books
related to the starting points for the model project briefs they left the students alone.
I didn’t know about the kind of preparation; what kind of suggestions I

was allowed to make. So I left the students more or less alone after I
explained the portfolio instructions to them. Anyway students didn’t
ask me many things; they seemed to be very confident in what they
were doing (TR E, 26 June 2003).
Students were quite excited about the new instrument; they seemed to understand the
tasks as authentic and liked the more personal way of working in the arts.
I think it was well done; well though. I liked this work because it is
related to my motivation. I am not very good in examinations. Usually
I had bad results in tests; I know I can do things; but I am not able to
demonstrate it in test or examination conditions. The portfolio was
good because we were assessed when we were working under normal
conditions (SR Inês, 9 June 2003).
With our teachers, before the portfolio, we were given very

prescriptive tasks; when it was to make a drawing, we made a drawing;
when they asked us to paint, we painted. In the portfolio we could do a
lot of other things, we could choose. We had to think what to do…. It
was important to know what we wanted to do (SR Júlia, 9 June 2003).
271
The portfolio was more real than the usual work and tests we do in the
art classes; it is about our life; not school life or tasks that school thinks
are important for us (SR Ruben, 9 June 2003).
Students seemed very committed to developing their portfolios and it could be said
this instrument corresponded to their expectations for ‘real’ art and design
I liked the portfolio experience. It was related to the course. And it

was important for me because…. in the portfolio we were aware of
what is necessary; we know what we are required to do, to make it
well. It makes us think; more than usual. And it was also important to
know that the portfolio would be assessed by other teachers (SR
Isabel. 9 June 2003).
It was a surprise for the teachers – although they knew their students were already
motivated they found this motivated them even more:
They were so involved in the portfolio; we could say that they were
engaging themselves in a very unusual way; it was more than the
normal motivation. I could tell that they were living it intensively (TR
E, 26 June 2003).
And this motivation ended up having consequences in the other subjects:
They worked hard; and the other teachers complained that students
were more interested in Studio Art and were not paying the same
attention to other disciplines’ homework (TR I, 26 June 2003).
According to the teachers, students were not usually required to do much homework
for Studio Art – the coursework was developed in class time. It seemed that during
the portfolio experience students reversed the usual status of the disciplines paying
more attention and time to Studio Art.
Students at School K were obviously familiar with processes of independent and
critical thinking; the fact that students did not ask for additional support during
272
preparation time suggested that they were quite comfortable with the portfolio
requirements.
Sometimes they asked me questions, just to confirm if they were

working in the right direction, if the chosen media were all right, how
many developmental studies they should submit. Things like that (TR I,
26 June 2003).
It is possible that the students’ previous experience in Studio Art classes and other
disciplines enabled them to perform well. However the school’s dynamic tradition
in promoting several cultural activities and workshops as extra-curricular activities
could also explain the students’ independent work habits. However the portfolio
was not considered to be an easy examination option. For example Tânia wrote in
her questionnaire ‘ I had a lot of difficulties especially in organising the work’ (SR
Tânia, June 2003).
7.2.3.5. School V ( Appendix XXI)
V was a city in the centre of the country. It was a large school but had noticeably
limited physical resources.
The school receives students from the surrounding district; it is a

famous school in the region, especially because of the art courses we
have. Some students are from distant places and live at student
residencies during the year. The parents are from very different
economical background, but the great majority are from middle class
families. There are no significant numbers of ethnic minorities at the
school. A very few Black students, the majority are White (TR C, 25
June, 2003)
In this school the arts were valued:
… the arts are recognised and cherished by the entire school

community. There is a strong tradition of good results in the national
art examinations. (TR R, 25 June 2003).
273
C defined the school as very dynamic, promoting all kinds of extra-curricular
activities open to the entire community. Art students were fortunate because their
teachers organised school visits to sites where students could see contemporary art:
However it is far away from other cities and the art department usually
organises school visits to museums and art festivals at Lisbon; Porto;
Barcelona or Madrid, at least one per year (TR AP, 25 June 2003).
The school had committed teachers and motivated students:
There are many art classes in the general courses and in the
technological courses. There are twelve art teachers in the art
department. The students are highly motivated and the art department
teachers are fine; we work very well together (TR R, 25 June 2003).
I think students like the school; and also the kind of relationship we have
with them; generally they work very hard (TR R, 25 June 2003).
However we don’t reduce the school to this building and the art lessons
are not always in the classrooms. The school is everywhere; in the park,
in the museum, in the streets. We usually have outdoors lessons.
Especially in Studio Art, where students can develop different media, we
try to make connections with other places; for example with factories,
design and craft studios in the town (TR R, 25 June 2003).
The new instrument had been piloted at this school, so the students and teachers were
familiar with it. The three teachers involved in the main trial continued to use
portfolios as before in their internal assessment. Consequently the conditions in this
school were quite different from the other participating schools. The teachers
involved had adopted the new assessment instrument and, for example, for teacher R
it was not a novel approach to Studio Art:
You know, I have worked like this for several years. What you gave me
with the portfolio experience was a more structured instrument,
because the booklet was very organised and things were systematised
… (TR R, 25 June 2003).
274
The other two teachers were given time to prepare students for the trial during
the year:
…. since we started the portfolio in September [pilot] students had the

time to be prepared; little by little they learned skills of critical
reflection and self-evaluation. They learn how to plan the different
steps of the work in the time to respond to the deadlines. It was good to
continue to use portfolios after the first experience [pilot] at the
beginning of the year; students acquired so many learning experiences,
they had the opportunity to develop organised ways of thinking and
making; they developed independent skills and little by little they
pushed their own barriers; they learned how to realise their own
intentions and how to make projects that motivated them (TR AP, 25
June 2003).
It was really interesting to see how much students evolved during the
year. In the first portfolio they were not really good at making
investigative studies and developmental studies; but towards the
middle of the year they started to understand how it was good for
them and in the end they made portfolios with very good personal
responses (TR C, 25 June 2003).
However for some students the new assessment instrument was still problematic:
The most difficult thing about the portfolio was to justify our options;
but it also depended on the media. I developed a photography project.
It was easy to explain the techniques and to evaluate the experiments;
but it was less easy to explain the decisions (SR Joaquim, 17 June
2003).
For me the difficulty was in dividing the tasks; planning. Because I left
a lot of things to the last hour… it was difficult to organise the ideas; to
sort them out from the confusion (SR Manuela, 17 June 2003).
7.2.4. Moderation
Moderation of a sample of students’ portfolios was conducted during the period 16-
30 June 2003 after the internal marking. The sample of portfolios followed the rules
established in the guidance booklet that contained examples with a range of low,
middle and high marks. At Schools B and A, the sample included also four portfolios
275
that the teachers had considered problematic to assess during internal moderation.
A was first moderator and visited all the schools to moderate the samples.
Her reports confirmed that the marks awarded were generally in line with the
standards agreed during the standardisation meetings. In some cases there was a
difference of one or two points between her own marks and the teachers’ marks.
After the moderation A discussed the results with the teachers explaining why she
had found their judgements too harsh or too lenient. In the majority of cases the
teachers had been too harsh, perhaps because they had been influenced negatively by
students’ behaviour and attitudes throughout the whole year. After the moderation
teachers revised the marks in line with A’s suggestions. In two schools A found
portfolios that had been very under marked (School K: Joana’s portfolio, see
Appendix XX, p.270 and School P: Luis’ portfolio, see Appendix XVIII, p. 248).
The second moderator, IL, visited Schools P and K. IL agreed with A’s
recommended awards and informed the teachers J (School P, see Appendix XVIII,
p.249) and E (School K, see Appendix XX, p. 273). They appeared to understand
why they had under marked the portfolios and agreed with IL and A’s marks.
Joana’s and Luis’ marks were finally agreed and IL moderated all students’
portfolios in order to see if there were similar cases, but none were found.
Student Teachers’marks 1st Moderators’mark 2nd Moderators’ marks

(AC1+AC2+AC3+AC4+AC5) (AC1+AC2+AC3+AC4+AC5) (AC1+AC2+AC3+AC4+AC5)
Award
Luis 3+1+3+10+1=18 10+5+20+20+5=60 10+5+20+20+5=60
(School Final mark: 2 Final mark: 6 Final mark: 6
P)
Joana 26+21+28+28+12=105 30+23+48+40+17=158 26+28+50+40+15=159
(School Final mark: 11 Final mark: 15 Final mark: 15
K)
Figure 14: Differences between marks in Joana’s and Luis’ portfolios
276
Overall the results of moderation procedure showed consistency in marking.
Teachers and moderators shared a common interpretation of criteria, but the small
number of students and teachers involved in the trial only allow a tentative
conclusion about the possible efficiency of the moderation procedures in other
contexts. The participating teachers viewed moderation very positively as a means to
improve assessment and also as a way to enhance confidence in their own
judgements.
I was very happy to see that the marks I awarded to the portfolios are
recognised by the moderator; it gives me confidence in my assessment
skills, and besides I feel less alone in my profession because of the
opportunity to establish dialogue with other art teachers outside my
school (AM’s report, 18 June, 2003).
What was also good in the portfolio assessment was the external
assessment. I think that it was very good to have my marks confirmed
by the moderator. So she could reassure me and my students that we
were using the established standards (M’s report, 27 June 2003).
7. 3. Feedback from Trial 2
7.3.1. Underlying concept
An analysis of the questionnaire responses showed that the new instrument was
compatible with teachers Studio Art curricula. Seven teachers strongly agreed and
three teachers agreed with the statement: ‘ The examination respects my own
opinions about art and design learning’ (question 65). Six teachers strongly agreed
and four teachers agreed with ‘The examination respects my teaching practice’
(question 66).
7.3.2. Preparation
Question Questionnaire results: % of agreement with: Students
277
(117)
54 To be well prepared to conduct the examination. 73%
55 It was important to discuss the contents of the portfolios 96.5%

with the teachers.
56 It was important to have regular group criticism sessions. 93.8%
57 The contents of the portfolio were regularly discussed with teachers. 84.3%
58 It was important to talk about and regularly explain the work in their portfolios 94.8%
with the teacher during preparation time.
59 The teacher was very interested to hear students’ views about 91.4%
the work in their portfolios.
Table 38: Responses to questions about preparation.
An analysis of the questionnaire responses showed that most students considered they
were well prepared to take the examination. Collaborative working was one characteristic
of the new assessment instrument and the researcher observed constant dialogue between
teachers and students in the form of advice or providing motivation. Although 51.3
percent of the students thought that their teachers influenced their work, only four
teachers recognised this (questionnaire trial, question 59). This influence was mainly in
the form of verbal advice and helping students to develop a project brief appropriate for
the time and resources available. There was interaction between students and ‘crits’ were
also considered useful in the development of the portfolios.
7.3.3. Format
The students considered the assessment instrument very flexible.

(117)
60 Freedom of choice to develop project briefs that related to interests and 87.2%
motivations.
61 The teacher influenced students in their choice of works and approach. 51.3%
62 Useful to have a model project brief. 83.3 %
63 Important to have the opportunity to design a class 81.7%
project brief.
64 Important to have the opportunity to design an individual 96.6%
project brief.
Table 39: Responses to questions about format.
278
Both students and teachers considered the assessment instrument compatible with art
and design practice, related to the contents of the specialism and to be capable of
allowing for students’ personal interests. According to the interviews:
The portfolio was real work; I mean because it was like the work by
artists and designers (SR Carla, 17 June 2003).
It was related to the course (SR Isabel. 9 June 2003).
In the end it [the portfolio] was also a way to explore ourselves (SR
Sandra, 2nd June 2003).

(117)
1 Students are given a good opportunity in the examination to show well 96.6%
what they know, understand and can do in and about art.
2 The examination respects diversity of learning styles. 94.9%
3 The exam respects students’ different motivations and intentions. 88%
4 The exam respects students’ different opinions. 89.7%
5 The examination allowed students to display the knowledge, 91.5%.
understanding and skills set out in the syllabuses.
10 The examination makes it possible to distinguish between 87.9%
very good students and just good students.
Table 40: Responses to questions about content.
7.3.4. Instructions
All the teachers and the majority of students found instructions clear.

(117)
15 The instructions for the examinations are clearly stated. 92.3%
16 The instructions clearly stated what students are required to submit as 88.9%
preparatory work.
developmental studies.
self-assessment work.
final product.
Table 41: Responses about instructions.
In the words of António:
279
‘The instructions for students were clear and helped us to organise
the sequence of tasks’ (School V, SR António, 17 June 2003.
In general the instructions for teachers and students were considered sufficiently
clear, accessible and useful for students and teachers.
7.3.5. Time
During the pilot, 78.4 percent of students had commented that the designated time for
portfolio development was not adequate. However the opinion of students who
collaborated in both the pilot and the trial changed for better. This confirms the
suggestion that the problem of lack of time was related to degree of familiarity with
tasks, as suggested in the previous chapter.
In the questionnaire response to the trial, all the teachers and 70.9 percent of
students agreed with the statement ‘students have plenty of time to complete
the examination’. However, time was a controversial and problematic issue.
Scheduling strict periods for developmental studies and for realising final products
might not be a good idea because some kinds of work take more time than others.
The problem of time is very relative; especially the time for production. It depends
on the kind of final work we develop. Some students made the final products very
quickly like Inês. She painted the canvas in two hours; but for me and others it took
lots of time to make the final product. We could have chosen quick final products but
it was our work; we wanted to make our best performance (School K, SR Isabel, 9
June 2003).
280
On the other hand more flexibility with time might introduce an element of bias
because students with more free time might be advantaged. Some students did not
want to spend extra-time on their portfolio, so they chose project briefs that could fit
the available time. However, other students felt that it was not fair to narrow their
options because of time limits; reducing time would prevent them fully developing
their work as they wish.
The question of time was discussed at length during teachers on-line meetings
without consensus. For example teacher E wrote: ‘All the students must have the
same period of time for the portfolio; it is their problem if they can not adapt their
projects briefs to their available time’ (on-line meeting 21 May 2003); and teacher R
argued ‘ We need to allow students to work on their final products after school,
otherwise they will not have time to complete it; we cannot ask students to narrow
their projects just because of limits of time’ (on line meeting 21 May 2003); and
teacher C said: ‘ It is not fair if some students have more time than others; all
students should have the same time for completion’ (on line meeting 21 May 2003).
During the main trial, teachers treated the question of time differently. However,
students who spent more time on their final products were not advantaged in this
assessment because the teacher and moderators marked the works taking time into
account. In their portfolios, as well as including the written time frame for their
portfolio’s development, students dated all the work for starting and finishing.
In the end a flexible timeframe was agreed depending on the school context and
students’ intentions. Although it might be seen as not allowing a strict uniformity of
281
application the ‘set’ requirement was the same for everyone. However students were
free to continue their work beyond examination time limits provided this was
recorded and this seemed to improve the authenticity of the assessment instrument.
7.3.6. School resources
Although in the researcher’s view only School B had suitable conditions for the
application of the assessment instrument, according to the questionnaire results 66.7
percent of the students said that the physical conditions (accommodation; equipment
and materials) were adequate for the examination. Teachers AM and R allowed their
students to develop final products outside the classroom when they needed to use
specific technical equipment, for example video or digital labs.
At a certain point we discussed; if we should narrow students’ choices

in terms of media and techniques because the school did not have
resources for everything which could be developed by students. But in
the end we decide that it was absurd to narrow students experimenting
and exploration just because of the school conditions (School V, TR
AP, 25 June 2003).
The other teachers advised students to work with the media and equipment that were
available in the classroom. However advising students to work this way narrowed
student’ freedom of choice, as teacher R pointed out:
Of course it was easier to reduce students choices in terms of media

and scale; but it wouldn’t be fair for them. Because what motivates
them is the freedom of choice; the possibility to go further, to explore
the impossible (School V, TR R, 25 June 2003).
The problem of inadequate physical conditions was more or less overcome in the five
schools by advising students to choose ‘realistic projects’ or by providing additional
resources. However access to adequate resources was identified as a possible cause
282
of bias, especially access to books, museums and exhibitions. In the main trial, the
majority of teachers brought their own books into the class; organised visits to
museums and exhibitions and helped students to find specific tools and equipment
for their projects in and out school.
7.3.7. Tasks
The results of questions about tasks in the questionnaire showed that examination
components Process, Product and Self-assessment were considered by all teachers
and by 90.5 percent of the students to be appropriate for assessing art and design.
The required assessment tasks were not familiar to 68.7 percent of students, but all
the teachers and a great majority of students considered the tasks appropriate for art
and design assessment as seen in the following students responses:

(117)
21 Recording ideas, intentions and motivations is an appropriate task to assess thought 96.6%
processes.
22 Investigative work is an appropriate task to assess knowledge and understanding in 93.1%
art and design.
23 Developmental studies are appropriate tasks to assess knowledge, understanding 93.2%
and skills in art and design.
24 The final product is an appropriate task to assess skills in 88.9%
art and design.
25 Self-assessment notes or reports are appropriate tasks to assess 82.2%
evaluative skills in art and design’.
26 Work journals/ annotated sketchbook is an appropriate task to assess thinking skills 89.7%
in art and design.
27 Annotations about the intentions, quality and progress of the work 89.7%
in the portfolio is an appropriate task to assess thought processes.
Table 42: Responses to questions about tasks.
In the pilot it was hypothesised that self-assessment tasks were perhaps more
attractive to girls than to boys (Chapter 6, p.232). In order to see if this could be
verified in the main trial, male and female student responses to the questions about
the appropriateness of the tasks were compared. Significant small differences were
283
found between male and female students’ responses to these questions (21-27). Table
43 shows the results of the t-test obtained with the gender variables, 21 (record
ideas), 22 (investigative work): 23 (developmental studies), 24 (final product), 25
(self-assessment), 26 (work journals), 27 (annotations).
One -Sa mple Te st
Test Value = 0
95% Confidence
Interval of the
Sig. Mean Difference
t df (2-tailed) Difference Lower Upper
sex 30.653 116 .000 1.38 1.30 1.47
record ideas 27.609 116 .000 .9316 .8648 .9985
investigative
18.242 115 .000 .8621 .7685 .9557
work
devlpmt studies 18.419 116 .000 .8632 .7704 .9561
final product 13.328 116 .000 .7778 .6622 .8934
selfassessment 9.414 116 .000 .6581 .5197 .7966
workjournals 14.109 116 .000 .7949 .6833 .9065
annotations 14.109 116 .000 .7949 .6833 .9065
Table 43: Questionnaire trial 2: one-sample T Test: sex/tasks
In talking about the assessment tasks, portfolios were viewed by both students and
teachers as a complete process of inquiry and making in art. The perception was that
capacities such as independent study and student autonomy improved, as Julia and
Carla made clear:
The portfolio helped us to think alone… I think that in the portfolio

people have time to explore and experiment without rushing towards the
final product… (School K, SR Júlia, 9 June 2003).
I think that with this last portfolio people can see what we learned during
the three years of the course; it was a final work. I can see how much I
evolved since the 10th year; but this year was more important; because
we had to think alone (School V, SR Carla, 17 June 2003).
Investigative studies were also seen as an important part of the portfolio.
The search part was also very helpful.... because it gives us things that
others have done and we can use it as starting points, I mean
interpreting it and going further according to our own personality
(School K, student Inês, 9 June 2003).
284
While students appeared to enjoy the opportunity to include process and
product in their portfolios, they firmly stated the importance of final
products:
I think that the portfolio is not complete without the real final product.
We need to finish the final product (School K, SR David, 9 June 2003).
The final product was the culminating point…(School B, SR Sandra, 2nd
June 2003).
But students also acknowledged the importance of the developmental
studies, work journals and annotations – the opportunity to include
evidence of process.
I am not very happy with the final product; because just seeing it
people will not understand what capoeira is [a Brazilian war-dance]
but, if they look at the complete portfolio it will be easier; I can tell
them what capoeira is …the final product is just a small part of my
work (School B, SR Susana, 2nd June 2003).
In my case; it is hard for people to understand what I want to say in the

final product; the final product is not enough to explain or show what I
can do in the arts; but with the process I can explain this to people
(School B, SR Inês A., 9 June 2003).
I think that for some students it was more than just a school work to be
assessed, it is about them … I think the portfolio is a whole, combining
process and products; a person has to see it all, to read everything and
not just see the final products, to fully understand our achievement
(School V, SR Carla, 17 June 2003).
There was a sense of great achievement to be derived from the portfolio. Students
felt motivated to learn.
With this last portfolio I could develop a product design project. My

project was to make a bicycle for disabled people; people with one
arm. I enjoyed the investigative studies. I enjoyed thinking about the
small problems, to see how to solve them in the sketches. But what
really satisfied me was the final product. I could make just a small
maquette; I will not have a better mark because I made the real
bicycle and tested the bicycle performance just like a real product
285
designer. But it was my personal pride. I needed to go until the end, to
see that I was capable of doing it (School V, SR Tiago, 17 June 2003).
7.3.8. Criteria and weightings
From the data from the on-line meetings and questionnaire responses it was clear that
teachers considered the criteria clear and well balanced and that they provided them
with the necessary common ground to be able to assess students’ work consistently.
According to teacher AP:
For us the portfolio experience was really good, because it made things
easier. I mean all the decisions about assessment and marking; the
criteria were clear and easy to apply to each portfolio whatever media
or techniques the students were using (School V, TR AP, 25 June
2003).
For the students the criteria were also appropriate, as can be seen from their
responses to the questionnaire:

(117)
7 The weightings attributed to the components are appropriate 78.6
for assessing the disciplines of art and design.
28 The criteria for marking are clear to all candidates. 85.5%
29 Students are aware of what qualities their art and design 77.8%
teacher and external assessors are looking for in their
portfolios.
30 The criteria used to assess the portfolio are fair and recognise 86.3%
students’ effort and achievement.
Table 44: Responses to questions about criteria.
And as expressed during student group interviews:
The criteria were fine; everything which is important was there (SR,
Manuela, 17 June 2003).
It was balanced, including the process and the products (SR, Joaquim,
17 June 2003).
It was good to know the criteria; before we knew more or less what the
teacher was looking for in our works; but it was quite vague. In the
286
portfolio everything was written down; we knew what to do and how
the work would be marked (SR, Inês, 9 June 2003).
I think the criteria are fine; it helps us to be responsible; to acquire

responsibility because we had to think by ourselves; our thinking
process is assessed (SR, Ruben, 9 June 2003).
Similarly all teachers and a great majority of students, in the questionnaire, agreed
with the specific criteria and respective weightings (questions 31-45) as follows:
Criteria Assessment Assessment Assessment Assessment Assessment

Criterion 1 Criterion 2 Criterion 3 Criterion 4 Criterion 5
% of students 97.4 98.3 97.4 98.3 97.4
agreement
Weightings
% of students 92.2 95.7 90.6 95.7 94.9
agreement
Table 45: Responses to questions about weightings.
The response to questions 46 and 48 showed that for all teachers and 84.5 percent of
students all five criteria were equally important and for all the teachers and
84.5percent of the students no other criteria were needed. The students who
disagreed with the weightings for the assessment criteria made suggestions about a
fair weighting. However their opinions varied, revealing personal preference for
certain kinds of skills and tasks. In the responses to question 48 of the questionnaire
some students suggested other criteria such as: ‘the student is able to plan the work
according to the time available’; ‘creativity ‘; ‘student commitment to work’;
‘imagination’; ‘ the student is able to justify her work’; ‘technical competence’.
Such suggestions were similar to the original criteria, but the students did not appear
to recognise them in the assessment matrix, may be because they were integral to
broader criteria.
7.3.9. Bias
287
Responses to questions 11-13 revealed that none of the teachers and very few
students found the instrument biased, however a considerable number of students
(28.7 percent) believed that students from small cities or rural areas were
disadvantaged, and in fact the access to resources like good art libraries may be
problematic in such areas.

(117)
11 I think that the examination is free from gender bias (i.e. equally fair 97.4%
to all students, boys and girls).
12 I think that the examination is free from bias related to where students live 71.3%
(i.e. equally fair to students from big cities and students from small cities or
rural areas).
13 I think that the examination is free from bias related to racial/ethnic/religious 94%
background of the students.
14 I think that the examination is free from bias related to social class. 88%
Table 46: Responses to questions about bias.
Although the majority of respondents did not detected gender bias in the
examination, some tasks involving written comments and self-reflection notes
favoured female more than male students. The researcher observed that the portfolios
of male students included fewer notes and often exclusively visual work-journals
using brief annotations. In contrast, girls included long prose passages and visual and
written work in journals.
According to some questionnaire responses students from small cities were
disadvantaged in terms of access to libraries and museums. And students from low-
income families could have problems in acquiring expensive materials as seen in the
case of Jocelyne and Joana at School K:
I have no money to buy another film and photographic paper and the
last photographs are not good enough; I have to present it as it is (22
May, Jocelyne’s work journal).
288
I spent all my money with the coloured photocopies… I would like to
go to the exhibition at Culturgest but the ticket is too expensive (12
May, Joana’s work journal).
This problem was acute at School K where teachers were aware of their students’
constraints in terms of materials. It was agreed from the first standardisation meeting
that the types of materials used by students should not influence teachers and
moderators and, in fact, some students with very high marks used recycled materials.
But the problem of bias related to students’ financial opportunities was not fully
overcome because certain students could not achieve what they wanted because and
they had to adjust their choice of material to their budgets.
The new portfolio assessment instrument facilitated collaborative work between
students and teachers in an ‘apprentice’ type relationship as observed with AM and
her students. R favoured tutorials as a teaching method and almost all the teachers
used negotiation of objectives and progressive development of the autonomy of
student learning in their methods. The diversity of methods was possible because of
the flexibility of the instrument, commitment of teachers and motivation of students.
However a serious bias was detected related to the degree of aid and advice students
obtained from their teachers. As teacher R warned:
I think that if you are carrying on this experiment with portfolios you
should be careful because it is only successful with a certain kind of
teacher; it demands involved and well-prepared teachers; it requires
good school conditions…(School V, TR R, 25 June 2003).
Collaborative work between teachers and students, negotiation of objectives and
progressive autonomy of student learning perhaps was only possible because the
majority of teachers who participated in the main trial were well motivated.
289
They worked hard to help their students without over influencing their students’
intentions and outcomes.
Although the new assessment instrument was considered valid and to provide authentic
tasks, considerable problems of bias were found related to flexibility of time, available
resources and teacher support as seen in the case of Tiago’s portfolio at School V:
Even if you say that you are not impressed by this type of final work; it
is difficult not be impressed by a student who actually makes a real
bicycle as a prototype instead of a small-scale maquette. And in this case
the student made the bicycle because he had more time, he made the
final product outside school. And this is not fair for the others who can’t
afford extra-time or a teacher who has friends owning a bicycle factory.
So I think, for a question of fairness that students should have the same
resources and time conditions (teacher AP, 25 June 2003).
7.3.10. Assessment procedures

In the questionnaire all the teachers and majority of students stated that they found
the assessment procedures fair:

(117)
49 Students’ portfolios were fairly marked. 94%
50 I think it is important that students’ portfolios are assessed by more 93%
than one person for the final assessment, to minimize subjective bias.
Table 47: Responses about assessment procedures.
The teachers expressed positive views about the appropriateness of the materials and
efficiency of training. However slight problems of consistency were detected as only
seven teachers strongly agreed with question 75 and six with question 76. The others
opted for ‘agree’ may mean they were not entirely satisfied with the assessment
matrix and the degree of consistency in standards. Comparing these answers with the
multi-faceted analysis of the trial marking (p. 285) it is interesting to note that four
290
teachers were slightly inconsistent in marking which could be a consequence of less
training or marking experience with the new assessment matrix.
Question Questionnaire results: number of agreements with: Teachers (10)

Strongly Agree
Agree
67 It was useful to discuss the interpretation of criteria with other 9 1
teachers
68 During standardisation meetings we reached consensus about 6 4
interpretation of criteria
69 It was useful to have visual exemplars to understand the criteria 8 2
and standards
70 The instructions for assessment were clearly stated 9 1
71 I had adequate training to assess students’ portfolios 10
72 The assessment procedure combining holistic and criterion 9 1
referenced marking is appropriate to judge art and design works
73 The global descriptors include an indication of the nature of 8 2
knowledge, understanding and skills required in each mark range
74 The assessment matrix includes general instructions on marking 8 2
75 The assessment matrix is designed to be easily and consistently 7 3
applied
76 The standards were consistently applied 6 4
Table 48: Responses about consistency of marking.
7.3.11. Consequences/Impact
7.3.11.1. Results
For 74.3 percent of the students, portfolio marks were close to those that students
obtained during internal assessment (question 53). At School V the marks were
similar, but in the other schools teachers said that the marks in the portfolios were
lower and probably this was a consequence of the breadth of the instrument, which
was ‘more demanding in terms of knowledge and skills than the usual coursework
exercises’ (M, on line meeting 25 May 2003). However it may be misleading to
291
compare the portfolio marks with coursework marks because the criteria and
standards normally used in internal assessment may not be the same as those used in
the new examination. This was confirmed by teacher J who stated in the interview
that he used to mark student work leniently in internal assessment in order to help
them obtain good marks for university entrance purposes.
7.3.11.2. Effects upon students
The portfolio was a way to improve our knowledge and at the same time
it was preparing us for the future (SR Mafalda, questionnaire, June
2003).
The new assessment instrument increased students’ workload, but the majority of the
students were motivated and did not see this effect as negative. However students
firmly stated that with the portfolio requirement they needed to be more committed
and pay more attention to the subject. The questionnaire revealed that all the teachers
and a great majority of students saw the new instrument as a good learning tool and a
sound foundation for university. Ninety-four percent of students agreed with the
statement ‘putting together the art and design portfolio has been a worthwhile
learning experience’ (question 51) and 95.7 percent agreed with ‘Students’ art and
design portfolios are a good foundation for further study in art and design’ (question
52). These responses were confirmed during the interviews:
If someone wants to go further and choose a creative job, for instance as

a designer, the portfolio is a good training (SR Sara, 2nd June 2003)
The portfolio was a good preparation for university because we learned
about the development of methods of work (SR Zé and Marta; 9 June
2003)
7.3.11.3. Effects upon teachers
292
In common with the English examination system it was evident that teachers felt
overloaded by the portfolio experience. However in the context of this study, since all
the teachers were volunteers: they did not see it as negative. On the contrary, teachers
appreciated the opportunity to learn, improve their relationship with their students and
assessment practices.
But, you know the portfolio is a real challenge for us, as teachers. We
had so different student outcomes…. It was amazing and I found it
difficult to organise it alone; it demanded extra-work; which I don’t
mind because I am used to it. You can’t imagine the quantity of phone
calls and emails I received per day from students. I was as much
involved in the projects as the students; so we lived it together
intensively…(TR R, 25 June 2003)
The experience helped me to understand my students better. Through the

dialogue established during the portfolio, I was able to understand them
as individuals, looking to their particularities and respecting their
differences (TR AP, questionnaire, June 2003)
The experience confirmed my own pedagogical practices in Studio Art

and provided me new tools for art and design assessment (TR AM,
questionnaire, June 2003)
Now; I think that I can assess students with more consistency and I can
give more valuable assessment feedback to students (TR C,
questionnaire, June 2003).
After the trial at least nine teachers appeared convinced about the positive qualities
of portfolio assessment. According to their responses to questions 77-92 in the
questionnaire, their perceptions of the necessary knowledge, understanding and skills
in art education had changed. They were convinced of the advantages of portfolios as
a learning strategy and as an instrument for summative assessment.
293
Str. Agree Disagree Str.
Questionnaire results: number of agreements with: agree disagree
77. The examination influenced my teaching practice by using 8 2
portfolios as a learning strategy
78. The examination influenced my teaching practice by using 9 1
portfolios for summative assessment
79. The examination influenced my perception of required knowledge, 5 4 1
understanding and skills in art and design education
80. The examination changed my views about the importance of 4 3 3
process in art and design assessment.
product in art and design assessment.
investigative studies in art and design assessment.
83. The examination changed my views about the importance of self- 2 5 3
evaluation in art and design assessment.
developmental studies in art and design assessment.
technical skills in art and design assessment.
thinking skills in art and design assessment.
problem finding skills in art and design assessment.
88. The examination changed my views about the importance of 2 8
problem solving skills in art and design assessment.
independent thinking in art and design assessment.
creativity in art and design assessment.
91. The exam did not influence my teaching practice 1 1 5
92. The exam influenced positively my 7 2
teaching practice
Table 49: Responses about impact.
However some teachers disagreed with the propositions in questions 84, 85, 86, 87
and 90. They did not consider developmental studies, technical skills, thinking skills,
294
problem finding skills and creativity as new, may be because they already fostered
these qualities in their previous teaching and assessment practices.
7.3.11.4. Effects upon schools and curriculum
Teachers saw the portfolio as a valid instrument for learning and assessment with
strong consequences for the educational experience. As stated by J the new
assessment instrument could be used as an agent for curriculum reform:
However, we live in a period of social and technological change and our

students should have access to a different kind of learning, fostering the
generation of personal ideas and the portfolio as an assessment instrument
is a good strategy to introduce new pedagogical practices (J’s report, 6
June 2003).
The teachers agreed that the new assessment instrument could function as an agent for
the reform of established educational orthodoxy, and to promote diversity and
plurality of approach, because it was a form of assessment that was integrated with
the learning process and focused on students as active subjects rather than objects for
passive reception of information:
That’s why I think this is not just about assessment; portfolio is more
than an assessment instrument; it is a way of teaching and learning (TR
R, 25 June 2003).
The teachers who participated in the trial had an opportunity to review and reflect on
their concepts of the art and design curriculum and to re-think their practices in the
light of the proposed assessment framework. It is possible that some of them will
continue to use this model for designing portfolios and become agents of change in
the context of their own schools (on-line meeting, 25 June 2003).
295
7.3.12. Reliability of results
In order to check the inter and intra-reliability of assessors (raters) within the new
assessment procedures during the main trial, the teachers marked ten portfolios on
two different occasions (April 2003 and July 2003). The difference between teachers
was from one to three points (inter-rater reliability). The marking difference between
the same teacher (intra-rater reliability, i.e. performance at different times) varied
from zero to two points of difference.
The high consistency of the results was, probably, a consequence of: (1) the ease of
application of the assessment matrix; (2) the use of a common shared language in the
criteria statements; (3) the training of teachers and moderators at the standardisation
meetings.
7.4. Comparing the current MTEP examination and new model
7.4.1. Validities
The existing Portuguese examination in arts did not present many problems of bias
(see Chapter 4). The time and materials prescribed were strictly controlled and
uniformly applied, the tasks did not require materials or equipment other than pencils
and paper, the tasks were short and did not require long periods of development and
production, the questions did not require long, and possibly expensive, investigative
studies. In contrast the new external assessment instrument showed some bias
problems. However it was considered by the users to be more valid and authentic in
terms of content, face and response validities, qualities that were not evident in the
existing examination.
7.4.2. Reliability
296
In the mock MTEP examination the ten raters had no training for marking; they
were asked to mark the scripts in the same way they usually did during the national
examinations – in other words they only had the official instructions used for
marking in the current examination. They marked students’ completed scripts as they
usually did, at home during the period January-April 2003 and they sent the marked
scripts to the researcher (for details see Chapter 4, p.128).
During the fourth pilot training session (11 January 2003) the seven teachers who
attended it were asked to mark thirteen portfolios; they had received training during
the previous sessions in order to come to common interpretation of the criteria.
During the main trial, the researcher collected twelve portfolios, and the teachers
were asked to mark them on two different occasions. The new assessment procedures
improved the reliability of marking in terms of inter and intra-raters reliability as
seen below:
Current Exam New External Assessment

Instrument and Procedures
Differences between 5-7 points 1-3 points
markers (inter-raters)
Differences between the 0-6 points 0-2 points
same marker (intra-rater)
Table 50: Comparing inter and intra raters reliability between the current
examination and the new assessment instrument and procedures
A Rasch-based rating scale analysis computer programme called FACETS (Linacre,
1999) was used to analyse the data obtained from marking current examination
scripts, the pilot (Trial 1) and main trial (Trial 2) portfolios. FACETS is a
generalisation of the Rasch (1980) family of measurement models that makes
possible the Analysis of examinations that have multiple potential sources of
measurement error (such as assessment items, raters, and rating scales). In the many-
297
facet Rasch model (Linacre, 1989), each element of each facet of the assessment
(such as the candidate, the marker, item, scale, etc) is represented by one parameter
that represents proficiency (for students), severity (for markers), difficulty (for items)
or challenge (for rating scale categories). For each element of each facet in the
Analysis, the computer programme provides a measure (a logit estimate of the
calibration), a standard error (information about the precision of that logit estimate),
and fit statistics (information about how well the data fit the expectations of the
measurement model).
The specific questions explored with the FACETS output were related to the
consistency of markers: intra and inter-raters reliability in the three different
situations in order to compare the old examination and the new assessment
reliability.
7.4.2.1. Summary of the FACETS
Tables 51,52 and 53 display all facets of the analysis. The markers, candidates and
rating scales are calibrated so that all facets are positioned on the same scale, creating
a single frame of reference for interpreting the results from the Analysis. The scale is
in log-odds units, or logits, which constitute an equal-interval scale with respect to
appropriately transformed probabilities of responding in particular rating scale
categories. Column 4 display the 20 points rating scale markers used to score
students’ works. The bottom rows provide the mean and standard deviation of the
distribution of estimates for candidates and markers. The first column in the tables
displays the measure scale in logits. In the current examination (table 51) the range
of scale measurement used was very narrow while with the new assessment (table 52
298
for trial 1 and table 53 for trial 2) enlarged considerably the scale of measurement
(column 1). The measurement scale is used to report estimates of probabilities of
candidates’ responses under the various conditions of measurement (ability, rater
severity) that have been entered in the Analysis. The difference between the current
examination and the new assessment measurement scales might show that within the
current examination there were fewer opportunities for differentiating performance;
and that the new assessment provided more reliable information than the current
examination. The second column displays estimates of candidates’ proficiency.
Higher scoring students appear at the top of the column, while lower scoring students
appear towards the bottom of the column (Candidate ability measure). The third
column compares the markers in terms of the level of severity or leniency. Because
more than one rater marked each script (six scripts in the old exam) or portfolio (13
portfolios in the pilot and 12 portfolios in the trial) markers’ tendencies to rate
student’s works higher or lower on average could be estimated. More severe
markers appear higher in the column, while more lenient raters appear lower. The
markers showed great variations in marking in the current Portuguese examination.
In the pilot and in the trial there was less variation of severity between the markers.
This could be because during the pilot and trial the markers were able to achieve
consensus about the interpretation of criteria; and also because the criteria were more
helpful for marking the new instrument than those for the current examination.
299
Table 51: Measurable data Table 52: Measurable data Table 53: Measurable data
summary: Current summary: Trial 1 summary: Trial 2
examination
Me Cand raters S.1 Mea Can raters S.1 Meas Cand Raters S.1
as idat sr did r idat
r es ate es
s
1 (17) 5 (18) 10 (20)
---- 6 ----
4 14 9 8 19
1 4 1 ---- ----
---- 8 5 18
10 13 7 ----
3 17 6 17
---- 8 ---- 5 9 ----
2 9 12 2 11 16 16
1 6 ---- 4 ----
1 5 ---- 12 5 15 3 11
123 11 1 15
7 ---- 4 3 ---- ----
7 2 12 14
0 2 4 ----
1 8 10 1
---- 0 6 1 13
9 9 14 ----
3 6 9 12
-1 ---- 0 2 1 5 7 ----
---- 2 13 2 8 11
-1 4
3 ---- ----
-2 -2 6 10
7 12 7 ----
10 9
-1 (6) 5 ---- -3 4 ----
-3 11 8
-4
-4 ---- ----
13 -5 7
-5 3
(10) -6 10
-7 ----
-8 6
-9 ----
-10 1 (4)
7.4.2.2. Reliability index

Multi-faceted Analysis executed through the FACETS program provides, in addition
to the summary map, detailed information about the extent to which the assessment
300
instrument defines different levels of ability – its capacity to distinguish between
candidates – is provided in a summary of the information on candidates in the form
of a reliability index. The index is from zero to one, with values close to one
suggesting good reliability. This reliability does not indicate the degree of agreement
between raters, it is more like the indices of reliability associated with instruments of
assessment or tests (e.g. Cronbach’s alpha). The Analysis revealed that the current
Portuguese examination had a lesser degree of reliability in terms of internal
consistency than the new assessment instrument: 0.7 for the current examination
estimates of reliability, 0.97 for Trial 1 and 0.99 for Trial 2.
7.4.2.3. Raters measurement reports
FACETS produces two indices of the consistency of agreement across markers. The
indices are reported as fit statistics  weighted and unweighted, standardised and
non-standardised. In this analysis the infit mean-square was used to estimate markers
reliability. The expectation for this index is 1; the range is 0 to infinity. The higher
the infit mean-square index, the more variability can be expected. When markers are
fairly similar in the degree of severity they exercise, an infit mean-square index less
than 1 indicates little variation in the pattern of scoring, while an infit mean-square
index greater than 1 indicates more than typical variation in the ratings. According to
Myford and Wolfe (2000) there are no hard-and-fast rules for setting upper and lower
control limits for the infit mean square index. An acceptable interval for infit mean
square can be between 0.5 and 1.5. Below 0.5 the raters show less than acceptable
variation in scoring and those with more than 1.5 will show inconsistency or
excessive variation in marking.
301
The fixed (all same) chi-square is a test of the ‘fixed effect’ hypothesis:
‘Can this set of elements be regarded as sharing the same measure after allowing for
measurement error?’ The chi-square value and degrees of freedom (d.f.) are shown.
The significance is the probability that this ‘fixed’ hypothesis is the case. This tests
the hypothesis: ‘Can these raters be thought of as equally severe?’ The final line
contains the Random (normal) chi-square test. This is a test of the ‘random effects’
hypothesis: ‘Can this set of elements be regarded as a random sample from a normal
distribution?’ The significance is the probability that this ‘random’ hypothesis is the
case. This tests the hypothesis: ‘Can these persons be thought of as sampled at
random from a normally distributed population?’ (O’Sullivan, 2002, p. 17).
Looking at the Infit mean Squares in table 54 (current Portuguese examination)
revealed serious problems with four raters. Raters 9,6 and 5 they tended to give the
same scores to every script with no deviations, especially the marker 9. Marker 10
was also very problematic showing total inconsistency of marking (3.2).
In the trial 1 (table 55) one marker presented an infit mean square of 0.3 (poor
deviation) and two markers 1.6 (slightly inconsistent).
In the trial 2 (table 56) four markers showed poor deviation (infit mean squares 0.2,
0 .1, and 0 .3), but these four markers did not participate in the pilot, so it was
reasonable to expect less consistency because they had less training than the others.
Table 54: Current examination raters measurement report

| Obsvd Obsvd Obsvd Fair-M| Model | Infit Outfit |
| Score Count Average Avrage|Measure S.E. |MnSq ZStd MnSq ZStd | Nu Raters
| 74 6 1s 0 .7 0 | 1 1 |
| 74 6 12.3 12.32| -.19 .22 | .9 0 .9 0 | 2 2
302
|
| 75 6 12.5 12.48| -.24 .22 | .6 0 .6 0 | 3 3
|
| 54 6 9.0 8.95| .71 .21 | .9 0 .8 0 | 4 4
|
| 72 6 12.0 12.01| -.09 .22 | .2 -1 .3 -1 | 5 5
|
| 70 6 11.7 11.70| .00 .22 | .3 -1 .4 -1 | 6 6
|
| 77 6 12.8 12.81| -.34 .23 | 1.0 0 1.0 0 | 7 7
|
| 80 6 13.3 13.32| -.49 .23 | 1.4 0 1.4 0 | 8 8
|
| 68 6 11.3 11.40| .10 .22 | .0 -3 .0 -3 | 9 9
|
| 64 6 10.7 10.76| .28 .21 | 3.2 2 3.3 2 | 10 10
|
| 70.8 6.0 11.8 11.81| -.05 .22 | .9 -.5 .9 -.5| Mean (Count: 10)
|
| 7.1 .0 1.2 1.17| .33 .01 | .8 1.5 .9 1.4| S.D.
RMSE (Model) .22 Adj S.D. .24 Separation 1.09 Reliability .54
Fixed (all same) chi-square: 22.5 d.f.: 9 significance: .01
Table 55: Trial 1 raters measurement report

|
| Score Count Average Avrage|Measure S.E. |MnSq ZStd MnSq ZStd | N Raters
|
| 178 13 13.7 13.87| .33 .31 | .7 0 .7 0 | 1 1
|
| 176 13 13.5 13.73| .51 .31 | 1.6 1 1.7 1 | 2 2
|
| 172 13 13.2 13.37| .91 .32 | .3 -2 .3 -2 | 3 3
|
| 175 13 13.5 13.65| .61 .31 | 1.1 0 1.0 0 | 4 4
|
| 168 13 12.9 12.92| 1.33 .33 | .5 -1 .5 -1 | 5 5
|
| 179 13 13.8 13.94| .23 .30 | 1.0 0 .9 0 | 6 6
|
| 173 13 13.3 13.47| .81 .32 | .7 0 1.1 0 | 7 7
|
| 174.4 13.0 13.4 13.56| .68 .32 | .8 -.5 .9 -.4| Mean (Count: 7)
|
| 3.5 .0 .3 .32| .35 .01 | .4 1.1 .4 1.0| S.D.
|
RMSE (Model) .32 Adj S.D. .15 Separation .48 Reliability .19
Table 56: Trial 2 raters measurement report

|
| Score Count Average Avrage|Measure S.E. |MnSq ZStd MnSq ZStd | N Raters
|
303
| 143 12 11.9 12.03| -.08 .33 | .9 0 1.0 0 | 1 1
|
| 148 12 12.3 12.64| -.60 .33 | .2 -2 .2 -2 | 2 2
|
| 137 12 11.4 11.21| .54 .32 | .1 -3 .1 -3 | 3 3
|
| 153 12 12.8 13.19| -1.13 .33 | .7 0 .6 -1 | 4 4
|
| 140 12 11.7 11.62| .23 .32 | .1 -3 .1 -3 | 5 5
|
| 138 12 11.5 11.34| .44 .32 | .5 -1 .6 -1 | 6 6
|
| 144 12 12.0 12.16| -.18 .33 | .5 -1 .4 -1 | 7 7
|
| 147 12 12.3 12.53| -.50 .33 | .3 -2 .3 -2 | 8 8
|
| 138 12 11.5 11.34| .44 .32 | .8 0 .8 0 | 9 9
|
| 143.1 12.0 11.9 12.01| -.09 .32 | .5 -1.9 .4 -2.0| Mean (Count: 9)
|
| 5.1 .0 .4 .65| .54 .00 | .3 1.3 .3 1.3| S.D.
|
RMSE (Model) .32 Adj S.D. .43 Separation 1.32 Reliability .63
It is important to note that three of the Trial 2 teachers (R, C and AP) also
participated in Trial 1 (pilot), so they had more experience with the new marking
procedures than the others who had just attended the standardisation meeting and this
was reflected in the Trial 2 teachers’ highly consistent marking (see in the table 56
the raters numbers 6, 7 and 4 for infit mean squares logits). Failure to use the full
scale for marking can be observed with the new teachers involved in the trial and
who had not attended all the on-line meetings (see table 56 raters 3, 2, 5 and 8 for
infit mean squares logits). It is possible that these raters were not confident in
applying the grade descriptors because of their lack of training with visual exemplars
and participation in group discussions about interpretation of criteria. Therefore it
seems apparent that teachers who experienced previous assessment training
improved their marker reliability while the lack of previous training reduces
reliability.
7.4.2.4. Probability curves
FACETS also provides probability curve graphics. The curves present the probability
of occurrence for each category. The probability of the extreme categories always
304
approaches 1.0 for corresponding extreme measures. Most scale developers intend
this to look like a series of hills (O’Sullivan, 2002, p. 21). From the graphics in tables
57, 58 and 59 it is evident that the current Portuguese examination probability curve
is not at all as might be expected. It appears that in the current examination (table 57)
between 7 and 16 on the scale there is never a clear probability that any particular
score will be achieved suggesting very serious problems in the scale. The middle of
the scale appears to be unable to distinguish between different levels of ability and
this is problematic as this is where critical decisions often have to be made.
In the Trial 1 (table 58) the range of performance only extends from 10-18, and there
were problems with the scale points 13 and 16. At these points on the scale there is
never a clear possibility that this score will be awarded, i.e. this ‘hill’ is not
distinguishable from the other ‘hills’ and this could be caused by lack of clarity in the
draft assessment matrix and grade descriptors. However the revision of such tools
seemed to help because in the trial 2 (table 59) the probability curve was closer to the
expected probability curves, though there are still some residual problems with the
scale, for example at points 9 and 11.
In conclusion it is evident that still more work needs to be done to improve the new
examination instructions at these points on the scale, but the trial scale is clearly
superior to that used in the current Portuguese examination.
-4.0 -2.0 0.0 2.0 4.0

++----------------+----------------+----------------+----------------++
1 | |
| |
|6 7|
| 666 777 |
| 666 777 |
305
P | 66 77 |
r | 66 77 |
o | 66 77 |
b | 6 7 |
a | 66 7777 77 |
b | 6777 77 7 |
i | 7766 7 666666* |
l | 77 6 66 77 666 |
i | 77 66 7 6 7 66 |
t | 77 6 7 6 7 66 |
y | 77 6 *00 22***336 77 66 |
| 777 66 00 1****1 2633 7 666 |
| 777 * 11* 33001 6**5**555 6666 |
|7 009***9*7 ***5 **4*4 5555 6|
| 8888****************0***1 22****4 555555 |
0 |*********************************************************************|
++----------------+----------------+----------------+----------------++
-4.0 -2.0 0.0 2.0 4.0
Table 57: Current examination probability curves
-6.0 -4.0 -2.0 0.0 2.0 4.0 6.0

++----------+----------+----------+----------+----------+----------++
1 | |
|0 |
| 000 88|
| 00 88 |
| 00 88 |
P | 0 8 |
r | 0 8 |
o | 0 8 |
b | 0 88 |
a | 0 4444 7777 8 |
b | 0 44 4 555 77 7 |
i | 4 4*5 55 7 *7 |
l | *111 2222 4 5 4 5 8 7 |
i | 11 0 ** 2 5 4 7 8 7 |
t | 11 02 1 **3 5 4 75 8 77 |
y | 1 20 1 3342 33 5 4 7 5 8 7 |
| 11 2 0 3* 4 2 3*5 * 588 77 |
| 111 22 03 *1 225 33 76*666** 77 |
| 111 22 330*4 1 552 33 6*6 4 8 **6 77|
|1 2222 333 44 00 5**1 222 6**** 88*44 5*666 |
0 |*******************************************************************|
++----------+----------+----------+----------+----------+----------++
-6.0 -4.0 -2.0 0.0 2.0 4.0 6.0
Table 58: Trial 1 probability curves
-12.0 -8.0 -4.0 0.0 4.0 8.0 12.0

++----------+----------+----------+----------+----------+----------++
1 | |
| 0|
| 00 |
| 0 |
|4 0 |
P | 4 666 |
r | 6 6 0 |
306
o | 4 6 6 |
b | 0 |
a | 4 6 88 |
b | 6 6 77 8 8 00 55 66 77 0 |
i | 7 7 8 0 0 44 5 6 67 7 8 |
l | 46 7 8 22 4 * * 8 8 9* |
i | 7 6 7 ** 02 333 5 76 87 * 9 |
t | 5**5 7 8 7 9 9 324 5 4 6 7 6 9 8 9 |
y | 5 5 6 9 08 9 *03 3 4 5 8 7 0 9 |
| 5 6 4 5 7 86 * 1 1 42 53 6 75 86 97 8 9 |
|5 5 7 8 70 8 *2 31 * 6 4 7 9 0 9 |
| 6 4 7* 8 699 07 *29 0* * * 58 * 07 8 9 |
|66 7*4 55*8 9660 711283**40** 2* 337 4488559 6* 77 88 99|
0 |*******************************************************************|
++----------+----------+----------+----------+----------+----------++
-12.0 -8.0 -4.0 0.0 4.0 8.0 12.0
Table 59: Trial 2 probability curves
7.4.2.5. Expected ogives
The expected score ogive (linear graph representing scores expected to look like
similar steps) shows the average rating value expected for any measure relative to the
item, judge etc. The ogive also indicates the category ‘zone’ or areas between
average ratings (O’Sullivan, 2002, p. 21). Comparing the expected ogives from the
current Portuguese examination (table 60), trial 1 (table 61) and trial 2 (table 62) it
is evident that the new assessment showed much more consistency between points in
scales because the ‘zones’ between average ratings are more equally distributed.
In the current Portuguese examination only a small portion of the scale is used; it is
impossible to determine different steps for the real scale, at the top and at the bottom
the scores are overlapping. There is almost no increase in score for large areas of
ability between 2 and 7. Between 7 and 15 the scores for a similar ability increase.
But between 16 and the top of the scale there is almost no improvement in score for a
similar increase in ability. It appears that the examination was unable to
distinguish between levels of student ability or achievement and raters were
unable to differentiate between those levels. In the trial 1 (table 61), and especially
307
in the trial 2 (table 62) the increase in score is more in line with increase in ability
or achievement.
-4.0 -2.0 0.0 2.0 4.0
++----------------+----------------+----------------+----------------++
17 | 77777777|
| 66666777777 |
16 | 6666 |
| 556 |
15 | 5 |
14 | 44 |
| 4 |
13 | 3 |
| 23 |
12 | 2 |
| 12 |
11 | 1 |
| 0 |
10 | 0 |
| 9 |
9 | 99 |
8 | 8 |
| 78 |
7 | 777 |
| 666666777777 |
6 |666666666 |
++----------------+----------------+----------------+----------------++
-4.0 -2.0 0.0 2.0 4.0
Table 60: Current examination expected score ogive
-6.0 -4.0 -2.0 0.0 2.0 4.0 6.0

++----------+----------+----------+----------+----------+----------++
18 | 888888|
| 77788888 |
17 | 7777 |
| 667 |
16 | 66 |
| 56 |
15 | 555 |
| 4455 |
14 | 4444 |
| 344 |
13 | 333 |
| 23 |
12 | 222 |
| 12 |
11 | 1111 |
| 000011 |
10 |00000000 |
++----------+----------+----------+----------+----------+----------++
-6.0 -4.0 -2.0 0.0 2.0 4.0 6.0
Table 61: Trial 1 expected score ogive
-12.0 -8.0 -4.0 0.0 4.0 8.0 12.0

++----------+----------+----------+----------+----------+----------++
20 | 0000000|
19 | 9999 |
18 | 888 |
17 | 7777 |
16 | 6666 |
15 | 5555 |
14 | 4444 |
308
13 | 33 |
12 | 22 |
11 | 111 |
10 | 00 |
9 | 9999 |
8 | 8888 |
7 | 77777 |
6 | 66666666 |
5 | 5555 |
4 |444 |
++----------+----------+----------+----------+----------+----------++
Table 62: Trial 2 expected score ogive
7.4.3. Consequences/impact
According to the questionnaire responses about current art examinations in Portugal
(see Chapter 4) the current examination narrowed the range of the discipline and
distorted curriculum practice; the new assessment instrument enlarged the range and
promoted a broad vision of art and design education which might have some positive
impact upon the curriculum. The current examination does not require specific in-
service training for teachers; the new assessment procedures facilitated useful
professional training and dialogue between teachers. And, finally, the new
assessment instrument based on portfolio work was seen as a good foundation for
further studies in higher education, which was not the case with the current
examination.
7.3.4. Practicalities
The current Portuguese examination might be considered more practical in terms of
the limited resources, time, equipment and environment required. The new
assessment instrument and procedures require extra work from students and teachers,
improved work environment and possibly more expensive resources. The cost of
achieving a degree of authenticity of tasks is high but not impossible if the
examination is conducted in normal classroom situations. The price to pay for the
309
enhanced reliability of marking results is also higher because teachers and
moderators must be properly trained to conduct the external assessment.
Summary
Through its implementation in five different schools the new assessment instrument
was evaluated. From the data collected for evaluation purposes the following
conclusions about its strengths and weaknesses were drawn:
Main advantages:
1. The new assessment instrument was more closely related to the nature of art
and design providing ‘authentic’ tasks. It fostered in students development of
independent thinking, critical skills and decision-making.
2. The necessary evidence required to reveal relevant knowledge, understanding
and skills in art and design in portfolio increased validity. The assessment
instrument was considered to be compatible with art and design practice,
related to the contents of the discipline and to be capable of allowing for
students’ own interests.
3. The new assessment instrument included tasks that enabled students to realise
their intentions and motivated them.
4. It facilitated assessment for learning by providing useful feedback for
teachers and students. Students experienced a great sense of achievement
from the portfolio and they felt motivated to learn. In addition the new
instrument was seen as a good learning experience and a good foundation for
preparation for art and design at university.
5. It fostered effective dialogue between students and teachers. The new
portfolio assessment instrument also enabled collaborative work between
students and teachers in an ‘apprentice-like’ relationship. Collaboration was
310
one key aspect of the new assessment instrument that facilitated constant
dialogue between teachers and students, and between students especially in
group discussions, or ‘crits’.
6. It was adaptable to different school contexts.
7. The instructions, criteria, assessment matrix and global descriptors were
clear, increasing the transparency of assessment.
8. It offered a system of teacher and assessor training that increased common
interpretation of criteria.
9. It included a system of moderation that enabled a more uniform application
of standards and provided useful feedback for teachers and students.
10. Finally it provided more reliability of results than the existing examination.
Main disadvantages:
1. It requires hard-working and committed teachers. During the pilot and trial
teachers spent long periods of their free time in meetings. Collaborative work
between teachers and students, negotiation of objectives and progressive
autonomy of student learning was only possible because teachers who
participated in the trial and in the pilot on a voluntary basis were particularly
committed.
2. Problems were revealed resulting from unfamiliarity with the new instrument.
The candidates’ previous experience of tasks similar to those tasks required in
the portfolio seemed to be a key factor for the success of the new assessment
instrument. Unfamiliarity with the tasks and the lack of previous training in
critical thinking were the main causes for the failure. An important issue in
311
the relative success of the portfolio was linked to the students’ attitudes and
behaviours. Their past working practices and experiences of learning
independently and collaboratively all contributed to their overall
performance.
3. Potentially serious problems of bias were revealed, for example bias related
to access to resources and teacher support. The bias related to the degree of
help and advice students could obtain from their teachers was particularly
problematic.
4. The proposed new instrument is more expensive to operate than the current
Portuguese examination.
5. Finally the increased flexibility offered by the new assessment instrument
may inhibit its uniform application.
312
Chapter 8Conclusions
The three main sections of this chapter focus on: (1) the research questions
identified in the Introduction Chapter (p. 15) which were revisited in order to
summarise the findings and to draw conclusions, (2) some unanswered
questions were identified which might be explored more fully in further
research, (3) the research methodology was reviewed and critiqued, and (4)
implications of the study for Portugal were established and recommendations
made.
8.1. Current system of art examinations in Portugal (2000-2003)
The finding that the current system of art and design examinations in Portugal lacked
the validity and reliability deemed necessary elsewhere was not a surprise outcome
(see Chapter 4). Eisner’s views (1979) were confirmed in that the old-fashioned
system of tests to assess art and design provided: ‘a biased, and distorted picture of
the reality that we are attempting to understand’. Portuguese art educators and
examinations were strongly embedded in tradition and regulations designed for the
benefit of administrators rather than students. Portuguese art examinations were in
conflict with many essential art and design concepts and even with the general
ideology underpinning the Portuguese educational system. In the name of
‘objectivity’ art and design students were examined in only easily assessable
activities by means of pencil and paper tests. Consequently art and design students
were selected for higher education on the basis of assessments that did not reflect the
real complexity of art and design learning. The examinations focused on a narrow
313
area of the curriculum; studio art was eliminated from the external assessment
mainstream because its assessment was thought to be ‘subjective’ and
problematically fosters a diversity of outcomes. So, instead, students were examined
on less important specialisms such as Materials and Techniques of Plastic Expression
(MTEP) by means of a short, timed drawing test. Studio art’s status in the curriculum
was weak although considered by a great majority of art students and teachers to be
the most important specialism for preparing students for further studies in art and
design and associated fields such as architecture.
The Portuguese system of art examinations tended to deny important rationales and
educational objectives for the arts. The contemporary demand for transferable skills
such as those of communication, information retrieval, problem solving, critical
analysis, self-monitoring and self-assessment did not feature in art examinations.
Students were not given an opportunity in the examination to show how well they
know, understand and can make art (Chapter 4, p.130). Summative assessment and
examinations were used only for gate keeping purposes or as punishment/reward
system, not to empower students or as educational experiences designed to display
student’s knowledge as Wiggins described (1993, pp 54-67).
Debate about new forms of assessment in Portugal was mainly limited to formative
assessment, mostly in science subjects and teacher education (Valadares & Graça,
1998; Boavida & Barreira, 1993; Sá Chaves, 2000). Although they were dissatisfied
with the current system of art examinations, teachers and students generally were not
aware that these traditional forms of external assessment could be replaced.
As Broadfoot (1998, p. 473) argued, tests: '… with their emphasis on scientific
314
rationality are so pervasive in modern societies that they blind us to the potential for
alternative forms of assessment'. This study found evidence for the need to reform.
The great majority of participants in the Portuguese survey (see Chapter 4) proposed
new external assessment instruments to improve the system, which were linked to the
alternative or more authentic methods of assessments discussed in the international
literature (see Chapter 2). Conservatism was not an issue among the participants and
it was found that there was a strong will to change practices.
A finding of this research was that Portuguese art and design examinations did not
offer a clear set of assessment materials and guidance for teachers and students
(Chapter 4, pp. 128-130). Curiously while there was an extraordinary amount of
regulations for the conduct of examinations, assessment criteria were only vaguely
defined. There were no established attainment targets or standards – these were
tacitly used but not publicly discussed. The ritual of the examinations was more
important than the evidence of achievement they produced.
It could be argued that the system embodied Eisner’s connoisseurship model (1985)
in the conviction that teachers and assessors should be qualified artists and teachers
who know how to interpret the visual terms used in the criteria: art critics and
connoisseurs. This may be an important factor but is not the only necessary
condition. As the literature has established valid and reliable assessment does not
depend on single teachers working in isolation; it depends enormously on dialogue
and sharing values. It is true that it is difficult to define criteria and standards in art
and design (Boughton, 1999) but it is erroneous to expect assessors to work
efficiently and effectively without common agreement about the qualities thought to
315
be desirable in the work. Not only are standards and criteria difficult to establish,
it is also very expensive to prepare teachers to interpret them accurately and provide
places and time for dialogue. The research found that in-service opportunities for
training in assessment were not on the agenda of the Portuguese art examination
system, and the negative consequence of this was the examinations poor reliability of
results.
In Portugal the independent judgements of each assessor were trusted by the system
without any moderation procedures to ensure consistency of marking. Teachers and
students were dissatisfied with this situation (Chapter 4, pp. 136-138). The lack of
control of the assessment procedures was another possible cause for the unreliability
of the examination results and eliminated a possible means of providing effective
feedback for teachers and schools.
8.2. Current system of art and design examinations in England (2000-2003)
The current system of art and design examinations in England was found to have
some advantages when compared with the Portuguese system in terms of validity and
reliability. Overall the assessment procedures revealed some degree of increased
reliability, especially as a consequence of the standardisation and moderation
procedures. However an analysis of key stakeholders’ views revealed some
problems. For example, one problem was the ever increasing workload for teachers,
while the other was ‘more philosophical and concerned with the effect of the
examination on teaching and learning’ with negative effects on students and the
curriculum (Steers, 2003, p.25). In the GCE examination students were required to
submit every unit from the start of the course fully realised and of the required
316
standard, potentially exposing them to over-assessment. Students and teachers lacked
time for experimentation and creative risk-taking. According to Steers (2003, p.25)
the examination’s hidden message for teachers and students was: ‘…don’t bother
being creative, avoid risks, play safe, do what is expected’.
The modular system was believed by some to threaten the very nature of art and
design learning which, in theory, should foster a climate of inquiry, risk-taking and
creative opportunities (Hardy, 2002). It was also in conflict with the current
emphasis in English education on the need to develop the creative potential of
students, as expressed for example, in the report of the National Advisory Committee
on Creative and Cultural Education (NACCCE, 1999). In the researcher’s view, at
the Edexcel standardisation meetings the set of exemplar of art and design students’
work displayed was rather ‘safe’ devoid of obvious risk-taking. Besides the
‘exemplars of students’ art works’ were used not for discussion or to negotiate
standards but rather to illustrate the competencies students needed to reveal in order
to meet assessment objectives. These visual examples were important tools for
understanding and interpreting the language of the criteria and assessment matrix,
but there was no ‘debate about what counts as good work in the community’
(Boughton, 1997). Instead it was found that in practice there was little teacher’s
ownership of assessment. It was observed that students’ voices were also neglected
by the assessment system, which had little emphasis on self-assessment in the
instructions and criteria for the examinations.
At the end of 2003 several signs of dissatisfaction with the English examination
system were evident and debate about reform was ongoing. Tattersall, a former head
317
of the Assessment and Qualification Alliance (AQA), reported to in The Guardian
(September 30, 2003) that teachers should be trusted more to assess their own
students. She wrote:
‘ …the equation of reliable assessment with externally

set and marked examinations is neither helpful nor
based on reality. It devalues the skills which external
assessment cannot accommodate, it places pressure on
students. Most of all it undermines teachers’
confidence and commitment’.
Ronnie Lane in the National Society for Education in Art & Design newsletter:
A’N’D (Nº 10, Winter 2003/4) expressed teachers’ concerns as voiced in the
Liverpool Art and Design Teachers Forum meeting (2003) related to increasing
workload for marking, internal standardisation and preparing moderation exhibitions:
‘The forum was of the opinion that art teachers cannot sustain the levels of workload
which is being demanded by the examination boards’ (Lane, 2003, p. 3). At the time
of this writing the Department for Education and Skills was working on new
proposals for opportunity and excellence in education at 14- 19 ages (Tomlinson’s
Working Group on 14-19 Reform), building on proposals outlined in the Green Paper
consultation of February 2002 (www.dfes.gov.uk/14–19) concerning extending
opportunities and raising standards. Their extensive consultation confirmed the need
to create a clearer and more appropriate curriculum and qualifications framework for
the 14–19 phase – ‘…one that develops and stretches all our young people to achieve
their full potential, and prepares them for life and work in the 21st century’
(www.dfes.gov.uk/14–19). By the end of 2003 the Working Group on 14-19 Reform
reached interesting interim conclusions and recommendations emphasising the
importance of matching styles of assessment to styles of learning, using a variety of
assessment methods as well as fostering the idea of assessment for learning,
318
increasing formative assessment and reinforcing the professional judgement
of teachers.
8.3. Impact of the English and Portuguese art and design examinations
From the study of English and Portuguese art examinations it was evident that
assessment of art practice constitutes a particular instance of what Foucault describes
as power-knowledge which invokes processes of surveillance, normalisation,
discipline and regulation of student’s art work. As Atkinson pointed out assessment
logic is reductionist and by implication it omits ‘…those things, those forms of life,
which do not fit’ (Atkinson, 2002, p. 195).
A tension between the curriculum and summative/external assessment was
evident in Portugal. Formative assessment to reinforce learning was emphasised in
curriculum materials but summative assessment and examinations were not viewed
as having learning potential. Therefore it was not surprising to discover that
internal assessment results were inconsistent with external examination results.
This confirmed the hypothesis that external assessment in Portugal could be
interpreted as a simple gate keeping exercise with an extremely negative washback
upon the curriculum. Tests and examinations strongly influenced teaching practice,
teachers taught to the tests and examinations which apparently were intended to
assess students’ memory, understanding of technical drawing conventions, trivial art
historical information and formalist applications of the elements and principles of
design whilst neglecting critical understanding of art making and creativity.
319
In England, art and design examinations, as well as providing a means for selection
and certification, also seemed to be a way of controlling the curriculum. The QCA
and awarding bodies exercise great power over students, teachers and schools
imposing a centralised view of art education as defined in the National Curriculum.
As previously stated, the modular system of examination in England was found to
have negative effects on the curriculum, teachers and students. Not only in terms of
work overload but also because they did not encourage teachers to foster
experimentation and risk taking during their courses. Therefore the need for
accountability in England also generated negative effects upon schools and teachers.
For example allocation of school resources by the government depended on the
grades that their students obtain during the examinations and the position of
individual schools in national ‘league tables’.
8.4. Increasing the validity of Portuguese art and design external assessment
The great majority of participants in the Portuguese survey reported in Chapter 4
(p. 134) suggested external assessment instruments as portfolio work that they
believed might improve the system. However assessing portfolios requires changes
to assessment practices and corresponding changes to the curriculum and pedagogy
(Klenowski, 2002). In the context of this research it was impossible to change the
established curriculum for art and design in Portugal, thus it was necessary to find
strategies to increase the transparency of the assessment procedures; to revisit the
current domains of knowledge, understanding and skills in order to provide a clear
and valid framework for defining new criteria, new tasks and the evidence required
for art examinations.
320
8.5. Increasing the reliability of Portuguese art and design external
assessment results
It was not surprising to discover that the great majority of participants in the
Portuguese survey (p. 139) suggested teacher training and assessment juries as ways
to improve the system. The key reforms identified for increasing the reliability of
results were in-service training, the introduction of standardisation meetings and
moderation procedures.
8.6. The proposed model for art examinations
8.6.1. The underlying model
The proposed framework was designed to accommodate postmodern views of art and
education and was intended to relocate student experience, and contemporary
realities of the visual arts, by encouraging students to ‘…develop a critical awareness
of the visual culture they encounter every day’ (Freedman, 2003, p.11). It was partly
influenced by Swift & Steers (1999) manifesto for the arts in schools and was
intended to reduce tensions between assessment discourses and the heterogeneity of
practice by moving towards a more inclusive approach to teaching and learning in art
education. It was also designed to enhance the production of visual forms of
signification in order to explore students’ experiences and their interests in specific
social issues.
During the trials of the new assessment (Chapters 6 and 7) it was found that in
certain cases students integrated everyday life experiences and reflected on the social
meanings of art through exploration of issues such as racism, sexual orientation,
otherness, etc. They used a variety of methods of investigation to understand social
321
issues including interviewing members of the community and independent searches
for source material. But this was not the case for all students. In school A for
example the project briefs for the portfolios did not ask pupils to explore social
issues and the investigation was limited to examples of modernist art. The role of the
teachers and their beliefs significantly influenced students’ choices. In some cases
teachers were not able to go beyond the formalist art concepts they had been taught
and they did not encourage critical analyses of visual culture. Such teachers, for
example M, AP and C, were nervous about letting students investigate themes they
were not comfortable with themselves, and convinced their students to engage with
‘safer’ and more conventional projects. Short standardisation meetings were
insufficient to encourage such teachers to make profound changes in their attitudes.
The on-line meetings while were intended to provide support for teachers and more
space for dialogue did not completely attain these goals. Using Internet resources for
dialogue about art and design learning and assessment may be more useful in the
future; at the present time it was found that both teachers and their students found it
difficult to use such tools because of their unfamiliarity. Moreover digital
reproductions of students’ studio art works are not appropriate for fully appreciating
and discussing the visual characteristics of portfolios, except perhaps when the
student’s chosen media is digital.
8.6.2. The proposed instrument
After the trials of the proposed instrument (Chapters 6 and 7) it was evident that the
role of the teacher was crucial when students challenged conventions or methods of
work through creative proposals or when they attained unexpected outcomes that
revealed their own intentions, personal experience and awareness of social and visual
322
culture issues. A, I and R acted as facilitators through the substantive
dialogue that took place between the student and the teacher, using teaching methods
based on collaborative learning and partnership (see Appendices XIX, XX and XXI).
This enabled assessment to be integrated into the teaching and learning cycle.
This confirms Klenowski’s (2002, p.115) view that teacher’s role is fundamental in
preparing and managing the learning environment. However, one negative
consequence of the importance of teacher’s role in the assessment was the evidence
that the instrument was strongly biased by the degree of aid afforded by the teacher.
The new assessment instrument was found to respect student’s voice and
personal style as Wiggins (1993) and Ross et al (1993) have claimed it should do.
Student performance was very variable according to their school context, although a
wide range of levels of achievement was obtained in each school with the exception
of School P where the portfolio experience verged on complete failure for the
reasons described in Chapter 7, for example the lack of student motivation and
unfamiliarity with the format. In the new assessment instrument students were given
a considerable degree of decision-making: selection of works for inclusion in
portfolios and the reasons for inclusion were negotiated with their teachers.
However, for some students, such a level of autonomy was difficult, perhaps because
they did not clearly understand the aims and assessment criteria. While they were
aware that they were expected to be thoughtful about the selection of work for
inclusion, this presented problems because they had not developed a capacity for
reflection in previous years. From the research it was also apparent that some
students found self-evaluation difficult. In the new instrument students need to take
responsibility for self-directed learning and for developing and maintaining their
323
portfolios. The student interviews confirmed that this requirement was understood
but nevertheless they were not always able to evaluate their own progress and
achievement. Interactive learning, tutorials, interviews, one-to-one sessions,
discussion groups, peer critique, were all found to be helpful in this regard.
But developing a capacity to select appropriate evidence of attainment of particular
competencies takes time.
According to the teachers involved in the main trial some students succeeded in
presenting portfolios of exceptional quality even in cases where resources were poor
and teacher aid was not forthcoming.
The students’ commitment and motivation were evident and contributed to their
success. For example Joana fought to pursue her own goals despite a lack of the
resources to realise them and the teacher’s disagreement with the chosen theme
Apocalypses and sources‘(see Appendix XX, p.270). According to their teachers
Susana presented excellent investigative studies despite the lack of available
information about ‘capoeira’ and Ruben presented very good developmental studies
using recycled materials. In the case of these students the opportunity to make a
project relevant to their own lives appeared to give them the strength to work hard
and seek out aid elsewhere. Students felt that with the new assessment instrument
they gained more ownership of the learning and assessment process. The conclusion
was drawn that students need to have decision-making powers about themes for
project briefs and about which work is chosen for inclusion in portfolios if they are to
feel ownership of, and real commitment to, the art making. It increases the content
validity of the assessment instrument and presents more opportunities for student
motivation and independent learning.
324
Unlike the current examination in Portugal, which limits the range of the curriculum,
the new assessment instrument enlarged the range of the subject and raised the status
of studio art in some students’ perceptions. It required more evidence and more
committed work from them and this was only possible because they came to believe
that studio art had some direct relevance for their lives. Should the proposed
assessment instrument ever be implemented on a national scale, it might enhance
the status of studio art in the curriculum.
The breadth and flexibility of the assessment instrument accommodated a variety of
different visions of art and design processes and making which challenged the old
habit of setting teacher-directed prescriptive tasks. It does, however, require art
teachers who are able to help students develop independent critical and self-reflective
skills. It also demands dialogue between teachers and students, mutual understanding
of aims and intentions and negotiation of tasks and approaches. From the trials it was
concluded that the new assessment instrument provided useful feedback for students
and teachers. Assessment was seen in Wolf ‘s terms as 'an episode of learning'
(Wolf et al, 1991, p.183) providing qualitative information in the course of portfolio
development (Wiggins, 1993, p.195) in the form of a constructive dialogue between
students and teachers.
Problems may arise when teachers do not understand the implications of such
changes, as happened with teacher E (School K) who did not understand
Joana’s visual work in her portfolio because she did not clearly realise that the
assessment had expanded art practice to the world of visual culture (see Appendices
325
XX, p.264 and XXVI, p.319). E did not recognise that the teachers needed to acquire
a deep respect for the students’ artwork and this challenged the limits of her
understanding of art practice. It seems
evident therefore that before implementing the proposed assessment instrument
it would be necessary to change teaching and learning practices and to revise
preconceptions of what students in the art and design disciplines should learn and be
taught. Teachers need to reconceptualise their pedagogy before they can integrate the
new instrument into their everyday routines and classroom teaching.
At the same time, the breadth and flexibility of the assessment instrument presented
problems in terms of uniformity of application and therefore its potential reliability.
Students and teachers in the trials recognised that ‘taking risks’ was positively
encouraged, and diverse creative outcomes and approaches were expected. In the
current Portuguese art examination system tests were easily administered under the
same conditions for all schools and all students. The only resources required were
pencil and paper and the short time for the test allows effective standardisation of
assessment conditions. The new instrument, which aims to provide a more valid
assessment, presented problems related to diversity of outcomes and the need for
increased resources. In the English art and design examination system this problem
was generally less evident because the work submitted for examination was often
very similar in character, may be because teachers and students felt safe within
orthodoxies established by the school art tradition and examination rubrics.
Perhaps they felt that they could not risk different approaches because the results of
examinations had major implications for public perceptions of schools and teachers,
as well as students.
326
It is recognised that if the new assessment instrument were to be used over a period
of several years, there is a danger of losing flexibility. Just as that which appears to
have happened in the English art and design examinations, a new orthodoxy could
develop around sets of exemplars displayed during standardisation meetings and
teachers might encourage students to present very similar portfolios in order to
achieve good results. One possible way to prevent such an occurrence is to stress the
rationales for the assessment instrument and emphasise the importance of diverse
creative outcomes during in-service teacher training sessions. And students need to
understand that they will be rewarded for innovative and personal responses
evidenced through a wide variety of outcomes.
It is widely agreed that creativity needs to be encouraged in the learning community;
there is a need for a wide variety of visual experience and debate within assessment
communities. As Freedman (2003, p. 24) pointed out, surprise should be valued,
educators ‘…should continually hope for surprising crossings of aesthetic levels in
the creation of knowledge, and for the unexpected outcome that surpasses planned
objectives’. Teachers should take into account the most up-to-date student and
professional art being produced at any given time in their debates about what should
be valued, and such debates ‘…must include continual challenge so that student art
[that] goes beyond the hopes stated in instructional objectives, is rewarded’
(Freedman, 2003, p. 168).
327
8.6.3. The assessment procedures
Using the new assessment procedures required considerable time for training and for
dialogue between teachers. The increased burden on teachers needs to be
acknowledged – the quantity of work to be assessed, the standardisation meetings
and reports to be written – all contributed to teachers’ workload. This was only
possible in the trials because the teachers who participated in them were convinced
that it was important for their own professional development and for the fairness of
art and design students’ assessment. It would be unrealistic to expect all Portuguese
art and design teachers to engage in the new assessment procedures with the same
commitment as the volunteers in the trials without considerable preparation and in-
service training. However this experience showed that it is possible to change old
established and unreliable habits of arts assessment in Portugal if the reasons for
change are properly understood, and change is understood to be in the best interests
of the subject, teachers and students.
The development of communities of assessors for inter-rater reliability purposes can
be used for positive professional development opportunities (Klenowski, 2000;
Schönau, 1996). The concept of a ‘community of judges’ (Boughton, 1997) and the
example of the English moderation procedures was used to design the assessment
procedures. It was clearly established that reliability of results achieved in the trials
increased in comparison with the current Portuguese system of a single assessor (see
Chapter 7, p. 281). As noted by several authors (Boughton, 1997; Beattie, 1997;
Steers, 1996; Blaikie, 1996) assessors can reach a considerable degree of consensus
through the use of standardisation and moderation procedures. The new procedure
was found to be more demanding than the current assessment system in Portugal, but
328
it was also found to present real benefits in terms of validity, reliability and teachers’
professional development. One feature of the experiment’s success was that teachers
had enhanced confidence in their powers to make assessment decisions. The new
procedures did not undermine teachers’ professionalism. The current Portuguese
reliance on teachers’ connoisseurship was maintained but the shift was made to
dependence on a community of assessors achieving consensus, thus helping to
eliminate some of the idiosyncrasies of the present system.
8.6. 4. Impact
From the research (see Chapter 7, pp. 275-279) it appeared that the effects of the new
framework for assessment were significant for both the teachers and the curriculum.
A principal outcome of the new assessment instrument could be an increase in the
status of studio activity and the raising of art and design educational standards by
redefining rationales, domains of learning, understanding and skills in art and design.
The teachers in the pilot and the trial reported that they changed the way that
they thought of their teaching, classroom practice and assessment methods.
These changes happened because the teachers who were already competent
practitioners reflected on the proposed rationales for the experience and decided,
with the support of their peers, that they had to make changes, because they agreed
with the proposals and believed that their students had the right to be involved in and
be responsible for their own learning and assessment.
329
8.6.5. Strengths and Weaknesses
Throughout this chapter strengths and weaknesses of a new framework for external
assessment in the arts have been discussed and these are summarised as follows:
8.6.5.1. Weaknesses
a. The assessment instrument (portfolio) is extremely demanding in terms of
teachers and students’ workload. Portfolios demand a great amount of
selected evidence to be submitted and marked.
b. The assessment instrument has various potential sources of bias.
c. The work requires considerable time and resources that are not always
available in schools for teachers and students.
d. It requires constant interaction and collaboration between teachers and
students, and some students may receive different degrees of teacher aid.
e. The assessment criteria require skills that might or might not have been
fostered during the previous years of the course; problems of unfamiliarity
might affect students whose critical reflective capacities have not been
developed in the course prior to assessment using the new instrument.
f. The assessment instrument may not be applied evenly for all students and
schools and its inherent flexibility might be a factor in reducing uniformity
of implementation.
g. Standardisation and moderation procedures are not easy to develop and
their success is dependent on expensive in-service teacher training and
professional debate among the community of teachers and examiners.
330
8.6.5.2. Strengths
a. The assessment instrument provides valid and authentic tasks related to the
art and design curriculum.
b. The assessment instrument enhances learning and student motivation
enabling their individual voices to be heard.
c. The assessment instrument integrates a wide range of methods of inquiry;
media, and domains of art and design, allowing students to develop personal
projects in which they can personalise social issues and reveal important
cognitive and metacognitive skills.
d. The assessment instrument enables collaborative learning; peer group and
self-assessment.
e. The assessment procedures enable a significant degree of improved
reliability of results through common interpretations of criteria and standards
between teachers and assessors, as compared to the present Portuguese art
and design examinations.
8.7. Wider implications
The approach to designing assessment instruments for art and design presented in
Chapter 6 was influenced by Weir’s (2004) framework for validating tests. It was
adapted for art and design contexts and might also be of use for developing similar
exercises in other countries. However, it is recognised that this present proposal for
a new system of art examinations needs debate and further revision. The conceptual
framework developed in this thesis was specifically designed for the Portuguese
context. Nevertheless some of the research findings could be of use for others
elsewhere as these suggested that: (1) it is possible to make worthwhile ‘bottom-up’
331
changes in art and design assessment, (2) dialogue, collaborative work and some
degree of teachers’ ownership of the assessment is possible in external assessment, at
least with small-scale groups. Each of the negative and positive findings of this study
raises important questions for consideration when designing assessment instruments
based on portfolios or coursework. For example:
a. Portfolio assessment requires change to traditional
assessment practices and corresponding changes to curriculum and pedagogy.
b. Developing portfolios demands a culture of self-
reflection. Before using portfolios as assessment instruments teachers need to
reconceptualise their pedagogy to integrate portfolio processes into their
everyday routines and classroom teaching.
c. In portfolio assessment the teacher’s role is
fundamental in preparing
and managing the learning environment. For portfolio assessment teachers
have to be able to help students develop independent critical and self-
reflective skills.
d. Developing portfolios demands dialogue mutual
understanding of aims
and intentions, and negotiation of tasks and approaches between teachers
and students.
e. Assessment procedures in the arts require a wide
variety of visual experience and debate within the art and design community.
The development of communities of assessors for inter-rater reliability
purposes can be used as positive professional development opportunities
332
These issues may be useful as a staring point to generate further discussion and
experimentation for those who are interested in developing portfolios and assessment
by juries. As Eisner (2003, p.55) pointed out ‘…our resolutions will generate other
situations that will need further resolution’. It would have been unrealistic to try to
resolve all the problems of external assessment in art and design in this research.
The idea established in the literature, that using alternative assessments such as
portfolios and assessments procedures involving teacher in-service training and
moderation to provide some degree of validity and reliability of external assessment
in art was supported by the empirical studies. It is hoped that the conclusions of this
research might be of use for the Portuguese government, art teachers, art education
researchers, and researchers into evaluation and assessment everywhere who are
searching for more valid and reliable assessment instruments and procedures for art
and design.
8.8. Further research
This research study was not intended to resolve all the problems of validity and
reliability in external art assessment. And the problematic issues identified in
Chapters 1 and 2 were not fully resolved. The identified strengths and weaknesses of
the English system of art and design examinations proved to be a useful starting
point for designing a new framework for external assessment in the subject in
Portugal. However biases identified in the English examinations were also found in
the new model for external assessment proposed during the research. As Beattie
(1995, pp. 52-53) pointed out, the emphasis on written descriptions of learning
processes in reflective journals may be unfair to disadvantaged students, minorities,
some boys and non-standard language speakers. To prevent bias it will be necessary
333
to reconsider issues concerning teacher and student ownership of the assessment,
school resources and opportunities for learning.
The English external assessment system was able to combine formative and
summative assessment (see Chapter 5) and the framework developed for a new
system of assessment in Portugal also combined them. Nevertheless questions remain
about whether these assessment functions should be separate or combined.
Contemporary trends are to foster ‘assessment for learning’, assessment for which
the first priority is to serve the purpose of promoting students’ learning (Black et al,
2003) and a combined approach to assessment (Eisner, 2002). However Gipps and
Stobart (1993) claimed that different purposes require different models of assessment
and whereas they admit that it might be possible to design one assessment system
that measures performance for accountability and selection purposes, whilst at the
same time supporting the teaching and learning process, ‘… no one has yet done so’.
However this statement might be applicable to assessment of portfolios. Klenowski
(2002, p. 10), for example believes that ‘ …a portfolio of work can fulfil the full
range of various assessment purposes: accountability, summative assessment,
certification, selection, promotion, appraisal and formative assessment in support of
teaching and learning processes’. The research did not fully confirm Klenowski’s
views, because it would be necessary to study the long-term consequences of the new
assessment instrument to do this. Nevertheless the issue of combined formative and
summative assessment needs to be addressed in future research.
Other aspects of assessment that could be explored more fully in the future are the
relationship between curriculum and examinations and the development of other
334
alternative methods of assessment. Assessment systems, with their bias towards
easily testable achievements rather than depth or interconnectedness of
understanding often reinforce the fragmentation embedded in the curriculum (White,
2003, p. 183). The disjointed nature of the Portuguese art and design curriculum
might be one reason for the relative invalidity of the assessment system, because
contemporary art and design work often links several disciplines or specialisms.
However it is hard to believe that it is the only reason and the question remains
whether the fragmentation of the curriculum is a consequence of an educational
system based on a post-nineteenth century tradition of compartmentalised and
‘objective’ examination methods or the primary cause of it.
The alternative form of assessment described in Chapter 6 achieved greater
validity but whether or not more equality and fairness was obtained is questionable.
Some students were disadvantaged because of gender bias and access to resources
and teacher aid. These problems may be related to the form of examinations in
general, which on a large scale, can be understood as power structures representing
‘…the desire to discipline an irrational social world in order to establish rationality
and efficiency’ (Broadfoot, 2000, p. ix). As Foucault (1977, 183) points out
examinations are potentially unfair, because they have ‘…the constraint of a
conformity that must be achieved’. Even if the required knowledge, skills,
competencies and criteria are established through dialogue and agreement, allowing
some degree of teachers’ and learners’ ownership, assessment still denies diversity
and plurality of approach for some groups of students and schools. More fairness
might be obtained by using negotiated assessment (Ross et al, 1993). Forms of
negotiated assessment were not used in this research; however the possibility of
335
negotiated assessment as an element in external assessment could be explored in
further research. Other methods and forms of assessment need to be investigated, as
Rayment (1999, p.193) suggests, it is necessary ‘…to identify methods of providing
valid and reliable evaluation procedures which can perform both formative and
summative functions’.
8.9. Reflections on the research methodology
The research method was constantly refined and readjusted during the period of the
study. Its hybridity provided flexibility for readjustments and allowed a balance
between statistically significant and meaningful outcomes. The mixture of qualitative
and quantitative methods provided considerable data for analysis (see for example
Appendices VIII, IX, X, XVII, XVIII). Some of the evidence proved superficial and
unnecessary detailed – and analysing it was consequently time consuming. On the
other hand the questionnaires, providing statistical information, revealed worthwhile
data with which to understand current examinations and that directly informed the
design of the new assessment instrument. However interviews, documents and
observation generally were more helpful for these purposes. Furthermore the
response rate to the questionnaires on the current Portuguese art examinations was
relatively poor (44 teachers and 104 students). Every effort was made to obtain
questionnaire responses during conferences, teachers’ meetings, and through
correspondence. But conducting a successful national survey is difficult when a
single researcher undertakes research with no formal authority consequently an
indicative sample is the best that can be expected.
336
Overall the research method was ambitious in trying to access views of participants
in two countries. However the programme as planned was followed thanks to the
generous help of various individuals and organisations. The fact that the research was
carried out by one single researcher limited the number of case studies and
consequently the possible validity of generalised results. The full potential of
methods of analysis such as multi-faceted Rasch measurement could not be realised
because of the restricted number of participants. On reflection, therefore, the research
ideally needed more resources and participants than as initially planned.
8.10. Recommendations for assessment stakeholders
8.10.1. Students
In the trials of portfolio assessment the students were expected to think, to assess
themselves, to accept challenging expectations and be collaborative learners.
From the sample included in the research it was apparent that the impact of the new
assessment instrument based on portfolios increased students in-depth study, active
and independent learning, awareness of their own learning strategies, motivation and
interest in their own achievements and performance. These benefits are compatible
with the findings of the Arts Propel project that also used portfolios as alternative
assessment instruments (Gardner, 1996). Inevitably this demanded increased work
and students who have grown used to being tacit observers might resent having to
work harder. By comparison the current Portuguese tests do not require so much
work and evidence to be submitted. Nevertheless students felt there were benefits in
the new assessment system; it seemed more interesting and relevant. The new
instrument required them to take individual responsibility for developing a portfolio.
337
As Black et al (2003, p. 59) stress: ‘Learning cannot be done for the student; it has
to be done by the student’.
However in the light of the problems detected during the research concerning
students’ lack of self-monitoring skills, it is evident that if these modes of
assessment are to be used, then the coursework must be planned carefully in advance.
Planning should take into account available resources and the assessment
requirements and students should be encouraged to follow work plans in order to
keep to deadlines.
As Klenowski (2002) pointed out, it is not only teachers who need to learn about the
pedagogical implications of using portfolios for assessment and learning; students
need specific teaching and support to develop the necessary cognitive processes. It
was evident from the research that students need skills in independent study, group
work, self-reflection, self-assessment, self-evaluation and questioning. Some students
were not prepared to develop their portfolios because they lacked background skills
in critical inquiry, in defining problems, self-discipline and self-evaluation. The
teachers saw that the source of the problem was that their students lacked the
necessary skills both to judge specific problems in understanding and to set realistic
targets to remedy problems within reasonable time frames. However, where teachers
created classroom environments in which students worked together to understand
teacher’s comments about their work, then peer and self-assessment provided the
training that students needed to judge their own learning and to begin to take action
to improve their performance. It was evident that changes are needed to the role of
students, but the period during which the role of the student’s changes needs to be
338
handled carefully and the students have to be supported as they learn to become
active, responsible learners. This requires time.
8.10.2 Teachers and Assessors
It was evident that to increase its validity the new assessment requires a long period
of time for students to become familiar with a culture of self-reflection that is only
established through long years of education. It also became increasingly clear, as
Black et al argue (Black et al, 2003, p. 59), that the teachers also needed to train their
students to take responsibility for their own learning and assessment. The teacher’s
role is to ‘scaffold’ this process – that is, to provide a framework of appropriate
targets and to give support throughout the task of attaining them. Implementation of
changes in classroom assessment would call for profound changes in both teachers’
perceptions of their own role in relation to their students and in their classroom
practice. Knowledge acquisition must be conceptualised as an active process rather
than passive. Cognitive processes used by students such as self-regulation and self-
monitoring are fundamental to understanding individual development and
achievement and have to be fostered by teachers. This kind of assessment therefore
has to be viewed as an integral part of the curriculum combining summative and
formative forms. Before it can become widespread it would be necessary for
teachers to change prescriptive modes of teaching and probably to revise
fundamentally their rationales for art and design education.
8.10.3. Schools
If the new assessment instrument is to be further developed with a larger sample then
certain problems need to be acknowledged. Choices between validity and uniformity
339
have to be made bearing in mind that less uniformity will not necessarily reduce the
reliability of results but will increase the potential for bias. These problems will have
to be solved before full implementation of the instrument by providing greater and
more equal resources for schools and by promoting pedagogy that emphasise
cognitive growth and inquiry methods. The learning milieu is a fundamental resource
and schools must recognise its importance.
Another important consideration for schools is the allocation of time for teachers in-
service training, meetings and marking. Teachers need to be motivated and to
understand the justification if they are to be expected to take on the burdens of
change. They need time to plan, discuss and mark students’ work, they also need
opportunities for in-service training, peer-reviews or meetings in order to create a
community of assessors.
8.10.4. Universities
The research found that the new assessment instrument had clear advantages in terms
of preparation of students for further studies in art and design because the evidence
submitted required a broader range of knowledge, understanding and skills in art and
design (see p.275). In Portugal the majority of art and design faculties and schools
select their students on the basis of the current examination system. The proposed
new framework allows for wider and more appropriate evidence of students’ artistic
achievement in terms of working processes and quality of outcomes, which may be
of more value for accurate selection of art and design students in higher education.
8.10.5. Government
340
It is fully recognised that the proposed framework and assessment instrument is less
than perfect but it does offer some clear advantages over existing practices in
Portugal. It would need further revision, development and re-trials to be ready for
implementation on a regional or a national scale. But, more importantly and as a first
step, rationales, concepts and models of art and design education need to be radically
re-thought in order to align Portuguese art and design education at secondary level
with the needs of contemporary society. It is also necessary to revise present
concepts of examinations and external assessment so as to provide opportunities for
the development of much more valid assessment instruments.
In conclusion, it was a key finding of this research that an instrument based on
portfolios, which includes a range of learning domains and skills in art education,
offers more validity than the current Portuguese examinations and can provide
positive outcomes for teachers and students. In-service teacher training and
moderation procedures combined to increase the reliability of the assessment results
in the small-scale trials reported here. However the instrument and assessment
procedures offered less utility than the current examinations, requiring more
resources and more work from teachers and students. Nevertheless the advantages of
the proposed framework probably outweigh the disadvantages. If the Portuguese
government is minded to improve the system of external assessment in art and
design, undoubtedly, this will require a long overdue, comprehensive and radical
reform of the art and design curriculum and examinations.
341
Glossary
Ability
A mental trait or the capacity or power to do something (ALTE, 1998).
Achievement
Ability to demonstrate accomplishment of some outcome for which learning
experiences were designed (Armstrong, 1994).
Achievement levels
Behaviours along a continuum that represent degrees of attainment of a criterion
(Armstrong, 1994).
Alternative Assessment
These are ways of measuring student achievement using methodologies other than
pencil and paper tests e.g. observation, checklists, portfolios, interview etc.
Art and design works

Art and design works are envisaged as a purposeful activity, the capacity to bring
into being something meaningful, useful or valuable (Minkin, 1998) which could be
original or a re-formulation of a concept or artwork. The outcomes should be of
value in relation to the pre-established intentions and in relation to particular
audience or viewer.
Art and design

Art and design is taken in England to mean the National Curriculum subject so
defined in English legislation for general education, examined by the recognised
awarding bodies under this associated specifications provided for students in state
maintained and independent schools. The subject encompasses a range of practical
activities such as drawing, painting, printmaking, graphics, and crafts including
textiles, ceramics, metalwork and Information & Communication Technology( ICT)
in which the emphasis is on developing creativity, understood as ‘ innovative
application of knowledge and skills ( NACCCE, 1999, p.30). It may also include art,
craft and design history and critical and contextual studies in which the emphasis is
on developing critical thinking and knowledge and understanding of art and design
theory and practice.
Art Education
The process of learning to understand, make and adequately respond to the
complexities of the arts.
Art examinations
In the context of this research the term ‘art examinations’ are used to refer
examinations covering visual art subjects.
342
Art criticism
Describing and evaluating the media, processes and meanings of works of visual art,
and making comparative judgements (NAEA, 1994).
Assess
To analyse and determine the nature and quality of achievement through means
appropriate to the subject (NAEA, 1994).
Assessment
The process of obtaining information that is used to give feedback about students’
progress, strengths and weaknesses i.e. make educational decisions about students
(Weir, 2004). The process of judging student behaviour or product in terms of some
criteria (Clark,1975).
Assessor
Someone who assigns a score to a candidate’s performance in a test, using subjective
judgement to do so. Assessors are normally qualified in the relevant field, and are
required to undergo a process of training and standardisation. Also referred to as
examiner, marker or rater (ALTE, 1998).
Assessment objectives
The Assessment objectives are means by which the formal elements, processes and
practices can be defined and assessed to ensure that a coherent and meaningful
course has been followed.
Attainment targets
Description of expected standards. A kind of performance matrix where different
levels of achievement to be expected in students' works are described based on a set
of agreed criteria. Those attainment targets act as descriptors of performance,
knowledge skills and understanding.
Authentic assessment
A characteristic of assessments that have a high degree of similarity to tasks
performed in the real world, a view that assessment should include task types which
resemble real life activities as closely as possible.
Bias
A test or item can be considered to be biased if one particular section of the
candidate population is advantaged or disadvantaged by some feature of the test or
item which is not relevant to what is being measured. Sources of bias may be
connected with gender, age, culture, etc. (ALTE, 1998)
Candidate
A test/examination taker. Also referred to as examinee.
Cognitive model
343
A theory concerning the way in which a person’s knowledge, in the sense of both
concepts and processes, is structured. This is important in examinations because
such a theory may have an effect on choice of instrument or examination content.
Competence
The knowledge or ability to do something. (ALTE, 1998)
Connoisseurship
The art of appreciation, the ability to define the quality of an object or environment
(Armstrong ,1994).
Construct
Can be viewed as definitions of abilities that permit us to state specific hypothesis
about how these abilities are or are not related to other abilities, and about the
relationship between these abilities and observed behaviour.
Context
A set of interrelated conditions that influence and give meaning to the development
and reception of thoughts, ideas or concepts and that define specific cultures and eras
(NAEA, 1994).
Criterion
A distinguishing property or characteristic of anything, by which its quality can be
judged or estimated, or by which a decision or classification can be made
(Sadler,1987).A behaviour, characteristic, or quality of a product or performance
about which some judgement is made ( Clark, 1975).
Criterion -referenced assessment

Means that achievement is being assessed in reference to some student outcome that
can be expected as a result of an educational experience (Armstrong, 1994).
Curricular materials
Documents as orientations to curriculum, syllabus; list of objectives, list of criteria
and list of grade descriptors.
Descriptors
A brief description accompanying a band on a rating scale, which summarises the
degree of proficiency or type of performance expected for a candidate to achieve that
particular score (ALTE,1998).
Discrimination
The power of an item to discriminate between weaker and stronger candidates.
Various indices of discrimination are used. Some (e.g. point-biserial, biserial) are
based on a correlation between the score on the item and a criterion, such as total
score on the test or some external measure of proficiency. Others are based on the
difference in the item’s difficulty for low and high ability groups (ALTE 1998).
344
Evaluation
A judgement of merit based on various measurements, notable events and subjective
impressions (Armstrong, 1994).
Examinations
Measure the attainment of a candidate at the end of a course of study. They may be
external or school based, or a combination of both and, usually conducted in a formal
manner. Examinations may include a variety of instruments such as practical tests,
written essays, coursework and portfolios.
Examiner
The person who is responsible for judging a candidate’s performance in a test or
examination.
Expressiveness
Expressive features. Elements evoking affects such as joy, sadness, or anger (NAEA,
1994).
Expression
A process of conveying ideas, feelings, and meanings through selective use of the
communicative possibilities of the visual arts (NAEA, 1994).
Feedback
Comments of people involved in the testing process (examinees, administrators, etc.)
which provide a basis for evaluating that process. Feedback may be gathered
informally, or using specially-designed questionnaires (ALTE, 1998).
Formalism
Conception of aesthetics developed in the late 19th and early 20th centuries it focus on
the analyses of physical and perceptual characteristics of art objects and involves the
reduction of form to elements such as line, shape, and colour and principles of design
such as rhythm, balance, and unity. (Freedman, 2003, p. 27) .
Formative assessment
Testing which takes place during, rather than at the end of, a course or programme of
instruction. The results may enable the teacher to give remedial help at an early
stage, or change the emphasis of a course if required. Results may also help a
student to identify and focus on areas of weakness (ALTE, 1998).
Formative Evaluation
Ongoing evaluation of a process, which allows for that process to be adapted and
improved as it continues. It can refer to a programme of instruction (ALTE, 1998).
Global assessment
A method of scoring. The assessor gives a single mark according to the general
impression made by the work or performance produced, rather than by breaking it
down into a number of marks.
Grade
345
A score may be reported to the candidate as a grade, for example on a scale of A to
E, where A is the highest grade available, B is a good pass, C a pass and D and E are
failing grades (ALTE, 1998).
Grading
A process of assigning a symbol for some judgement of quality relative to some
criterion.
Holistic
Holistic, whether describing learning or assessment, refers to the conscious
awareness of the way parts interact and influence, looking globally rather than
analytically (Armstrong, 1994).
Holistic Scoring
Scoring based upon an overall impression (as opposed to traditional test scoring
which counts up specific errors and subtracts points on the basis of them). In holistic
scoring the rater matches his or her overall impression to the point scale to see how
the portfolio, product or performance should be scored.
Impact
The effects created by an assessment instrument, both in terms of influence on
general educational processes, and in terms of the individuals who are affected by
test results.
Instrument of assessment
Assessment tool. A method of gathering data about student performance
(Armstrong, 1994).
Instructions
General directions given to candidates, for example on the front page of the answer
paper or booklet, giving information about such things as how long the test lasts,
how many tasks to attempt and where to record their responses (ALTE 1998).
Internal consistency
A feature of a test, represented by the degree to which candidates’ scores on the
individual items in a test are consistent with their total score. Estimates of internal
consistency can be used as indices of test reliability ; various indices can be
computed, for example, KR-20, Cronbach alpha (ALTE, 1998).
Inter-rater reliability
An estimate of test reliability based on the degree to which different assessors agree
in their assessment of candidates’ performance (ALTE, 1998).
Intra-rater agreement
The degree of agreement between two assessments of the same sample of
performance made at different times by the same assessor (ALTE 1998).
Level
346
The degree of proficiency required for a student to be in a certain class or represented
by a particular test is often referred to in terms of a series of levels. These are
commonly given names such as ‘elementary’, ‘intermediate’, ‘ advanced’, etc.
(ALTE, 1998).
Mark
A symbolic number which represents the achievement level of the students' work.
Marker
Someone who assigns a score to a candidate’s responses to a written test. This may
involve the use of expert judgement, or, in the case of a clerical marker, the relatively
unskilled application of a mark scheme (ALTE, 1998).
Marking
Assigning a mark to a candidate’s responses to a test. This may involve professional
judgement, or the application of a mark scheme which lists all acceptable responses
(ALTE, 1998).
Measurement
Generally, the process of finding the amount of something by comparison with a
fixed unit, e.g. using a ruler to measure length. In the social sciences, measurement
often refers to the quantification of characteristics of persons (ALTE, 1998).
Method of assessment
Instruments and procedures used to measure student performance in meeting the
standards for a learning outcome. These assessments must relate to a learning
outcome, identify a particular kind of evidence to be evaluated, define tasks that
elicit that evidence and describe systematic scoring procedures.
Moderation
Moderation is the means by which the marks of different teachers in different
centres/schools are equated with one another.
Norm-referenced
If a test is norm-referenced it aims to place candidates on some sort of ordered scale,
so that they can be compared with one another (Alderson et al 1995).
Objective testing
Objective testing refers to items such as multiple-choice, true-false, and error-
recognition, amongst others, where the candidate is required to produce a response
which can be marked as either ‘correct’ or ‘incorrect’. In objective marking the
examiner compares the candidate’s response to the response or range of responses
that the item writer has determined is correct (Alderson et al 1995).
Perception
Visual and sensory awareness, discrimination, and integration of impressions,
conditions, and relationships with regard to objects, images and feelings (NAEA,
1994).
347
Portfolio
A portfolio is defined as a purposeful collection of selected student work, exhibiting
effort, progress and achievements in more than one area, and including student
participation in selecting the contents and self-reflection (Lindström, 1998).
Practicality
A practical assessment instrument is easy to administer and to score without
wasting too much time or effort.
Question
Sometimes used to refer to a task or item (ALTE, 1998).
Rating scale
A scale consisting of several ranked categories used for making subjective
judgements such as mark schemes or assessment matrices. Rating scales for
assessing performance are typically accompanied by grade descriptors which make
their interpretation clear.
Reliability
Reliability is the extent to which test scores are consistent: if candidates took the test
again tomorrow after taking it today, would they get the same result (assuming no
change in their ability). There are three aspects to reliability: the circumstance in
which the test is taken, the way in which it is marked and the uniformity of the
assessment it makes. There are several ways of measuring the reliability of
‘objective’ tests (test-retest, parallel form, split-half, KR 20, KR21 etc.) The
reliability of subjective tests is measured by calculating the reliability of the marking.
This is done in several ways (inter-rater reliability, intra-rater reliability etc.)
Reliability coefficient
A measure of reliability, in the range 0 to 1. Reliability estimates can be based on
repeated administrations of a test (which should produce similar results) or where
this is not practicable, on some form of internal consistency measure. Sometimes
known as reliability index (ALTE, 1998).
Rubric
The instructions given to candidates to guide their responses to a particular test task
(ALTE, 1998).
Scale
A set of number or categories for measuring something. Four types of measurement
scale are distinguished – nominal, ordinal, interval and ratio (ALTE, 1998).
Score
The result obtained by a student on an assessment, expressed as a number.
Self-Assessment
348
Students reflect about their own abilities and performance, related to specified
content and skills and related to their effectiveness as learners, using specific
performance criteria, assessment standards, and personal goal setting.
Skill
Ability to do something, expertness.
Specifications
Specifications provide the official statement about the method of assessment and
how it assess what it intends to assess.
Standard: 'a definite level of excellence or attainment, or definite degree of any

quality viewed as a prescribed object of endeavour or as the recognised measure of
what is adequate for some purpose, so established by authority, custom, or
consensus' (Sadler, 1987).
Standards in the arts

Challenging, but attainable visions of art student outcomes (i.e. what students should
know and be able to do and appreciate, resulting from their art education experience).
Art standards for excellence can motivate change in art education programs,
curriculum, instruction and assessment. National standards in art education can guide
educational decisions about art programs and fill a gap in the large picture of art in
education (Armstrong, 1994).
Standardisation
The process of ensuring that assessors adhere to an agreed procedure and apply
rating scales in an appropriate way (ALTE, 1998).
Summative assessment
Testing which takes place at the end of a course or programme of instruction (ALTE,
1998)
Syllabus
A list of headings outlining a curriculum (Steers, 1979). A detailed document which
lists all the areas covered in a particular programme of study, and the order in which
content is presented (ALTE, 1998).
Task
A combination of rubric, input and response.
Techniques
Specific methods or approaches used in a larger process; for example, graduation of
value or hue in painting or conveying linear perspective through overlapping,
shading or varying size or colour (NAEA, 1994).
Test
Assessment instrument based on objectivity, usually pencil and paper tasks requiring
short answers through for example items like matching, multiple choice, alternative
items or essays.
349
Testing
Testing is one procedure used to obtain data for purposes of forming descriptions or
judgments about one or more human behaviours. Tests can be used to obtain
summative assessments and in external examinations, '…the typical testing situation
puts students in an artificial situation in the sense that their performance will be
appraised and rewarded accordingly' (Eisner, 1972, p. 205).
Traditional assessment instruments

Objective tests, short essays.
Validity
The degree to which an assessment instrument actually measures what it is supposed
to measure. Validity occurs when assessment procedure measures the performance
described in the objective, that it claims to measure (Armstrong, 1994). Validity is
the extent to which a test measures what it is intended to measure: it relates to the
uses made of test scores and the ways in which test scores are interpreted, and it
therefore always relative to test purpose (Alderson et al 1995).
Validation
The process of gathering evidence to support the inferences made form assessment
results. The extent to which assessment scores enable inferences to be made which
are appropriate meaningful and useful, given the purpose of the assessment
instrument.. Different aspects of validity are identified, such as content, criterion and
construct validity; these provide different kinds of evidence for judging the overall
validity of the assessment for a given purpose.
Value-added measures
Value-added measures measure the progress made by individual pupils in an
institution rather than raw outcome scores.
Visual arts education

The process of learning to understand, make and adequately respond to the
complexities of visual arts
Weighting
The assignment of a different number of maximum points to a item, task or
component in order to change its relative contribution in relation to other parts of the
same assessment instrument.
350
Bibliography
ALEXANDER, P.A. (1992). Domain Knowledge: Evolving themes and emergent

concerns. Educational Psychologist, 27(1), 33-51.
ALEXANDER, R; BROADFOOT, P.; PHILIPS, D. (Eds.) (1999). Learning from

Comparing: New directions in Comparative Educational Research. Oxford:
Symposium Books.
ALDERSON, J.C., CLAPHAM, C., & WALL, D. (1995). Language test

construction and evaluation. Cambridge, UK: Cambridge University Press.
ALDRICH; R. & WHITE, J. (1998). The National Curriculum beyond 2000:

the QCA and the aims of education. London: Institute of Education,
University of London.
ALLISON, B. (1982). Identifying the Core in Art and Design, Journal of Art and
Design Education, 1(1), pp.59-66.
ALLISON, B. (1986). Some Aspects of Assessment in Art and Design Education. In:
Ross, M. (Ed.). Assessment in Arts Education. Oxford: Pergamon Press.
ALLISON, B. (1986). Assessment in Art and Design. In Ross, M. (Ed.) Assessment

in the Arts. Oxford: Pergamon.
ALTE (1998). Multilingual Glossary of Language Testing Terms. Cambridge:

Cambridge University Press.
ADDISON, N. AND BURGESS, L. (2003). (Eds.) Issues in art and design teaching.
London: Routledge Falmer.
ADDISON, N. AND BURGESS, L. (2000). (Eds.) Learning to Teach Art and

Design in the Secondary School. London: Routledge Falmer
AFONSO, A.J. (1998). Estado, mercado, comunidade e avaliação. Revista Crítica

das Ciências Sociais 51., Coimbra: Centro de Estudos Sociais, pp. 109-136.
AMABILE, T. M. (1990). Within you, without you: The social psychology of

creativity, and beyond. . In M. A. Ruco and R.S. Albert (Eds.) Theories of
Creativity, pp.61-91. Newbury Park, CA: Sage.
AMABILE, T.M. (1983). The social psychology of creativity. N.Y.: Springer-Verlag.
ANASTASI, A. (1988). Psychological Testing (6th Edition). New York: Macmillan.
351
ANDERSON, T. (1994). The International Baccalaureate Model of Content-based
Art Education. Art Education 47 (2), pp.19-24.
APECV (2002). Parecer da Apecv sobre a prova de MTEP, 1ª chamada 2002. Porto:
Apecv Reports.
ARMSTRONG, C.L. (1994). Designing Assessment in Art. Reston: NAEA.
ASH, A.; SCHOFIELD, K.and STARKEY, A (2000). Assessment and Examinations

in Art & Design. In: Addison, N. and Burgess, L. (Eds) Learning to Teach Art and
Design in the Secondary School. London: Routledge Falmer.
ASKIN, W. (1985). Evaluating the Advanced Placement Portfolio in Studio Art.

Princeton, New Jersey: Advanced Placement and The College Board.
ATKINSON; D. (1999). A Critical Reading of The National Curriculum for Art in

the Light of Contemporary Theorisations of Subjectivity. In: Broadside 2. Swift, J. &
Hughes, A. (Eds.) Birmingham: University of Central England.
ATKINSON; D. (2002). Art in Education: Identity and Practice.

Dordrecht/Boston/London: Kluwer Academic Publishers.
BACKMAN, L.F. (1990). Fundamental considerations in language testing. Oxford,

UK: Oxford University Press.
BAKER, R. (1997). Classical Test Theory and Item Response Theory in Test
Analysis. Preston, Lancashire: SDS Supplies Limited.
BARRET, M. (1990). Guidelines for Evaluation and Assessment in Art and Design
Education 5-18 years. Journal of Art and Design Education, 9 (3), pp.233-313.
BEATTIE, D. K. (1996). Objects of Assessment: What Aspects of Learning Effects

Do We Assess: Products and/or Processes? In: Art&Fact: Learning Effects of Arts
Education, International Conference. World Trade Center Rotterdam. The
Netherlands, pp.44-60.
BEATTIE, D. K. (1997). Visual Arts Criteria, Objectives, And Standards: A Revisit.

Studies in Art Education 38 (4), pp.217-231.
BEATTIE, D. K. (1994). The Mini-Portfolio: Locus of a Successful Performance

Examination. Studies in Art Education, 47 (2), pp.14-18.
BELL, J. (1989). Doing Your Research Project. Milton Keynes: Open University
Press.
BENAVENTE, A; COSTA, A F.; MACHADO, F.L.; NEVES, M. C. (1992). Do

Outro Lado da Escola .Lisboa: Editorial Teorema.
352
BENAVENTE, A (1990). Escola, Professoras E Processos De Mudança. Lisboa:
Livros Horizonte.
BERNARDES, C. & MIRANDA, F.B: (2003). Portfolio: Uma Escola de

Competências. Porto: Porto Editora.
BEST, D. (1996). A Racionalidade do Sentimento. O Papel das Artes na Educação.

Colecção Perspectivas Actuais. Lisboa: Ed. ASA.
BEST, D. (1982). Objectivity and Feeling in the Arts. Journal of Art & Design
Education, 1 (3), pp.373-390.
BINCH, N. (1994). The Implications of the National Curriculum Orders for Art.
Journal of Art & Design Education, 3 (2).
BLACK, P.; HARRISON, C.; LEE, C.: MARSHALL, B.; WILIAM, D. (2003).
Assessment for Learning: Putting it into practice. Maidenhead: Open University
Press.
BLAIKIE, F. (1994). Approaches to Secondary Studio Art Assessment in North

America: 1970-1993, Journal of Art and Design Education, 13 (3), pp. 299-311.
BLAIKIE, F. (1994). Studio Art in North America: Advanced Placement, Arts

Propel, and International Baccalaureate. Studies in Art Education, 35 (4), pp.237-
251.
BLAIKIE, F. (1996). Qualitative Assessment of Studio Art: Problems, Definitions

and Solutions. Canadian Review of Art Education, 23 (1), pp.17-30.
BLISS, J., MONK, M. & OGBORN, J. (1983). Qualitative Data Analysis for
Educational Research. London: Croom Helm.
BOAVIDA, J. & BARREIRA, C. (1993). Nova Avaliação: Novas Exigências.

Inovação 6.IIE.
BOUGHTON, D. (1994). Evaluation and Assessment in Visual Arts Education.

Geelong, Australia: Deakin University Press.
BOUGHTON, D. (1996). Evaluating and Assessing Art Education: Issues and

Prospects. In Boughton, D , Eisner, E. & Ligtvolet (Eds.) Evaluating and Assessing
the Visual Arts in Education . New York: The Teachers College Press.
BOUGHTON, D. (1996)b. Assessment of Student Learning in the Visual Arts. In:

Translations From Theory To Practice. Reston: USA: NAEA
BOUGHTON, D. (1996)c . Assessing Learning Effects in the Visual Arts: What

Criteria Should Be Used? In: Art & Fact: Learning Effects of Arts Education,
International Conference 27/28 March 1995. The Netherlands: World Trade Center
Rotterdam, pp. 72- 86.
353
BOUGHTON, D. (1999). Framing Art Curriculum and Assessment Policies in
Diverse Cultural Settings. In: Mason, R, and Boughton, D. (Eds.) Beyond
Multicultural Art Education: International Perspectives. New York: Waxmann, pp
331-348
BOUGHTON, D. (1997). Reconsidering Issues of Assessment and Achievement

Standards in Art Education. Studies in Art Education, 38 (4), pp.199-213.
BOWDEN, J. (2001). Boy’s under-achievement in art and design: A project

summary. A’N’D: The Newsletter of The National Society for Education in Art, nº2:
Autumn 2001, p.9.
BROADFOOT, P. (1992). Multilateral Evaluation: a case study of the national

evaluation of records of achievement (PRAISE) project. British Educational
Research Journal 18 (3), pp: 245-257.
BROADFOOT, P. (1996). Education, Assessment and Society. Milton Keynes: Open

University Press.
BROADFOOT, P. (1998). Records of Achievement and the Learning Society: a tale

of Two Discourses. Assessment in Education 5 (3),pp. 447-477
BROADFOOT, P. (2000).Preface in Ann Filer (Ed.) Assessment: Social Practice

and Social Product. London: RoutledgeFalmer, pp. ix-xiii.
BROWN, N.C. (1997). The Meta-Representation of Standards, Outcomes and

Profiles in Visual Arts Education. Australian Art Education 20, (1/2), pp.: 34- 43.
BROWN, S. (1994). Assessment: A Changing Practice In Moon, B. & Shelton

Mayers, A. (Eds) Teaching And Learning In The Secondary School. London: Open
University Press
BTEC. (1986). Assessment and Grading: General Guide. London: Business and
Technician Education Council.
BURGESS, R.G. (1982). The Unstructured Interview as a Conversation. In: Burgess,

R. C. (Ed.) Field Research: A Sourcebook and Field Manual. London: Allen &
Unwin.
BURGESS, R.G. (1993). (Ed.) Educational Research & Evaluation for Policy and
Practice? London: Falmer Press.
BURGESS, L. & Addison, N. (2000). Contemporary art in schools: why bother? In:
Hickman, R. (Ed) Art education : meaning, purpose and direction. London:
Continuum.
354
BURKHART, R. (1965). Evaluation of Learning in Art. Art Education, 19 (4),
pp.3-5.
CARDINET, J. (1986). Èvaluation Scolaire et Mesure. De Boeck-Wesmel:Bruxelles.
CARLINE, R. (1968). Draw they Must. London: Edward Arnold Publishers.
CNEES (1998). Relatório sobre os exames de 1998. Portuguese Council of National

Examinations of Secondary Education.
CHALMERS, G. (1981). Art Education as Ethnology. Studies in Art Education 22

(3), pp.6-14.
CHAPMAN, L. (1978). Evaluation of Learning Art. In: Art Approaches to Art in

Education. New York: Teachers College Press.
CLARK, G. (1975). Evaluation in Art Education: Less Subconscious And More

Intentional. In D. J. Davis (Ed.). Behavioural Emphasis in Art Education, pp. 43-50.
Reston, VA: National Art Education Association.
CLARK G.; ZIMMERMAN, E. and ZURMUEHLAN, M. (1987). Understanding

Art Testing. Reston, VA: National Art Education Association.
CLARK G. (1995). Clarks Drawing Abilities Test. Bloomington: Arts Publishing Co.
Inc.
COHEN, L. and MANION, L. (1994). Research Methods in Education (4th Ed.).

London: Routledge.
COLLINS, G. and Sandell, R. (1987). Women's Achievements in Art: An Issues

Approach for the Classroom. Art Education, 40(3),pp.12-21.
COMISSÃO INTERNACIONAL SOBRE EDUCAÇÃO PARA O SÉCULO XXI:

UNESCO (1996). Educação Um Tesouro A Descobrir. Porto: Colecção Perspectivas
Actuais: Ed. ASA.
CRESWELL, J.W. (1998). Qualitative Inquiry And Research Design, Choosing

Among Five Traditions. Thousand Oaks: Sage.
CROSSLEY, M. & BROADFOOT, P. (1992). Comparative and International

Research In Education: scope, problems and potential. In: British Educational
Research Journal 18 (2).
CRONBACH, L.J. (1990). Essentials of Psychological Testing (5th edition). New

York: Harper and Row.
CRONBACH, L.J. (1971). Test validation. In: Thorndike, R.L. (Ed.) Educational
Measurement (2nd edition.). Washinghton, DC: The American Council on Education
and The National council on Measurement in Education.
355
CSIKSZENTMIKALYI, M. (1990). The domain of creativity. In M. A. Ruco and
R.S. Albert (Eds.) Theories of Creativity. Newbury Park, CA: Sage, pp.190-212.
DASH, P. (1999). Thoughts on a Relevant Curriculum for the 21st Century. Journal
of Art and Design Education, 18 (1), pp. 123-127.
DAVIES, I. K. (1987). Art and Design in the GCSE: Objectives, Criteria and
Assessment. Journal of Art and Design Education, 6 (1), pp.51- 65.
DARLING-HAMMOND, L. (1994). Performance-Based Assessment and

Educational Equity. Harvard Educational Review, Symposium. Equity in
Educational Assessment. 64 (1), pp.5-29.
DEPARTAMENTO DO ENSINO SECUNDÁRIO: DES (1992). Orientações e

Gestão dos Programas: Oficina de Artes. Lisboa: M.E.
DEPARTAMENTO DO ENSINO SECUNDÁRIO: DES (1992). Orientações e

Gestão dos Programa da Disciplina De Materiais E Técnicas de Expressão
Plástica:. Lisboa: M.E.
DEPARTAMENTO DO ENSINO SECUNDÁRIO: DES (2000).Revisão Curricular.

Lisboa: M.E.
DEPARTAMENTO DO ENSINO SECUNDÁRIO (2001). Resultados dos exames

do 12º ano . Lisboa: Ministério da Educação.
DEPARTAMENTO DO ENSINO SECUNDÁRIO (1996): Instruções para a

realização dos exames. Lisboa: Ministério da Educação.
DEPARTMENT FOR EDUCATION AND SKILLS (2003). 14–19: opportunity and

excellence. Nottinghamshire: DfES Publications.
DENZIN, N.K. & LINCOLN, Y.S. (1998). Strategies of Qualitative Inquiry.

Thousand Oaks: SAGE.
D' HAINAUT ,L. (1977). Des Fins aux Objectives de L'Education. Editions Labor:
Bruxelles.
DOBBS, S. (1992). The DBAE Handbook: An overview of Discipline Based Art

Education, Santa Monica: Getty Center for the Arts.
DEPRYCK, K. (1993). Paper presented at Creativity 93 World Congress in Madrid
DEWEY, J. (1950). Experience and Education. Co., New York: Macmillan

Publishing.
DIAZ DE RADA, V. (1999). Técnicas de Análisis de Dados para Investigadores

Sociales: Aplicaciones practices con SPSS para Windows. Madrid: Ra-Ma.
356
DORN, C. M. (2002). The teacher as stakeholder in student art assessment and art
program evaluation. Art Education 55 (4), pp. 40-45.
EARLE, D.M. (1986). Art Examinations in Secondary Schools. Institute of London:

PHD thesis.
EÇA, T. (1999). Assessment of Creative Work in Portuguese MTEP External

Examinations. London Roehampton Institute. MA thesis.
EDEXCEL (2000). GCE A level Art & Design: Guide for Teachers and Moderators.
London: Edexcel.
EDEXCEL (2000). A Student's Guide to the AS and Advanced GCE in Art & Design.
London: Edexcel.
EISNER, E. (1972). Educating Artistic Vision. London: Macmillan.
EISNER, E. (1985). The Educational Imagination. New York: Macmillan.
EISNER, E. (1986). The Art of Educational Evaluation. Philadelphia: The Falmer

Press.
EISNER, E. (1988). Structure and Magic in Discipline- Based Art Education.

Journal of Art and Design Education 7 (2), pp.185-196.
EISNER, E. (1991). The Enlightened Eye: Qualitative Inquiry and the Enhancement
of Educational Practice. New York: Macmillan.
EISNER (1992). The Misunderstood Role of the Arts in Human Development. Phi
Delta Kappan, April 1992.
EISNER, E. (1998). The Kind of Schools We Need. Personal Essays. Portsmouth:

Heinemann.
EISNER, E. (2003). The Arts and the Creation of Mind. New Haven & London:
Yale University Press.
EISNER, E. (2003). Qualitative research in the new millennium. In: Addison, N. and
Burgess, L. (2003). (Eds.) Issues in art and design teaching . London: Routledge
Falmer , pp: 52- 60.
EMERY, L. (1996). Heuristic Inquiry. In: Australian Art Education 19 (3) p.29.
EFLAND, A, KOROSCIK, J. and PARSONS, M. (1991). Assessing Art Learning

Based on Novice-Expert Paradigms: A Progress Report. Unpublished manuscript.
The Ohio State University. Ohio, U.S.A.
357
EFLAND, A, FREEDMAN, K. & STUHR, P. (1996). Postmodern Art Education:
An Approach To Curriculum. Reston, Virginia: NAEA.
EPPI Centre (2004) A Systematic Review of the Impact of Formal Assessment on

Secondary School Art and Design Education. Evidence for Policy and Practice
Information and Co-ordinating Centre: University of London, Institute of London,
Social Science Research Unit.
ESTRELA, A. (1994). Avaliações em Educação: Novas Perspectivas. Porto: Porto

Editora.
EU Comenius 3,1/Socrates & YOUTH, Report of the project 40966-CP-1-97-1-FI-

C31. Portfolio assessment in secondary art education and final examination. June
1999, UIAH Helsinki: Department of Art Education: University of Art and Design.
FELDHUSEN, J.F. & GOH, B.E. (1995). Assessing and Accessing Creativity: An
Integrative Review of Theory, Research, and Development. Creativity Research
Journal 8 (3), pp. 231-247.
FERNANDES, D. (1992). Práticas e Perspectivas da Avaliação (Dois anos de

experiências no Instituto de Inovação Educacional). Lisboa:I.I.E.
FIELDING, R. (1996). A Justification for Subjectivity in Art Education Research.

Australian Art Education 19 (2).
FIELDING, R. (1998). Creative Interpretation in Art Criticism . Australian Art

Education 21 (3).
FILER, A. ( 2000). (Ed.) Assessment: Social Practice and Social Product . London:
Routledge Falmer.
FLINDERS, D. & MILLS, G.E. (1993). (Eds.) Theory and Concepts in Qualitative
Research. N.Y: Teachers College Press
FODDY, W. (1994). Constructing Questions for Interviews and Questionnaires.

Cambridge University Press.
FOUCAULT, M. (1977). Discipline and Punish. New York: Vintage Books.
FREEDMAN, K. (1993). The social production of computer graphics and art

education. In Muffoletto, R. and Knupfer, N. (Eds.) Computers in education: Social,
political, historical perspectives. Hampton, NJ: Hampton Press.
FREEDMAN, K. (2003). Teaching Visual Culture: Curriculum, Aesthetics and the

Social Life of Art. N.Y.: Teachers College, Columbia University.
FREEDMAN, K. (2003)b. Recent shifts in US art education. In: Addison, N. &

Burgess, L. (Eds.) Issues in art and design teaching. London: RoutledgeFalmer, pp.
8-18.
358
GENTILE, R., & MURNYACK, N. (1989). How shall Students be Graded in
Discipline-Based Art Education? Art Education 42 (6), pp.33-41.
GARDNER, H. & GRUNBAUM, J. (1986). The assessment of artistic thinking:

comments on the national assessment of educational progress in the arts.
Unpublished paper, Harvard Project Zero.
GARDNER, H. (1990). Multiple Intelligences: Implications for Art and Creativity.

In Moody, W. J. (Ed.) Artistic Intelligences: Implications for Education. New York:
Teachers College Press.
GARDNER , H. (1990)b. The Assessment of Student Learning in the Arts.

Conference Paper. Bosschenhoofd: Netherlands.
GARDNER, H. (1992). Assessment in Context: The Alternative to Standardized

Testing. In Guifford, B & O, Connor M.C. (Eds.), Changing Assessment: Alternative
Views of Aptitude, Achievement and Instruction. Boston: Kluwer.
GARDNER, H. (1996). The Assessment of Student Learning in the Arts. In:

Boughton, D., Eisner, E. & Ligtvoet, J (Eds.) Evaluating and Assessing the Visual
Arts in Education. New York: Teachers College Press.
GARDNER, H. (1999) Intelligence Reframed: Multiple Intelligences for the 21st

Century. New York: Basic Books.
GETAP: MINTÉRIO DA EDUCAÇÂO (1992). Programa da Disciplina de Oficina

de Artes. Lisboa: M.E.
GETAP: MINTÉRIO DA EDUCAÇÂO (1992). Programa da Disciplina Materiais

E Técnicas de Expressão Plástica. Lisboa: M.E.
GIPPS, C. AND STOBART, G. (1997). Assessment: A teacher's guide to the issues.

London: Hodder & Stoughton.
GOSWAMI, A. (1996). Creativity and the Quantum: A unified Theory of Creativity.

Creativity Research Journal 9 (1), pp.47-61.
GOUGH, S. & SCOTT, W. (2000). Exploring the Purposes of qualitative Data

Coding in Educational Enquiry: insights from recent research. Educational Studies
26 (3).
HALPIN, T. (2003). Teachers caught cheating to help pupils make grade. In: The
Times, October 11, 2003, p. 1.
HANNAN, B. (1985). Assessment and Evaluation in Schooling. Victoria: Deaken

University Press
HANS, N. (1964). Comparative Education. London: Routledge.
359
HARDY, T. (2001) Farewell to the Wow. Times Educational Supplement, 16th
November 2001, p. 7.
HARDY, T. (2002). AS Level Art: Farewell to the ‘Wow’ Factor. The International
Journal of Art and Design Education 21 (1), pp.52- 59.
HARLEN, W., GIPPS, C., BROADFOOT, P. & NUTALL, D. (1994). Assessment

and the Improvement of Education. In: Moon, B. Shelton Mayes, A. (Eds.) Teaching
and Learning in the Secondary School. London: Open University Press.
HARGREAVES, A (1984). Experience counts, theory doesn't. How teachers talk

about their work. Sociology of Education. 57, pp.244-254.
HARGREAVES, D.; GALTON, M.J. & ROBINSON, S. (1996). Teachers’

assessments of primary children’s classroom work in the creative arts. Educational
Research 38 (2), pp. 199-211.
HASSELGREN, A. (1998). Small words and valid testing. PHD thesis. Bergen:
University of Bergen, Department of English.
HAUSMAN, J. (1994). Standards and Assessment. Art Education 47 (2), pp.9- 13.
HAUSMAN, J. (1995). Evaluation in Art Education . Australian Art Education, 18

(3), pp: 25-29
HENNING, G. (1987). A guide to language testing. Cambridge, Mass: Newbury

House.
HENRY, C. (1990). Grading Student Artwork: A Plan for Effective Assessment. In:
Little, B.E. (Ed.) Secondary Art Education: An Anthology of Issues. Reston, Virginia:
NAEA, pp.: 61- 68.
HERMAN, J.L.; KLEIN, D.C.D. AND WAKAI, S.T. (1997). American Students'
Perspectives on Alternative Assessment: Do they Know It's Different? Assessment in
Education 4 (3), pp.339-52.
HERMANS, P (1996). Why Assess? In: Art & Fact: Learning Effects of Arts
Education, International Conference 27/28 March 1995. The Netherlands: World
Trade Center Rotterdam, pp: 97-107.
HENNESSY, B. A. (1994). The Consensual Assessment Technique: An Examination

of the Relationship between Ratings of Product and Process Creativity. Creativity
Research Journal, 7(2), pp.193- 208.
HERNANDEZ, F. (1997). Educación Y Cultura Visual . Sevilla: Morón.
HERNANDEZ, F. (1998). La Necessidad de Una Perspectiva Critica en la

Educacion Artistica y la Enseñanza de las Artes . Ugo (21). Barcelona: Universidad
de Barcelona
360
HERNANDEZ, F.; VALLS, M.R.; RODRIGUEZ, J.M.B. (1998). Pedagogia De
L'Art: Identitat de l' Artista, Context i Ensenyament de les Arts. Barcelona:
Universitat de Barcelona.
HEYFRON, V. (1986). Assessment in Art and Design, in Ross, M (Ed.). Objectivity

and Assessment in the Arts. Oxford: Pergamon.
HICHMAN, R. (2000). (Ed.) Art Education 11-18 Meaning, Purpose and Direction
London: Continuum.
HITCHCOCK, G. & HUGHES, D. (1995). Research and the teacher. New York:
Routledge.
HOLMES, E:R. & VAN DE GRAAFF (1973). Relevant Methods in Comparative

Education. UNESCO.
ILTA(2000). Code of Ethics for ILTA. International Language Testing Association.

[available March 2000, http://www.dundee.ac.uk/languagestudies/Itest/ilta.html]
INSTITUTO DE INOVAÇÃO EDUCACIONAL (1995). Estudo Comparativo dos

sistemas de avaliação em quatro países europeus. Lisboa: IIE.
JAMESON, F. (1998). La Condición Posmoderna: Un Informe Sobre el Ser. Ugo

(21).Barcelona: Universidad de Barcelona, pp. 3-20.
KARPATI, A. (1994) Hungarian Examinations in Visual Arts- the academic

tradition. Paper presented at the 3rd European Congress of InSEA. Lisbon 17-21 July
1994.
KELLAGHAN, T. and MADAUS, G.F. (1991). National Testing: Lessons for

America from Europe. Educational Leadership 49: 87-93.
KINNEAR, T., C. & TAYLOR, J.R. (1996). Marketing Research. (5th Ed.) New
York: McGraw-Hill, Inc.
KLENOWSKI, V. (2003). Developing Portfolios For Learning And Assessment.

London: RoutledgeFalmer.
LAMBERT, D. & LINES, D. (2000). Understanding Assessment. London:

RoutledgeFalmer.
LANE, R. (2003). Liverpool Teachers’ Exam Concerns. A’N’D (10). National

Society for Education in Art & Design, issue Winter 2003/4.
LINDSTRÖM, L. (1997). Integration, Creativity, or Communication? Paradigm

Shifts and Continuity in Swedish Art Education. Arts Education Policy Review 99
(1).
361
LINDSTRÖM, L. (1999). The Multiple Uses of Portfolio Assessment. In: L.
Piironen (Ed.) Portfolio Assessment in Secondary Art Education and Final
Examination. University of Art and Design Helsinki (Finland), Department of Art
Education, 1999, pp. 7-16. Report of EU Comenius 3.1 project.
LINDSTRÖM, L. (1999) b. Portfolio Assessment of Creative Skills in the Visual Arts.

Paper presented at the 29th Congress of the Nordic Educational Research Association
(NERA), Stockholm, 15-18 March 2001.
LINCOLN, Y. S. & GUBA, E. G. (1985). Naturalistic Inquiry. London: Sage

Publications.
LEMOS, V. (1993). O Critério do Sucesso: Técnicas de Avaliação da

Aprendizagem. Lisboa: Texto Editora.
LEARY, A. (1986). Art, Assessment and Ethnocentrism . Journal of Art & Design
Education 5 (1 & 2) .
LINACRE, M. (1989). Many-faceted Rasch measurement. Chicago, IL: MESA

Press.
LYOTARD, J. (1984). The Postmodern Condition: a Report on Knowledge.

Manchester: Manchester University Press.
MACE, M. (1997). Toward an Understanding of Creativity Through a qualitative

Appraisal of Contemporary Art making. Creativity Research Journal 10 ( 2&3), pp.
265-278)
McNAMARA, T.F. (1996). Measuring Second Language Performance. New York:

Addison Westley Longman.
MACGREGOR , R. N. (1992). A Short Guide to Alternative Practices. In: Art

Education 45 (6), p.34-38.
MACGREGOR, R. N. (1996). What You See and What You Get: Assessment and
Its Constituents In: Art&Fact: Learning Effects of Arts Education, International
Conference Art&Fact, 27/28 March 1995. World Trade Center Rotterdam. The
Netherlands
MACKINNON, D. & STATHAM, J. (1999). Education in the UK. Facts & Figures.
(3th Ed.) London: Hodder &Stoughton/The Open University.
MADAUS, G. (1994). A technological and historical consideration of equity issues

associated with proposals to change the nations testing policy. Harvard Educational
Review, 64 (1). pp.76-95.
MASON, R., AND BOUGHTON, D. (1999). (Eds.) Beyond Multicultural Art

Education: International Perspectives. New York: Waxmann.
362
MARTINS, G. (1999). Práticas Avaliativas, Momentos Formativos .MA thesis:
Lisboa: Universidade Católica de Lisboa.
MARTINDALE, C. (1999). Aesthetic and Cognition. Paper presented in the

Conference: Educação Estética-Abordagens Transdisciplinares. Lisboa: Fundação
Gulbenkian ( September 1999).
MESSICK, S. (1971). Validity . In: Thorndike, R.L. (Ed.) Educational

Measurement (2nd edition.). Washinghton, D.C.: The American Council on Education
and The National council on Measurement in Education.
MESSICK, S. (1989). Meaning and Values. In Test Validation: The Science and
Ethics of Assessment. Educational Researcher 18 (5), pp.10-11.
MESSICK, S. (1992). Validity of Test Interpretation and Use. In: Alkin, M.C. (Ed.)
Encyclopedia of Educational Research (6th edition). New York: Macmillan.
MIRZOEFF, N. (1999). An Introduction to Visual Culture. London/ New York:

Routledge.
MILES, M.B. & HUBERMAN, A.M. (1994). Qualitative Data Analysis: An

Expanded Sourcebook (2nd Ed.). London: Sage Publications.
MINISTÉRIO da EDUCAÇÂO: Diário da Répública (1993). Despacho Normativo

nº 338/93. Lisboa.
MINISTÉRIO da EDUCAÇÂO: Diário da Répública (1993). Despacho Normativo

nº162/ME/9. Lisboa.
MCFEE, J. (1986). Cross-Cultural Inquiry into the Social Meaning of Art:

Implications for Art Education. Journal of Cross-Cultural and Multicultural
Research in Art Education, 4 (1), pp. 6-16.
MICHAEL, J. (1980). Studio Art Experience: The Heart of Art Education. Art
Education 33 (2), pp.15-19.
MINKIN, (1998). Inviting A Response to the Agenda. Draft for Discussion in the
Consultative Group on Creative Development. Unpublished paper.
MOURA, A. (1999). Art Patrimony in Portuguese Middle Schools: Problems of

Cultural Bias. In: Boughton, D. & Mason, R (Eds.) Beyond Multicultural Education.
NY: Waxmann, pp.115-133.
MOUSTAKIS, (1990). Heuristic Research. USA: Sage.
MYFORD, C. M. & WOLFE, E. W. (2000). Monitoring Sources of Variability

Within the Test of Spoken English Assessment System. Educational Testing Service.
Report 65. [Available June 2003: http://www.toefl.org.]
363
MUMFORD, M.D.; BAUGHMAN, W.; MAHER, M.A.; COSTANZA, P. &
SUPINSKI, E.P. (1997). Process-Based Measures of Creative-Solving Skills: IV.
Category Combination. Creativity Research Journal 10 (1), pp. 59-71.
NACCCE (1999). All our futures: Creativity, Culture. Education. London: DCMS
and DFEE.
NEAB. (1999). Art and Design ( Advanced): Syllabus number 4191. Northern
Examinations and Assessment Board.
[available January 2000: http://www.neab.ac.uk/syllabus/arts/artdes/ss992191.htm]
NEWTON, PAUL (1997). Examining Standards Over Time . Research Papers in

Education. 12 (3), pp.227- 48 .
NAEA: NATIONAL ART EDUCATION ASSOCIATION (1994). The National

Visual Arts Standards. Reston VA: NAEA .
NAEA: NATIONAL ART EDUCATION ASSOCIATION (1994). Visual Arts

Assessment and Exercises Specifications. Reston VA: NAEA.
NCTE (2000). Developing a Test’s Taker Bill of Rights. Milwaukee: NCTE Annual
meeting.
NORUSIS, M.J. (2000). SPSS 10.0: Guide to Data Analysis. New Jersey: Prentice
Hall.
NOVAK, J.D. & GODWIN, D. B. (1984). Learning How to Learn. Cambridge

University Press.
NUTALL, D. (1990). Talk presented to Chief Examiners for General Certificate of

Secondary Education Examiners in Art and Design. Bath, 1990, March. UK.
Unpublished paper.
NUTALL, D. (1987). The Validity of Assessment. European Journal of Psychology

of Education, II (2).
OECD (1972). Programas de Ensino a partir de 1980. Organisation for Economic

Cooperation and Development report.
O’ SULLIVAN, B. (2001). Multi-facet Rasch Practical. Unpublished paper.
PACHECO, J.A (1995). A Avaliação Dos Alunos Na Perspectiva Da Reforma.

Porto: Porto Editora.
PATMAN, T. (1988). Key Concepts: A Guide to Aesthetics, Criticism and the Arts in
Education. London: The Falmer Press.
364
PIIRONEN, L. (1999). (Ed.). Portfolio Assessment in Secondary Art Education and
Final Examination. University of Art and Design Helsinki (Finland), Department of
Art Education.
PISA (2000). Mesurer les connaissances et les compétences des élèves : Lecture,
Mathématiques et Sciences : l´evaluation de PISA 2000. Programme international
pour le suivi des acquis des élèves (PISA).
PRENTICE, R. (1995) (Ed.). Teaching Art and Design: Addressing Issues and
Identifying Directions. London: Continuum.
POSTLETWAITE, N.T. (1988). The Encyclopedia of Comparative Education and

National Systems of Education. NY: Pergamon Press.
PRICE, E. (1982). National Criteria for a Single System of Examining at 16+ with
reference to Art and Design. Journal of Art and Design Education, 1 (3),
pp. 397-407.
QCA (1999). Curriculum guidance for 2000. [Available July 2000:

http://www.qca.org.uk.]
QCA (2000). England: Context and principles of Education. [ Available July 2000:
http://www.inca.org.uk/.]
QCA (2000). GNVQ Code of Practice. London: QCA.
QCA (2000). GCSE? A level? Vocational A level (Advanced GNVQ)? AS

qualification? Entry level award? NVQ . [Available July 2000: http://www.qca.uk.]
QCA (2000). GCSE and GCE A/AS Code of Practice. London: QCA.
RAGIN, C. (1987). The Comparative Method. Los Angeles: University of California

Press.
RAYMENT, T. (1999). Assessing National curriculum Art AT2, Knowledge and

Understanding: A small-scale project at Key-Stage3. In: Journal of Art and Design
Education 18 (2), pp. 188-193.
READ, H. (1943). Education Through Art. London: Faber and Faber.
REID, L.A. (1986). Assessment in Art Education, in Ross. M. (Ed.). "Art" and the
Arts . Oxford: Pergamon.
RESNICK, L. & RESNICK, D. (1992). Assessing the thinking Curriculum: New

Tools for Educational Reform. in Guifford & O'Connor (Eds.). Changing
Assessment: Alternative Views of Aptitude, Achievement and Instruction.
Boston: Kluwer.
365
RIBEIRO, C. (1993). Educação e Arte. Braga: Revista Portuguesa da Educação,6
(1), pp.103-108.
RIBEIRO, L.C. (1997). Avaliação da Aprendizagem. Educação Hoje. Cacém: Texto

Editora.
ROBINSON, K. (1982). The Arts in Schools: Principles, Practice and Provision.

London: Calouste Gulbenkian Foundation.
ROBINSON, V.M.J. (1993). Problem Based Methodology: research for the

improvement of practice. Oxford: Pergamon.
ROBSON, C. (1993). Real World Research. Oxford: Blackwell.
ROSS, M. (1978). The Creative Arts: Evaluation, Assessment and Certification.

Oxford: Pergamon.
ROSS, M. (1986). Against Assessment. In: Assessment in Art Education, Ross, M.

(Ed.) Oxford: Pergamon.
ROSS, M. (1989). The Claims of Feelings: Readings in Aesthetic Education. UK:

Heineman.
ROSS, M. (1992). Assessment of Arts Achievement in the United Kingdom: The

Reflective Conversation. Journal of Aesthetic Education, 26 (3).
ROSS, M., RADNOR, H., MITCHELL, S., BIERTON, C. (1993). Assessing

Achievement in the Arts. Buckingam: Open University Press.
SÁ CHAVES, I (2000). Portfolios Reflexivos. Aveiro: Universidade de Aveiro.
SADLER (1987). Specifying and Promulgating Achievement Standards. Oxford

Review of Education. ERIC Digest ED348238. Bloomington.
SANTOS, A. S., Fragateiro, C.,Vieira, J.,Peixinho, J. Dias, J.A, Quadros, A. (1996).

Ensino Artístico. Porto: Edições Asa.
SCHÖNAU, D.W. (1999). The Future of Nation-Wide Testing in the Visual Arts.
Art Education, 18 (2), pp.183- 87.
SCHÖNAU, D.W. (1994). Final Examinations in the Visual Arts in the Netherlands.
Art Education, 47 (2), pp.35- 39.
SCHÖNAU, D.W. (1996). Nationwide Assessment of Studio Work in the Visual

Arts: Actual Practice and Research in the Netherlands. In D. Boughton, E. Eisner, &
Ligtvolet (Eds.) Evaluating and Assessing the Visual Arts in Education. New York:
Teachers College Press.
366
SCHÖNAU, D.W. (1999). A Survey on Visual Art Exams in Europe. In : Piironen, L
(Ed.) EU Comenius 3,1/Socrates & YOUTH, Report of the project 40966-CP-1-97-1-
FI-C31. Portfolio assessment in secondary art education and final examination, June
1999, UIAH Helsinki: Department of Art Education: University of Art and Design.
SCRIVEN, M. (1967). The Methodology of Evaluation in Tyler, R et al (Eds.)

Perspectives on Curriculum Evaluation. AERA Monograph Series on Curriculum
Evaluation (1). Chicago: Rand McNally.
SELTON-GREEN, J. & SINKER, R. (2000). Evaluating Creativity: Making and

learning by young people. London: Routledge.
SHOAMY, E. (2001). The Power of Tests. Essex: Pearson Education Limited.
SPOLSKY, B. (1995) Measured Words. Oxford University Press.

STAKE, R. (1978). The Case Study Method in Social Inquiry. Educational
Researcher 7, pp 5-8.
STEERS, J. (1987). Art, Craft And Design Education In Great Britain: A Summary.
Canadian Review of Art Education, 15 (1), pp.15-20.
STEERS, J. (1988). Art And Design In The National Curriculum. Journal of Art and
Design Education 7(3), pp.303-323.
STEERS, J. (1994). Art and Design Assessment and Public Examinations. Journal of
Art and Design Education, 13 (3), pp.287-297.
STEERS, J (1994). b Orthodoxy in Art Education. Paper presented at the 3rd

European Congress of InSEA. Lisbon 17-21 July 1994.
STEERS, J. (1996). Response to Papers by Gardner and Schönau. Evaluation in the

Visual Arts: a Cross-cultural Perspective. In Boughton, D., Eisner, E. & Ligtvoet, J
(Eds.) Evaluating and Assessing the Visual Arts in Education. New York: Teachers
College Press.
STEERS, J. (2001). New A and AS levels. A’N’D: The Newsletter of The National
Society for Education in Art, nº1: Summer 2001, p.11.
STEERS, J. (2003). Art and design in the UK: the theory gap. In: Addison, N. and
Burgess, L. (Eds.). Issues in art and design teaching. London: RoutledgeFalmer, pp.
19-31.
STEERS, J. (2003)b. Art and Design. In: WHITE, J. (Ed.) (Ed.) Rethinking The
School Curriculum: Values, Aims and Purposes. London: RoutledgeFalmer.
STEWART, R. (1996). Constructing Neo-narratives in Art Education. Australian Art

Education.19(3) p.46.
367
STOKROCKI, M. (1997). Qualitative Forms of Research Methods. In: Lapierre,
S.D. & Zimmerman, E. (Eds.) Research Methods and Methodologies for Art
Education. USA: NAEA, Indiana University.
SULLIVAN, G. (1996). Research in Art Education. Australian Art Education.19 (3).
SUTHERLAND, G. (1996). Assessment: Some Historical Perspectives. In:

Goldstein, H. and Lewes, T. (Eds.) Assessment: Problems, Development and
Statistical Issue.UK: John Wiley.
SWIFT, J. & HUGHES, A., (1999). (Eds.) Broadside2. Birmingham: UCE:

University of Central England.
SWIFT, J. & STEERS, J. (1999). A Manifesto for Art in Schools. Journal of Art &
Design Education, 18 (1), pp. 7-13.
TATTERSALL, K. (2003). A national obsession. in: Guardian Education,

September 30, 2003, p.4.
TAYLOR, R. (1986). Educating for Art. Harlow, Essex: Longman.
THISTLEWOOD, D. (1992). Editorial: Controversies Highlighted By The National

Curriculum. Journal of Art and Design Education 11 (1), pp.3-7.
TEUNE, H. (1990). Comparing Countries: Lessons Learned. In: Oyen, E.(Ed.)

Comparative Methodology . Theory and Practice in International Social Research .
London: SAGE.
TGAT (1988). Task Group on Assessment and Testing, a Report. London: DES.
THOMAS, R.M. (1990). International Comparative Education. London: Routledge.
TORRANCE, H. (1989). Ethics and Politics in the Study of Assessment In: Burgess,
R.G. (Ed.) The Ethics of Educational Research. Philadelphia: Falmer Press.
TORRANCE, H. (2000). Postmodernism and Educational Assessment . In: Filer, A.

(2000) (Ed.) Assessment: Social Practice and Social Product . London: Routledge
Falmer.
TUCHMAN, G. (1998). Historical Social Science. Methodologies, Methods and

Meanings. In: In: Denzin, N.K. & Lincoln, Y.S. (Eds.) Strategies of Qualitative
Inquiry. Thousand Oaks: SAGE.
VALADARES & GRAÇA, M. (1998). Avaliando para Melhorar a Aprendizagem.

Coimbra: Plátano Editora.
VERMA, G.K. & MALLICK, K. (1999). Researching Education. Perspectives and

Techniques. London: The Falmer Press.
368
VOS, A. J. & BRITS,V.M.(1987). Comparative and International education for
Students Teachers. London: Buttlerworth.
VICKERMAN, C. (1986). Assessment in Arts Education. In Ross, M (Ed.). The Arts

in Public Examinations. Oxford: Pergamon.
VYGOTSKY, L.S. (1972). Thought and Language. Cambridge, MA: MIT Press.
WALLING, D. R. (2000). Rethinking How Art is Taught. Thousand Oaks, Corwin

Press, Inc. (Sage).
WALKER, J. A. & CHAPLIN, S. (1997). Visual Culture: an introduction.

Manchester: Manchester University Press.
WELLINGTON, J.J. (1996). Methods and Issues in Educational Research.

University of Sheffield: USDE Papers in Education.
WEIR, C.J. (1993). Understanding and developing language tests. New York:
Prentice Hall.
WEIR, C.J. (2004). Language Testing and Validity Evidence. In Print.
WHITE, J. (2003). (Ed.) Rethinking The School Curriculum: Values, Aims and
Purposes. London: RoutledgeFalmer.
WILLIAM, D. (1992). Some technical issues in assessment: a user’s guide. British

Educational Research Journal, 22, pp.537-548.
WILLIAMS (2001). Forced in the same mould. In: The Times Educational
Supplement ,16th November 2001.
WIGGINS, G. P. (1993). Assessing Student Performance. Exploring the Purpose

and Limits of Testing. San Francisco: Jossey-Bass Education Series.
WOLF, A. (1988). Opening up Assessment. Educational Leadership, 45 (4),

pp.24-29.
WOLF, A (1995). Competence Based-Assessment. Buckingham: Open University

Press.
WOLF, D. (1988). Artistic Learning: What and Where is it? Journal of Aesthetic
Education 22 (1).
WOLF, D.; Bixby, J.; Glen, J. and Gardner, H. (1991). To Use Their Minds Well:
Investigating New Forms of Assessment. In: Grant, G. (Ed.) Review of Research in
Education. Washington, D.C.: American Research Association.
WORKING GROUP ON 14-19 REFORM. [Available March 2003:

www.dfes.gov.uk/14–19]
369
YOUNG, J. O. (1995). Relativism and the Evaluation of Art. Journal of Aesthetic
Education 29 (4).
ZABALZA, M. M. (1992). Planificação e Desenvolvimento Curricular na Escola.

Porto: Edições ASA.
ZIMMERMAN, E. (1992). Assessing Students' Progress and Achievements in Art.

Art Education 45 ( 6), pp. 14-24.
370
Appendix XIV
Booklet (Instructions for an experimental examination in art and design)
Instructions for an experimental examination in art and

design
Index:
A. General Instructions for teachers
B. Instructions for students
C. Assessment Matrix
D. Grade descriptors
E. Model Project Brief
371
General Instructions for teachers
Teachers are responsible for the delivery of the examination and for the first marking
of portfolios.
Training sessions will be conducted during March.
The final exam in art and design consists of the submission of a portfolio by
candidates.
Schools are responsible for organising the exam. Teachers are responsible for the
development of the unit for external assessment. Students should have access to the
examples of project briefs, assessment criteria and mark schemes one month before
the exam (before the Easter holidays) in order to start the preparatory investigation
through independent study.
Project Brief
There are three options for project briefs:
1. The theme and optional final products provided.
2. Theme and final products designed by the students and the class
teacher.
3. Theme and final product designed by the individual student.
The project brief can be designed by the students and class teacher taking into
account students’ motivations. The class teacher and or the students will develop a
project brief according to a chosen theme and starting points in order to be realised
during the class time taking into account the discipline(s) aims, objectives and
contents. Materials, techniques should be specified in task definitions (according to
school conditions) , however the choice can also be left open to students. The project
brief can be transdisciplinary involving several disciplines (for example:
Technologies; Studio Art and Design; MTEP; History of Art). The project brief
designed by the teachers and students must include the following:
1. Aims and objectives (clear explanation of what is intended to be developed

and the expected qualities of the works)
2. Themes
3. Starting points and possible connections (examples of starting points;
problems to find, possible media and possible sources to use)
4. Final products (a range of optional tasks according to the disciplines)
5. Assessment criteria, mark schemes and weightings
6. Bibliography, references, etc.
2. Portfolio
Students construct their portfolios around one task/theme or starting point. The
portfolio contains a selected collection of investigation work, preparatory and
developmental studies (annotations, sketches, experiments, visual studies), the final
product and a self-assessment report ( written, oral, visual). The separate items in the
372
portfolio should be connected and depict the entire process. The portfolio should also
reflect the development of the project in time; all the evidence should be dated and
numbered as a part of the process. Portfolio should include comprehensive
documentation ( models, sources).
The portfolio, for example, can be, an expanding file, document case, box, album,
web page, CD Rom; video, work journal or notebook, etc. The appearance of the
portfolio can be a part of the artistic production and reflect the character of the
project and the personality of the author. The appearance can contribute to the
general visual expression, and help to form a positive evaluation.
Form 1: students’ submission form

Project Brief Preliminary
Form3: internal standardisation form studies
Form 4: Awarding Experiments;
Assessment matrices exploration of
possibilities with
reflection and
Reports, records or evaluation of the
notes about previous processes
experiences, Work journals
intentions, interests, Not compulsory Investigation
etc. (information;
sources; models
and related critical
Final Products Student analysis)
(visual): Paintings, Portfolio
drawings,scultpures,
CD, web-pages,
design products, Crits/ Self- assessment
photographs,films,
video records on Written or oral (tape, video, digital record) about the
performances, students’ intentions, investigation, progress, decisions;
installations, techniques, materials, purposes; functions; meanings
exhibitions, design achievement, and evaluation.
2D or 3D; etc.
3. Time:
Portfolio tasks should be developed during class time excepting investigative work
which can be developed extra-class time.
• Preparation/ Investigation/ Developmental studies- March/April, 25 hours

extra class-time:
• Developmental studies and final products - May 15-20 hours during class
time:
• Self-assessment report: 2 hours (class time).
373
Teachers should be available for tutorials with students during the preparation time.
When portfolios are evaluated the time available should be taken into account as
well as the student’s ability to plan the work beforehand and carry out his/ her plans
in the time given.
4. Special considerations
Teachers should provide conditions for students with special needs to conduct their
work with appropriate conditions. Students with physical disabilities should have
special arrangements or special equipment according to their different needs. Non-
native language students should have additional support in order to understand the
instructions of the assessment and task requirements (for example access to
translation).
5. Equipment and materials

Candidates choose and purchase the materials needed for their project. The range of
materials may be restricted in the definition of the tasks. Advice on materials and
techniques can be sought from the class teacher before the final exam period.
6. Role of the class teacher

The class teacher is responsible for the unit description project and mark schemes.
The unit project and mark schemes should be designed following discussion with
students. The class teacher should give a clear explanation to students before the
examination period of assessment criteria; project brief; tasks and mark schemes.
The teacher can, on request, advise the candidates on purchase of materials and
equipment. The teacher is not, however, allowed to intervene with the project work
itself, or with the use of materials and equipment during the final exam period. The
teacher can, on request, advise candidates on gathering background information
before the final exam period. Teachers should record the nature of the assistance they
offered to individual candidates during preparation time, and their observations can
be used as reasons to explain the internal marking of candidates’ portfolios.
This instrument of assessment is not familiar to students. The teacher needs to

prepare students in a systematic way advising them to make written comments while
the project develops, to make critical reflections about the sources and the processes
used, they need to ensure that students understand they have deadlines for
submission of the work. They need to help them plan their work. Sometimes students
spend a lot of energy and time collecting useless information; pasting and copying
documents, it is very important that the student understand the purposes of critical
reflection and the role of personal opinions as a starting point for development of
ideas and experiments. Students are not used to develop ideas in quantity and variety.
A specific number of preparatory studies is not indicated in the instructions but the
teachers must help the students to develop their ideas intensively and not give up too
easily. Students are not used to justifying the quality of their work; their intentions
and purposes; it is up to the teachers to help them to acquire specific vocabulary in
order to explain the merits of the work and the meaning and function of their
374
proposals. Portfolio assessment require great effort and commitment from the
students in terms of time and persistence; students must see studio art or studio
design as important disciplines; they must understand why portfolio will be
beneficial for them in terms of fair assessment and preparation for further studies.
7. Components and criteria

Students’ portfolios must include the description of the project brief; developmental
studies; final products and self-assessment report. The portfolio components for
assessment are:
Components Weighting range

Process Investigative notes or studies 50%- 60%
Developmental studies
Preparatory notes and studies in the work
journals or developmental studies
Product Final products 30%- 40%
Self- Self-assessment report 10%

assessment And notes about self-assessment in the work
journals or developmental studies
Teachers are responsible for verifying the authenticity of their

students’ work.
The assessment criteria to be negotiated with students are as follows:

The criteria are not strict guidelines; they must be used with the necessary flexibility.
Teachers and students may develop them, adding rubrics for each. However the
general structure and the mark schemes provided in the assessment matrix must not
be changed in order to achieve a common language for assessment.
8. Assessment procedures
The quality of the work and student achievement is holistically estimated taking into
account overall judgement and criteria.
375
The teacher must mark all the portfolios of all the candidates. Portfolios are assessed
in accordance with the negotiated assessment instructions (assessment criteria,
assessment matrix). The class teacher must provide a copy of the portfolio tasks and
mark scheme used in the project unit to the external assessors and explain the
methodology and strategies used in the project unit to the moderator(s).
Internal Marking
Internal marking should be conducted after student’s submission of the portfolios.
Student’s work and achievement is assessed by the student own teacher before the
moderator’s visit to the centre. Teachers will take into account the quality of the
portfolios and their own observations of students’ behaviour during classroom time.
Teachers must use the assessment criteria and assessment matrix language. A total
mark out of 20 points must be awarded for each portfolio. Marks will be allocated in
the assessment matrix and assessment form for each candidate.
Teachers should take into account their own observations of student’s performance
during the course. Evidence for some rubrics related to ‘crits’ for example can only
be observed by the student’s own teacher. Teacher should use such observations to
justify their award.
Art departments are reminded that it is their responsibility to ensure that where more
than one teacher has marked the work in a centre, effective standardisation (group
assessment) has been carried out across all teaching groups. This procedure ensures
that the work of all candidates at the centre is marked to the same standard.
8. 2. External Assessment: Moderation

If in one school there are more than ten candidates, only a sample of ten portfolios
will be subject to moderation. The sample will include candidates achieving the
highest mark and the candidate achieving the lowest mark and will otherwise be a
random selection of portfolios determined by the moderator.
Moderations visits shall be conducted in the period 1 – 15 June. Teachers will

display the required samples of students’ work in appropriate conditions for the
moderation.
Students’ work can be displayed as folders; exhibitions; multimedia presentations,

etc. The class teacher must organise the display of students’ portfolios (sample) in
appropriate audio-visual conditions. Only the portfolios in the identified sample
should be displayed, however all the students’ portfolios should be available.
Teachers must ensure that all the portfolios are stored in appropriated conditions at
the centre during the period: June- September.
Whatever the chosen means of presentation, each portfolio presentation must be

clearly distinguished, one from another, and include the candidate’s name and
number.
The following documentation must be available to the visiting moderator at the start
of the moderation visit:
376
• Copies of the project brief for each candidate.
• Copies of the assessment matrix for each candidate (with awarded marks).
• Awarding forms for each candidate.
The teacher should meet the visiting moderator(s) at the beginning of the visit, to
introduce them to the work. They should be readily available throughout the visit in
case they are required. Visiting moderators will review the portfolios in the
moderation sample in order to ensure that the centre’s marking is:
• In accordance with the marking criteria stipulated;
• In conformity with the overall standards of the examination.
The moderator(s) checks the samples’ marks. Moderation is not a re-marking but
rather a confirmation or otherwise of the teachers’ marks. Re-marking is only
recommended if a serious discrepancy of marks (5 points difference) occurs between
moderators and teachers and will be conducted on a second moderation visit which
can be required by the teacher or by the moderator. At the end of the visit the
teacher(s) will be informed of the moderators’ findings.
The visiting moderator may request a further visit by a second moderator if there are
any aspects of the work and the moderation, which the visiting moderator believes
should be subject to further consideration.
In this case candidates’ portfolio must be retained in the same conditions as viewed
by the original visiting moderator. As a result of the second visit, the second
moderator may, in certain circumstances, find it necessary to recommend that the
original moderator’s marks should be amended upwards or downwards. Should this
be the case, the recommendations of the second moderator will stand.
Moderation standards are checked in the following ways:

• Accompanying visits where the moderator is observed by a second external
assessor.
• Follow-up visits where schools are visited by a second moderator after the
initial moderator’s visit
• Statistical review of moderators’ performance which takes place after the
moderation when all marks are in the system. All moderated marks will be
reviewed and may be subject to further adjustment if necessary. Teachers will
be informed about any further adjustments.
8. Meetings with teachers

During March teachers will be asked to attend the preparation meeting. The meeting
has two sessions (one or two days), the first session concerns the explanation of the
instructions and the second session the assessment of students’ portfolios. The
assessment of a sample of students works aims to achieve consensus about the
interpretation of criteria and standards of the examination.
A third session will be conducted with external assessors in order to prepare them for
moderation; this will include the marking of students’ portfolios and is intended to
verify the accuracy of moderators’ marking.
377
9. Reports about the examination
Teachers will receive a report of the examination (September).
378
B. Instructions for Students
1. Components
The work to be submitted will take a form of portfolio. The portfolio includes three
components: Process; Product and Self-assessment
Components Weightings
Process Investigation notes or studies 50%- 60%
Developmental studies
Preparatory notes and studies in
the work journals or
developmental studies
Product Final products 30%- 40%
Self- Self-assessment report 10%

assessment And notes about self-assessment
in the work journals or
developmental studies
1.1. Assessment Criteria

The following assessment criteria describe the qualities that teachers and external
assessors will look for in your portfolio:

In other words the qualities the qualities that teachers and external assessors will look
for in your portfolio will be:
AC1: Visibility of the intentions: Show that you are able to express your ideas,
motivations, opinions, purposes through words and images, explain your intentions at
the beginning of the project and as long as it develops make annotations about your
progresses, if you re-formulate ideas explain why, if something influences you
explain it, explain the importance of the examples of visual culture that you have
selected and used in your work. Show that you are able to plan your project and
respect deadlines; if for some reason you cannot fully realise what you have
proposed explain why it was not possible.
379
AC2: Searching abilities; abilities to interpret and use examples of visual culture: Show that you
can collect information about the work of others in the world of visual culture (art, design, media,
etc); that you can interpret its meanings, purposes and functions. But, be careful, do not spend too
much time in collecting information that is not relevant for your project; you are not required to copy
and paste information, but rather to reflect critically on it, expressing personal opinions and informing
your work through what you have learned in your search. Show that your sources were useful for
developing your own ideas and experiments of techniques, materials, etc.
AC3: Purposeful development of ideas and experiments: Several qualities will be

evaluated under this criterion. The quantity and quality of sketches or initial ideas is
important. Show that you are persistent, that you do not give up easily. The teachers
will look for personal style and the capacity to communicate your ideas visually; they
will look for your critical reflection about the ideas explored; the processes used and
your capacities to explain your decisions. They will look for your skills in finding
and solving problems; raising issues; presenting possibilities and evaluating them.
Show that you can find problems alone, and not only the problems stated by your
teacher. Many times the problems are formulated through searching the work of
others and exploring ideas. Show that you can be imaginative and propose new ways
of representing a problem from several angles. Your technical skills will be very
important, show that you can represent real and imaginary things with expertise. For
example in drawing, painting, sculpture you must understand and fluently use the
formal elements such as shape, light, space, composition, colour, rhythm, etc. In
product design you must understand and apply notions about ergonomics,
anthropometrics, design methodology, etc.
AC4: The portfolio as a whole and the final product: The teacher will look for
your knowledge, understanding and skills in art and design and if your personal
response was informed by what you have learned through the process and adequate
for realising your intentions. The teacher will see how you used the formal elements
of visual language; art and design concepts and conventions; your technical skills
and your personal style.
AC5: Knowing the strengths and weakness of your work, explain and justify the
meaning and purposes of your work: The teacher will look for your skills of
evaluation, if you are able to reflect upon the process and product you have
developed, if you can justify the purposes, meaning and function of your final
outcome by using specific vocabulary. Show that you are aware of the difficulties
you encountered; that you can point out what you achieved in terms of knowledge,
understanding and skills in art and design.
2. Portfolio
Students construct their portfolios according to the chosen project brief. The
portfolio contains a selected collection of investigation work, preparatory and
developmental studies (annotations, sketches, experiments, and visual studies), the
final product and a self-assessment report (written, oral, visual). The separate items
in the portfolio should be connected and depict the entire process. The portfolio
should also reflect the development of the project in time; all the evidence should be
380
dated and numbered as a part of the process. Your portfolio should include
comprehensive documentation (models, sources).
The portfolio can be, for example, an expanding file, document case, box, album,
web page, CD Rom; video, work journal or notebook, etc. Appearance of the
portfolio can be a part of the artistic production and reflect the character of the
project and your personality. The appearance can contribute to the general visual
expression, and lead to a positive evaluation.
Project Brief
You have three options for you project brief:
1. The theme and optional final products provided.
4. Theme and final products designed by the students and the class teacher.
5. Develop your own theme and final products.
The project brief may involve one discipline or involve several disciplines’ contents.
You must describe the project brief. In the description you must state:
1. The theme
2. The tasks: what are you proposing to do; what kind of final product will you
develop. Explain why do you want to develop it, submit diagrams and of
development and methodology.
3. Timetable: submit a planning diagram or schema showing a time line to
completion of your project.
4. What type of knowledge; understanding and skills will you develop in the
project? What kind of help and resources do you need?
During March April you must choose and develop the project brief. You can use the
example provided, examples provided by your teacher or design your own project
brief in negotiation with your teacher. Your teacher will help you to design your
project brief through individual tutorials at your request.
You must choose a theme for your work; the portfolio is built around one task or
final product. You have one month to study the tasks, gather necessary background
information and make your choice before the final exam period.
381
2.2. What kind of evidence to submit in the portfolios
The following diagram describes some types of evidence to include in portfolios:
Form 1: students’ submission form Preliminary

Project Brief studies
Form3: internal standardisation form Experiments;
Form 4: Awardings exploration of
Assessment matrix possibilities with
reflection and
evaluation of the
Reports, records or
processes
notes about previous
experiences, Work journals
intentions, interests, Not compulsory Investigation
etc. (information;
sources; models
and related critical
Final Products Student analysis)
(visual): Paintings, Portfolio
drawings,scultpures,
CD, web-pages,
design products, Crits/ Self- assessment
photographs,films,
video records on written or oral (tape, video, digital record) about the
performances, students’ intentions, investigation, progress, decisions;
installations, techniques, materials, purposes; functions; meanings
exhibitions, design achievement, and evaluation.
2D or 3D; etc.
2. 3.Suggestions for carrying out Portfolio tasks
1.Preparation: (March –April)

You are expected to make the preparatory studies outside school over a period of 25
hours. Prior to the timed examination, you must produce and submit preparatory
supporting studies undertaken before the examination period. The preparatory
supporting studies should chart the development of the timed work from conception
to completion; include a description of your intentions and motivations; include an
analysis and interpretation of sources from visual culture or other things seen,
imagined or remembered which are important for the work. You may include initial
ideas and first experimentation with materials and processes. Use your work journal
to describe your intentions at the starting point of the portfolio by explaining what
the reasons were for choosing a specific task, what was interesting about the task,
what are the aims and goals of your work, what is your planned timetable, for
example:
- What is the content of your project, what do you want to express, study or tell?
What is at the root of your personal artistic expression (Who or what has inspired
your personal style? What is the reason behind your choice of mode of expression,
382
techniques and materials? What kind of experiments and exploration do you want to
try and why?)
2. Timed examination: (15-20 hours + 2 hours (self-assessment report) during class

time).
The work produced during the examination period should include
• Continuation of the developmental studies initiated during preparation time

for example: experimental works; exploration of ideas/techniques/ materials;
sketches or other visual attempts; visual, written or oral annotations;
experiments and exploration of sources, materials and processes, ideas
evaluated, rejected, selected and developed.
• Final Products.
• Self- assessment report explaining your progresses and achievement.
The portfolio provides evidence about how you have tackled your chosen task and
contains material produced during the process. You should choose related sketches,
studies and experiments that, together with the final product, best illustrate your
project and end results.
The portfolio should illustrate the origins of your chosen content and mode of
expression. It should also reveal pivotal decisions and their consequences: how you
have started and developed your form and content, mode of expression, techniques
and use of materials. You can also include annotated supporting materials that you
have used and analysed, such as texts, newspaper clips, material samples,
reproduction of images or media files. You may include the work journal in your
portfolio.
The portfolio should show that you have made an art and design study of your
subject. This means that making sketches, studies and decisions on alternative
solutions should be a part of the process. The portfolio should also show how your
project developed in time. Date, sign, number all your developmental studies and
other material during the work period. The work journal may be useful to help you
document and critically reflect on the progress and purposes of the project; it helps
you to record and justify your decisions. Notes, visual or textual, also help you to
develop your ideas; reflect about problems found and your decision making process.
You may include the original final product; but if it is impossible to include it
because of size; media or other reasons include a video/photographic or digital
reproduction of it. The final product(s) or the record of it must be labelled as
Product.
383
3. Self-assessment report guidelines
This part should be conducted after completion of the work (during the specified
time). Try to answer the following questions using oral, visual, written media for
example.
Describe how you developed the form and content of your project; your progress,
how you overcame the difficulties; what kind of sequence of work (leap of
imagination, how new ideas appeared and why). If you changed your point of
departure, why and how was that? Did you use any sources or models; were they
useful, why? How well did you attain your purposes, aims and goals; how well did
you achieve the qualities required by the assessment criteria and mark schemes? Did
you achieve your purposes; why and how? Does your work reveal a personal
reflection about an issue, why and how? Is your personal reflection important for
others? Is your work meaningful for others? Why? An honest explanation of partial
failure may be as valuable as an exaggerated claim that all went smoothly.
384
C. Assessment matrix
385
New Examination Instructions
Criteria Level: 1-4 Level: 5-9 Level: 10-14 Level: 15-17 Level: 18-20
CA1: Record personal .
ideas, intentions, Reasonable amount of Considerable amount of
experiences, information Limited or inappropriate Small amount of records not records, the intentions The student’s intention is records with personal
and opinions in visual and records always persistent. The sometimes are clear , obvious. The student reflection.
other forms. The student has no student knows what she shows persistence and shows persistence and Intentions are clearly stated.
conscious intention for wants to achieve, but her/his curiosity combines some The student approaches
what she/he is doing. intention is not explicit. information with the work themes and problems from
according to the several angles and develops
intentions. it through a series of drafts
and. sketches
(1/6pts) (7/14)
( 30 pts) (15/21) (22/26) (27/30 pts)
CA2: The student uses the sources The student actively The student actively The student actively
The student only uses the indicated by the teacher and searches for sources to get searches for various searches for and critical
Critical analysis of sources sources indicated by the others but limits the search ideas for her/his own sources in time and space
reflect about various
from visual culture teacher, the analysis is to collect and organise work. Collect, organise and use it in a well-sources in time and space
showing understanding of limited to collect information. and select information integrated way in her/his
and use it in a versatile,
purposes, meanings and information. according to intentions. own work ( collect, independent, and well-
contexts organise, select, combine)
integrated way in her/his
( 30 pts) own work ( collect,
organise, select, combine,
(1/6pts) (7/14) (15/21) (22/26) critic and re-organise)
(27/30 pts)
CA3: Develop Limited or inappropriate The students show a small The student can use pre- The student can re- The student can find, and
developments of obvious amount of exploration of established problems. . formulate problems. formulate problems in an
ideas through ideas, small ideas, lacking sense of order Reasonable and safe Comprehensive independent way.
experimentation , experimentation and no and technical abilities, exploration of ideas exploration of appropriate Constantly experiment and
signs of critical reflection Tendency to repeat ideas and (without risk-taking) ideas taking risks, explore possibilities taking
exploration and about the experiments and experiments. No critical Shows some understanding revealing understanding risks and finding non-
evaluation decisions made. reflection about the of visual language, of visual language, expected responses. Shows
experiments and decisions concepts and techniques concepts and techniques critical reflection about the
made. but reduced reflection and critical refection experiments and decisions
about the experiments and about the experiments and made.
decisions made. decisions made.
(1/12 pts) (13/27 pts) (28/42 pts) (43/51 pts) (52/60 points)
386
( 60 pts)
CA4: Present a coherent Inappropriate small The amount of works and The considerable amount The amount of works and A set of works was selected
and organised sample of amount of works and final final product shows little of works and final product final product shows and organised. The works
works and final product product showing lack of understanding of visual shows reasonable understanding of visual and final product shows
revealing a personal and understanding of visual language, concepts and understanding of visual language, concepts and excellent understanding of
informed response that language, concepts and techniques. language, concepts and techniques. visual language, concepts
realises their intentions techniques. techniques. and techniques.
( 60 pts)
(1-12 pts) (13-27) (28/42) (43/51) (52/60)
CA5: Evaluate Unable to explain the The student explain the The student evaluate the The student evaluate the The student evaluate
and justify the reasons for her work. merits of her work using characteristics and merit of characteristics and merit fluently the characteristics
qualities of the The student cannot point vague terms, can refer to the her work using specific of her work using specific and merit of her work using
work. out the strengths and intentions and used sources terms, can describe the terms, can explain the specific terms, can explain
weaknesses of her own but is unable to justify the progresses referring to progresses referring to and justify the progresses
work or distinguish quality and purpose of the intentions, sources and intentions, sources and referring to intentions,
between works that are work. problems encountered, problems encountered, sources and problems
successful and those that Justify the purpose and Justify the purpose and encountered, Justify with
are less successful. meaning of her work using meaning of her work in fluent terms the purpose
vague terms. cultural and social and meaning of her work in
(1/4 pts) ( 5/9 pts) ( 10/14 pts) contexts. cultural and social contexts.
( 15/17 pts) ( 18/20 pts)
( 20 pts)
387
D. Grade Descriptors
Assessment Criteria; grade descriptors
A list of draft criteria was established in order to be used as ‘windows’ to help the
assessors to judge students’ artworks.
Draft criteria
Students should:
AC2: Critically analyse sources from visual culture showing understanding of
AC3: Develop ideas through purposeful experimentation, exploration and
evaluation.
AC4: Present a coherent and organised sample of works and final product
revealing a personal and informed response realising their intentions.
AC5: Evaluate and justify the qualities of their work.
2. Global Descriptors
The following descriptors or band scales might be useful for holistic assessment.
However teachers are reminded to use it with the necessary flexibility.
1-4. Reduced
388
A clearly inadequate and disorganised quantity of work has been completed
including a few records of obvious or literal ideas; some collected but not analysed
sources from visual culture; few developmental studies and a final product which
does not realise the intentions. Personal expression is weak and stated in vague terms
and the student is unable to justify the strengths and weakness of their work Overall
the work is lacking a sense of order, basic technical skills and relevant
understanding of art and design processes of inquiry (formal elements/ composition/
design principles/visual language).
5-9. Limited
A small amount of work has been produced including some record of personal ideas
and intentions, some organisation of sources but the student does not use it to explore
his/her own ideas. There is some sense of order and structure in the way ideas are
formed but few explanations of decisions made. The exploration of ideas is
abandoned too early, ideas and experiments are repeated without any changes in the
form and content. The student sometimes critically reflects on his/her work and the
work of others but the vocabulary is clumsy and unrefined.
Overall the work demonstrates a limited understanding and use of art and design
concepts, processes and techniques (formal elements/ composition/ design
principles/visual language).
10-14 . Basic/Reasonable/Appropriate
A reasonable sample of work has been presented including records of a range of
ideas, clear explanation of the intentions and motivations; an organised selection of
389
sources appropriate to the project but with superficial critical reflection. Some
sources seem to be used during the development of ideas but not fully explored. The
original ideas may be consolidated too early; experiments are not fully developed
and the outcomes are predictable, however there is some sense of order and method.
The student sometimes critically reflects on his/her work and the work of others
using appropriate vocabulary.
Overall the work demonstrates an adequate understanding of the contexts, concepts,
processes and techniques of art and design production (formal elements/
composition/ design principles/visual language).
15-17: Good
A good sample of work has been organised including a record of some unusual ideas;
the majority of development studies show a personal style and reasonable technical
expertise, and, sometimes unexpected experiments which progressively evolve
towards the realisation of intentions through systematic evaluation. The student
understands and critical reflects on the contexts, meanings and purposes of visual
culture production. Uses the sources to develop personal interpretations. Experiments
and explores possibilities taking risks and some times evaluating the decisions. Use
of art and design specific vocabulary to reflect and evaluate his work and the work of
others. Explains the importance of the work presented within social and cultural
contexts.
390
The work overall illustrates personal experimentation, exploration of ideas and
possibilities and a good resolution of concepts, media and technical expression
through developmental studies and final products.
18-20: Very Good/ Excellent /Fluent
A considerable sample of work has been organised including a record of a range of
unusual ideas; development studies showing a personal style and technical expertise,
and unexpected experiments which progressively evolve towards the realisation of
intentions through systematic evaluation. The student understands and critical
reflects on the contexts, meanings and purposes of visual culture production. Uses
the sources to develop personal interpretations. Experiments and explores
possibilities taking risks and
always evaluating decisions made. Use of art and design specific vocabulary to
reflect and evaluate his/her work and the work of others. Persuades the audience
(viewer) about the importance of the work presented within social and cultural
contexts.
The work overall illustrates a personal and sophisticated experimentation,
exploration of ideas and possibilities and an outstanding resolution of concepts,
media and technical expression through developmental studies and final products.
391
E. Examination in Art and Design ( Experiment)

March- May 2003-03-14
Model Project Brief
Timed examination: 22 hours

This paper may be given to students as an option for the project brief. Students have
a four weeks period in which to do preparatory supporting studies prior to the timed
examination.
1. General instructions
This paper and the instructions for examination are given to you in advanced of the
examination so that you can make sufficient preparation.
The portfolios for external assessment will be developed according to a project brief
theme. You may chose to develop:
1. This project model theme
2. A theme and project brief developed by the class and the teacher
3. Your own theme and project brief.
The works to be included in the portfolio are:

• Preparatory studies
• Developmental studies
• Final product
• Self-assessment-report
The assessment criteria and weightings are:
Assessment criteria Weightings

AC1: Record personal ideas, intentions, experiences, information and 30 points
opinions in visual and other forms.
AC2: Critical analysis of sources from visual culture showing 30 points
understanding of purposes, meanings and contexts
AC3: Develop ideas through purposeful experimentation, exploration 60 points
and evaluation.
AC4: Present a coherent and organised sample of works and final 60 points
product revealing a personal and informed response that realises their
intentions.
AC5: Evaluate and justify the qualities of the work. 20 points
392
2. Model Project briefs
2.1. Theme: ‘Excluded people’
The excluded people are those persons who are rejected by the society, or those who
are discriminated by the community because they are different from the majority of
the people in the community. People can be rejected because they are physically or
mentally different; other can be excluded because they have different ethnic origins
or they are from a different race. Some persons can be discriminated by sexual
characteristics; others are excluded because they have different beliefs, different
social backgrounds or simply because their ways of living are different from the
majority.
It is intended to make a visual study reflecting upon the people who are excluded
from the community, who are discriminated. You are asked to present a personal
reflection about ‘the other’.
Dwarfs, beggars, crippled and crazy people were depicted in works of art though
history; for example by Velasquez; Murillo, Gericault, picaresque paintings, etc.
Contemporary artists have been working on projects about homeless people.
Similarly in film and photography some artists had approached discriminated or
excluded people. Photographs by Sebastião Salgado show a vision about the other,
and some of his works are about minorities. The photograph by Gabriel Orozco: ‘ the
island in the island’ is a reflection about closed and discriminating spaces. The
photographer Diane Arbus in her works presented a personal vision about different
people often called monsters. Monsters are often present in comics and science
fiction films. The super-heroes, androgynies, cyborgs and mutants are somehow a
vision about different people and their place in society.
Some theatre and dance companies, and film producers have artists with disabilities,
do you know about ? In advertising and publicity different people are not commonly
represented, do you often see fat people, disabled or poor people in publicity? Why?
However there are some humanitarian founded advertising about different people.
Designers invented products to make easier the daily life of disable people, for
example cutlery and scissors for left-handed people.
2.2. Starting points
• Three ways to look at the other

We can look at the other from three different ways: discriminate; tolerate or
integrate him. We discriminate the other when we think that he doesn’t belong to
our circle and that he shouldn’t share our lives. We tolerate the others when we
see them as different but with their own rights, we accept his presences but we do
393
not invite him to share out lives. We integrate the other when we invite him to
share our daily lives with no constraints.
• Me as the other
I can be the other, the different one. I can be discriminated by some reason. How
will I feel within the community? What will be my sensations and feelings? How
shall I live; who are my friends? How do the other persons look at me in the
street. Why they insist to look at me?
What is my relationship with the objects made for the normal people? The
entrances in the buildings are accessible for a wheeling chair? What about the
size of urban equipments? If I am blind, are the streets safe for me? If I am
locked in a psychiatric hospital or prison cell, how do I feel in this space?
• The other is a human being

How does the community accept the other? How can we inform and make people
aware about the others? How to fight racism and other discriminating attitudes?
How to change people attitudes towards difference? How to inform and persuade
the community about the need to be aware of discriminating attitudes towards
different people? How to denounce inequalities?
2.3. Tasks and media
• Fine Arts
Fine arts could be defined as a mean of to express personal experiences, feelings
and particular visions, it is not aimed to solve practical or utilitarian questions,
however fine arts media can transmit strong messages about the reality and the
world including social questions. If you chose this media you can make a
painting, a drawing, sculpture, printing, installation, performance, site-specific
work, etc. You can also choose ceramic, textile or other crafts.
There are several artists who approached this theme, make a search about it and
try to understand their visions about different people. Develop your ideas after
the search and from your own opinions about the issue.
• Photography, film, video and multimedia

Photography, film, video and multimedia can be seen as autonomous fields or as
products which can be embodied in other fields, for example photography and
advertising, photo-journalism, film and publicity, video-clip and multimedia
productions.
Search about photography and media products where different people are
presented, and try to understand the underlying visions about the issue. You can
produce as outcome one or several finished photographs, film or video in order to
394
present your own views and opinions about the theme. You can also make a web-
page, video-clip or other multimedia presentation.
• Comics
Comics are a narrative medium using images and texts. You can make a comic
strip or a graphic novel to tell a story related to the issue. The graphic novel
cannot be a finished product, but you must include the complete story-board;
essays for scenes and characters; drafts for page settings and some finished ‘
pranches’.
• Graphic design
The aim of graphic design production is the communication of messages using
expressive, symbolic, formal and functional aspects of the image. Messages can
be informative and or persuasive.
You must detect a need or a problem, investigate specific areas of the

problem, determine relevant sources of information and use these to
develop and redefine the problem. You can produce outcomes of
illustrating, packaging, advertising, computer-aided design, typography
and multimedia design. For example a logo; a pictogram, a set of
sinaletics, a web-page, a leaflet, a billboard/ show card, etc.
• Product design
Try to detect needs related to different people and disable in your environment
before thinking about the final product (you can use brainstorming strategies to
help you). You should show evidence of an understanding of the appropriateness
of the medium to function and of fitness for the purpose. Use your knowledge
about design process to develop your ideas (concept, formulation of brief,
research, experimentation, realisation and evaluation). You can re-design a
utilitarian object or a space or invent a new one.
395

Developing A New Conceptual Framework For Pre-University Art Examinations

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Developing A New Conceptual Framework For Pre-University Art Examinations

Загружено:

Авторское право:

Доступные форматы

Developing a new conceptual framework for pre-university art examinations in

Maria Teresa Torres Pereira de Eça

Submitted in partial fulfilment

School of Education Studies, University of Surrey Roehampton

0.1. The problem area 1

Chapter 1:Assessment in Art Education 18

Chapter 2: Reliability, Validity, Impact, Practicality 51

Chapter 3: Design of the research 77

3.1. Research questions 77

Chapter 4: Portuguese art examinations 107

4.1. Historical background overview 107

Chapter 5: Art and design examinations in England 153

5.1. Introduction 153

Chapter 6: Design and piloting of a new external assessment 208

6.1. Introduction 208

7.1. Trial sample 240

Chapter 8: Conclusions 296

I: Portuguese education system 1

Figure 1: Conceptual framework for evaluation of art and design 76

Table 1: Plan of action 82

I wish to extend my grateful thanks to:

Dr John Steers, Director of Studies, for his knowledgeable advice, encouragement,

trends in art and design curriculum and assessment.

0.1. The problem area

inconsistently. North American literature on the subject usually refers to 'Art

Assessment is a broad concept with multiple potential purposes. According to Brown

students; as an instrument of surveillance, control and accountability and as an

& Stobart, 1997; Wiggens, 1993).

In the case of examinations at the end of secondary education, assessment (and

especially external assessment in the form of formal examinations) has serious

further study or employment. The various procedures used in external examinations

are frequently determined by the rationales that underpin different educational

related to specific emphases in art education policy and practice.

or judgement about individual student achievement. It should be distinguished from

which is taken to be the process of making judgemental statements about educational

competencies through various forms of external assessment. External assessment

refers to procedures developed and implemented by assessment experts from

national, regional or other agencies outside the candidates’ own schools.

0.1.1. Multiple viewpoints about art education

How do we accommodate emerging postmodern forms within traditional

...Should the balance in emphasis of socially, critically theoretical

...To what extent should formalist and modernist concepts be

modernist discourse art is understood as a highly personal, individual expression and

often tends to apply formal criteria to student outcomes as measures of quality. In

postmodern discourse art works are viewed as social constructions, interdisciplinary

Freedman, & Stuhr, 1996).

example the criterion of ‘originality’, is not so important in a postmodernist as in a

by students are valued as much as ‘original creations’. Finally the well-established

be valued so highly in postmodern conceptions of art and design assessment criteria.

need moderation and a certain degree of consensus between teachers (Boughton,

1999; Freedman, 2003). The tension between modernist and postmodernist

is important to consider their effects in assessment. Assessing the arts raises

of minority groups, or about issues of national identity, rapid changes in technology

and the advent of postmodern philosophy are reshaping fundamental assumptions

undertaken in countries that are members of the Organisation for Economic

Cooperation and Development (OECD) seem to be demanding more prescriptive

impact of postmodern thinking has created tensions between the expectations of

students’ knowledge, understanding and skills in art assessment is a problematic task

constraints of art examinations always influence the design and implementation of

examinations may be not be congruent with what is actually learned in schools. It

may be limited to a narrow domain of knowledge, raising questions about the

validity of the system.

0.1.2. Assessment considerations