University of Malawi Chancellor College June, 2009

EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR
LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO

THE 2005 MSCE MATHEMATICS PAPER 2.

M.Ed (Testing, Measurement and Evaluation) Thesis

By
CHIFUNDO STEVEN AZIZI
BSc (Ed) Mzuzu University

Submitted to the Department of Educational Foundations, Faculty of Education,
in partial fulfilment of the requirements for the degree of
Master of Education (Testing, Measurement and Evaluation)

University of Malawi
Chancellor College
June, 2009

DECLARATION

I the undersigned hereby declare that this thesis is my own original work which has not
been submitted to any other institution for similar purposes. Where other peoples work
has been used acknowledgements have been made.

____________________________________
Full Legal Name

_____________________________________
Signature

_____________________________________
Date

Certificate of Approval

The undersigned certify that this thesis represents the students own work and effort and
has been submitted with our approval.

Signature: ____________________________Date:__________________________

M. Kazima PhD (Senior Lecturer)
Main Supervisor

Signature: ____________________________Date:__________________________

L. Kazembe PhD (Senior Lecturer)
Member, Supervisory Committee

iv

To the memory of my late father, Charles Frank Azizi and late brother, Charles Mike
Azizi. May their souls rest in peace!

v

ACKNOWLEDGEMENTS

I would like to thank Dr. M. Kazima and Dr. L. Kazembe, my main supervisor and
co-supervisor respectively, for their many suggestions and constant support during this
research. Without them this work would never have come into existence.
I also wish to thank the headteachers of Blantyre, Henry Henderson Institute,
Bangwe, Chiradzulu, and Njamba secondary schools for allowing me to collect data from
their institutions. Again, my gratitude goes to the Executive Director of Malawi
Examinations Board (MANEB) for authorising me to use 2005 MSCE mathematics
examination paper 2. Big appreciations should also go to the students who participated in
this study; you really helped me a lot.
I am grateful to my mum, my fiance, brothers and sisters for their love and
financial support. Special mention goes to Ministry of Education for funding my tuition
fee. Finally, words alone can not express my gratitude to the Almighty God who made it
possible for me to complete this study and for the infinite blessings.

vi

ABSTRACT
MSCE mathematics paper 2, like many high-stakes test formats, includes a section
of optional questions in addition to mandatory part. It has been argued that offering
options and comparing final scores is often not fair to examinees especially to those that
attempt most difficult questions from the optional part. Livingston (1988) proposed a way
of adjusting essay score. This was later explained from the perspective of test equating by
Allen, Holland, and Thayer (1993) and they concluded that the proposal made implicit
assumptions of chained linear equating about the unobserved data. This study has tested
the assumptions with application to 2005 MSCE mathematics examination paper 2 so as
to determine if Livingston score adjustment could be used on this examination.
The study used systematic sampling to obtain examinees from five purposively
selected secondary schools. The 2005 MSCE mathematics paper 2 was administered to
247 examinees in two parts, section A followed by section B. For section B, examinees
were asked to first indicate their choice of three optional questions and were then
instructed to answer all of the questions.
The results were analysed using Root Mean Square Difference (RMSD) and Root
Expected Mean Square Difference (REMSD) to quantify the differences between the
subgroups linking functions of unobserved and observed data. It was found that group
invariance did not hold across the entire subgroups that were involved. This means that
Livingston score adjustment would not be possible on this examination. It is
vii
recommended that in order to minimize optional scores inequity, item writers need to
use analytical methods to strictly match different levels of cognitive demands of topics by
using MSCE mathematics performance level descriptors when constructing the optional
items.

viii

TABLE OF CONTENTS

Page
DEDICATION. iv
ACKNOWLEDGEMENTS.. v
ABSTRACT.. vi
LIST OF TABLES xiii
LIST OF FIGURES.. xiv
LIST OF ACRONYMS AND ABBREVIATIONS.. xv
CHAPTER
1 INTRODUCTION 1
1.1 Background... 1
1.1.1 Characteristic of the examination investigated 1
1.1.2 Grade Awarding Process. 2
1.1.3 Comparability of optional questions raw scores 2
1.1.4 Livingstons raw score adjustment.. 4
1.2 Statement of the Problem. 6
1.2.1 Purpose of the Study 7
1.2.2 Research Questions. 8
ix
1.2.3 Significance of the study 8
1.3 Theoretical Framework 9
1.4 Definition of terms 13
2 LITERATURE REVIEW. 15
2.1 Introduction. 15
2.2 General information on optional questions... 15
2.3 Advantages of optional questions. 17
2.4 Problems of optional questions. 18
2.4.1 The syllabus. 19
2.4.2 The abilities of candidates 19
2.5 Relationship between candidates question choice and getting
high scores.. 21
2.6 Linking and Equating 22
2.7 Can we link or equate optional questions?........................................ 25
2.8 What are the consequences of not linking/equating optional questions
scores?............................................................................................... 28
3 METHODOLOGY.. 30
3.1 Introduction.. 30
3.2 The Research Questions 30
3.3 The Design 31
3.3.1 Description of the Research 31
3.3.2 Population 31
3.3.3 Sampling.. 31
x
3.3.4 Instruments. 33
3.3.5 The administration of the instruments and data gathering. 34
3.4 Data Analysis ........ 34
3.4.1 Extent of difficulty in optional questions 34
3.4.2 Correlation of scores on section B and total scores of
the section A. 35
3.4.3 Establishing group invariance on linking/ equating functions
of examinees that chose a concerned optional question and
for those that selected other questions. 36
3.5 Ethical Considerations. 39
3.6 Validity and Reliability 40
3.7 Delimitations and Limitations of the study. 41
3.7.1 Delimitations.. 41
3.7.2 Limitations. 41
4 RESULTS AND DISCUSSION OF THE FINDINGS. 43
4.2 To what extent do optional questions differ?................................... 43
4.2.1 Preliminary analysis... 43
4.2.2 Comparing p-values of section B............................................. 46
4.3 How are scores on section A and section B with choice
correlated?........................................................................................ 47

xi
4.4 Establishing group invariance on linking/ equating functions
of examinees that chose a concerned optional question and
for those that selected other questions . 48
4.4.1 Linking functions that largely vary at lower tale of choice
question scale... 49
4.4.2 Linking functions that largely vary at upper tale of choice
question scale.. 51
4.4.3 Linking functions that largely vary at lower and second
upper tale of choice question scale. 54
4.4.4 Linking functions that largely vary at both lower and upper
tales of choice question scale.. 57
4.4.5 Linking functions that constantly vary across the entire
score scale. 58
5 CONCLUSIONS, IMPLICATIONS AND RECOMMENDATION.... 60
5.2 Conclusions... 60
5.2.1 The main findings of the literature review60
5.2.2 The main findings of the empirical investigation..61
5.3 Implications....63
5.4 Recommendation... 64
REFERENCES.. 66
APPENDICES... 74
A. Pairs of subgroups that chose particular questions and other questions. 75
xii
B. Pairs of subgroups that chose particular questions and other questions ... 77
C. Section A of 2005 M.S.C.E. Examination paper 2 presented in this study
as paper I.... 81
D. Section B of 2005 M.S.C.E. Examination paper 2 presented in this study
as paper II .. 85
E. Answer sheet cover page for paper I.. 89
F. Answer sheet cover page for paper II 90
G. Original form of 2005 M.S.C.E. Examination mathematics paper 2 91
H. Letter to Executive Director of Malawi National Examinations Board 97
I. Letter from Executive Director of Malawi National Examinations Board 98
J. Letter to secondary school headteacher 99
K. Letter to Shirehighlands Education Division Manageress 100
L. Letter to South West Education Division Manager... 101
M. My introduction letter from Head of Department to secondary schools
headteachers... 102

xiii
LIST OF TABLES
Table Page
4.1 Major content areas of section A.. 44
4.2 Major content areas of section B..45
4.3 P-values for questions in section A and section B without choice.... 46
4.4 Pairs of subgroups that chose particular questions and other questions and
their graphs are illustrated in appendix A.... 51
4.5 Pairs of subgroups that chose particular questions and other questions and
their graphs are illustrated in appendix B.... 53

xiv
LIST OF FIGURES

Figure Page
4.1 Equated scores on section A from optional question 7 that largely vary at
lower tale of choice question scale ...................................... 49
4.2 Equated scores on section A from optional question 8 that largely vary at
higher tale of choice question scale . 50
4.3 Equated scores on section A that largely vary at lower and second upper tale
of choice question scale from different optional questions . 54
4.4 Equated scores on section A that largely vary at both lower and upper tales
of score scale of optional question 10 .. 57
4.5 Equated scores on section A that vary constantly across the entire score scale
of optional question 7 ......58

xv
LIST OF ACRONYMS AND ABBREVIATIONS

AP Advanced Placement
CSE Certificate of Secondary Education
DTM Difference That Matters
HHI Henry Henderson Institute
IRT Item Response Theory
MANEB Malawi National Examinations Board
MSCE Malawi School Certificate of Education
NEAT Non-Equivalent groups Anchor Test
REMSD Root Expected Mean Square Difference
RMSD Root Mean Square Difference

1
CHAPTER 1

1.0 INTRODUCTION
This chapter provides a general overview of the problem under study. It
considers important concepts that dissect the problem into manageable components.
The first section is the background, followed by statement of the problem, theoretical
framework, and definition of terms is the last component.

1.1 Background
Malawi School Certificate of Education (MSCE) examination among other uses
is for certification, selection for tertiary education, and employment decisions. There
are several subjects examined at MSCE including mathematics. It is rated as one of
the most significant subjects for entry into most programmes in Malawian
universities. University of Malawi, in particular, prefers candidates with at least a
credit in mathematics among other subjects to enrol in almost every programme that
is offered.

1.1.1 Characteristic of the examination investigated
At MSCE examination, mathematics has two papers; paper 1 and paper 2. Paper
1 asks candidates to attempt all 24 questions in 2 hours and, by design, it is easier
than paper 2, although the two papers carry the same weight: each paper carries 100

2
marks. Paper 2 has two sections, A and B (see appendix G). Section A is
compulsory, where candidates attempt six questions worth 55 marks in total. In
section B, however, candidates are allowed choice of questions to answer. Out of six
questions, candidates are asked to answer three questions only, worth 45 marks in
total. Paper 2 runs for 2 hours.
1.1.2 Grade Awarding Process
Mathematics, like all other subjects at MSCE examination, is graded on a nine-
point scale (Malawi National Examinations Board, 1999).
1-2, denote pass with distinction;
3-6, denote pass with credit;
7-8, denote general pass; and
9, denotes fail.
The raw score of each candidate is converted into grades. This is done by
awards committee that uses grade boundaries (cutoff scores) to turn scores into
grades (Khembo, 2004). Because mathematics has two papers, each paper is graded
separately and then corresponding cutoff scores at 2/3, 6/7, and 8/9 are summed to
determine the final cutoff scores for the subject.

1.1.3 Comparability of optional questions raw scores
Livingston (1988) observed that question developers try their best to make
optional questions equally difficult. Angoff (1971); Newton (1977); and Wainer &
Thissen (1994), however, argue that it is not easy to produce tests that are similar in
difficulty. Though item setters strive to produce questions of equal difficulty, the

3
questions have their own inherent intricacy that cannot be equalized. The difficult
inherencies come from the complexity of the topics where the questions are
formulated. It could be nave to compare a raw score that an examinee gets from an
optional question which elicits, for example, the use of Venn diagrams to analyse
and interpret data to a question which asks an examinee to find the sum of
geometric progression using a formula. These two questions come from different
topics which differ in complexity; hence raw scores on these two questions will not
mean the same thing because the raw scores on the two questions do not indicate
the same level of knowledge and skill. The scores will not be comparable. To treat
them as if they are comparable would be misleading for the score users and unfair
to the examinees.
Having looked at the complexity of measuring examinees who answer different
questions, the question would be: should choice questions still be incorporated in our
examinations? The merits and demerits of optional questions are discussed in
literature review section. However, Kierkegaard (1986, p.24) argues if you allow
choice, you will regret it, if you dont allow choice, you will regret it; whether you
allow choice or not, you will regret both. This argument highlights that if choice
were not allowed, the limitations on the domain coverage forced by the small
number of questions might unfairly affect some candidates. And on the other hand,
choice would compromise test fairness when it comes to comparison of scores
because of different levels of knowledge and skills being elicited from examinees
from each optional question. Nevertheless, one would propose to increase the length
of the test; this is not often practical (Wainer and Thissen, 1994) taking in

4
consideration of exams time and examinees fatigue. The onus, therefore, remains
with the examiners.
In case of mathematics paper 2, there have not been any intense arguments over
optional questions behaviour, except Khembo (2004) sentiments against the policy
of allowing choice. With little or no study done on optional questions on
examinations administered by Malawi National Examinations Board (MANEB), the
policy of allowing choice questions in mathematics paper 2 would continue without
reforms and innovations to improve fair assessment because most of the stakeholders
would not know how the choice questions are performing on this paper.

1.1.4 Livingstons raw score adjustment
Psychometricians, nevertheless, have tried to find a post hoc solution to the
incomparability of optional questions scores. Livingston (1988) developed a method
for adjusting scores of optional questions to take away the differential in difficulty of
the questions. The procedures, in brief, are imputing a score for the examinee on
each optional question which the examinee does not answer, and then averaging the
scores, observed and imputed, over all optional questions. Allen, Holland and
Thayer (1993) observe that the methodology makes implicit assumptions when
imputing scores using chained linear equating. Under this procedure, raw scores on
optional question i are transformed to the scale of optional question j through
scores on mandatory section (also known as common portion) for the examinees that
answered questioni .

5
The assumptions are that the chained linear equating/linking functions do not
depend on which population is used for linking; meaning equating raw scores of
optional question i to a scale of scores of common portion X for each examinee
who answers question i is supposed to give the same equating function as equating
raw scores of question i to X on examinees who answer question j . Adopting
notations employed by Allen, Holland, and Thayer (1993), the two linear equating
functions are:
( )
1
1
1
1
) (
i i
i
X i i
y y X
i
X
i
+ =
and
( )
1
1
1
1
) (
ij i
ij
X
X i ij
y y X
j
j
+ =

where:

i X
i y
i
X y ) (y X
i
X
i
i i i
question selecting examinees for of mean
question optional on score raw
question
answers who examinee on of scale to ) ( score raw equating denotes
1
=
=

i question selecting examinees for question of mean
it selecting examinees for question of deviation standard
question choosing examinees for of deviation standard
1
1
1
i
i
i X
i
i
X
i
=
=
=

j i
j i
j X ) (y ) (y X
ij
ij
i i ij
question selecting examinees for question of deviation standard
question choosing examinees for question of mean
question answers who examinee on to score raw equating denotes
1
1
=
=
The basics of the assumptions are that the slopes and the intercepts of the two
functions should be the same. When these assumptions are substantially violated, the
adjustments are not meaningful.

6
1.2 Statement of the Problem
Mathematics is one of the papers at MSCE examinations that are not pre-tested
(Khembo, 2004). Pretesting allows item analysis, which in turn ensures that only
questions of proven quality are included in the final examination. When examiners
compile examination paper they assume that the selected questions have equal
inherent difficulty, as it is evidenced by the equal allocation of marks (each optional
question carries 15marks).
In the study done by Khembo (2004), where he was investigating the use of
performance level descriptors to ensure consistency and comparability in standard
setting divulged that item difficulty indices (item p values) for 2002 mathematics
paper 2 examination were varying much for questions in section B. For example,
question 10a and b had p-values of 0.03 and 0.01. Question 7a and b p-values were
0.52 and 0.15, question 12a and b difficulty indices were 0.27 and 0.14. Comparing
the p-values of the mentioned questions; one would note that the items were
differentially difficult. However, some would argue that the items were attempted by
non-equivalent groups conditioned to choice, and that it would not be possible to
compare their p-values outright. This argument is valid, but in the mentioned study,
the researcher employed competent mathematics teachers to establish differential
difficulty on the optional questions. The rating by the judges using performance
level descriptors for questions in section B for 2002 and 2003 mathematics papers
confirmed that some questions required higher order cognitive demands than others
for an examinee to succeed. The judges complemented what was observed from the
p-values.

7
With observations from the teachers and coupled with conspicuous differential
p-values for optional questions, it is clear that the introduction of optional questions
into this paper brings in unfairness in grading. The basis for comparability of raw
scores, thus, is considerably weakened since different examinees would answer
samples of questions that are not comparable in difficulty.
For this reason, there is a need of finding a method which would circumvent
incomparability of measurements. Livingston (1988) proposed a method of adjusting
raw scores of optional questions to achieve fairness in grading examinees that take
different questions. In the procedure, Allen et al. (1993) note that there are implicit
assumptions, which are used in order to adjust the scores. They call them
Livingston missing data assumptions.
The assumptions are based on a key theoretical requirement of test equating
which emphasises that the resulting equating functions should not depend on the
population on which it is calculated. In other words, the two equating functions
should be identical regardless of which subpopulation has attempted which question.
Therefore, before the method is adopted and adapted in our grading system,
especially in mathematics, there is a need to scrutinise it in detail.

1.2.1 Purpose of the Study
General objective
The general objective of the study is to test the assumption of chain linear
equating/linking for Livingston raw score adjustment method on optional questions
scores of MSCE mathematics paper 2.

8
Specific objectives
distinguish item difficulty level of optional questions using item difficulty
indices of raw scores.
compare correlations between total scores of compulsory section ( i.e.
Section A/common portion) and scores of optional questions portion.
establish whether equating/linking functions of examinees that chose a
concerned optional question and for those that selected a different choice
question are group invariance.

1.2.2 Research Questions
1. To what extent do optional questions differ in difficulty?
2. How are scores on optional questions portion and total scores on the
common portion correlated?
3. Are equating/linking functions of examinees that chose a concerned
optional question and for those that selected alternate question group
invariance?

1.2.3 Significance of the study
Fairness in measurement is of paramount significance. Every examinee ought to
be measured using the same instrument and the same scale for comparability to be
meaningful. As already mentioned, mathematics is one of the subjects that are
treasured at Malawi School Certificate of Education; and as a result a certificate

9
without a pass in mathematics puts a person at a disadvantage position when it comes
to selection for further studies or even job selection.
To forestall this measurement quandary, Livingston suggests a method for score
adjustment of optional questions to a common scale. It would be easy to adjust the
scores of MSCE mathematics paper 2 using this method. The consequences,
however, of that action are not known in our context; and therefore it is worth testing
the mentioned fundamental assumptions as Dorans (2004); Liu, Cahn and Dorans
(2006) say that subgroups invariance is the most critical and plays a significant role
in assessing fairness.
Furthermore, there has been no detailed research to the knowledge of the
researcher that has addressed the consequences of optional questions on the
examinations administered by Malawi National Examinations Board. This study
would evaluate the extent of relationship between knowledge and skills measured in
section A and those measured in section B. It would also explore the pattern of
choices in section B conditioned to topics in Malawi senior mathematics syllabus.

1.3 Theoretical Framework
The process of equating is used to obtain comparable scores when more than one
test forms are used in a test administration (Holland, von Davier, Sinharay, and Han,
2006). Angoff (1971) has defined the equating of tests as a process to convert the
system of units of one form to the system of units of the other so that the scores
obtained from one form could be compared directly with the scores obtained from
the other form.

10
The central reason for equating different test forms is to ensure fair decision
making regarding the test results (Liu and Dorans, 2008). There are three techniques
and methodologies for making different test forms comparable known as equating
procedures (Jaeger, 1981; Petersen, Kolen, and Hoover, 1989; Cook and Eignor,
1991), or designs; namely random groups, single group, and common item non-
equivalent groups (also known as non-equivalent groups anchor test, NEAT).
There are three equating methods used in common item non-equivalent groups
design such as Tucker, Levine, and chain linear (von Davier and Kong, 2005). This
study focuses on the chain linear because it uses common item(s) scores(s) as the
middle link in a chain of linear linking relationships. Basically, the chain linear
linking is done by equalising standardised deviation scores (z-scores) on the two test
forms via standardised deviation common item(s) scores. Before going into detail of
chain linear equating/linking, we first look at Livingston score adjustment procedure
in steps as presented by Allen, Holland, and Thayer (1993, pp17-18); because at the
end would like to connect it with the chain equating/linking functions. Here are some
more notions for easy grasp of what to follow:

A
j Y
P Y
*
j
j j
section on the scores with correlated
perfectly were question on score if imputed be d that woul score the
in not examinee an for j question on imputed score the
=
=

11

j XY j j
i XY i i
j
i
j
i
P , ,
P , ,
j Y
i Y
A X
B j P P
B i P P
X
A P
j
i
in t coefficien n correlatio deviation, standard mean, denote
in t coefficien n correlatio deviation, standard mean, denote
question optional on score
question optional on score
portion) (common section on score
section in question answer that of population sub
section in question answer that of population sub
test as
known also is which , section take who examinees of population entire the
=
=
=
=
=
=

Step 1: equating
i
Y to each of the
j
Y . For examinees in
i
P obtain the converted
value of the observed
i
y to the scales of the other
j
Y s. The converted values are
denoted ) (
*
i ij
y Y .
Step 2: obtaining imputed values, ( )
i imputed j
y y
,
, for i j for every examinee in
i
P .
These imputed scores are weighted averages of the raw score
i
y and its equated score
in the
j
Y scale, ) (
*
i ij
y Y :

) ( ) 1 ( ) (
*
,
1 1
i ij XY i XY i imputed i
y Y y y y
j j
+ =

Step 3: calculate the adjusted score as the simple average of the observed raw score
and the imputed scores over all k optional questions.
{ } k y y y Y
i j
i imputed j i adj
+ =

) (
,

Combining steps 2 and 3 to get a simple expression for
adj
Y , we first denote as the
average of all the correlations,
1 j
XY
:
[ ] k
j
XY
j
=
1
and
[ ] [ ]

=
j
XY
j
i ij XY i i
j j
y Y y Y
1 1
) ( ) (
*
,

12
where ) (
i i
y Y is the weighted average of the converted values, in other words, a
transformation of
i
y into an average scale of the k question scores determined by the
equations with weights proportional to the correlations,
1 j
XY
.
A simple Livingston adjusted score function is expressed as
) ( ) 1 (
i i i adj
y Y y Y + =

Coming back to chain linear equating/linking functions and connecting it with
Livingston score adjustment, it is discovered that:
In step 1, the linear equation for equating
i
Y to the scale of X in
i
P is
( ) 1) ( ) (
1
1
1
1
i i
i
X i i
y y X
i
X
i
+ =
and the linear equation for equating
i
X to the scale of
j j
P Y in is
(2) ) ( ) (
1
1
1
1
j
j
X
X
j
j j
x x Y
+ =
where
1 1
and
j j
X X
are the mean and standard deviation of X for examinees
choosing question j . The essence of the word chained in the chained linear
equating is the substitution of x in the ) (x Y
j
of equation (2) with ) (
i i
y X in equation
(1), neglecting the fact that the two equating functions are for different populations
(Brennan, 2006). That is

(3) ) (
) ( ) ( )) ( (
*
1
1
1 1
1
1
1
1 1
1
i ij
i i
i
j
X
X
X X
X
j
j i j j
y Y
y y X Y
j
i
j i
j
=
+ + =

13
Braun and Holland (1982) indicate that for chain equating/linking to produce
unbiased results, the two chained equating/linking functions should not depend on
which population is used for the equating. Dorans and Holland (2000); von Davier,
Holland, and Thayer (2004); Dorans (2004); Liu, Cahn, and Dorans (2006) call this
requirement population invariance. It means that equating
i
Y to
i
P X on ought to
give the same equating function as
j i
P X Y on to (Allen et al., 1993). In this case
i
Y
is missing data on
j
P , which in this study will be available. The resulting linear
equating function of
j i
P X Y on to is
(4) ) ( ) (
1
1
1
1
ij i
ij
X
X i ij
y y X
j
j
+ =
The two linear equating/linking functions (1) and (4) therefore must have the same
slope and intercepts in order to meet the above condition or requirement.

1.4 Definition of terms
Conventional secondary school: public school owned by Malawi government.
Cutoff score/cut score: a point on a score scale in which scores at or above that
that point are in a different category or classification than scores below the
point.
Difficulty: a factor causing trouble in achieving a positive result or tending to
produce a negative result.
Optional questions: examinees self-selected questions or choice of questions in a
test.

14
Performance descriptors: scale of achievement levels with a set of observable
behavioural descriptions
Test form: examination paper
National secondary: a school where its students are selected for admission from
different districts across Malawi.
District secondary: a school that admits students taken from the same district. It
offers boarding and lodging.
Day secondary: a school that offers no boarding and lodging. Its students come from
surrounding communities.
Grant-aided secondary: church affiliated school that receives financial assistance
from Malawi Government.

15
CHAPTER 2

2.0 LITERATURE REVIEW
2.1 Introduction
The literature review has seven sections. The first section gives general
information on optional questions. The second section discusses some advantages of
optional questions regarding to their use in test forms. The third section looks at
problems that come with the policy of allowing candidates to choose questions in an
examination. Relationship between candidate question choice and performing high is
discussed in the fourth section. Definition of linking and equating under this study is
given in fifth section. Sixth section discusses the possibility of linkage and
equitability of optional questions using traditional equating methods. The last section
discusses the consequences of not linking/equating when choice items are
differentially difficult.

2.2 General information on optional questions
The introduction of optional questions into examinations brings in a certain
complication of the process of measurement, since different groups of candidates
will attempt different questions yet from a single paper; thereby creating room for
combination of different test forms in candidates scripts (Willmott & Hall, 1975;
Bell, 1997). In the context of mathematics paper 2, choosing three questions out of

16
six creates twenty possible combinations of test forms. The complication comes in
because candidates answer in effect different papers out of these different
combinations, especially when questions vary much in difficulty. It then means the
same total mark may not represent comparable performance (Lewis, 1974).
Good test adequately samples out questions from the content domain to provide
a sound basis for determining the extent to which a student has mastered the course.
Mann (1845, pp.37-40) as cited by Wainer, Wang, and Thissen (1991, p.2) argued
that
it is clear that the larger the number of questions put to a scholar the
better is the opportunity to test his merits. If but a single question is put,
the best scholar in the school may miss it, though he would succeed in
answering the next twenty without a blunder; or the poorest scholar may
succeed in answering one question, though certain to fail in twenty
others. Each question is a partial test, and the greater the number of
questions, therefore, the nearer does the test approach to completeness. It
is very uncertain which face of a die will turn up at the first throw; but if
the dice are thrown all day, there will be a greater equality in the number
of faces turned up.
The argument of Mann is quite plausible in the context of MSCE mathematics
syllabus. To determine that one has indeed mastered MSCE mathematics, it does not
take a single question answered correctly, but enough questions that cover fairly the
content domain. Section A, which is a mandatory section of the mathematics paper 2
contains fairly small items whilst in section B there are large items. Wainer et al.

17
(1991) define large items as those that take examinee longer to complete than do
short items. Large items provide deep coverage of the content domain that can
guarantee the examiner if one answers them correctly that the examinee has
thoroughly mastered the course. In this case, large items need to be many but an
examinee cannot complete many large items within the allotted testing time. One
way of compromising testing time limits and domain coverage is by providing many
large items and allow examinees to choose them.

2.3 Advantages of optional questions
Optional questions have some advantages to candidates, teachers and examiners.
In this study, only three main advantages are discussed.
First, optional questions provide each candidate the chance to answer questions
on a wide range of topics (Bradlow and Thomas, 1998). It is so because the presence
of so many questions on a paper than time can allow means wider coverage of the
syllabus. This in return increases fairness among candidates (Allen, Holland, and
Thayer, 2005) because they are not restricted to answer samples of questions from
few topics.
Second, optional questions are used in the examinations that are interested in
measuring higher order cognitive domain (Allen et al., 2005). In these examinations,
authenticity of candidates work is perceived by the examiners to be more realistic
(Bradlow and Thomas, 1998). This advantage is more applicable to essay optional
questions where candidates are just given a topic to write about. In mathematics, it is
also applicable because optional questions demand high level of thinking. When an

18
examinee gets all marks on an optional question, it means s/he has demonstrated
high-level cognitive ability.
Third, examinations with question choice give teachers freedom to teach
particular portions of the syllabus in which they may be particularly interested
(Schools Council Examinations Bulletin, 1971; Willmott and Hall, 1975). Similarly
candidates do concentrate on particular aspects of the topics in which they are able to
show themselves to the best advantage. However, optional questions of mathematics
paper 2, no teacher can confidently know which topics will be examined, therefore;
in essence, there is no freedom of teaching particular topics and leaving out others.
Nevertheless, some teachers have problems in executing lessons involving
some mathematics topics. As a result, they either engage someone who is
comfortable with the particular topics or they fallibly present the topics. The latter
situation puts students in awkward position in terms of thorough examination
preparations. It eventually negatively influences their choices in the examination
since the mathematics domain has been reduced by the teachers incompetence.
Nonetheless, candidates are forced to prepare thoroughly by studying the whole
syllabus. One can be good at a particular topic, but still s/he is extrinsically
motivated to study hard on the other topics in order to do well because no one can
predict exact topics that will be examined.

2.4 Problems of optional questions
Although the merits of the above section cannot be denied, little attention has
been paid to the problems brought by optional questions when they are used in

19
examinations. It appears examiners over look some of very important aspects of a
test as a measuring instrument. Below are accounts of two main problems associated
with examinees choice of questions. The first discusses about the difference in
cognitive domain demands of topics in a syllabus; while the second challenge looks
at the variability in abilities of candidates.

2.4.1 The syllabus
In a syllabus, there are a number of different topics. It may be argued whether or
not syllabus topics are of the same basic level of difficulty (Willmott, 1972). One
good example of these arguments is the one presented by School Council
Examinations (1971) which say that in mathematics; is the quoting of geometry
theorem followed by an example on par with factorisation followed by the solution
of a pair of simultaneous equation? Certainly, the two topics or branch of
mathematics could not be at the same difficulty level in our syllabus. There are quite
a number of topics in senior secondary school mathematics syllabus which have
different levels of difficulty. The comparability of the results of candidates
attempting these questions drawn from different topics may be questioned.
Therefore, putting scores from different optional questions on the same scale is
necessary for fair comparisons.

2.4.2 The abilities of candidates
The level of questions may vary considerably within the same test form in terms
of level of proficiency required of the candidates to be able to answer the question

20
fully (Willmott, 1972). The provision of question choice results in the type of
responses required of the candidates over the whole paper not to be controlled in any
way. Some candidates may choose to answer questions with a certain pattern of
proficiency. For example, if a paper of ten questions consisted of five description
questions and five explanatory questions, and candidates were to answer five
questions in all, it is likely to see describers only and explainers only (School
Council Examinations, 1971). This would create measurement problem when one
tries to consider candidates with the same marks to be worthy of the same ability
level (Willmott, 1972). In the case of mathematics, candidates who are not good at
graphs, for example, will tend to avoid graph questions, and some whose proficiency
is low in matrices and vectors will choose other questions. However, the fact that
they have answered their preferred questions does not guarantee them to get full
marks on that particular question. The gist of the matter is if they like geometry most
than arithmetic and algebra they go for such branch of mathematics. The problem
that would come in is of comparison: is my geometry better than your algebra or
arithmetic? Wainer and Thissen (1994) are also concerned with such comparisons
because there is need to take into account the difficulty of the accomplishment for
comparison to be meaningful. It would not be fair to judge two examinees
mathematics proficiency based on different questions. Fair play is ought to be
achieved.

21
2.5 Relationship between candidates question choice and getting high scores
The suggestion that optional questions allow candidates to select the questions
on which they can perform better is contradicted by research evidence. According to
Wang (1996), the correlation between the popularity ranking of the five choice
questions and their corresponding means was 60 . 0 , and the correlation between
the ranking of the choice questions combinations and mean score was . 22 . 0 It is
very surprising to note the negative correlations because it is assumed that
examinees choose questions they feel that they would get right. Taylor and Nuttal
study (1974) as cited by Bell (1997) asked candidates taking a Certificate of
Secondary Education (CSE) examination to answer the questions they omitted on a
separate occasion after the actual examination. It was found that about % 25 of
candidates actually showed an improvement in the final marks. This meant that not
all candidates are able to choose in advance the questions on which they will score
most highly.
Power, Fowles, Farnum, and Gerritz (1992) found that the more the examinees
liked a particular topic, the lower they scored on an essay they subsequently wrote
on the chosen topic. This phenomenon is quite true when the choice between the
questions is relatively hard for examinees to make, that is, the choices are not
strongly determined (Allen, Holland, and Thayer, 1993). There is no knowledge on
whether MSCE mathematics paper 2 optional questions presents this kind of
scenario where most candidates find it hard to select questions that they would
attempt and score most highly or not. Malawi National Examinations Board item
developers do try to produce optional questions of equivalence in difficulty by

22
following available guidelines (Khembo, 2004). It is yet to be seen if examiners
effort to produce optional questions of equivalence in difficulty, on face value,
would produce hard choices on the part of examinees. The face value words are
used because no detailed research has been done to ascertain the notion of equal
difficulty of optional questions.

2.6 Linking and Equating
Linking encompasses a broad perspective on score adjustment of different test
forms. Feurer, Holland, Green, Bertenthal, and Hemphill (1999) in their uncommon
measures report presented three types of linking of scores of different tests that are
built based on
1. the same framework and same test specifications,
2. the same framework and different test specifications, or
3. different frameworks and different test specifications.
Kolen and Brennan (2004, p.427) ably defined the term framework as a delineation
of the scope and extent (e.g., specific content areas, skills, etc) of the domain to be
represented in the assessment They also defined test specifications or blue print as
specific mix of context areas and items formats, number of tasks/items, scoring
rules, etc. On the other hand, Mislevy (1992) and Linn (1993) proposed a type of
taxonomy for linking which mainly focuses on methodologies. They grouped the
taxonomy into four categories, based on the strength of the resulting linkage, starting
with equating, followed by calibration, projection, and lastly moderation.

23
When the first two types of linking presented by Feurer et al. (1999), Mislevy
(1992) and Linn (1993) are put into the same perspective, one would find that score
adjustment relationship of different test forms that are built on the same framework
and same test specifications is called equating (Kolen and Brennan, 2004). Tests that
are developed on the same framework and different specifications when linked the
resulting relationships is called calibration. The term projection comes in because
the methodology does not require the test forms to measure the same constructs or
domain, and score adjustment relationship is obtained through linear or non linear
regression. Moderation is a type of linking in which the test frameworks are different
but the constructs are similar (Kolen and Brennan, 2004). For this case, the
fundamental aspect relies on distribution matching.
Looking specifically at equating as one type of linking, Lord (1980) outlined
four requirements that must be met for equating of, say, test
i
Y to test
j
Y
1. the same construct: the two tests must measure the same construct,
2. equity: once two test forms have been equated, it should not matter to
the examinees which form of test is administered,
3. symmetry: the equating transformation should be systematic. This
means the equating of
i
Y to
j
Y should be the inverse of equating
j
Y to
i
Y ,
4. subpopulation invariance: the equating transformation should be
invariant across subpopulations.
As noted previously from the definitions on the types of linking in uncommon
measure report; same framework is viewed as construct similarity and same test

24
specifications is considered as similarity in measurement characteristics such as test
length, test format, administration conditions, etc (Kolen and Brennan, 2004). These
definitions are concordant with four requirements for equating as delineated by Lord
(1980). The study would use these definitions as benchmarks for deciding the type of
linking which would be involved. Therefore, the term linking would be used
(henceforth) to refer to any function used to connect the scores on one test to those
of another test, and would reserve the term equating to the special case of linking
that satisfies the benchmarks.
Livingston (2004); von Davier, Holland, and Thayer (2004); Holland, von
Davier, Sinharay, and Han (2006) describe chain linking as equating the scores on
the new form to scores on the anchor and then equating the scores on the anchor to
scores on the reference form. Putting the definition in our context, chain linear
linking describes equating the scores on a particular optional question (new form) to
total scores on common portion (anchor) and then linking the total scores on the
common portion to scores on the other optional questions (reference forms). The
chain formed by these two linking functions connects the score on the concerned
optional question to the scores on the other optional questions.
The study is particularly interested in the first part of the chain where a
particular optional question scores are linked to total scores on common portion.
There is an assumption that says the linear function from a particular optional
question scores on a common portion is the same in the two populations, those that
answer the concerned question and those that do not (
i
P and
j
P ) (von Davier &
Kong, 2005). Based on the assumptions level of attainment, we can substantiate the

25
consequences of chain linear linking on the optional questions as modelled by
Livingston (1988) and modified by Allen, Holland, Thayer (1993).

2.7 Can we link or equate optional questions?
Some tests have two sections; a mandatory section and a section of question
choice. Examinees choose which questions to attempt according to the instructions.
According to various research findings, for example, Nuttal and Willmott (1972);
Wainer, Wang, and Thissen (1991); Allen et al. (1993); Wang, Wainer, and Thissen
(1993); Wainer and Thissen (1994) do agree that optional questions differ in
difficulty, no matter how the item developers would try to equalise the difficulty
level. The question is: Is it possible to link or equate the scores for examinees taking
different optional questions?
In trying to find ways of putting optional questions scores on the same scale,
various research projects have been conducted. Bradlow and Thomas (1998)
indicated that item characteristic curves for optional questions should be the same
for examinees who choose a question as it would have been for examinees who did
not choose that particular question. This is an assumption necessary for optional
questions to be consistent with item response theory assumptions. Wang, Wainer,
and Thissen (1993) in their study where they were testing the equality of the
distribution of scores for those who chose an item to those who did not take that
particular item using multiple-choice optional sections; found that sometimes the
assumption was viable and sometimes it was not. However, the sample size was
small and contributed to statistically insignificance of the test. Allen, Holland, and

26
Thayer (2005) discovered that the question choice tends to be positively associated
with performance in the sense that the better an examinee does on a question the
likely s/he is to prefer that question and vice versa. This revelation, however, is
mudded with a reversal where examinees who prefer a certain question perform
better on the unprefered question. They concluded that there is a substantial amount
of variation around the performances in regard to preferred and unprefered choices
and, therefore, it is difficult to justify the non-ignorable selection. With the above
findings, it seems impossible for scores on optional questions to be treated
interchangeably through traditional equating because it is inconsistent with the
notion of standardised testing (Kolen and Brennan, 2004).
Though it is deemed impossible to equate optional questions scores,
nevertheless, comparability of scores is possible through score adjustment
procedures (Kolen and Brennan, 2004) by employing linking paradigms. Wainer,
Wang, and Thissen (1991) employed Item Response Theory (IRT) to explore
equating possibility of choice items by assuming ignorable non-response using data
from the College Boards Advanced Placement (AP) test in Chemistry. They treated
examinees as two subpopulations. Both were administered the common items, but
differing in the administration of the chosen questions to calibrate the item
parameters for the common items and selected questions. They succeeded but
without the confirmatory evidence that could only be sourced with further data.
Allen, Holland, and Thayer (1994a, b) provided a general procedure based on
missing-data methods for non-ignorable non-response to estimate distribution of
scores on an optional part of a 1987 Advanced Placement (AP) European History

27
test. Using sensitivity analysis approach, they observed that an assumption of
ignorable non-response given additional information from the common section score
could determine the correct assumption about the non-response when only the
optional essay score and the common section were available. Fitzpatrick and Yen
(1995) investigated the psychometric characteristics of constructed response items
referring to choice and non-choice passages administered to students in Grades 3, 5,
and 8. The items were scaled using IRT methodology. The findings indicated that
the scores obtained on different choice sets were comparable when these choices
were scaled together with the non-choice items that all students took. The non-
choice items play an important role in producing comparable scores. Bridgeman,
Morgan, and Wang (1997) assessed the ability of history students to choose the
essay topic on which they could get highest score. They concluded that techniques
for equating scores generated by different topics are not totally satisfactory therefore;
scoring rubrics must be established by single group of raters to enable single
standard.
As it can be noted, there is mixed bag of success and failure in making choice
items scores comparable. Most of the mentioned studies used IRT methodology in
data analyses which require strong assumptions on the test, such as
unidimensionality and local independence. Unidimensionality is statistical
dependence among items which comes about because the test is measuring one latent
trait) and local independence is achieved when items are statistically independent for
each subpopulation of examinees whose members are homogenous with respect to
the latent trait (Crocker and Algina, 1986; Hambleton, Swaminathan, and Rogers,

28
1991). The opponents of IRT always argue that it is nave to assume that a single
latent trait is accounted for the responses to items on a test. Thus, this study uses
classical item analyses statistics in testing a key assumption of Livingstons score
adjustment on MSCE mathematics paper 2 based on the requirement that the two
equating/linking functions should not depend on a particular population used for
equating.
At this juncture, it should be accentuated that examinations used in the
mentioned studies are quite different in terms of format with the one understudy. In
those studies, examination papers had more than two sections whilst ours has two
sections only. In regard of this, it would not be plausible to conclude equating is not
possible for every examination with optional questions until prove so beyond
reasonable doubt.

2.8 What are the consequences of not linking/equating optional questions scores?
Linking/equating have the potential to ameliorate problems presented by choice,
through making them equivalent in difficulty. If examinees who choose different
items are to be fairly compared with one another, the scores obtained on these items
must be equated (Wainer, Wang, and Thissen, 1991, p.2). This process facilitates
the linkage of scores on optional items to one another by putting them on
comparable scale using z-score model.
The optional questions are intended to test the same skills and types of
knowledge which are taken from the same syllabus. Though test developers try to
make the questions equally difficult, oftentimes more optional questions turn out to

29
be harder than others. Wang, Wainer, & Thissen (1993) observed that in 1989 AP
chemistry and 1989 AP American History, women were adversely affected because
most of them chose the more difficult items. This is one example among many of
unfairness that comes along with question choice. When some optional questions are
harder than others, the raw scores on those questions would not indicate the same
level of the knowledge or skill the questions are intended to measure thus the scores
would not be comparable.
As noted previously, it remains a fact that to develop choice items of equal
difficulty is a gargantuan challenge. Even so, to remove choice from the examination
would reduce domain coverage because of small number of items that would be
examined. This would affect some students. Increasing the length of time to
accommodate large number of items is often impractical. Since choice has been
decided as the desirable format for MSCE mathematics paper 2 examinations, there
are two main consequences of not putting the scores on the same scale. First, same
observed raw score on each optional item would not imply the same accomplishment
because the difficulties of the tasks are different. Second, observed total raw scores
from choice items combinations in section B would still present different patterns of
mathematical proficiencies. From one combination to another that might create
intricacy in comparison.

30
CHAPTER 3

3.0 METHODOLOGY
3.1 Introduction
This chapter describes how the research problem was investigated. The list of
questions to be answered is given first. This is followed by the design of the study,
analysis plan, ethical consideration, validity and reliability. In the final section, a
narrative of delimitations and limitations of the study is presented.

3.2 The Research Questions
The following questions were addressed in this study:
1. To what extent do optional questions differ in difficulty?
2. How are scores on optional questions and total scores on the common
portion correlated?
3. Are linking/equating functions of examinees that chose a concerned
optional question and for those that selected another choice question
similar?

31
3.3 The Design
3.3.1 Description of the Research
The research strategy which was employed is survey because the researcher
wanted the measures used to be reliable and valid, and that there was guarantee of
fair representation of all individuals to whom the researcher wanted the results to
apply (Cohen, Manion, & Marrison, 2000; Slavin, 1984). Further, quantitative
approach was the method used because it uses the positivism approach, which holds
the belief that the social environment is real and constant regardless of time and
setting (Creswell, 1994).

3.3.2 Population
The population of the study was all form 4 students from purposively sampled
secondary schools in southwest and shirehighlands education divisions.

3.3.3 Sampling
The study used purposive sampling where five secondary schools were chosen
to participate in the study. Two main reasons are given why purposive sampling
was preferred to others. First, the researcher wanted to ensure representation of four
major conventional secondary schools types. This is in agreement with Borg, Gall
and Gall (1996) who say that purposive sample provides a more focused data and
allows for a detailed analysis of a particular segment of population. Second, due to
limitations of research funds and time it was judicious to engage schools which
were close to each other.

32
Brief descriptions of the characteristics of the schools that were chosen are
listed below, with letters used to distinguish them from one another:
Secondary school A: it is a government day secondary school
where boys and girls learn together. It offers no boarding.
Secondary school B: it is a government national secondary school,
with full boarding, where boys and girls learn together.
Secondary school C: it is a government owned district secondary
school, and co-education.
Secondary school D: it is a church affiliated co-education school
and it offers partial boarding.
Secondary school E: it is a government maintained non-residential
school.
Upon the selection of the five schools, systematic sampling was employed to
choose sixty students from each school to participate in the study (all students were
drawn from Form 4 class only). Systematic samples of persons are usually
representativethe results from systematic samples tend to be slightly more
accurate, than results from simple random samples, but inconsequentially so.
(Glass & Hopkins, 1996, p.229). This notion comes in since systematic sampling
allows participants to be ordered according to certain attributes. This ensures fair
representation of all elements across population, hence allows less opportunity for
sampling error to happen (Glass & Hopkins, 1996).
With systematic sampling, each school provided numbered names of students
listed, from high to low, according to mathematical ability based on teachers

33
classroom assessment. Since the study wanted sixty participants from each school; a
sampling interval, k , was computed by dividing the class size of students in form 4
class at each school by 60. From the teachers list, a name of student corresponding
to
th
k number was picked, and every
th
k name thereafter was chosen until the
required number was achieved.

3.3.4 Instruments
The main instrument that was used is a 2005 Malawi School Certificate of
Education Examinations mathematics paper 2 (see appendix G). This paper was
purposively chosen because it was the latest paper at the time of writing the
research proposal.
The design was that the candidates had no choice in section B, thereby
increasing the test length by three more questions. In view of this, the paper was
divided into two parts; paper 1 representing section A (see appendix C) and paper 2
representing section B (see appendix D). This was done in agreement with the
observation of Hand (2004, p.120) that the more questions included in a test, the
more difficulty one might find in obtaining valid responses and candidates tire as
the number of questions increases, and might even refuse to take part if there are
too many.
Paper 1 consisted of six questions and time allotted to it was 1 hour 30 minutes.
Paper 2 took 2 hours and had six question choices. In this paper, examinees were
instructed to read all the optional questions and chose three questions; and that they

34
should write down the number of these questions in order of preferences. Then they
were instructed to answer all the six questions.
The other instrument was the questionnaire that was used as a cover page for
candidates answer sheets for paper 1 and paper 2 (see appendices E and F
respectively). The questionnaire was used to solicit extra information from the
candidates such as question choice preference, exclusively for paper 2, gender, and
age.

3.3.5 The administration of the instruments and data gathering
The two papers were administered three weeks prior to commencement of
National Examinations. This was done to ensure that students had prepared
thoroughly in terms of mastering the whole mathematics syllabus. This is the time
when the majority of the secondary schools finish delivering lessons to students and
instead they engage in revisions of various courses that are offered. The two test
papers were administered on the same day, starting with paper 1, and after 30
minute break, paper 2 was taken.
Students were instructed to answer the questionnaire first before attempting the
questions in both papers. The time given to fill the questionnaire was two minutes.

3.4 Data Analysis
3.4.1 Extent of difficulty in optional questions
The item difficulty indices (p-values) were used to analyse the extent of
difficulty in optional question. These p-values are obtained by computing the

35
average mark obtained on the question divided by the maximum mark for that
question (Nuttal & Willmott, 1972). The p-values for questions in section A and
section B without choice (i.e. no choices were allowed on the optional questions
portion) were all calculated in the same manner. The item difficulty indices for
questions in section B without choice were unbiased statistics because all
examinees (population) were used to compute them.

3.4.2 Correlation of scores on section B and total scores of the section A
Pearson product-moment correlation coefficient between the common portion
and question choice portion was calculated. The coefficient of determinant was
worked out to determine variance in section A that is associated with the variance in
section B. This question helped the researcher to see if the examinees would differ
in the same way on the common portion as they would do on the optional questions
portion. If the correlation coefficient were strong, then the researcher would know
that section A measured similar construct as section B. It signifies that the
mathematical knowledge and skills that were asked in section A were also available
in section B; making the two sections measure the same mathematical elements.
This is one requirement amenable to equating for two tests (Liu, Cahn, and Dorans,
2006).

36
3.4.3 Establishing group invariance on equating/linking functions of examinees that
chose a concerned optional question and for those that selected another
question

In normal examination, the raw score
i
Y for examinee selecting question j is
unobservable, in fact,
i
Y is missing datum. Equating
j i
P X Y on to , therefore, is
impossible. This equating function is denoted ) (
i ij
y X . For instance, an examinee
who chose optional questions, say, 7, 9, and 12 would have unobserved scores on
optional questions 8, 10, and 11. Thus equating the score of, say, question 8 to scale
of total score of section A on the group that selected question 7, or 9, or 12 is
impossible. We could denote this equating function as ) (
8 7 , 8
y X , or ) (
8 9 , 8
y X , or
) (
8 12 , 8
y X with respect to the chosen optional questions.
The missing scores, however, were available and were used to determine the
means
1 , j i
and standard deviations
1 , j i
. These moments were used together with
means
1 j
X
and standard deviations
1 j
X
of section A to establish slopes and
intercepts of functions ) (
i ij
y X . The computable missing linear equating
is ( )
1
1
1
1
) (
ij i
ij
X
X i ij
y y X
j
j
+ = . Other slopes and intercepts of the observable scores

equating functions ) (
i i
y X were computed using this equation
( )
1
1
1
1
) (
i i
i
X i i
y y X
i
X
i
+ = .
For each optional question, there were five sets of linear functions. For each
set, one function belonged to subgroup that chose a concerned question; the other
function was for a subgroup that never selected the concerned question but chose

37
another question; and the last function was for the combined group. The two
subgroups in each set were mutually exclusive.
Dorans and Holland (2000) introduced two statistics to summarise differences
between the equating functions obtained from subgroups and combined group. The
first one is standardised Root Mean Square Difference, RMSD, which gives
detailed information as to which Y-score points, y, that are most affected the
subgroup difference. The second one is the standardised Root Expected Mean
Square Difference, REMSD, which summarises overall differences between the
equating/linking functions. The formulae for the two statistics are

[ ]
) (
) ( ) (
) (

1
2
group combined
H
h
X X h
X
y eq y eq w
y RMSD
h
= (5)

[ ]
) (
) ( ) (

1
) max(
) min(
2
group combined
H
h
y
y
X X yh h
X
y eq y eq w
REMSD
h
= (6)

X
eq represents transformed scores on Y to the scale of X for the combined group,
h
X
eq represents transformed scores on Y to the scale of X for subgroup h.
h
N is the
sample size for subgroup h, N is the total number of examinees and
N
N
w
h
h
= is the
weight for the subgroup h. Furthermore,
yh
N is the number of examinees for
subgroup h with a particular score (y) on Y, and
h
yh
yh
N
N
= is a weighting factor
for subgroup h and score (y).

38
As it can be noted, RMSD is computed at each y-value and the contribution of
each subgroup is weighted by its proportional representation in the combined group.
REMSD is a doubly weighted statistics over
yh
and
h
w .
To evaluate the relative magnitude of RMSD and REMSD, Dorans and
Feigenbaum (1994) suggested the notion of score Difference That Matters (DTM)
in the context of linking the SAT to the old SAT. Test that is reported in 10-point
unit, linking functions that are within 5 scaled score points of each other at a given
raw score point are treated as close enough to ignore because they are less than half
of a reported score unit of 10 (Dorans, 2004). Kolen & Brennan (2004, p. 462) give
a good illustration on the logic of DTM when reported scores are integers,
equivalents of 15.4 and 15.6 round to different integers even though they differ by
only .2 (less than a DTM). Also equivalents of 14.6 and 15.4 round to the same
integer even though the different by .8 (more than a DTM). The score unit on
MSCE mathematics examination is 1-point, which is an integer. This means that
half of score unit was considered as a score Difference That Matters, .5.
Recall that RMSD and REMSD statistics are standardised by dividing by the
standard deviation of scores on compulsory section for combined group. DTM was
standardised in the same manner so that it could be used as a benchmark for
evaluating RMSD and REMSD. When REMSD was below the standardised DTM it
indicated that the equating functions for each subgroup were very close to that of
the combined group, hence they were group invariance. Otherwise, they failed
group invariance test. These functions and RMSD were plotted on graphs to
visually display their similarities and the differences.

39
3.5 Ethical Considerations
Creswell (2003) says codes of professional conduct for researchers are
applicable to all research methods: qualitative, quantitative, and mixed methods. In
this study, the researcher observed two ethical codes of conduct. First was obtaining
informed consent, and second was to do with privacy and confidentiality.
First, Gay and Airasian (2003) say that very rarely is it possible to conduct
research without the cooperation of people in the setting of the study. Cooperation
would come into play if the researcher obtains consent from participates. Before
carrying out the research, a written permission was sought from the Education
Division Managers and headteachers to conduct the research at their schools
(appendices J, K, & L), and furthermore, students of the participating schools were
asked if they were acceding to take part in the study. Only those that acceded were
systematically selected to be the candidates. Rossman and Rallis (2003) comment
on the significance of getting informed consent from participants by saying that the
permission from the subjects is crucial for the ethical conduct of the research
because it serves to protect the privacy of the participants.
Second, Fowler (1995); Vaughn, Schumm, and Sinagub (1996); Rossman and
Rallis (2003) mention that privacy and confidentiality during data collection is of
paramount importance. Participants responses should be kept confidential and they
should know the purpose of the study. Based on these assertions, the study assured
subjects of their privacy and confidentiality during the administration of the tests by
advising them not to disclose or write their names on the answer sheets. Letters and
numerical values were used to distinguish examinees from one another.

40
3.6 Validity and Reliability
Validity is defined as the accuracy or truthfulness of a measurement with
reference to a construct of specific interest; and reliability is concerned with
consistency of a measurement (Crocker & Algina, 1986; Bakewell, 2003). Hand
(2004, p.129) defines validity as how well the measured variable represents the
attribute being measured, or how well it captures the concept which is the target of
measurement. He further defines reliability as the differences between multiple
measurements of an attribute.
On validity, MANEB item setters developed the instrument that was used in
this study. These item setters are well-trained personnel with vast teaching
experience in mathematics. During the development of the tests, they use blue
prints, that is, tables of specifications to guide them in terms of content coverage
and the level of cognitive demands. The blue prints help to maintain consistency of
difficulty level of the tests over years. The papers, therefore, possess the required
magnitude of content validity based on how they are designed. Furthermore, the
examinees took the tests three week prior to the National examinations. This means
that the students at that time were well prepared. Hence their responses were taken
as their optimal performance or achievement in MSCE mathematics paper 2 as they
displayed their true mathematics knowledge and skills.
In assuring reliability, marking scheme was used for consistency in scoring,
and one item rater was used to avoid inter-rater variability. The marking scheme
used for scoring the test was developed by two experienced mathematics teachers
from Chiradzulu Secondary School. These teachers are also MANEB mathematics

41
raters. The scheme is similar with MANEB scheme in terms of mark allocation and
content specification. Furthermore, before rating the items, the researcher and the
two teachers standardised the marking scheme to encompass examinees diversity
answers. One question at a time was marked on each script before marking the
subsequent question to ensure consistency.

3.7 Delimitations and Limitations of the study
3.7.1 Delimitations
The study focused only on optional questions of mathematics paper 2; hence
the finding would not apply to other MANEB examinations that allow examinee
choice.
The results would not be generalised to all secondary schools in Malawi
because the participating schools were purposively sampled. However, the results
would be related to other schools with similar characteristics as the sampled ones.

3.7.2 Limitations
Visiting all the secondary schools that offer mathematics would have been an
ideal but this was impossible due to time and financial constraints. Instead the study
was done on five schools only.
Some students declined to participate in the study after previously affirming to
do so. In some instances, candidates took only one paper instead of two. This
behaviour provided scores for one paper only, as for the other one were not
available. With this regard, they were dropped from the study thereby reducing the

42
targeted sample size. This attrition was much observed in Njamba secondary
school. The total number of attrition was 53.
Finally, MANEB marking scheme was not issued to the researcher to be used.
They say it is a confidential document, hence cannot be given to anyone outside the
organisation. This created a minor setback because it was planned to use their mark
scheme. It resulted into extra finances and resources in bringing about two
experienced teachers from Chiradzulu secondary schools, who are also MANEB
item writers and scorers, together with the researcher to develop another marking
scheme. Nonetheless, our combined experience as item scorers made the marking
scheme similar to the ones developed by MANEB.

43
CHAPTER 4

4.0 RESULTS AND DISCUSSION OF THE FINDINGS
4.1 Introduction
In this chapter, results and discussions of the findings are presented under three
main sections. The sections were formulated based on the research questions. Thus
they display answers to the posed research questions in chronological order, starting
off with the first research question, and the second. Third research question is
addressed in the final section coupled with a chapter summary.

4.2 To what extent do optional questions differ?
4.2.1 Preliminary Analysis
The item content and major content areas that made up section A and section B
are outlined in Tables 4.1 and 4.2 respectively. Almost all content areas that were
examined in section A were also tested in section B, but with different item
contents. It signifies that the two sections were measuring the same construct.
Construct similarity is viewed as same framework (Feuer et al. 1999), thus both
sections were built on the same framework.
Furthermore, Feuer et al. (1999) define same test specifications as similarity
in measurement characteristics/conditions such as test length, test format,
administrations conditions, etc. Popham (1974) as cited by Crocker and Algina

44
(1986) defines item specification as sources of item content, descriptions of the
problem situations or stimuli, etc. In view of both definitions, the items in both
sections were built on different item specifications. This is evidenced by similar
item format but different sources of item content. Further, the differences rested in
the levels of cognitive operation demands. Most questions in section A demand less
cognitive operation than those in section B as indicated by p-values in Table 4.3.
Table 4.1: Major content areas of section A
Section A
Question No. Item content Content areas
a Algebra fractions Algebra, patterns, & functions 1
b Irrational numbers Numeration
a Subject of a formula Algebra, patterns, & functions 2
b Matrices Algebra, patterns, & functions
a Triangle geometry Geometry 3
b Remainder theorem Algebra, patterns, & functions
a Circle geometry Geometry 4
b Mapping Algebra, patterns, & functions
a Measurement Numeration 5
b Speed-time graph Numeration
a Similar figures Geometry 6
b Vectors Numeration

45
Section B
Question No. Item content Content areas
a Statistics Statistics & probability 7
b Formulation & solving
quadratic equation
Algebra, patterns, & functions
a Partial variation Algebra, patterns, & functions 8
b Probability Statistics & probability
a Exponential equation Algebra, patterns, & functions 9
b Linear programming Algebra, patterns, & functions
a Equation of a straight line Algebra, patterns, & functions 10
b Arithmetic progression Algebra, patterns, & functions
a Cyclic quadrilateral Geometry 11
b Sets Numeration
a Trigonometry Numeration 12
b Solving polynomial
equation graphically
Algebra, patterns, & functions
Table 4.2: Major content areas of section B
Having looked at same framework and test/item specifications of the two
sections of the test under investigation, it would be reasonable to use the term
linking rather than equating because the two sections had different item content
but same content areas; and the length of choice items in section B were not equal
to items in section A. Further, level of cognitive processes required in two sections
was different as illustrate in subsection 4.2.2. Thus, the two portions measured the
same construct, but different specifications. However, when equating choice items,

46
the interest is on item content as opposed to content areas of the test form because
item scores are the ones to be linked within the same test.

4.2.2 Comparing p-values of section B

Table 4.3: P-values for questions in section A and section B 'without choice'
Section A Section B

Item Max.
mark
Average
mark
p-value Item Max.
mark
Average
mark
p-value
1 8 5.190 0.649 7 15 6.436 0.429
2 7 4.401 0.629 8 15 5.061 0.337
3 9 5.518 0.613 9 15 5.869 0.391
4 10 5.801 0.580 10 15 5.116 0.341
5 11 6.324 0.575 11 15 3.927 0.262
6 10 1.917 0.192 12 15 7.566 0.504

Table 4.3 displays the item difficulty indices (p-values) for questions in section
A and section B without any choice. Questions in section A have generally higher
p-values than those in section B. This affirms the notions that section A questions
are easier than section B questions. The questions in the latter section were
relatively difficult because they usually provided deep coverage of the content
domain. Adopting the terms used by Wainer and Thissen (1994), most of the
questions in section B would be called large items. Section A questions would be
dubbed short items because most of them were considerably straight forward.

47
However, question 6 in section A had the lowest p-value amongst all questions in
the test. The predicament which candidates faced in attempting this question was
translating the word problem into correct computable mathematical concepts.
Levels of proficiency in language skills might have influenced the performances on
this question (Crocker and Algina, 1986).
Focusing on section B questions, it is noted that question 11 was the most
difficult, and question 12 was the fairest question. Ordering them from least
difficult to the most difficult question, one would get questions 12, 7, 9, 10, 8, and
11.
As noted, optional question 11 was the most difficult and if a student gets a raw
score of, say 7, on that problem, it has no consequence, with the current assessment
policy on MSCE mathematics paper 2 examinations, whether it is on problem 12
which is the easiest. In all fairness, it is clear that one who receives a score of 7
demonstrated more proficiency than another student who gets the same score on
problem 12. Wainer, Wang, and Thissen (1991) and Wainer and Thissen (1994) say
that when optional questions that are differentially difficult cannot be equated,
scores comparing persons on the questions have their validity compromised because
there is no score equity.

4.3 How are scores on section A and section B with choice correlated?
The correlation coefficient (r) between the common portion and question
choice portion was 0.7996, significant at the 0.01 level (2-tailed).

48
Hinkle, Wiersma, and Jurs (1998, p.120) presented guidelines for interpreting
the size of a correlation coefficient (r) which in part say that 70 . 0 to 90 . 0 means
high positive (negative) correlation. Adopting this guideline, thus, the above
correlation value indicates high positive correlation. The coefficient of
determination (
2
r ) is 0.64. This means that section A had accounted for at least 64
percent of the total variation in section B. Remember that these two portions were
measuring knowledge and skills of mathematics at senior secondary level. In face of
what the measures were purportedly measuring, a great deal of association between
these two measures would be attributed to mathematical construct.
However, there are no generally recognised guidelines for what constitutes
adequate evidence of construct validation through correctional studies (Crocker
and Algina, 1986, p 231) and the only way out, as suggested by Crocker and Algina
(1986) is to compare with the range of values that would have been computed by
MANEB (as they are the ones who develop the test). Unfortunately MANEB do not
calculate the congruency validity of the two portions. Nonetheless, it remains a fact
that section A was a good predicator of performance in section B.

4.4 Establishing group invariance on linking functions of examinees that chose a
concerned optional question and for those that selected another question

Optional questions are differentially difficult as noted from the p-values of
section B without choice. Thus adjustment for the differences in difficult would
ameliorate the problem of unfairness. In order to make appropriate adjustment,
there was need to ascertain the key assumptions of chained linear equating from the
perception of Livingston score adjustment, which say two chained equating

49
functions should not depend on which population is used for the equating. It means
that equating
i
Y to
i
P X on ought to give the same equating function as
j i
P X Y on to (Allen et al., 1993). In this case
i
Y are missing data on
j
P which were
available in this study. This section displays the results and discussion of
comparisons of linking functions of ) (
i i
y X and ) (
i ij
y X ; where i = 7, 8, 9, 10, 11, or
12 was the concerned question; and j = 7, 8, 9, 10, 11, or 12 was other question
choice. Recall that for each optional question, there were five sets of linear
functions. For each set, one function belonged to subgroup that chose a concerned
question; the other function was for a subgroup that never selected the concerned
question but chose another question; and the last function was for the combined
group. The two subgroups in each set were mutually exclusive.

4.4.1 Linking functions that largely vary at lower tale of choice question scale
Figure 4.1: Equated scores on section A from optional question 7 that largely vary at lower tale
of choice question scale
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 7
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A
Equating/Linking function for group that chose ques 7
Equating/Linking function for the combined group

0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 7
Standardised Difference That Matters Root Mean Square Difference

(a) REMSD=0.249

(b)

50
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 7
E
q
u
a
t
i
n
g
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.05
0.1
0.15
0.2
0.25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 7
Root Mean Square Difference Standardised Difference That Matters

(c) REMSD= 0.180
(d)

Figure 4.1 shows plots of linking functions and corresponding Root Mean Square
Difference (RMSD) and Root Expected Mean Square Difference (REMSD) values
for two different subgroups that chose concerned question 7, two other subgroups
that selected question 8 and 9, with their respective combined groups. The general
displays of RMSD graphs indicate that the differences of subgroups that chose the
concerned question 7 and subgroups that took questions 8 and 9 were huge at the
lower tales of the scale of the concerned optional question.
Further, the RMSDs and REMSD values fell above the standardised DTM line
virtually across the entire score range of the concerned optional question 7. This
trend is also observed on plots of five pairs of linking functions of subgroups that
chose particular questions and subgroups that selected other questions, together
with their corresponding combined groups that are displayed in Table 4.3 and the
graphs are shown in appendix A. Thus, the subgroups of the mentioned sets were
not group invariance.

51
Table 4.4: Pairs of subgroups that chose particular questions and other questions and their
graphs are illustrated in appendix A

Pair Subgroup that chose
concerned question
Subgroup that selected
other question
1 8 7
2 9 11
3 11 7
4 11 8
5 12 10

4.4.2 Linking functions that largely vary at upper tale of choice question scale
Figure 4.2: Equated scores on section A from optional question 8 that largely vary at higher
tale of choice question scale
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 8
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A
Equating/Linking function for the group that chose ques 8

0
0.05
0.1
0.15
0.2
0.25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 8

(a) REMSD= 0.111
(b)

52
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 8
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(c) REMSD= 0.234
(d)

0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 8
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 8

(e) REMSD= 0.174
(f)
Figure 4.2 show sets of linking functions and corresponding RMSDs and REMSD
values of three subgroups that took concerned question 8, and other three subgroups
that chose other questions such as 9, 11, and 12 and their corresponding combined
group. Generally, the RMSDs indicate that there were enormous diversities between
subgroups in a set at the higher score continuum as evidenced by RMSDs plots that
were further above standardised DTM lines. This trend is also noted on the graphs
of ten pairs of subgroups and their corresponding combined groups that are
presented in Table 4.5 and the graphs are shown in appendix B.

53
Table 4.5: Pairs of subgroups that chose particular questions and other questions and their
graphs are illustrated in appendix B

Pair Subgroup that chose
concerned question
Subgroup that selected
other question
1 9 7
2 9 8
3 9 10
4 10 7
5 10 9
6 10 11
7 12 7
8 12 8
9 12 9
10 12 11

Overall assessment shows that the linked scores of the concerned optional
questions to section A total scores of lower achievers in each pair of subgroups and
combined group were similar. In particular, the linking functions of subgroup that
chose concerned question 8, subgroup that selected question 9, and the combined
group (see Figure 4.2a and 4.2b) illustrate that the equated scores on section A for
lower achievers were the same in the two subgroups; meaning the groups were
invariance at the lower tale of score range. It also means the lower achievers found
the concerned question 8 equally difficult. The relative difficulty tread, however,
changed when moving from lower to higher achievers. There were interactions

54
among score level, difficulty on concerned question and group membership. This,
then, means the sets of linking functions failed the test of group invariance as
evidenced by REMSD values of all the sets that were above the standardised DTM
line.

4.4.3 Linking functions that largely vary at lower and second upper tale of choice
question scale

Figure 4.3: Equated scores on section A that largely vary at lower and second upper tale of
choice question scale from different optional questions

0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 7
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.05
0.1
0.15
0.2
0.25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 7

(a) REMSD= 0.203
(b)
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 7
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on ques 7

(c) REMSD= 0.271

(d)

55

0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 11
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Standardised DTM Root Mean Square Difference

(e) REMSD= 0.189
(f)

0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(g) REMSD= 0.346
(h)

0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(i) REMSD= 0.355
(j)
The sets of linking functions that incredibly varied both at lower and second
upper tale of choice item scales are displayed in Figure 4.3a, c, e, g, and i. These

56
linking functions sets are graphically represented in this order: subgroup that chose
concerned question 7, subgroup that selected question 10, and the combined group;
subgroup that chose concerned question 7, another that selected question 12, and
the combined group; subgroup that chose concerned question 11, another that
attempted question 9, and the combined group; subgroup that chose concerned
question 11, another that selected question 10, and the combined group; subgroup
that chose concerned question 11, another that selected question 12, and the
combined group. Similarly, graphs depicting RMSDs for the mentioned sets are
illustrated in Figure 4.3b, d, f, h, and j.
The RMSD plots in concord with the plots of sets of linking functions
demonstrate that second upper achievers of the two subgroups in each set
performed relatively equally on the concerned optional questions. On the other
hand, first upper and lower achievers of the two subgroups in each set performed
differently on the same concerned optional questions. On the whole, thus, the
subgroups in each set of linking functions were group dependent as shown by
REMSD values which are all above the standardised DTM lines. Recall that
REMSD value summarises the values of RMSD across score level of optional
question scale.

57
4.4.4 Linking functions that largely vary at both lower and upper tales of choice
question scale

Figure 4.4: Equated scores on section A that largely vary at both lower and upper tales of
score scale of optional question 10
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

(a)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

REMSD= 0.092

(b)

Figure 4.4a illustrates a set of linking functions, namely: subgroup that chose
concerned question 10, subgroup that selected question 8, and the combined group.
It shows that the linking functions of subgroups and the combined group came
closest at the middle of score continuum. The RMSD plot (Figure 4.4b) displays the
above mentioned pattern. It further shows that middle (average) achievers from the
two subgroups and the combined group performed relatively similar on the
concerned question 10 than the lower and upper counterparts. It is also observed
that the linking functions were group invariance on middle achievers only as
evidenced by RMSD plots that were below the DTM line at this sore level.
However, the REMSD value was above the DTM line, meaning the overall linking
functions of the subgroups was dependent on group membership.

58
4.4.5 Linking functions that constantly vary across the entire score scale
Figure 4.5: Equated scores on section A that vary constantly across the entire score scale of
optional question 7
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 7
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

(a)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 7
Standardised Differnce That Matters Root Mean Square Difference

REMSD= 0.149
(b)

Figure 4.5a shows the linking functions of question 7 scores to section A total
scores of subgroup that chose it, subgroup that selected question 11, and the
combined group. The linking functions are visibly parallel to each other, which
signify that lower, middle, and upper achievers from the groups performed
constantly across the score range (see Figure 4.5b). This means that at each score
level the optional question 7 was constantly difficult in the same manner to
subgroups.
The subgroups, however, failed the group invariance test because the RMSD
plot and REMSD value were above the standardised DTM line.
To summarise the contents of this chapter; section A and section B measured
the same latent trait as evidenced by high correction coefficient and the qualitative
analysis of sections content areas. It was noted that questions in section A were
relatively easier than those in section B, and that the choice items were among

59
themselves differentially difficult. Furthermore, linking functions of subgroup that
chose a concerned optional question and a different subgroup that selected another
question were not similar or group invariant. Generally, it was observed that the
linking functions of the subgroups and the combined group largely vary at different
positions on the score continuums of different optional questions. The overall
results indicated that subgroups found the same optional question hugely
differentially difficulty, and hence the linkages were not group independent.

60

CHAPTER 5

5.0 CONCLUSIONS, IMPLICATIONS AND RECOMMENDATION
5.1 Introduction
This chapter provides few statements that wind up the thesis. The statements
are divided into three sections; namely conclusions, implications and
recommendation.
5.2 Conclusions
5.2.1 The main findings of the literature review
Choice items have considerable effect on score equity. Willmott and Hall
(1975); Newton (1977); Livingston (1988); Wainer and Thissen (1994); Khembo
(2004) have observed that choice items are differentially difficulty because of
diversities in the levels of knowledge and skills the items measure. This means that
direct comparisons of raw scores emanating from optional items is meaningless.
In regard of this problem, Livingston (1988) attempted to adjust the raw scores
of Advanced Placement history optional items. Wang and Thissen (1991); Wang,
Wainer and Thissen (1993) employed item response theory to explore equating
possibility of choice items and they noted that extra information was needed to
succeed. Allen, Holland and Thayer (1994a, b) in their research observed that
common section scores were helpful in giving information of unobserved score of

61
choice item. In the face of such evidence, Fitzpatrick and Yen (1995) succeeded in
using common portion to put different choice items on the same scale. In spite of
their success, however, they never tested the vital assumptions of equating.
Further, all but one research mentioned used IRT methodology in analysing
data.
5.2.2 The main findings of the empirical investigation
Extent of difficulty in optional items
The optional items were differentially difficulty as evidenced by varying p-
values. The difficulty was fluctuating because of differences in the levels of
cognitive operation demands. This observation is in-line with the literature
(Willmott and Hall, 1975; Newton, 1977; Livingston, 1988; Wainer and Thissen,
1994; Khembo, 2004).
Correlation of scores of section B and total scores of the section A
The correlation coefficient was 0.8 which shows high association between the
two portions. The coefficient of determination was 0.64; indicating 64 percent of
variance in section B performance is related to variance in performance of section
A. This indicates that section A score was a good predictor of section B
performance. Therefore section A scores were helpful in giving reliable information
of performance in section B as observed by Wang et. al (1993); Allen et. al (1994a,
b); Fitzpatrick and Yen (1995).
Establishing group invariance on linking functions
With Livingston score adjustment, the raw scores on choice item
i
Y are
transformed to the scale of choice item
j
Y through scores on common portion X for

62
the examinees that answered choice item
i
Y . Its methodology makes implicit
assumptions of chained linear equating when imputing scores. The assumptions are
that the chained equating functions do not depend on which population is used for
equating. It means that equating raw score of choice item
i
Y to a scale of total
scores of common portion X for each examinee who answers choice item
i
Y is
supposed to give the same equating function as equating raw scores of choice item
i
Y to common portion X on examinees who answer choice item
j
Y . In other words,
the two equating functions from the two subgroups should not be strongly
influenced by subgroup membership on which they are computed.
The study used Root Mean Square Difference (RMSD) statistic to quantify the
differences between linking functions of two subgroups (i.e. students that chose a
particular optional question and those that never selected that particular optional
question but instead chose another optional question) and a combined group linking
function at a given choice item score level of 2005 MSCE mathematics examination
paper 2. It also employed Root Expected Mean Square Difference (REMSD) to
summarise overall differences between the said linking functions. It was established
that group invariance did not hold across the entire subgroups that were involved on
this examination. Using standardised Difference That Matters values, it was noted
that differences between the linking functions of the subgroups and the combined
group were big enough to be ignored. This means that linking functions of the
unobserved data that were available in this study and the usually observed data on
MSCE mathematics examination paper 2 were different. Thus the assumptions of
the chained linear linking for Livingston score adjustment proposal is not viable on

63
2005 MSCE mathematics paper 2. The results partially concur with the results of
Wang, Wainer, and Thissen (1993) where they were testing similar assumption but
from Item Response Theory (IRT) perspective. They found that the assumption was
sometimes viable and sometimes was not.

5.3 Implications
Lack of group invariance in the linking functions indicates that the differential
difficulty of the concerned optional question is not consistent across the two
subgroups. The invariance could hold if the relative difficulty changes as a function
of score level in the same way across subgroups. When there is an interaction
among score level of concerned optional question, difficulty and subgroup, then
invariance does not hold (Liu, Cahn, and Dorans, 2006). It, then, means the
Livingston score adjustment would not be an ideal methodology for making
optional questions scores equity on MSCE mathematics examination paper 2. This
adjustment methodology would cause biased score adjustment because one
subgroup that performed poorly on an optional item, say
i
Y , would have its equated
scores on section A substantially adjusted upwards. On the other hand, the equated
scores on section A of the other subgroup that achieved highly on the same optional
item would be considerably adjusted downwards.
Every year MANEB administers MSCE mathematics paper 2 examination
similar to the 2005 paper used in this study. Therefore, with the failed assumption
test, MSCE mathematics paper 2 optional selection would still create the same old
problems of measurement on candidates raw scores especially when some

64
problems are difficult than others. The raw scores on those questions would not
represent the same proficiency and yet similar scores from these optional questions
would still be treated the same; overlooking the fact that the scores signify different
proficiency. As some have pointed out (e.g. Wainer and Thissen, 1994), ignoring
that choice items are differentially difficult makes the test unfair and compromises
the test validity because the test would measure a latent trait which is to do with
choice proficiency apart from the desired subject matter.

5.4 Recommendation
The sole purpose for optional item raw score adjustment was to take away the
differences in difficulty not content dissimilarities. However, it has been shown that
it is impossible to link raw scores of optional questions of 2005 MSCE mathematics
paper 2 examination because of group dependence. The literature review has shown
that diversities in level of cognitive demands of these choice items contribute to
score inequity. Thus, to control large dissimilarities in difficulty one needs to
control proficiency levels of choice items to achieve score equity. This could be
done by training item writers and moderators to use analytical methods to strictly
match the diversity proficiencies required of topics when constructing the items by
using MSCE mathematics performance level descriptors, such as the ones
developed by competent mathematics teachers in the Khembos (2004) study.
MSCE mathematics performance level descriptors are observable indicators which
classify mathematics abilities into low, middle and high levels.

65
When the choice items are constructed, MANEB could trial them using this
studys methodology and observe whether choice matters.

66
REFERENCES

Allen, N.S., Holland, P.W., & Thayer, T. (1993). The optional essay problem and the
hypothesis of equal difficulty (Technical Report 93-34). Princeton, NJ:
Educational Testing Service.
Allen, N.S., Holland, P.W., & Thayer, D.T. (1994a). A missing data approach to
estimating distributions of scores for optional test sections (Research Report
94-17). Princeton, NJ: Educational Testing Service.
Allen, N.S., Holland, P.W., & Thayer, D.T. (1994b). Estimating scores for an
Optional section using information from a common section (Research Report
94-18). Princeton, NJ: Educational Testing Service.
Allen, N.S., Holland, P.W., & Thayer, T. (2005). Measuring the benefits of
examinee-selected questions. Journal of Educational Measurent, 42, 27-51
Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R.L.,Thorndike.(ed)
Educational Measurement (2
nd
edition, pp. 508-600). Washington, DC:
American Council on Education.
Bakewell, O. (2003). Sharpening the development process: A practical guide to
monitoring and evaluation. Antony Rose Ltd, U.K.
Bell, J.F. (1997). Question choice in english literature examinations. Oxford
Review of Education, 23(4), 447-458.
Borg, W.R., Gall, J.P., & Gall, M.D. (1996). Educational research An introduction.
Longman Publishers, U.S.A.

67
Bradlow, E.T. & Thomas, N. (1998). Item response theory models applied to data
allowing examinee choice. Journal of Educational and Behavioural
Statistics, 23(3), 236-243.
Braun, H.I., & Holland, P.W. (1982). Observed-score test equating: A mathematical
analysis of some ETS equating procedures. In P.W. Holland and D.B. Rubin
(Eds.). Test equating (pp.9-49). New York: Academic.
Brennan, R.L. (2006). Chained linear equating (Technical Notes 3). Iowa: Center for
Advanced Studies in measurement and Assessment. Retrieved on April 19 2007
from http://www.education.uiwa.edu/casma/documents/clinearreport3.pdf
Bridgeman, B., Morgan, R., & Wang, M. (1997). Choice among essay topics:
Impact on performance and validity. Journal of Educational Measurement,
34(3), 273-286.
Creswell, J.W. (1994). Research design: Qualitative, quantitative approaches.
Thousand Oaks, CA: Sage Publications.
Creswell, J.W. (2003). Research design: Qualitative, quantitative, and mixed methods
Approaches (2
nd
ed.). London: Sage Publications.
Cohen, L., Manion, L., & Marrison, K. (2000). Research methods in education
(5
th
ed.). London: Roatledge Falmer.
Cook, L.L. and Eignor, D.R. (1991). An NCMF instructional module on IRT equating
Methods. Educational Measurement: Issues and Practice, 10, 37-45.
Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory.
Belmont, CA: Wadsworth Group/Thomson Learning.

68
Dorans, N.J. (2004). Using subpopulation invariance to assess test score equity.
Journal of Educational Measurement, 41, 43-68.
Dorans, N.J., & Feigenbaum, M.D. (1994). Equating issues engendered by changes to
the SAT and PSAT/NMSQT. In I.M. Lawrence, N.J. Dorans, M.D.
Feigenbaum, N.J. Feryok, A.P. Schmitt, & N.K. Wright. Technical issues
related to the introduction of the new SAT and PSAT/NMSQT. (ETS RM-94-
10). Princeton, N.J: Educational Testing Service.
Dorans, N.J. and Holland, P.W. (2000). Population invariance and the equatability of
tests: Basics theory and the linear case. Journal of Educational Measurement,
37, 281-306.
Feurer, M.J., Holland, P.W., Green, B.F., Bertenthal, M.W., & Hemphill, F.C. (Eds.)
(1999). Uncommon measures: Equivalence and linkage among educational
tests. Washington, DC: National Research Council.
Fitzpatrick, A.R., & Yen, W.M. (1995). The psychometric characteristics of choice
items. Journal of Educational Measurement, 32(3), 243-259.
Fowler, F.J., Jr. (1995). Improving survey questions: Designs and evaluation.
London: Sage Publications.
Gabrielson, S., Gordon, B., & Engelhard, G., Jr. (1995). The effects of task choice on
the quality of writing obtained in a statewide assessment. Applied Measurement
in Education, 8(4), 273-290.
Gay, L.R., & Airasian, P. (2003). Educational research: Competencies for analysis
and application (7
th
ed.). New Jersey: Merrill Prentice Hall.

69
Glass, G.V., & Hopkins, K.D. (1996). Statistical methods in education and
psychology. 3
rd
edition, Boston: Allyn and Bacon.
Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item
response theory. Newbury Park, CA: Sage.
Hand, D.J. (2004). Measurement theory and practice The world through
quantification. London: Arnold.
Hill, B. (1976). Intra-examination reliability of mathematics education. In
DeGruijter., Dato., and Van der Kamp Leo .(eds). Advances in Psychological
and Educational Measurement. pp 215-. London: John Wiley & Son.
Hinkle, D.E., Wiersma, W., & Jurs, S.G. (1998). Applied statistics for the
behavioural sciences. 4
th
edition. Boston: Houghton Mifflin Company.
Holland, P.W., von Davier, A.A., Sinharay, S. and Han, N. (2006). Testing the
untestable assumptions of the chain and poststratification equating methods for
the NEAT design. (Research Report 06-17). Princeton, NJ: Educational Testing
Service.
Jaeger, R.M. (1981). Some exploratory indices for selection of a test equating
method. Journal of Educational Measurement, 18, 23-28.
Khembo, D.J. (2004). Using performance level descriptors to ensure consistency and
comparability in standard setting (Doctoral thesis). Ann Arbor: ProQuest
Information and Learning Company.
Kierkegaard, S. (1986). Either/ or. New York: Harper and Row.
Kolen, M.J., & Brennan, R.L. (2004). Test equating, scaling, and linking: Methods
and practices (2
nd
ed.). New York: Springer-Verlag.

70
Lewis, D.G. (1974). Assessment in education, London: University of London Press
Ltd.
Lindsay, M. (1973). Faculty of education, Monash University, Melbourne, Australia.
In Education Commomwealth. Public Examinations. London: Commonwealth
Secretariat, 8, pp 202-207.
Linn, R.L. (1993). Linking results of distinct assessments. Applied Measurement in
Education, 6, 83-102.
Livingston, S.A. (1988). Adjusting scores on examinations offering a choice of essay
questions (Research Report 88-64). Princeton, NJ: Educational Testing Service.
Livingston, S.A. (2004). Equating test scores (without IRT). Princeton, NJ:
Educational Testing Service.
Liu, J., Cahn, M.F. and Dorans, N.J. (2006). Application of score equity assessment:
Invariance of linkage of new SAT to old SAT across gender groups. Journal of
Educational Measurement, 43, 113-129.
Liu, J. and Dorans, N.J. (2008). Score equity assessment: Development of prototype
analysis. Paper presented at the annual meeting of the National Council on
Measurement in Education (NCME) held between March 23-28.
Lord, F.M. (1980). Application of item response theory to practical testing problems.
Hillsdale. NJ: Erlbaum.
Malawi National Examinations Board. (1999). Malawi School Certificate of
Education Examination: Award programme, unpublished.

71
Mislevy, R.J. (1992). Linking educational assessment: Concepts, issues, methods, and
prospects. Princeton, NJ: Educational Testing Service Policy Information
Center.
Newton, P. (1977). Examining standards over time. Research papers in Education:
Policy and Practice, 12(3), 227-248.
Nuttall, D., & Willmott, A. (1972). British examinations techniques of analysis.
London: National Foundation for Educational Research in England and Wales.
Petersen, N.S., Kolen, M.J., & Hoover, H.D. (1989). Scaling, norming, and equating.
In R.L. Linn (Ed), Educational measurement (3
rd
ed., pp. 221-262). New
York: Macmillan.
Power, D.E., Fowles, M.E., Farnum, M., & Gerritz, K. (1992). Giving choice of
Topics on a test of basic writing skills: Does it make any difference?
(Research Report 92-19). Princeton, NJ: ETS.
Rossman, G.B., & Rallis, S.F. (2003). Learning in the field: An introduction to
qualitative research (2
nd
ed.). London: Sage Publications.
Scannel, D., & Tracy, D. (1975). Testing and measurement in the classroom. Boston:
Houghton Mifflin Company.
Schools Council Examinations Bulletin 23. (1971). A common system of examining at
16+. London: Evans/ Methuen Education, appendix B, pp 39-43.
Slavin, R. (1984). Research methods in education: A practical guide.
New Jersey: Prentice-Hall, Inc.
Stalnaker, J.M. (1951). The essay type of examination. In E.F. Lindquist. (ed.)
Educational Measurement, Washington, DC: America Council on Education.

72
Vaughn, S., Schumm, J.S., & Sinagub, J.M. (1996). Focus group interviews in
education psychology. Thousand Oaks, CA: Sage Publications.
von Davier, A.A., Holland, P.W. and Thayer, D.T. (2004). The chain and post-
stratification methods for observed-score equating: The relationship to
population invariance. Journal of Educational Measurement, 41, 15-32.
von Davier, A.A. and Kong, N. (2005). A unified approach to linear equating for the
non-equivalent groups design. Journal of Educational and Behaviour statistics,
30, 313-342.
Wainer, H., Wang, X.B., & Thissen, D. (1991). How well can we equate test forms
that are constructed by examinees (Tech. Rep. 91-15). Princeton, NJ:ETS.
Wainer, H., & Thissen, D. (1994). On examinee choice in educational testing. Review
of Educational Research, 64, 159-195.
Wang, X.B., Wainer, H., & Thissen, D. (1993). On the viability of some untestable
assumptions in equating exams that allow examinee choice (Tech. Rep. 93-31).
Princeton, NJ: Educational Testing Service.
Wang, X.B. (1996). Understanding psychological processes that underlie examinees
choices of constructed response items on a test. Paper presented at the
annual meeting of the 1996 National Council on Measurement in Education,
New York.
Wang, X. B. (1999). On giving test takers a choice among constructive response
items. LSAC Research Report Series, Newtown: Law School Admission
Council.

73
Willmott, A.S. (1972). GCE item analysis-reliability through combinations. In D.L.
Nuttal and A.S. Willmott. British examinations techniques of analysis. London:
National Foundation for Education Research.
Willmott, A., & Hall, C. (1975). O level examined: The effect of question choice.
London: Macmillan Education

74

APPENDICES

75
APPENDIX A

PAIRS OF SUBGROUPS WITH CORRESPONDING COMBINED GROUPS THAT
CHOSE PARTICULAR QUESTIONS AND OTHER QUESTIONS
Equated scores on section A from different optional questions that largely vary at
lower tale of choice question scale
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 8
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on questio 8

(a) REMSD= 0.196
(b)

0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A
Equating/Linking functions for the group that chose ques 9
Equating/Linking functions for the combined group

0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 9

(c) REMSD= 0.328
(d)
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(e) REMSD= 0.336 (f)
76
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(g) REMSD= 0.338
(h)
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(i) REMSD= 0.265
(j)

77
APPENDIX B

PAIRS OF SUBGROUPS WITH CORRESPONDING COMBINED GROUPS THAT
CHOSE PARTICULAR QUESTIONS AND OTHER QUESTIONS
Equated scores on section A from different optional questions that largely vary at
higher tale of choice question scale
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 9
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 9

(a) REMSD= 0.384
(b)
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 9
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A
Equating/Linking functions for the combined group

0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 9

(c) REMSD= 0.243
(d)
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 9
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 9

(e) REMSD= 0.363 (f)
78
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A
Equating?Linking function for the group that chose ques 10
Equating?Linking function for the group that chose ques 7
Equating?Linking function for the combined group

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(g) REMSD= 0.300
(h)
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A
Equating/Linking function for the group that chose 10
Equating/Linking function for the group that chose 9

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(i) REMSD= 0.207
(j)
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(k) REMSD= 0.235
(l)

79
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Standardised DTM Root Means Square Difference

(m) REMSD= 0.308
(n)
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(o) REMSD= 0.300
(p)
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(q) REMSD= 0.190
(r)

80
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
E
q
u
a
t
e
d
/
L
i
n
k
e
d

s
c
o
r
e

o
n

s
e
c
t
i
o
n

A

0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
standardised DTM Root Mean Square Different

(s) REMSD= 0.213
(t)

81
APPENDIX C

SECTION A OF 2005 M.S.C.E. EXAMINATION PAPER 2 PRESENTED AS PAPER I

MATHEMATICS

PAPER I
(55 marks)

Time Allowed: 1 hr 30 mins

Instructions

1. This paper contains 4 pages. Please check.

2. Answer all the six questions.

3. All working must be clearly shown.

4. Calculators may be used.

82
1. a. Simplify to the lowest term,

1 2 3
2
2
x
x
x x
x
(3 marks)

b. Express
1 3
1 3
+
with rational denominator in its simplest form. (5 marks)

2. a. Make n the subject of the formula 2 log =
n
y . (3 marks)

b. Calculate the values of
|
|
\
|
=
|
|
\
|
|
|
\
|
b
a a
b a
2
16 4
6 2
2 2
6 4
18 3
if and . (4 marks)

3 a. Figure 1 shows a square DEFB of side 5 cm inside triangle ABC.

Figure 1

If angle = 35 ACB , calculate the length of AB. (5 marks)

b. Find the remainder when ). 1 ( by divided is ) 1 ( 3 2
2 3
+ + + t t t t (4marks)

5 cm
A
35
E
B
C
D
F
83
4. a. In Figure 2, XYZ is a circle with centre O. TXP is a tangent to the circle
at X, and the diameter ZY produced meets the tangent at T.

Figure 2

If angle = 72 ZXP , calculate the value of angle XTY. (5 marks)

b. The function
x
x
x f
1 2
) (

= is defined on the domain
)
`
1 , , 1
2
1
.
Draw the arrow diagram to represent this function. (5 marks)

5. a. A metallic sphere of volume 770
3
cm is melted and made into a solid
cylinder of length 5 cm. Calculate the radius of the cylinder. (Take
7
22
= ) (4 marks)

b. A cyclist starts from rest and accelerates uniformly for 4 minutes to reach
a speed of 300 metres/minute. She then maintains this speed for 6 minutes
after which she decelerates uniformly for 5 minutes to come to a complete
stop.

Using a scale of 2 cm to represent 50 metres/minutes in the vertical axis
and 2 cm to represent 2 minutes in the horizontal axis, draw a speed-time
graph for the cyclist on the graph paper provided. (7 marks)

P X T
Y
O
Z
O
72
84
6. a. Two triangles LMN and ABC are similar. In triangle LMN, cm 4 LM = ,
cm 5 MN = and cm 6 LN = . The longest side of triangle ABC is 5 cm
longer than its shortest side. Find the ratio of the areas of the two triangles.
(6 marks)

b. Let .
2
4
AC and
4
6
AB
|
|
\
|
=
|
|
\
|
= Calculate the mid-point of BC.
(4 marks)

END OF QUESTION PAPER

85
APPENDIX D

SECTION B OF 2005 M.S.C.E. EXAMINATION PAPER 2 PRESENTED AS PAPER
II
MATHEMATICS

PAPER II
(90 marks)

Time Allowed: 2 hrs

Instructions


2. Read all the questions carefully, and select three questions you would choose
if given a choice.

3. Write down the question numbers of the selected questions in order of your
liking, and then answer all the questions.

4. All working must be clearly shown.


86
1. a. Table 1 shows the results of a test which 30 pupils sat for.

Table 1

2 9

14 7 12 3 19 7 13 19
8 14 23 7 18 12 9 14 8 22
17 9 12 18 14 18 13 12 24 4

(i) Using the class intervals of the marks as 1-5, 6-10, 11-15, 16-20, 21-
25, construct a frequency table for the marks.

(ii) Using your frequency table, draw a frequency polygon. (8 marks)

b. A rectangular garden has a perimeter of 40 metres. If its area is 91
2
m ,
calculate the length of the garden. (7 marks)

2. a. The cost (C) for an international call from Malawi to Europe, partly varies
inversely as time, (t), and partly as the square of the time.

A one-minute call costs K120 and a two-minute call cost K200. Find the
cost of a five-minute call. (8 marks)

b. A family has 3 children born at different times. Assuming it is equally
likely to have a baby boy or a baby girl,

(i) draw a tree diagram to show the probabilities of having a boy or a girl
on each of the three births.

(ii) calculate the probability that the family has 2 boys and 1 girl in any
order of births. (7 marks)

3. a. Solve for x if . 0 8 ) 2 ( 9 ) 2 (
2
= +
x x
(5 marks)

87
b. A city assembly decides to construct a 1400
2
m car park for lorries and
minibuses. A minibus will fit on a 10
2
m space while a lorry requires
15
2
m of space.

The number of lorries has to be greater or equal to half the number of
minibuses. The number of lorries has to be less than twice the number of
minibuses.

(i) If x represents the number of minibuses and y represents the number of
lorries, write down one inequality involving x and y in addition to
. 2 and
2
x y
x
y <

(ii) Using a scale of 2 cm to represent 20 units on both axes, draw the
region R bounded by the three inequalities.

(iii)Use your graph to find the maximum number of vehicles (minibuses
and lorries) that can be parked on a car park of such a size. (10 marks)

4. a. A straight line passes through points (-1, -2) and (3, 4). Find the equation
of the straight line in the form . c mx y + = (5 marks)

b. The ratio of the 2
nd
term to the 7
th
term of an arithmetic progression is 1:3
and their sum is 20.Calculate the sum of the first 10 terms of the
progression. (10 marks)

5. a. Figure 1 shows two unequal circles PRS and QRS intersecting at R and
S. TP and TQ are tangents to the circles at P and Q respectively. PR and
RQ are diameters of the circles.

Figure 3
P
T
S
Q
R
88
Prove that:
(i) PRQT is cyclic quadrilateral;

(ii) Angle QRT. angle PRS = (8 marks)

b. Given that A, B and C are sets,

(i) draw a venn diagram and shade the region representing C. B' A'

(ii) find 12. C) B n(A and 8 B) n(A if C), B' n(A' = = (7 marks)

6. a. Given that , cos find ,
3
1
sin = leaving your answer in surd form.
(5 marks)

b. Table 2 shows some of the values for equation . 6 5 2
2 3
+ = x x x y

(i) Copy and complete the table.

Table 2: 6 5 2
2 3
+ = x x x y

(ii) Using a scale of 2 cm to represent 1 unit on the x-axis and 2 cm to
represent 5 units on the y-axis, draw the graph of 6 5 2
2 3
+ = x x x y .

(iii)Use your graph to solve the equation . 0 4 5 2
2 3
= + x x x
(10 marks)


x -3 -2 -1 0 1 2 3 4
y -24 8 0 -4 0 18
89
APPENDIX E

Answer sheet cover page for paper I

Please fill in the information below before starting answering the questions.

1. Identification Number:_________________________________________

2. Age: _______________________________________________________

3. Gender: Male
Female

4. School: ____________________________________________________

THANK YOU

Please tick where it is appropriate

90
APPENDIX F

Answer sheet cover page for paper II

Please fill in the necessary information below before starting answering the
questions.

1. Identification Number:______________________________________

2. Age: _____________________________________________________

3. Gender: Male
Female

4. School: ____________________________________________________

5. Read all the six questions carefully. If given a choice which three questions
would you select?

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

6. Write down the question numbers of the selected questions in order of your
liking

1
st
choice______________ 2
nd
choice_____________ 3
rd
choice____________

THANK YOU

Please tick where it is appropriate

Please tick three questions only
91
APPENDIX G

ORIGINAL FORM OF 2005 M.S.C.E. EXAMINATION MATHEMATICS PAPER 2

THE MALAWI NATIONAL EXAMINATIONS BOARD

2005 MALAWI SCHOOL CERTIFICATE OF EDUCATION EXAMINATION

MATHEMATICS

Subject Number: M131/II
Monday, 24 October Time Allowed: 2 hr 30 mins
8:30-11:00 am

PAPER II
(100 marks)

Instructions


2. Answer all the six questions in Section A and any three questions from Section B.

3. The maximum number of marks for each answer is indicated against each question.

4. Mathematical tables, graph paper and answer books are provided.


6. Used graph paper and/or supplementary sheets must be tied together inside the
answer book with a string.

7. All working must be clearly shown; it should be done on the same sheet as the
rest of the answers.

8. Write your Examination Number on top of each page of your Answer Book.

92
Section A (55 marks)

Answer all the six questions in this section.

1. a. Simplify to the lowest term,

1 2 3
2
2
x
x
x x
x
(3 marks)

c. Express
1 3
1 3
+
with rational denominator in its simplest form. (5 marks)

2. a. Make n the subject of the formula 2 log =
n
y . (3 marks)

b. Calculate the values of
|
|
\
|
=
|
|
\
|
|
|
\
|
b
a a
b a
2
16 4
6 2
2 2
6 4
18 3
if and . (4 marks)

3 a. Figure 1 shows a square DEFB of side 5 cm inside triangle ABC.

Figure 4

If angle = 35 ACB , calculate the length of AB. (5 marks)

b. Find the remainder when ). 1 ( by divided is ) 1 ( 3 2
2 3
+ + + t t t t (4marks)

5 cm
A
35
E
B
C
D
F
93
4. a. In Figure 2, XYZ is a circle with centre O. TXP is a tangent to the circle
at X, and the diameter ZY produced meets the tangent at T.

Figure 5

If angle = 72 ZXP , calculate the value of angle XTY. (5 marks)

b. The function
x
x
x f
1 2
) (

= is defined on the domain
)
`
1 , , 1
2
1
.
Draw the arrow diagram to represent this function. (5 marks)

5. a. A metallic sphere of volume 770
3
cm is melted and made into a solid
cylinder of length 5 cm. Calculate the radius of the cylinder. (Take
7
22
= ) (4 marks)
b. A cyclist starts from rest and accelerates uniformly for 4 minutes to reach
a speed of 300 metres/minute. She then maintains this speed for 6 minutes
after which she decelerates uniformly for 5 minutes to come to a complete
stop.

Using a scale of 2 cm to represent 50 metres/minutes in the vertical axis
and 2 cm to represent 2 minutes in the horizontal axis, draw a speed-time
graph for the cyclist on the graph paper provided. (7 marks)

6. a. Two triangles LMN and ABC are similar. In triangle LMN, cm 4 LM = ,
cm 5 MN = and cm 6 LN = . The longest side of triangle ABC is 5 cm
longer than its shortest side. Find the ratio of the areas of the two triangles.
(6 marks)

P X T
Y
O
Z
O
72
94
b. Let .
2
4
AC and
4
6
AB
|
|
\
|
=
|
|
\
|
= Calculate the mid-point of BC.
(4 marks)

Section B (45 marks)

Answer any three questions from this section.

7. a. Table 1 shows the results of a test which 30 pupils sat for.

Table 2

2 9

14 7 12 3 19 7 13 19
8 14 23 7 18 12 9 14 8 22
17 9 12 18 14 18 13 12 24 4

(i) Using the class intervals of the marks as 1-5, 6-10, 11-15, 16-20, 21-
25, construct a frequency table for the marks.
(ii) Using your frequency table, draw a frequency polygon. (8 marks)

b. A rectangular garden has a perimeter of 40 metres. If its area is 91
2
m ,
calculate the length of the garden. (7 marks)

8. a. The cost (C) for an international call from Malawi to Europe, partly varies
inversely as time, (t), and partly as the square of the time.

A one-minute call costs K120 and a two-minute call cost K200. Find the
cost of a five-minute call. (8 marks)

b. A family has 3 children born at different times. Assuming it is equally
likely to have a baby boy or a baby girl,

(i) draw a tree diagram to show the probabilities of having a boy or a girl
on each of the three births.

(ii) calculate the probability that the family has 2 boys and 1 girl in any
order of births. (7 marks)

9. a. Solve for x if . 0 8 ) 2 ( 9 ) 2 (
2
= +
x x
(5 marks)

95
b. A city assembly decides to construct a 1400
2
m car park for lorries and
minibuses. A minibus will fit on a 10
2
m space while a lorry requires
15
2
m of space.

The number of lorries has to be greater or equal to half the number of
minibuses. The number of lorries has to be less than twice the number of
minibuses.

(i) If x represents the number of minibuses and y represents the number of
lorries, write down one inequality involving x and y in addition to
. 2 and
2
x y
x
y <
(ii) Using a scale of 2 cm to represent 20 units on both axes, draw the
region R bounded by the three inequalities.

(iii)Use your graph to find the maximum number of vehicles (minibuses
and lorries) that can be parked on a car park of such a size. (10 marks)

10. a. A straight line passes through points (-1, -2) and (3, 4). Find the equation
of the straight line in the form . c mx y + = (5 marks)

b. The ratio of the 2
nd
term to the 7
th
term of an arithmetic progression is 1:3
and their sum is 20.Calculate the sum of the first 10 terms of the
progression. (10 marks)

11. a. Figure 1 shows two unequal circles PRS and QRS intersecting at R and
S. TP and TQ are tangents to the circles at P and Q respectively. PR and
RQ are diameters of the circles.

Figure 6
Prove that:
(iii)PRQT is cyclic quadrilateral;
P
T
S
Q
R
96
(iv) Angle QRT. angle PRS = (8 marks)

b. Given that A, B and C are sets,

(i) draw a venn diagram and shade the region representing C. B' A'

(ii) find 12. C) B n(A and 8 B) n(A if C), B' n(A' = = (7 marks)
12. a. Given that , cos find ,
3
1
sin = leaving your answer in surd form.
(5 marks)
b. Table 2 shows some of the values for equation . 6 5 2
2 3
+ = x x x y

(i) Copy and complete the table.

Table 2: 6 5 2
2 3
+ = x x x y

(ii) Using a scale of 2 cm to represent 1 unit on the x-axis and 2 cm to
represent 5 units on the y-axis, draw the graph of 6 5 2
2 3
+ = x x x y .
(iii)Use your graph to solve the equation . 0 4 5 2
2 3
= + x x x
(10 marks)


x -3 -2 -1 0 1 2 3 4
y -24 8 0 -4 0 18
97
APPENDIX H

LETTER TO EXECUTIVE DIRECTOR OF MALAWI NATIONAL EXAMINATIONS
BOARD

Chancellor College,
Dept. of Educational Foundations,
P.O. Box 280,
Zomba.

31
st
May 2007.
The Executive Director,
Malawi National Examinations Board,
P.O. Box 191,
Zomba.

Dear Sir,

REQUEST FOR DATA AND INSTRUMENTS TO BE USED IN MY THESIS

I am Chifundo Steven Azizi, a student pursuing a Master of Education (Testing,
Measurement and Evaluation) degree program at Chancellor College.

Sir, the purpose of writing you this letter is to ask for examination data from your office
which I need to use for my thesis, which is : Examining the untestable assumptions of the
chained linear equating for the Livingston score adjustment with application to the 2005
MSCE mathematics paper 2.

Meanwhile we are in the process of writing our theses and I need the following data:
1. Item difficulty indices of mathematics paper 2 examination for years 2005 and
2006,
2. Question papers and marking schemes for the year 2005,
3. The Malawi School Certificate of Education examination award programme,

I would like also to seek your permission, Sir, to re-administer the 2005 MSCE
mathematics paper 2 to current Form 4 students of purposively selected secondary
schools.

I pledge that the information I would receive from you would be handled with utmost
care and confidentiality and that I would use it solely for the purpose of my thesis.

I will be grateful for your assistance in my request. Thanking you in advance.

Yours faithfully,

Chifundo Steven Azizi
98
APPENDIX I

THE MALAWI NATIONAL EXAMINATIONS BOARD
P. O.BOX 191. ZOMBA, MALAWI, TEL.(265)01525 2T7.FAX: (265)01525351
e-mail: mwmatemba@sdnp.org.mw

EXECUTIVE DIRECTOR All communications should be addressed to:
Mr M. W. Matemba The Executive Director

In reply please quote:

CONFIDENTIAL Our Ref.: C/ 1/6/2
Your Ref.:

.

28
th
June 2007

Mr Chifundo Steven Azizi
Chancellor College
Department
ZOMBA

Dear Sir,

RE: REQUEST FOR DATA AND INSTRUMENTS TO BE USED IN RESEARCH

I acknowledge receipt of your letter dated 31
st
May 2007.

What could be given to you is question papers for Mathematics Paper II in the years 2002
to 2005. The syllabuses can only be lent to you.

The other documents you requested for will not be available.

Yours faithfully,

J.S. Chalimba
FOR: EXECUTIVE DIRECTOR

99
APPENDIX J

LETTER TO SECONDARY SCHOOL HEADTEACHER

Chancellor College,
Department of Educational
Foundations,
P.O. BOX 280,
Zomba.

12
th
July, 2007.

The Headteacher,
______________________________________
______________________________________
______________________________________
______________________________________
______________________________________

Dear Sir/Madam,

REQUEST FOR PERMISSION


Sir/madam, the purpose of writing you this letter is to ask for permission from your office
to administer a mathematical test in your secondary school. The test is the main
instrument for collecting data for my study, which is: Examining the untestable
assumptions of the chained linear equating for the Livingstone score adjustment with
application to the 2005 MSCE mathematics paper 2.

I pledge that the information I will collect from the school will be handled with utmost
care and confidentially and that I will use it solely for the purpose of my research.

I will be grateful for your assistance in my request.

Thank you in advance.

Yours faithfully,

Chifundo Steven Azizi.

100
APPENDIX K

LETTER TO SHIREHIGHLANDS EDUCATION DIVISION MANAGERESS

Chancellor College,
Foundations,
P.O. Box 280,
Zomba.

12
th
July, 2007.

The Education Division Manageress,
Shirehighlands Education Division,
Private Bag 7,
Mulanje.

Dear Madam,



Madam, the purpose of writing you this letter is to ask for permission from your office to
administer a mathematical test in some secondary schools in your division. The test is the
main instrument for collecting data for my study, which is: Examining the untestable
assumptions of the chained linear equating for the Livingstone score adjustment with
application to the 2005 MSCE mathematics paper 2.

I pledge that the information I will collect from the schools will be handled with utmost



Yours faithfully,


101
APPENDIX L

LETTER TO SOUTH WEST EDUCATION DIVISION MANAGER

Chancellor College,
Foundations,
P.O. Box 280,
Zomba.

12
th
July, 2007.

The Education Division Manager,
South West Education Division,
P.O. Box
Blantyre.

Dear Sir/Madam,



Sir/Madam, the purpose of writing you this letter is to ask for permission from your
office to administer a mathematical test in some secondary schools in your division. The
test is the main instrument for collecting data for my study, which is: Examining the
untestable assumptions of the chained linear equating for the Livingstone score
adjustment with application to the 2005 MSCE mathematics paper 2.

I pledge that the information I will collect from the schools will be handled with utmost



Yours faithfully,


102
APPENDIX M

UNIVERSITY OF MALAWI

CHANCELLOR COLLEGE
Department of Educational Foundations
PRINClPAL
Emmanuel Fabiano, B.Ed., MSc., Ph.D. P. O. Box 280, Zomba , MALAWI
Tel: (265) 01 522 222
Telex: 44742 CHANCOL MI
Our Ref.: Edf/6/19 Fax: (265) 01 522 046
Your Ref.: Email: edf@chanco.unima.mw.

11
th
July, 2007

Dear Sir/Madam

INTRODUCING MR. CHIFUNDO AZIZ

I have the pleasure to introduce to you Mr. Chifundo Aziz, our postgraduate student who
is studying for a Master of Education degree in Educational Measurement. To fulfill
some of the requirements for the program, he is required to carry out a field-based
research study to collect data for his thesis. Mr. Aziz is due to start his research study this
month and accordingly, any support rendered to him will be greatly appreciated.

I thank you in anticipation for your cooperation and support.

'\
.
Dr. Bob Wajizigha Chulu
Head, Educational Foundations Department

University of Malawi Chancellor College June, 2009

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

University of Malawi Chancellor College June, 2009

Загружено:

Авторское право:

Доступные форматы

EXAMINING THE UNTESTABLE ASSUMPTIONS OF THE CHAINED LINEAR

LINKING FOR LIVINGSTON SCORE ADJUSTMENT WITH APPLICATION TO

+ = . Other slopes and intercepts of the observable scores

Вам также может понравиться