Fundamental Units and Psychology

This article was downloaded by: [McMaster University]
On: 03 November 2014, At: 08:20

Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Measurement: Interdisciplinary Research

and Perspectives
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/hmes20
The Role of the Unit in Physics and

Psychometrics
a
Stephen M. Humphry
a
Graduate School of Education , University of Western Australia
Published online: 30 Mar 2011.
To cite this article: Stephen M. Humphry (2011) The Role of the Unit in Physics and Psychometrics,
Measurement: Interdisciplinary Research and Perspectives, 9:1, 1-24
To link to this article: http://dx.doi.org/10.1080/15366367.2011.558442
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Measurement, 9: 1–24, 2011
Copyright © Taylor & Francis Group, LLC
ISSN: 1536-6367 print / 1536-6359 online
DOI: 10.1080/15366367.2011.558442
FOCUS ARTICLE
The Role of the Unit in Physics and Psychometrics

Downloaded by [McMaster University] at 08:20 03 November 2014
Stephen M. Humphry
Graduate School of Education
University of Western Australia
The purpose of this article is to examine the role of the unit in physics in order to clarify the role of
the unit in psychometrics. Based on this examination, metrological conventions are used to formu-
late the relationship between discrimination and the unit of a scale in item response theory. Seminal
literature in two lines of item response theory is reviewed in light of the standard definition of mea-
surement in physics, and Birnbaum’s formulation of the discrimination parameter in item response
theory is reexamined. Consequently, the article introduces a scale parameter in a model that special-
izes to both the two-parameter logistic (2PL) and Rasch models. The model has sufficient statistics
for person and item parameters, the feature that defines Rasch models, whilst also parameterizing
discrimination. By formulating the relationship between discrimination and the unit, this article rec-
onciles differing perspectives regarding the use of a discrimination parameter in the 2PL and Rasch
models. A simulation study is used to demonstrate the results of implementing conditional maximum
likelihood estimations of item locations. Implications for the progress of measurement in the social
sciences are identified and discussed. It is argued that these implications entail substantial shifts in
the way we think about measurement in the social sciences.
Key words: 2PL model, discrimination, measurement, metrology, physics, scale, sufficient
statistic, unit
The purpose of this article is to examine the role of the unit in physics to clarify its role in psycho-
metrics. The article draws on the definition of measurement in physics that has become standard
This work was supported by an Australian Research Council Linkage grant with the Australian National Ministerial
Council on Employment, Education, Training and Youth Affairs (MCEETYA) Performance Measurement and Reporting
Task Force; UNESCO’s International Institute for Educational Planning (IIEP); and Pearson Assessment as Industry
Partners. David Andrich and Stephen Humphry are the chief investigators on this project.
Correspondence should be addressed to Stephen M. Humphry, Graduate School of Education (M428), University
of Western Australia, 35 Stirling Highway, Crawley, Western Australia 6009, Australia. E-mail: stephen.humphry@
uwa.edu.au
2 HUMPHRY
in metrology, the science of measurement. The article identifies and examines implications for
psychometrics by formulating a connection between discrimination and the unit of a measure-
ment scale in item response theory (IRT), based on the conventions of metrology as used in
physics. These implications are discussed with a particular focus on differing perspectives regard-
ing the incorporation of a discrimination parameter in the two-parameter logistic (2PL) and Rasch
models.
There have been two main lines of development in IRT that have led to diverging perspectives
regarding the use of a discrimination parameter. One began with the work of Rasch (1980) and the
other with the work of Lord & Novick (1968) and of Birnbaum (1968). Within the second line of
development in IRT, incorporating an item-discrimination parameter is regarded as advantageous
if it allows the capacity to better model item responses. Within Rasch’s line of development,
an item-discrimination parameter is regarded as undesirable because it destroys sufficiency of

the kind required by Rasch (Andersen, 1977; Andrich, 2004). Wright (1985) also argued that
incorporating an item-discrimination parameter in the 2PL destroys conjoint additivity (Luce &
Tukey, 1964), which is theoretically sustained by the Rasch model (Perline, Wright, & Wainer,
1979; Wright, 1985; Wright, 1997; however see Kyngdon [2008] for a critique).
Key considerations in psychometrics are (a) the rationale for including a discrimination
parameter and (b) the material consequences for research and applied work. The article aims
to clarify both the rationale and the consequences.
The article is structured as follows. First, it examines the standard definition of mea-
surement in physical science, focusing on the centrality of the unit of measurement to this
definition. The article juxtaposes the definition adopted in the physical sciences with the
definition adopted in psychometrics, highlighting the lack of attention paid to units of mea-
surement in the social sciences. Focusing specifically on IRT, the article then contrasts the
definitions of measurement implicitly adopted within the two independent lines of devel-
opment of this body of theory. The traditional conceptualization of item discrimination in
IRT is reviewed. Following this, the article explicitly defines discrimination using quantity
calculus, the conventions and principles of which underpin the definition of physical units
in the International System (SI). This definition of discrimination is then used to state the
Rasch model in a form that incorporates a discrimination parameter. The article reviews
the demonstration by Humphry & Andrich (2008) that it is possible to separate person
and item parameters whilst incorporating a discrimination parameter. It shows that this is
true provided that a limiting condition is met. It is argued on this basis that differing per-
spectives regarding discrimination in the two lines of development of IRT are essentially
reconciled.
Lastly, the article focuses on the broader implications of the developments for psychometrics.
It is argued that in psychometrics, actual quantities should be made explicit in the process of
formulating and testing (a) the scientific hypothesis that a quantitative attribute exists, and (b)
the contingent hypothesis that it is possible to measure the attribute in a specific unit. These tasks
are discussed with a particular focus on the framework within which physicists have success-
fully defined units and realized and reproduced measurements in well-defined units. Examples
of hypotheses are stated and discussed to indicate a possible starting point for approaching these
tasks in the social sciences. It is argued that the implications entail substantial shifts in the way
we think about measurement in the social sciences, and particularly the manner in which we state
and test quantitative hypotheses.
THE ROLE OF THE UNIT IN PHYSICS AND PSYCHOMETRICS 3
BACKGROUND AND OVERVIEW
Contrasting the Centrality of the Unit in Physical Science Versus Psychometrics
Viewed in interdisciplinary terms, a salient deficiency of the social sciences is that there are no
standard units, let alone coherent systems of units of the kind that are central to physics and its
industrial and technological applications (Duncan, 1984). Units are so fundamental to physics
that, “to describe an experimental process and to express any quantitative experimental result, a
unit is needed for each physical kind of quantity” [emphasis added] (Terrien, 1980, p. 766).
Luce & Krumhansl (1988, pp. 3–4) observed that for successful measurement in psy-
chophysics “it appears necessary for some structure of interconnected scales to arise, much as
exists in physics. This would mean a complex pattern of reductions of some psychological scales
to others, with a resulting simple pattern of units, and quite possibly some simple connection
with the scales of physics.” There are at least two possible implications of the absence of such
systems of units in the social sciences. It may be possible to identify genuine reasons that the
social sciences has no system of units and that no progress has been made toward establishing
systems of units. Identifying such reasons would almost certainly shed light on fundamental dif-
ferences between physics and quantitative social science should they exist. Otherwise, it should
be possible to establish systems of units and to state quantitative definitions and relations. Either
way, it seems clear we cannot “expect to make progress by ignoring pertinent matters” (Michell,
1999, p. 217).
NASA’s loss of the $125 Million Mars Climate Orbiter in September 1999 was a dramatic
demonstration of the importance of units in physics. The cause was the failure to convert impe-
rial pound-forces to newtons, the metric units required by navigational software on the craft.
Dramatic examples like these occur relatively infrequently, however, because there is a well-
established framework for defining and realizing transferable units in measurement in applied
Physics (Petley, 1992). The demands on the framework have grown, giving rise to an equally
well-established technology and industry to support its use.
In the psychometric literature, Wright (1997, p. 33) drew attention to the “commercial, archi-
tectural, political, and even moral necessities for abstract, exchangeable units,” arguing that
Rasch models are capable of maintaining a unit. However, the definition of measurement in the
physical sciences has been almost universally ignored in disciplines such as psychology. Michell
(2008) has argued, as follows, that as a consequence, psychometrics is pathological science:
Psychometrics is a science in which the central hypothesis (that psychological attributes are quan-
titative) is accepted as true in the absence of supporting evidence, and this fact is ignored because
psychometricians remain ignorant about the concept of quantity; they accept a definition of measure-
ment that deflects attention away from this issue of quantity; and an operationist interpretation is put
upon scale type distinctions. That is, psychometricians claim to know something that they do not
know and have erected barriers preserving their ignorance. This is pathological science. (p. 10)
The article does not aim to fully address criticisms made by Michell. However, the basis for
his criticisms is ignorance of the standard definition of measurement in physics. In the classical
definition, “a measurement of a magnitude of a quantitative attribute is an estimate of the ratio
between that magnitude and whichever magnitude of the same attribute is taken as the unit of
measurement” [emphasis added] (Michell, 2007, p. 74). “It is invariably along such lines that
4 HUMPHRY
measurement is, and always has been, defined in the physical sciences” (Michell, 1997, p. 358).
Thus, a measurement unit is a “scalar quantity, defined and adopted by convention, with which
any other quantity of the same kind can be compared to express the ratio of the two quantities as a
number” (Bureau International des Poids et Mesures, 2008, p. 6). For example, the pure number
2.734 = L: m is the ratio of the length L = 2.734 m to the length 1 m (see Bureau International
des Poids et Mesures, 2008, p. 13, note 2 for similar examples).
To test the scientific hypothesis that an attribute is quantitative, it is therefore necessary, in
practice, to test the contingent hypothesis that it is possible to measure relative to a unit of a given
kind of quantitative attribute using specific procedures under specific empirical conditions. In
physics, it is explicitly recognized that measurement presupposes description of “a measurement
procedure, and a calibrated measuring system operating according to the specified measurement
procedure, including the measurement conditions.” (Bureau International des Poids et Mesures,
2008, p. 16). The measurement procedure is formulated “according to one or more measurement
principles and to a given measurement method, based on a measurement model and any calcula-
tion” that is required to obtain a measurement result (Bureau International des Poids et Mesures,
2008, p. 18).
In contrast, most psychometricians have either explicitly or implicitly adopted Stevens’ (1946)
definition of measurement as the assignment of numbers to objects or events according to
rules. Stevens’ definition is incompatible with the classical definition for reasons explained in
Michell (1999, pp. 16–20). Stevens put forward his definition of measurement in response to
the report by the Ferguson Committee, which was commissioned by the British Association
for the Advancement of Science. The chair, A. Ferguson, was a physicist. The committee was
appointed by the association in 1932 to investigate the possibility of quantitatively estimating
sensory events. The committee’s report was critical of Stevens’ work and it highlighted the impor-
tance of the definition of measurement. The British Association for the Advancement of Science
“played an important role in developing the concepts relating to systems of units” (Terrien, 1980,
p. 766).
In IRT, more specifically, Lord & Novick (1968, pp. 20–21) adopted Stevens’ definition,
whereas, Rasch did not. It is argued in this article that the difference between the perspectives
on parameterizing discrimination in the two lines of IRT reflects the different definitions adopted
within those lines of development.
Quantities Versus Numerical Values
To the extent that Stevens’ definition is accepted, the social sciences are in a similar situation
to physical science in the early- to mid-1900s, before a clear distinction was made between
quantities and their numerical values (see de Boer, 1994/95, p. 410). The state of affairs in the
social sciences is, to a significant degree, due to the separate events that occurred at a similar time
in history. It is unlikely the Ferguson Committee could have reached agreement on the definition
of measurement when physics had not attained consensus.
Even today in physics, the importance of units and systems of units is often superficially
obscured by treating quantities, including units, literally as algebraic variables, despite it being
clear that practically realizable units are not literally algebraic variables (Emerson, 2008). For
example, quantities such as the centimeter, second, and gram have abbreviations cm, s, and g,
but “units are not variables” (Emerson, 2004a, p. 26). Instead, these units are specific and actual
magnitudes of length, time, and mass, respectively. To stress this point in a form relevant to
item response models, consider the expression en cm ; that is, the base of the natural logarithm e
“raised to the power of n centimetres.” Clearly, this term is “meaningless” (Massey, 1971, p. 3):
one cannot raise a number to the power of a length. On the other hand, a number can be raised to
the power of a measurement (ratio) because a measurement is a real number. Let {d} = d:[cm],
“where symbols in braces { } represent pure numbers, and symbols in square brackets are units”
(Emerson, 2008, p. 134; see also Allisy (1980) for an overview of the conventions). The term
e{d} is well defined because the exponent is a real number: it is a ratio of one quantity to another
of the same kind.
If the central importance of units is often superficially obscured in physics, it is all but ignored
in psychometrics. Given the success of physics as a quantitative science, it is prudent to question

the merit of either ignoring or abandoning the definition of measurement that underpins physics.
It is argued that the adoption of different definitions of measurement in different lines of IRT
in psychometrics underlies different perspectives regarding the incorporation of a discrimination
parameter, for reasons that will be explained. Before proceeding, some background in IRT is
reviewed.
Background in IRT: Two Independent Lines of Development
There were “two independent lines of development in test theory . . . in the 1960s, one articulated
by Rasch (1980), the other by Lord and Novick (1968) and by Birnbaum (1968)” (Andrich,
2004, p. 10). On the different perspectives that exist now, Andrich refers in particular to the
estimation of discrimination parameters, observing that proponents of the 2PL generally consider
the problems of estimating item discrimination to have been solved, whereas, proponents of the
Rasch model generally consider the solutions ad hoc solutions to “inherently intractable problems
that cannot provide stability of estimates” (Andrich, 2004, p. 10).
Wright (1997) outlined what he perceived to be the problems, citing several comments in the
literature on estimation in the 2PL regarding what he regards as “ad hoc” solutions. For example,
he referred to Lord’s (1968, pp. 1015–1016) remark that item discriminations “increase without
limit” and person abilities “increase or decrease without limit.” He observed Stocking’s (1989,
p. 25) remark that “Running LOGIST to complete convergence allows too much movement away
from the good starting values.” He also cited comments regarding the need to impose range
restrictions, such as Wingersky’s (1983, p. 48) remark that they must be applied “to all parameters
except the item difficulties” to control the “problem of item discrimination going to infinity.”
Verhelst & Glas (1995) address the use of marginal maximum likelihood estimation of param-
eters in the 2PL, observing that the rationale for the methods lacks internal consistency and that,
in any case, there are still problems with these approaches. As far as the focus of this article
is concerned, though, the more basic issue with modern approaches to estimation is that they
explicitly invoke assumptions about the distributions of measurements prior to estimating (mea-
suring). This stands in direct contradiction to the classical conception of measurement, which is
concerned with the estimation of individual magnitudes relative to units entirely separately from
the measurement of other individuals. This point was heavily stressed by Rasch in his description
of specific objectivity (Rasch, 1961, 1977).
6 HUMPHRY
In the 2PL, the discrimination parameter is conceptualized in terms of “the discriminating

power of the item, the degree to which item response varies with ability level” (Lord, 1980,
p. 13). Examined in terms of the classical definition of measurement, Wright’s argument is that
the standard 2PL estimation approaches attempt to jointly estimate (i) person abilities and item
difficulties in a unit as well as (ii) the discrimination parameter that determines the unit, using
solution equations that contain the person and item parameters for which that unit is implicit.
Thus, Wright (1985, p. 108) stated that “when we try to estimate [the item discrimination
parameter] we find that we cannot separate it from its interactions with the estimation of the
[person parameters] used for its estimation.” The estimation is conducted in a manner that poten-
tially “produces a feedback” (Wright, 1997, p. 40) and, in addition, if discrimination is advanced
as a second item parameter, “we have to estimate a different unit for every item” (Wright, 1985,
pp. 108–109).
DISCRIMINATION IN THE 2PL AND RASCH MODELS
The 2PL and Item Discrimination
The 2PL model has the form
exp (αi (θn − bi ))

Pr{Xni = 1} = , (1)
1 + exp (αi (θn − bi ))
where Xni is a random variable that can assume values of 0 and 1, α i is the discrimination of item
i, θ n is the scale location of person n, and bi is the difficulty of item i. Birnbaum (1968, p. 408)
noted the relevance of the “considerable body of theoretical and practical methods developed by
Rasch” to applications of the 2PL model if the items have the same discriminating powers. With
respect to the difference between Rasch’s model and the 2PL, Birnbaum (1968, p. 402) posed
the question: “Do the items in a test really differ from each other in discriminating power?” He
concluded that this “question is crucial to evaluating the validity of the models.”
A related question relevant to this article is whether there are sets or subsets of items that
have a common level of discrimination. A related but somewhat different question, also relevant
here, is whether empirical factors other than item characteristics affect the rate of change of
the probability Pr {X ni = 1} as a function of change in the distance between person and item
locations. If such factors exist, as noted by Lord (1980), they would also compromise the validity
of the model of Equation (1).
The Rasch Model with an Arbitrary Multiplicative Constant
The Rasch model has no discrimination parameter. However, Rasch identified what he referred to
as a general form of a measuring function, by explicitly identifying two constants. He observed
that specifying the values of the two constants allows the person and item locations to be made to
vary within any interval that “may for some reason be deemed convenient” (Rasch, 1960, p. 121).
One of the constants is a multiplicative constant. With this constant incorporated, Rasch’s model
for dichotomous response data is
exp (ρ (θn /ρ − bi /ρ))

Pr{Xni = 1} =
1 + exp (ρ (θn /ρ − bi /ρ))

exp ρ θn∗ − b∗i
= , (2)
1 + exp ρ θn∗ − b∗i
where θn∗ = θn /ρ, b∗i = bi /ρ, and ρ is the multiplicative constant. The Rasch model of Equation
(2) has the same structure as the 2PL of Equation (1). However, in the 2PL the multiplicative
constant is formulated as a parameter whereas in the Rasch model it is not.

The Rasch model for dichotomous responses is usually written in the form
exp (θn − bi )
Pr{Xni = 1} = (3)
1 + exp (θn − bi )
which is the same model as Equation (1), with Rasch’s multiplicative constant ρ implicitly
defined as 1. Clearly, there is no change of probability between equations (2) and (3) for given
θn , n = 1, ..., N and bi , i = 1, ...I. However, if ρ = 1, then in general, θn∗ = θn and b∗i = bi . The
term ρ is a scaling constant whose value affects the multiplicative factor of separation between
estimates obtained when the model is applied, as observed by Rasch.
FORMULATING DISCRIMINATION IN TERMS OF THE CLASSICAL DEFINITION
The Logistic Measurement Function
As a basis for clarifying the role of the unit in psychometrics, a general discrimination parameter
is here formulated by introducing the measurement function

exp ρs (θn∗ − b∗i )
Pr{Xsni = 1} = , (4)
1 + exp ρs (θn∗ − b∗i )
where ρ s is the level of discrimination in the presence of classification s of an empirical factor,

and θn∗ and b∗i are the location of person n and item i on the latent trait in a unit that is common
across classifications or levels s = 1,. . ., S of an empirical factor. The definition of the locations
of parameters in a common unit is given formally to follow using the conventions of metrology.
The model of Equation (4) is referred to simply as the logistic measurement function. In this
function, discrimination is formulated as a parameter the magnitude of which may depend upon
any empirical factor that affects the rate of change of the probability of a modeled response,
Xsni = x ∈ {0, 1}, as a function of differences between person and item locations. As examples,
the empirical factor might be a characteristic common to items in an item set; characteristics of
a judge when the scoring of an item involves a judgement; or an experimental condition under
which items are administered, such as online versus on paper. A given empirical factor may have
8 HUMPHRY
different classifications s = 1, . . ., S that are hypothesized to affect the slope of the item response
function (Humphry, 2010).
Formulating Discrimination in Terms of the Classical Definition
Using metrological conventions, Equation (4) may be written in terms of the classical definition
of measurement as
exp ({ρs } (θn : [u∗ ] − bi : [u∗ ]))

Pr{Xsni = 1} = , (5)
1 + exp ({ρs } (θn : [u∗ ] − bi : [u∗ ]))
Here, discrimination is defined as
ρs = [u∗ ] : [us ], (6)
where [u∗ ] is a common unit, and [us ] is the unit in terms of which items in set s is hypothesized
to measure the relevant trait. This definition of discrimination is congruent with the classical
theory of measurement and, accordingly, readily translates to physics. Consider, for example, two
instruments s = 1,2 that measure masses in the units 1 mg and 10 mg, respectively. It follows that
ρ1 /ρ2 = 10 mg/mg = 10. Therefore, the discrimination obtained using instrument 1 is 10 times
that obtained using instrument 2. Discrimination is defined provided a comparison between units
is implied; it cannot be defined for an instrument with a unit in the absence of a comparison with
any other unit.
Clearly, if the effects of the characteristics of item sets on discrimination are modeled and
each item set contains just one item, such that we may write s = i, then Equation (4) specializes
to the 2PL model of Equation (1). If, on the other hand, ρ s is uniform for the set of all items
considered because they all share the relevant characteristic, then, with the arbitrary specification
ρ s = 1, Equation (4) clearly specializes to the Rasch model.
Examining Item Response Models in Terms of the Classical Definition
Due to familiarity with the 2PL and the structural similarity between the logistic measurement
function and the 2PL, the term ρ s may be incorrectly interpreted as necessarily and only an
item parameter. It is instructive therefore to briefly examine the relationship between the models.
The logistic measurement function specializes to the 2PL in the particular sense that sets of
items specialize to those sets each containing just one member, whereas, the converse does not
hold. In addition, the scale parameter ρ s may relate to a level s of any kind of empirical factor,
item characteristics being a special case. In purely algebraic terms, the discrimination parameter
of the 2PL may be modified but, being an item parameter, it does not directly specialize to a
scale parameter for empirical factors such as marker characteristics or empirical conditions.
Formulating the relationship between discrimination and the unit in Equation (6) clarifies that the
unit [us ] in terms of which measurements are estimated may, in theory, be affected by empirical
factors other than the characteristics of items.
To further demonstrate that the general discrimination parameter cannot logically be an item
parameter only, it is instructive to also consider conditions in which item characteristics are
experimentally held constant while other empirical factors are varied. For example, the same item
may be marked by different judges or under different conditions, and these empirical factors may
affect the level of discrimination and, therefore, the unit of a scale. In such a case, the parameter
cannot possibly be interpreted as an item parameter because in that case, the same item has a
different level of discrimination under different empirical conditions.
The potential for empirical factors other than item characteristics to affect discrimination has
been observed on occasion in the literature on item response models. For example, Lord (1980)
recognized that empirical factors such as person characteristics might affect the slope. Similarly,
Klauer (1995) recognized the potential for person characteristics to affect the slope in studying
tests of violations of the Rasch model.

If it were hypothesized that the interaction of more than one kind of factor affects discrimina-
tion, it may be necessary to include more than a single subscript in the discrimination parameter.
Alternatively, in Equation (4), s = 1, 2,. . . , S may denote each interaction between classifi-
cations of two or more factors if such effects were hypothesized or found to have an effect on
the slope of the item response function. This is particularly relevant to vertical equating because
person and item characteristics generally vary systematically across both the person groups and
tests (item sets) in relevant empirical contexts. The potential for interactions to occur is noted for
completeness; however, further consideration is beyond the scope of the article.
The Relationship Between Discrimination and the Unit in the Rasch Model
Wood (1978, p. 29) observed that in the Rasch model, an ability estimate is “always scaled by a
factor,” this being “the common level of discrimination for all items.” This relationship between
discrimination and the unit of a scale is occasionally referred to in the literature (e.g., Brink,
1971; Embretson & Reise, 2000).
The relationship between the discrimination and the unit is recognized by defining
∗ ∗
θn(s) − b(s)
i ≡ ρs (θn − bi ) (7)
in which the superscript of the parameters θn∗ and b∗i signifies that the parameters are expressed
in a unit that is common across levels or classifications s = 1, . . . , S of an empirical factor. That
is, the distance between any two scale locations, such as θn∗ − b∗i , is defined to be invariant across
∗ ∗
classifications. On the other hand, the difference θn(s) − b(s)
i = ρs (θn − bi ) depends upon the mag-
nitude of ρ s , which is the relative degree of discrimination in the presence of classification s of
an empirical factor.
Using metrological conventions to distinguish between pure numbers, quantities, and units, in
accordance with the classical definition, Equation (7) is equivalent to
θn − bi [u∗ ] θn − bi
= , (8)
[us ] [us ] [u∗ ]
10 HUMPHRY
where again ρs = [u∗ ] : [us ]. The conventions clarify the meaning of the superscripts explicit in
terms of the classical definition. The superscripts refer to the units implicit within the parameters:
the parameters being hypothesized measurements of a latent trait in terms of those units (see also
Humphry & Andrich [2008]).
In the 2PL model of Equation (1) and in the Rasch model of Equation (3), θn = θn∗ and bi = b∗i .
That is, it is usually implied that the parameters are expressed in a unit that does not depend
upon the level of discrimination of either a particular item or, more generally, upon the level
of discrimination associated with a particular classification of an empirical factor. Given the
definitions in Equation (5) the function of Equation (4) may be stated in the alternative form

exp θn(s) − b(s)
i
Pr{Xsni = 1} = . (9)
1 + exp θn(s) − b(s)
i
Equation (9) is the Rasch model for dichotomous responses. However, the superscript s makes
explicit that the multiplicative factor of separation between person and item locations depends
∗ ∗
on the magnitude of ρ s , that is, θn(s) − b(s)
n = ρs (θn − bi ). For example, s might refer to an item
set with uniform discrimination or to an empirical assessment condition. Equation (9) is identical
to the Rasch model in its standard form if there is no variation between ρ s due to the effects of
different levels of an empirical factor s = 1,2, . . . , S.
Differential discrimination is a source of misfit to the Rasch model that affects the unit due
to this relationship. Baker observed precisely these implications of the relationship between the
discrimination and the scale in application of the Rasch model compared with application of
the 2PL.
An important feature of the Rasch model is that there is a mathematical requirement for the item
discrimination parameter to be unity for all items in the test. This is the basis for setting the unit of
the ability metric equal to one. Conventional wisdom is that this assumption is met when all items
share a common value of the item discrimination parameter . . . but among measurement specialists
it is also understood that any discrepancy between this common value and one is compensated by
adjusting the unit of measurement of the ability metric (see Wood, 1978). Yet, this compensation is
rarely mentioned in the literature dealing with the Rasch model . . . However, numerous studies have
dealt with the effect of item discrimination parameter values upon the goodness of fit of the Rasch
model to item response data . . . but none have dealt with the manner in which the violation of the
mathematical requirement of a unit value of the item discrimination parameter affects the obtained
metric. (Baker, 1983, p. 98)
It is instructive to consider the implication of Baker’s first two sentences, given the relation-
ship between discrimination and unit of a scale. If the uniformity of item discrimination is the
basis for setting the unit of the ability metric equal to one, what is implied if the value of item
discrimination varies for every item? In particular, what is the basis for estimating person and
item parameters in a common unit? The “mathematical requirement” referred to by Baker has
now been precisely formulated using the conventions of quantity calculus. The present analy-
sis shows that the onus is on the researcher to demonstrate that it is possible to measure in a
fixed unit the magnitude of which does not depend upon empirical factors. This task can be
approached experimentally by explicitly hypothesizing a fixed unit (and discrimination) based
on the definition of discrimination given in Equation (6). However, currently researchers do not
routinely test hypotheses about what affects the unit in applications of the 2PL; therefore, they
do not establish that there is a fixed unit of measurement.
Following these questions, the article indicates a possible reason the effect of differential item
functioning on the scale had not been, and is still not generally, dealt with in the literature dealing
with the Rasch model. In the Rasch model, discrimination is not parameterized at all. It is gener-
ally taken for granted in the psychometric literature, as by Baker, that the more general case is that
in which discrimination is parameterized in relation to each item. However, other empirical fac-
tors may affect the level of discrimination. The more general case is that any empirical factor may
affect the level of discrimination. Reconsidering the Rasch model accordingly, it can be seen from
Equation (5) that not incorporating a discrimination parameter, or setting its common value as
ρ s = 1, requires that there be a single and common unit [u∗ ] irrespective of any empirical factors.
RECONCILING PERSPECTIVES: SUFFICIENT STATISTICS

AND CONDITIONAL ESTIMATION
The existence of sufficient statistics for person and item parameters is what defines Rasch’s mod-
els (Rasch, 1961). The rationale for this class of models is based on measurement in the physical
sciences (Rasch, 1977, 1980). As the next step in reconciling the different perspectives in terms
of the classical definition, it is shown that discrimination can be parameterized without destroying
this property. When sufficiency is preserved, it is possible to implement conditional maximum
likelihood estimation. The article presents a set of pairwise conditional maximum likelihood
estimation equations and implements these to estimate item parameters.
The estimations involve a two-stage process. In the first of these, item estimates are obtained
conditionally relative to a common origin but in a different unit for each set. In the second,
person parameters and item-set-discrimination parameters are estimated jointly using maximum-
likelihood equations. The second stage does not further involve the item estimates.
Preserving Sufficiency in the Simplest Case: Pairs of set Pairs
For the purpose of obtaining conditional maximum likelihood equations, Equation (4) is firstly
restated as

exp αs θn∗ − b(s)
i
Pr{Xsni = 1} = . (10)
1 + exp αs θn∗ − b(s)
i
In this form of the model, the measurement of the items is stated in terms of the unit specific to its
own item set, that is, [us ]. On the other hand, the hypothesized measurements of person abilities
θn∗ , n = 1, 2, ... , N are stated in terms of the common unit [u∗ ]. To account for the difference
between the units, the scale or discrimination, parameter ρs = [u∗ ] : [us ] is required so that the
empirical relationship between the measurements and probabilities are preserved, as manifest
within observed data.
Consider the model of Equation (10) in the simplest possible case of just two items within each
of two item sets; that is, in the case of a pair of set pairs of items. To demonstrate sufficiency of
12 HUMPHRY
statistics for the person parameters, it is next shown that this probability is independent of the
person parameters in the simplest case of a pair of item pairs; specifically, the conditional prob-
ability Pr{(xni , xnj ); (xnk , xnl )} (rsn = 1, rtn = 1) is independent of the vector (θn(s) , θn(t) ) of person
parameters. Because the article focuses on pairs of (item) set pairs, pairs from set s are denoted i
and j and pairs of items from set t are denoted k and l. The result is used as a basis for pairwise
conditional estimation. The following proof is also provided in Humphry & Andrich (2008).
First,
Pr {(rsn = 1, rtn = 1)} = Pr {(1, 0) ; (1, 0)} + Pr {(1, 0) ; (0, 1)} + Pr {(0, 1) ; (1, 0)}
+ Pr {(0, 1) ; (0, 1)} ,

that is,
∗ (s) ∗ (t) ∗ (s) ∗ (t)

Pr{(rsn = 1, rtn = 1)} = {(eαs θn −bi ) .1.eαt θn −bk ) ) + (eαs θn −bi ) .1.eαt θn −bl ) )
∗ (s) ∗ (t) ∗ (s) ∗ (t)
+ (eαs θn −bj ) .1.eαs θn −bk ) ) + (eαs θn −bj ) .1.eαs θn −bl ) )}/γni γnj γnk γnl (11)
where γni = (1 + exp(αs θn∗ − b(s)

i )). On simplification, this gives
∗ ∗ (s)
−b(s) (s)
−b(t) (s)
−b(t) (s)
−b(t)
Pr{(rsn = 1, rtn = 1)} = eαs θn +αt θn {e−bi k + e−bi l + ebj k + e−bj l }/γni γnj γnk γnl .
The conditional probability of the relevant response pattern across the item sets in the case of
two sets each containing two items is
Pr{xn ∩ (rsn = 1, rtn = 1)}

Pr{ xn ∩ (1, 1)| (1, 1)} = , (12)
Pr{(rsn = 1, rtn = 1)}
where xn is the response vector (xni , xnj ); (xnk , xnl )for person n. Applied to Equation (10), this
gives
∗ ∗ (s) ∗ ∗
exni (αs θn −bi ) exnj (αs θn −bj ) exnk (αt θn −bk ) exnl (αt θn −bl
(s) (t) (t)
)
Pr{ (xn ∩ rn | (1, 1)} = (s)
∗ +αt θn∗ −b(t) ∗ (s)
+αt θn∗ −b(t)
+ eαs θn −bj + eαs θn −bj
(s)
∗
eαs θn −bi +αt θn∗ −b(t)
k + eαs θn −bi
∗ (s)
+αt θn∗ −b(t)
l k l
(s)
∗ ∗ −xnj b(s) (t) (t)
eαs θn +αt θn e−xni bi j −xnk bk −xnl bl
= (s)
−b(t) (s)
−b(t)
+ e−bj + e−bj
(s)
eαs θ∗n +αt θ∗n (e−bi −b(t)
k + e−bi
(s)
−b(t)
l k l )
(s)
−xnj b(s) (t) (t)
e−xni bi j −xnk bk −xnl bl
= (s)
, (13)
−b(t) (s)
−b(t)
+ e−bj + e−bj
(s)
e−bi −b(t)
k + e−bi
(s)
−b(t)
l k l

The conditional
probability shown in Equation (13) is independent of the vector αs θn∗ , αt θn∗ =
θn(s) , θn(t) of person parameters in their respective units [us ] and [ut ]. The numerator is necessarily
one of the terms in the denominator, as determined by the response vector xn .
b2(1) b1(1)
–6 –4 –2 0 +2 +4 +6
–3 –2 –1 +1 +2 +3
b3(2) b4(2)
FIGURE 1 Locations of two items in two sets with different units.
Clearly, the conditional probability of any other response pattern across the two item sets is
also independent of this vector of person parameters. The item parameters are estimated in the
respective units determined by the scale or discrimination parameter of the set.
To understand the nature of the conditional item estimates, it is instructive to consider a spe-
cific example. An example of four item locations b(s) i , two expressed in each of two different
units, is shown in Figure 1. In this figure, the continuum is partitioned into different units on top
and bottom as implied by the ratio of scale parameters α 1 / α 2 = 2. By analogy, one can think of
Figure 1 with a tape measure that has increments of both centimeters and inches, in which case
the ratio of units would be 1 in: 1 cm = 2.54. In the process described in this article, estimating
item locations in the same unit is made by estimating the relative levels of discrimination of two
(or more) item sets. However, the item parameters are first estimated in different units, as shown
above and below the continuum in Figure 1.
Due to its advantages in handling missing data, the pairwise conditional probability expression
is implemented to follow. This is particularly important given the potential application of the
model in contexts in which there may be significant proportions of missing data in a matrix
due to the design of equating studies. The conditional probability may be stated in terms of any
number of items, as shown in the Appendix.
Conditional Maximum Likelihood Estimation of Item Parameters

Let Pr{ xnijkl (1,1)} be the probability conditional probability shown in Equation (11), where
xnijkl is the response vector for person n across items i, j in set s and k, l in set t. Let the pairwise
conditional likelihood function be

Lpc = Pr{ xnijkl (1, 1)}. (14)
n i∈s j=i k∈t l=k
This is a pseudo-likelihood function, in the sense that it creates redundancy across responses of
persons across pairs of items, as described by Andrich and Luo (2003). The aim is to obtain
estimates of the item parameters in the units of their sets relative to a common origin. To do
so, the log likelihood function is obtained and a solution equation for each item i is given by
the partial derivative with respect to b(s)
i . Due to consistency, conditional estimation serves as
a reference for examining other estimation algorithms (Jansen, ven den Wollenberg, & Wierda,
1988). The constraint
14 HUMPHRY

b̄(s) = 0
s
may be imposed as an arbitrary choice of origin. That is, the mean of the set means in their
respective units may be set equal to zero. Only response patterns that involve non-extreme scores
for each set can be used in the estimation process, and this entails practical considerations for
implementing the approach.
Maximum Likelihood Estimations for Persons and Item set Discrimination

Having estimated the item parameters, the person and scale parameters may be obtained in a
second stage using joint maximum likelihood equations given by Lord and Novick (1968), here
focusing on the solution equations for the person and discrimination parameters. The maximum
likelihood solution equations for the person parameters θn∗ are
∗
eαs θn −bi
(s)
αs xni = αs ∗ (s)
n = 1, 2, ..., N (15)
s i∈s s i∈s 1 + eαs θn −bi
The maximum likelihood solution equations for the discrimination parameters α s are
∗
eαs θn −bi
(s)
θn∗ xni = θn∗ ∗ (s)

i∈s n i∈s n 1 + eαs θn −bi
s = 1, 2, .., S i = 1, 2, .., Is . (16)
Response patterns that produce extreme scores of 0 or the maximum must be eliminated in the
estimation of person parameters. The constraint

ρs = [u∗ ] : [us ] = 1 (17)
s s
must be imposed in the estimation of the scale parameters. In the case of only two sets, in metro-
logical notation this is [u∗ ] : [u1 ] × [u∗ ] : [u2 ] = 1, which therefore implies that [u∗ ] : [u1 ] =
[u2 ] : [u∗ ].
Clearly, having obtained the item conditional item parameters b(s) i , s = 1, 2, .., S and i =
1,2,..,I s , these are known values, fixed for each item across iterations, in Equations (19) and (20).
Only the two remaining sets of parameters need to be estimated using the two sets of solution
equations. The steps therefore involve estimating a scale parameter ρs = [u∗ ] : [us ]for each item
set without further estimation of the item parameters themselves. Thus, once the scale parameter
has been estimated the item parameters obtained conditionally may be expressed in a common
unit [u] because b∗i = b(s)i /αs . Thus, in Figure 1, once it is known that the ratio of scale parame-
ters is 2:1, all estimates can be expressed in a common unit, such as the unit shown on the bottom
of the line. In addition, the steps involve estimating a different unit for every item set but not for
every item individually. Consequently, there is no need to “estimate a different unit for every
item” (Wright, 1985, pp. 108–109).
The discrimination parameter in Equation (8) used as a basis for the estimations is the same
as that in the 2PL model, except that items belong to sets with uniform discrimination.
Thus, it has been shown that discrimination can be defined in terms of the unit in a manner
that retains sufficient statistics in the form of score vectors. The conflicting perspectives regarding
the parameterization of discrimination are thereby reconciled in terms of the classical definition.
However, a limiting condition applies.
Limiting Condition in Reconciling Conflicting Perspectives
If each item set contains only a single item, it is not possible to conditionally estimate param-
eters in the logistic measurement function because (xsn1 , x2n1 , ..., xSn1 ) ≡ (rsn , r2n , ..., rSn ). That
is, in this special case a person’s response vector is identical to the person’s score vector and
information cannot be obtained from relative frequencies of different response patterns in a
conditional likelihood expression of the form shown in the Appendix. Therefore, if the dis-
crimination of each item is parameterized, the separation of parameters demonstrated above is
no longer possible. Consequently, sufficiency cannot be exploited in the process of estimating
the relative scale locations of items. As will be discussed later, a minimum of three items in a
given set is required to obtain unique estimates in the units of each set, relative to a common
origin.
The limiting conditions are further clarified by considering the item response model referred
to above, introduced by Verhelst & Glas, in light of the preceding analysis. Verhelst & Glas
(1995) derive conditional maximum likelihood equations from a model referred to as the one
parameter logistic model, which contains a discrimination index rather than a discrimination
parameter, but is formally identical with the 2PL model. The authors note that the problem
faced in implementing conditional maximum likelihood (CML) estimation in the 2PL model is
that the values of discrimination parameters are unknown, meaning the “weighted” raw score
“is not a mere statistic, and hence it is impossible to use CML as an estimation method”
(Verhelst & Glas, 1995, p. 217). Verhelst & Glas proposed imputing the discrimination index
of each item, although in practice estimations of item discrimination are still used as a starting
point.
In the logistic measurement function, in contrast, it is possible to exploit sufficiency of
the weighted raw score without prior knowledge of the values of ρ s , s = 1, . . . , S by
conditioning on score vectors. This is possible because all response vectors for individual
persons xn = (x11n , ..., xSni ), which yield a particular score vector rn = (r1n , ..., rSn ), necessar-

ily also yield the same weighted raw score Wn = Ss=1 ρs rsn . The vector (θn(1) , ..., θn(s) , ..., θn(S) )
is by definition identical to (ρ1 θn∗ , ..., ρs θn∗ , ..., ρS θn∗ ). Hence, eliminating the vector of person
parameters through conditioning on score vectors necessarily implies that the person param-
eter θn∗ is eliminated. This is directly analogous to the elimination of the person parameter
16 HUMPHRY
in the one parameter logistic model, shown by Verhelst and Glas (1995). The key differ-
ence between the one parameter logistic model and the logistic measurement function lies in
making explicit the role of the unit, in terms of the classical definition, and the implications
of this role. As important as their technical contributions are in terms of the mathemati-
cal statistics involved in implementing estimation and test fit, Verhelst and Glas (1995) do
not explicitly discuss the relationship between the discrimination parameter and the unit,
much less attempt to formulate that relationship in terms of any definition or theory of
measurement.
Thus, as would be expected given results in the literature (e.g., Andersen, 1977), the perspec-
tives can be reconciled provided that (i) discrimination is parameterized but also that (ii) it is
not parameterized with respect to each item individually. More specifically, however, to condi-
tionally estimate item locations in the units of each set, three items in a given set are required
as a minimum to obtain unique estimates relative to a common unit. If there are only two items
in a set s, irrespective of the number of items in other sets, there is no unique solution for the
item locations in set s because there is only a single contrast, or difference, between the two
items. For example, suppose set s has only two items i and j. Wherever the items i and j appear
in the solution equations, Equation (13) may be reduced to a function involving only the dif-
ference b(s)
i − b(s)
j by multiplying both the numerator and denominator by exp b(s)
i when item

i appears in the numerator or exp b(s) j when item j appears in the numerator. Consequently,
any sufficiently accurate estimate of the difference b(s) (s)
i − bj will satisfy all conditional solution
equations jointly and it is not possible to obtain an estimate for each item relative to a com-
mon origin individually. If, on the other hand, there are three items, three possible differences
within the relevant set must be satisfied simultaneously in the conditional solution equations.
This provides the necessary constraint to obtain unique estimates for each item in the set relative
to the common origin. The existence of unique solutions also requires the existence of necessary
(valid) response patterns. Substantial work is required to determine the conditions under which
the model can be implemented in practice. This work has commenced using simulations with
various item set sizes, combinations of discrimination, and item difficulty. This work is beyond
the scope of the current article.
DISCUSSION
Implications of the Role of the Unit for Psychometrics
Thus, it has been shown that if discrimination is parameterized in relation to sets containing
several items, the requirement of uniform discrimination across all items may be relaxed without
destroying the property of sufficiency, which defines Rasch models (Rasch, 1980). This means
that it is possible to capitalize on the property of sufficiency without introducing unnecessary
restrictions with respect to the parameterizing of discrimination. This, in turn, provides greater
flexibility within the framework of Rasch’s line of development in test theory. In such cases, it is
necessary to account for associated differences between units of scales in order to obtain a scale
with a common unit of fixed size.
The different perspectives regarding the use of a discrimination parameter have been rec-
onciled in terms of the classical definition. In doing so, the paper has identified (i) a specific
accommodation that needs to be made within each perspective and (ii) benefits that can be gained
if the relevant accommodations are made.
From the perspective of parameterizing discrimination in the 2PL, it is necessary to accommo-
date the potential for factors other than item characteristics to affect the level of discrimination
by incorporating a general discrimination parameter. This consequence has been clarified by for-
mulating the relationship between discrimination and the unit in terms of the classical definition,
that is, as shown in Equation (6). If the discrimination parameter pertains to sets of items then
the advantages associated with sufficiency of score vectors for parameters can be realized. The
discrimination parameter ρ s specializes to the item discrimination parameter of the 2PL. Should
a researcher focus on item factors alone, the 2PL is readily modified so that items in sets have
uniform discrimination. More broadly, however, the accommodation makes it possible to model
and account for effects of empirical factors other than item characteristics on the unit (see also
Humphry [2010]).
In a complementary fashion, viewed in terms of the conceptual framework introduced by
Rasch, it is necessary to permit variation in the multiplicative scale factor due to empirical fac-
tors. The arbitrary multiplicative constant in Rasch’s model of Equation (2) has not been treated
as a parameter in the relevant literature. However, it has been shown that it is possible to incor-
porate this parameter whilst preserving sufficiency. The paper showed that if discrimination is
parameterized in relation to item sets, score vectors are the generalization of sufficient statis-
tics for person parameters. As well as greater flexibility, parameterizing discrimination again
provides the capacity to model and account for the effects of empirical factors on the unit of a
scale.
Thus, when formulated in terms of the classical definition, it becomes apparent that there
is no reason the discrimination parameter should pertain specifically to items rather than some
other kind of empirical factor that affects the rate of change of the probability of a modeled
response. In principle, empirical factors that affect the slope affect the unit [us ]. Empirically,
item characteristics are clearly an important kind of factor because they are intrinsic to the
process of generating modeled responses. However, person characteristics and assessment con-
ditions are also intrinsic to the process by which item responses are generated. Humphry
(2010) has shown the empirical effect of a person group factor on the degree of discrimina-
tion. Other factors such as judge characteristics may also affect discrimination. There might,
for example, be a higher level of discrimination if a short-response question is marked by
experts rather than novices if experts in the relevant area are able to better discriminate whether
or not a response meets a specified criterion than non-experts. In this case, the discrimi-
nation parameter would pertain to the effect of marker expertise on discrimination and the
unit.
The preservation of sufficiency implies that the logistic measurement function itself possesses
the property that Rasch himself considered to define the class of models. This defining property
is the existence of sufficient statistics for the model’s person and item parameters (Rasch, 1961,
1980; Andersen, 1977). Given that the function also specializes to the 2PL, the paper establishes
a basis for unifying theoretical and applied research to a considerably greater extent by clarifying
the connections in terms of the classical definition of measurement. However, a limiting condition
applies in reconciling the perspectives.
18 HUMPHRY
Multidimensionality
The approach to estimation demonstrated in the paper involves a step common to the application
of multidimensional Rasch models; namely, partitioning of the data matrix based on factors and
associated conditioning on vectors of raw scores. In a given situation, the introduction of such
hypotheses may or may not imply the possibility of multidimensionality, in the sense that more
than one quantitative attribute (latent trait) is measured. Consequently, tests should be employed
to ascertain whether variation among levels of more than one trait systematically affects item
responses. These should include tests for violations of local independence, just as they should in
applications of the Rasch model or 2PL where the objective is to measure a single latent trait.
Although a relevant consideration, this article’s focus is the role of the unit in which a single
dimension (kind of quantitative attribute) is measured.

Although a proper examination of the term is beyond the scope of this article, it is noted
that “multidimensionality” is seldom used in the physical sciences. As far as established physics
is concerned, it applies only to the actual coexistent (multiple) dimensions of space and time
(multiple dimensions are also posited in string theory but this is not relevant to the analysis of the
role of the unit in this article). This point is raised because the existence of related dimensions,
that is, related kinds of quantities, is relevant to the implications of this article. The reason this
is relevant is that it is usually necessary to control one or more quantitative attributes to measure
any particular kind of quantity in a specific unit. The reason is that quantities are interrelated.
Due to the interrelation of quantities, it is routine within engineering to conduct dimensional
analysis. More than 100 derived units in practice are defined in terms of the units of seven SI base
units. Definitions of derived units are based on quantity calculus, where “the value of the product
of the values of two concrete quantities, in a given system of measurable quantities and units, is
the product of their numerical values and a unit of the new quantity, if such a realizable quantity
can exist.” (Emerson, 2008, p.136). From this starting point, “more complex, derived quanti-
ties are expressed as functions of both base quantities and other previously derived quantities”
(Emerson, 2004b, p. 33). This is different from the use of spatial dimensions to represent corre-
lations between measurements obtained on a specific occasion under particular conditions, which
is the more prevalent sense in which multidimensional is used in the social sciences (Duncan,
1984, p. 161). Because correlations alone cannot demonstrate causal relations, they cannot pro-
vide the basis for establishing the effects of empirical factors on the unit. Instead, experiments
and tests are required.
Measurement and Scientific Hypothesis
To exploit the estimation procedures demonstrated in the simulations, in practice, it is necessary

to define the item sets in terms of characteristics that are hypothesized to affect the slope of the
item response function. As with any scientific hypotheses, these may be based upon theory, previ-
ous empirical evidence, or a combination of both. Due to the relationship between discrimination
and the unit formulated in Equation (6), empirical factors hypothesized to affect discrimination
are hypothesized to affect the unit [us ].
If an empirical factor impacts on the precision of measurement of a latent trait, it is reason-
able to hypothesize that there is a material relationship between that factor and the trait and the
observable manifestation of that trait. This is certainly the case in physics. For example, Bray
(1992, p. 3) observes in relation to mass that the “factors that affect standard stability and are apt
to cause measurable mass variations are: mechanical wear, chemical reactions, degassing, chemi-
cal and physical adsorption and desorption.” Conventionally, discrimination is treated essentially
as incidental and primarily of interest only for accurately modeling data and estimating param-
eters. This position is questionable given that empirical effects on the unit of measurement in
physics usually reflect causal empirical relationships, such as physical adsorption and chemical
reactions causing mass variation in a standard or prototype.
Although not focusing on the unit, Rasch (1977) did consider, in some depth, the need to
control related empirical factors in the process of measurement. For example, he examined the
experimental control of temperature or volume and subsequent observation of the effects on pres-
sure to make comparisons. He referred in this analysis to the ideal gas law and its implications
for comparisons in a two-way frame of reference for measurement. He also demonstrated that
invariance is characteristic of measurement in physics. He employed, as an example, a two-way
experimental frame of reference, in which instruments exert mechanical forces upon solid bodies
to produce accelerations. Rasch (1960/1980, pp. 112–113) stated of this context: “Generally: If
for any two objects we find a certain ratio of their accelerations produced by one instrument, then
the same ratio will be found for any other of the instruments.” It is readily shown that Newton’s
second law entails that such ratios are directly proportional to the ratios of the masses of the bod-
ies. Invariance is the basis for sufficiency in the class of probabilistic measurement models that
he identified. This is the reason it is necessary to show that sufficiency is preserved to reconcile
the differing perspectives regarding the use of a discrimination parameter in IRT.
One might object that in the social sciences the knowledge available to physicists for the pur-
pose of formulating and testing quantitative hypotheses is not generally available. The basis for
the objection is correct; however, accepting the objection as a reason not to formulate and test
hypotheses would, somewhat paradoxically, serve to reinforce the validity of Michell’s argument
that psychometrics is pathological. Michell argues that, in the main, psychometricians fail to rec-
ognize the hypothesis central to the discipline, which is that latent traits are actually quantitative.
Because the unit is central to the standard definition of measurement in physics, accepting the
objection would also reinforce the validity of Michell’s argument that the acceptance of Stevens’
definition serves to deflect attention from the central hypothesis of the discipline.
Conducting tests of specific hypotheses is beyond the scope of this article. It is, nevertheless,
instructive to refer to an example for which there is existing empirical support for the purpose of
illustrating precisely how experimental factors may be controlled in a way that affects the unit.
An example from psychophysics is used in which Pollack (1949) showed that the loudness of
speech in the presence of white noise is proportional to the ratio of the loudness of the speech
in the quiet to the level of noise. These, and directly analogous results in visual perception,
were reported by Stevens (1986). Pollack (1949, p. 256) observed that “the effect of the noise
on the loudness of the speech is to a first degree of approximation a function of the speech-to-
noise ratio rather than the level of the speech alone or of the noise alone.” On the basis of these
results, it can be hypothesized that the sensory unit of the loudness of a stimulus is affected by
the loudness of ambient noise, particularly when the ambient noise approximates white noise.
This hypothesis can be tested, for example, by asking participants to compare the loudness of
pairs of stimuli under different levels of background noise. In this context, there is a substantive
theoretical basis for the hypothesis that the empirical factor will affect the sensory unit, namely,
20 HUMPHRY
sensory inhibition. The same kind of hypothesis applies to other kinds of sensory inhibition, such
as glare in vision.
Whatever kind of empirical factor is hypothesized to affect discrimination, the classifications
or levels of the factor must be defined as part of the scientific hypothesis that their levels affect
discrimination. This is analogous to the formation of classifications or levels to conduct exper-
imental investigations to which ANOVA is applied. As a second example, it is reasonable to
hypothesize that engagement and effort will impact discrimination on attainment tests. Able stu-
dents may be more likely to correctly answer a question given sufficient effort, whereas, those
students who genuinely have very little ability cannot greatly improve their chances of correctly
answering relatively difficult questions. Engagement and effort can be manipulated and con-
trolled by various means to test effects on the unit. As empirical factors, it is reasonable to
hypothesize that engagement and effort are causally related to the latent trait of ability.
It is important to test fit to the model, and established tests of fit may be used for this purpose.
In addition, however, it is also important to state hypotheses regarding the effects of empirical
factors on the unit and to test these effects. Michell (2008) argues that tests of order relations are
required to establish additive structure, stating that “the reason I am confident that psychometrics
is pathological is that the theoretical and analytic work necessary to undertake tests of the kind
[indicated in the paper] has not yet been done . . . This, by itself, does not make psychometrics
pathological, but it does when conjoined with the presumption that psychological attributes are
quantitative” (p. 20). Michell’s conclusion and basis for his argument is also accepted in this
article. Clearly though, physics did not actually progress based on tests of cancellation condi-
tions. The difference is that in physics quantitative relationships are well established, and it is
not necessary to test that one attribute is quantitative if a quantitative relationship with another is
clearly established, for reasons that have been discussed. As Michell (1997, p. 359) observes:
Measurement always presupposes theory: the claim that an attribute is quantitative is, itself, always
a theory and that claim is generally embedded within a much wider quantitative theory involving the
hypothesis that specific quantitative relationships between attributes obtain. (Michell, 1997)
Hypotheses regarding quantitative relationships are also important so that the factors can be
deliberately controlled in the way empirical factors must be controlled to measure relative to
standard units in the physical sciences. Terrien (1980, p. 766) states: “The practically useful
definition of a kind of quantity is a set of rules for measuring the ratio of two instances of that
kind of quantity.” To realize definitions of standard units in practice it is usually necessary to both
understand quantitative relationships and to control empirical conditions. For example, the SI
definition of the meter depends on the definition of velocity, which is a relation between distance
and time. The definition also depends on the velocity of light in a vacuum being a constant,
which leads to a specific empirical condition for universally realizing measurement relative to
the meter. This and other definitions of base SI units clearly draw upon the relationships between
different kinds of quantity under specified empirical conditions to permit the precise realization
of standard units.
One of the principal reasons it is possible to define derived quantities in the manner required
to establish a coherent system of units is that “some physical measures are related to others
by laws” (Krantz et al., 1971, p. 455; see also Emerson, 2004b). Measures are also related by
definitions. Newton explicitly defined mass and momentum each as arising conjointly from two
related quantities, as follows:
Definition 1: The quantity of matter is the measure of the same arising from its density and bulk
conjointly.
Definition 2: The quantity of motion is the measure of the same, arising from the velocity and
quantity of matter conjointly.
(Cropper, 2001, p. 32)
Definitions of this kind form the basis of the definitions of derived units. Thus, Maxwell
(1952, art. SLVII) said that “the unit of force is that force which, acting on the unit of mass for
the unit of time, generates the unit of velocity.” The SI unit of force, the newton, is also stated in
precisely such terms.
The classical definition was explicitly assumed by Fechner, the founder of quantitative psy-
chology (Michell, 1986). It was also explicitly assumed by Rasch. Rasch (1980) referred to
Newton’s second law, in his analysis of invariance, and to Maxwell’s account of the definition
of the unit of force above. Both Newton and Maxwell explicitly stated the classical definition of
measurement (Emerson, 2008; Michell, 1993; Michell, 1999; Terrien, 1980).
CONCLUSION
In physics, the practical realization of units depends on the expression of definitions and laws,
and on the classical definition of measurement. There is no reason to presume this does not apply
to psychometrics. If there is a reason, it has certainly not been identified to date. Given that
systems of units depend upon theory and law (de Boer, 1994/95; Krantz et al., 1971), and given
that there is no system of units in the social sciences, it is not surprising that Michell (1999,
p. 217) should argue:
Psychology might be on the way to becoming a successful quantitative science, but as a body of
workable, quantitative theories and laws, it is so far short of the example set by physics that no one
yet has a clear idea of what a successful quantitative psychology would look like. The history of
science teaches us many things, but I do not think that one of them is that we can expect to make
progress by ignoring pertinent matters.
The pertinent matter is the definition of measurement. Not only is there a lack of standard
units, there is also a lack of progress toward the systematic scientific study of the effects of
empirical factors on units of measurement.
As a first step, the article has examined item response models in terms of quantity calculus.
By formulating the relationship between discrimination and the unit in terms of quantity cal-
culus, the article clarified how to conduct systematic scientific investigations of the effects of
empirical factors on units of measurements in item response models; and it reconciled differing
perspectives regarding the use of a discrimination parameter.
However, in concluding, it is stressed that should quantitative social sciences proceed in the
manner that physics did, we will need to establish quantitative empirical relations before we can
properly establish that it is possible to measure posited psychological quantities. This entails
something more than the use of psychometric models stated in terms of ordinary algebra: it
entails specific definitions with the form of the two definitions stated by Newton above (see
Bureau International des Poids et Mesures [2006] for the many definitions of SI units). If we are
22 HUMPHRY
to successfully define units in this way in psychometrics, we must make specific quantities and
their relations explicit in the process of positing and testing (a) the scientific hypothesis that a
quantitative attribute exists and (b) the contingent hypothesis that it is possible to measure the
attribute in a specific unit. There must be a substantive, theoretical explanation for the dimen-
sional equivalence of specific quantitative relations, as there is an explanation of the equivalence
of physical dimensions in the quantity equations that underpin the SI. The examination of the
role of the unit presented in the paper shows that these matters have received virtually no consid-
eration in psychometrics. Given this, the broader implication is a substantial shift in the way we
understand and experimentally approach measurement in the social sciences.
REFERENCES
Allisy, A. (1980). Physical quantities. In A. F. Milone & P. Giacomo (Eds.), Proceedings of the International School
of Physics “Enrico Fermi” Course LXVIII, Metrology and Fundamental Constants (pp. 14–17). Amsterdam,
Netherlands: North-Holland.
Andersen, E. B. (1977). Sufficient statistics and latent trait models, Psychometrika, 42, 69–81.
Andrich, D. (2004). Controversy and the Rasch model: A characteristic of incompatible paradigms? Medical Care, 42,
7–16.
Andrich, D., & Luo, G. (2003). Conditional pairwise estimation in the Rasch model for ordered response categories using
principal components. Journal of Applied Measurement, 4, 205–221.
Baker, F. (1983). Comparison of ability metrics obtained under two latent trait theory procedures. Applied Psychological
Measurement, 7(1), 97–110.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R.
Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Bray, A. (1992). The evolution of metrology during the last decade. In L. Crovini & T. J. Quinn (Eds.), Proceedings of the
International School of Physics “Enrico Fermi” Course CX, Metrology at the Frontiers of Physics and Technology
(pp. 1–9). Amsterdam, Netherlands: North-Holland.
Brink, N. E. (1971). Effect of item discrimination in the Rasch model. Proceedings of the Annual Convention of the
American Psychological Association, 6(1), 101–102.
Bureau International des Poids et Mesures (BIPM). (2006). The international system of units (SI). Sevres, France:
Organisation Intergouvernementale de la Convention du Mètre.
Bureau International des Poids et Mesures (BIPM). (2008). International vocabulary of metrology – basic and general
concepts and associated terms (VIM). General Conference on Weights and Measures. Retrieved 25 Feb 2010
from http://www.bipm.org/utils/common/documents/jcgm/JCGM_200_2008.pdf
Cropper, W. H. (2001). Great physicists. Oxford, United Kingdom: Oxford University Press.
de Boer, J. (1994/95). On the history of quantity calculus and the international system. Metrologia, 31, 405–429.
Duncan, O. D. (1984). Notes on social measurement: Historical and critical. New York: Russell Sage Foundation.
Embretson, S., & Reise, S. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Emerson, W. H. (2004a). One as a “unit” in expressing the magnitudes of quantities. Metrologia, 41, 26–28.
Emerson, W. H. (2004b). On the algebra of quantities and their units. Metrologia, 45, 134–138.
Emerson, W. H. (2008). On quantity calculus and units of measurement. Metrologia, 45, 134–138.
Humphry, S. M. (2010). Modeling the effects of person group factors on discrimination. Educational and Psychological
Measurement, 70, 215–231.
Humphry, S. M, & Andrich, D. (2008). Understanding the unit in the Rasch model. Journal of Applied Measurement, 9,
249–264.
Jansen, P. G. W., ven den Wollenberg, A., & Wierda, F. W. (1988). Correcting unconditional parameter estimates in the
Rasch model for inconsistency. Applied Psychological Measurement, 12, 297–306.
Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A. (1971). Foundations of Measurement (Vol. I). New York, NY:
Academic Press.
Kuhn, T. S. (1961). The function of measurement in modern physical science. ISIS, 52, 161–193.
Kyngdon. A. (2008). The Rasch model from the perspective of the representational theory of measurement. Theory and
Psychology, 18, 89–101.
Lord, F. M. (1968). An analysis of the Verbal Scholastic Aptitude Test using Birnbaum’s three-parameter model.
Educational and Psychological Measurement, 28, 989–1020.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Mahwah, NJ: Erlbaum.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Luce, R. D., & Krumhansl, C. (1988). Measurement, scaling, and psychophysics. In R. C. Atkinson, R. J. Herrnstein, G.
Lindzey, & R. D. Luce (Eds.), Stevens’ handbook of experimental psychology (pp. 1–74). New York, NY: Wiley.
Luce, R. D., & Tukey, J. W. (1964). Simultaneous conjoint measurement: A new scale type of fundamental measurement.
Journal of Mathematical Psychology, 1, 1–27.
Massey, B. S. (1971). Units, dimensional analysis and physical similarity. London, United Kingdom: Van Nostrand
Reinhold.
Maxwell, J. C. (1952). Matter and motion. New York, NY: Dover. (Originally published in 1876).
Michell, J. (1986). Measurement scales and statistics: A clash of paradigms. Psychological Bulletin, 100, 398–407.
Michell, J. (1993). The origins of the representational theory of measurement: Helmholtz, Hölder, and Russell. Studies
in History and Philosophy of Science, 24, 185–206.
Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology,
88, 355–383.
Michell, J. (1999). Measurement in psychology. Cambridge, United Kingdom: Cambridge University Press.
Michell, J. (2007). Measurement. In S. Turner & M. Risjord (Eds.), Philosophy of anthropology and sociology (pp.
71–119). Amsterdam: Elsevier.
Michell, J. (2008). Is psychometrics pathological science? Measurement: Interdisciplinary Research and Perspectives,
6, 7–24.
Perline, R., Wright, B. D., & Wainer, H. (1979). The Rasch model as additive conjoint measurement. Applied
Psychological Measurement, 3, 237–256.
Petley, B. W. (1992). The continuing evolution in the definitions and realizations of the SI units of measurement. In L.
Crovini T. J. Quinn (Eds.), Proceedings of the International School of Physics, course CX. Amsterdam, Netherlands:
North-Holland.
Pollack, I. (1949). The effect of white noise on the loudness of speech of assigned level. The Journal of the Acoustical
Society of America, 21, 255–258.
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests (expanded edition). Chicago, IL:
University of Chicago Press. (Originally published in 1960).
Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In J. Neyman (Ed.), Proceedings
of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, IV (pp. 321–334). Berkeley, CA:
University of Chicago Press.
Rasch, G. (1977). On specific objectivity: An attempt at formalizing the request for generality and validity of scientific
statements. The Danish Yearbook of Philosophy, 14, 58–93.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677–680.
Stevens, S. S. (1986). Psychophysics: Introduction to its perceptual, neural, and social prospects. New York, NY: Wiley.
Stocking, M. L. (1989). Empirical estimation errors in item response theory as a function of test properties. Princeton,
New Jersey: Educational Testing Service.
Terrien, J. (1980). The practical importance of systems of units; their trend parallels progress in physics. In A. F. Milone
& P. Giacomo (Eds.), Proceedings of the International School of Physics “Enrico Fermi” Course LXVIII, Metrology
and Fundamental Constants (pp. 765–769). Amsterdam, Netherlands: North-Holland.
Verhelst, N. D., & Glas, C. A. A. (1995). The one parameter logistic model. In G. H. Fischer & I. W. Molenaar (Eds.),
Rasch models: Foundations, recent developments and applications (pp. 215–237). New York, NY: Springer-Verlag.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450.
Wingersky, M.S. (1983). LOGIST: A program for computing maximum likelihood procedures for logistic test models.
In R. K. Hambleton (Ed.), Applications of item response theory. Vancouver, BC: Educational Research Institute of
British Columbia.
Wood, R. (1978). Fitting the Rasch model—A heady tale. British Journal of Mathematical and Statistical Psychology,
31, 27–32.
24 HUMPHRY
Wright, B. D. (1985). Additivity in psychological measurement. In E. E. Roskam (Ed.), Measurement and personality
assessment. Selected papers, XXIII International Congress of Psychology, Volume 8, (pp. 101–111). Amsterdam,
Netherlands: North-Holland.
Wright, B. D. (1997). A history of social science measurement. Educational Measurement: Issues and Practice, 16,
33–45.
APPENDIX
From Equation (2), the likelihood of xn = (xsn1 , ..., xsni , ..., xSNI ) is given by
⎡ ⎤
exp xni (θn(s) − b(s)
i )
⎣ ⎦
Pr{ xn | θn(s) , b(s)

i }=
s i∈s 1 + exp (θn(s) − b(s)
i )
(A1)
(s)
exp exp −
rsn θn(s) xni bi
s s i
= ,
1 + exp (θn(s) − b(s)
i )
s i∈s
where xn is the vector of responses of person n across items i in set s.
Thus, it follows that

exp rsn θn(s) exp − xni b(s)
i

s s i∈s
1+exp θn(s) −b(s)

i
Pr{ xn | rn ; βsn , δsi } =
s i∈s

(s)
exp rsn θn(s) exp − bi
s (x)| r s i∈s

1+exp θn(s) −δi(s)
s i∈s (A2)

exp − xni b(s)
i
s
= i∈s ,
(s)
exp − bi
(x)| r s i∈s
in which the vector of person parameters is eliminated by partitioning the response space in terms
of the score vector. rn = (r1n , ..., rSn ) where rsn = xni .
i∈s

Fundamental Units and Psychology

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Fundamental Units and Psychology

Загружено:

Авторское право:

Доступные форматы

This article was downloaded by: [McMaster University]

On: 03 November 2014, At: 08:20

Measurement: Interdisciplinary Research

The Role of the Unit in Physics and

To link to this article: http://dx.doi.org/10.1080/15366367.2011.558442

PLEASE SCROLL DOWN FOR ARTICLE

The Role of the Unit in Physics and Psychometrics

an item-discrimination parameter is regarded as undesirable because it destroys sufficiency of

BACKGROUND AND OVERVIEW

Contrasting the Centrality of the Unit in Physical Science Versus Psychometrics

Quantities Versus Numerical Values

in psychometrics. Given the success of physics as a quantitative science, it is prudent to question

Background in IRT: Two Independent Lines of Development

In the 2PL, the discrimination parameter is conceptualized in terms of “the discriminating

DISCRIMINATION IN THE 2PL AND RASCH MODELS

The 2PL and Item Discrimination

The 2PL model has the form

exp (αi (θn − bi ))

The Rasch Model with an Arbitrary Multiplicative Constant

exp (ρ (θn /ρ − bi /ρ))

constant is formulated as a parameter whereas in the Rasch model it is not.

FORMULATING DISCRIMINATION IN TERMS OF THE CLASSICAL DEFINITION

The Logistic Measurement Function

where ρ s is the level of discrimination in the presence of classification s of an empirical factor,

Formulating Discrimination in Terms of the Classical Definition

exp ({ρs } (θn : [u∗ ] − bi : [u∗ ]))

Here, discrimination is defined as

ρs = [u∗ ] : [us ], (6)

Examining Item Response Models in Terms of the Classical Definition

tests of violations of the Rasch model.

RECONCILING PERSPECTIVES: SUFFICIENT STATISTICS

Preserving Sufficiency in the Simplest Case: Pairs of set Pairs

+ Pr {(0, 1) ; (0, 1)} ,

∗ (s) ∗ (t) ∗ (s) ∗ (t)

where γni = (1 + exp(αs θn∗ − b(s)

Pr{xn ∩ (rsn = 1, rtn = 1)}

FIGURE 1 Locations of two items in two sets with different units.

Conditional Maximum Likelihood Estimation of Item Parameters

Maximum Likelihood Estimations for Persons and Item set Discrimination

θn∗ xni = θn∗ ∗ (s)

s = 1, 2, .., S i = 1, 2, .., Is . (16)

Limiting Condition in Reconciling Conflicting Perspectives

Implications of the Role of the Unit for Psychometrics

dimension (kind of quantitative attribute) is measured.

Measurement and Scientific Hypothesis

To exploit the estimation procedures demonstrated in the simulations, in practice, it is necessary

Pr{ xn | θn(s) , b(s)

where xn is the vector of responses of person n across items i in set s.

Thus, it follows that

1+exp θn(s) −b(s)

Вам также может понравиться