Академический Документы
Профессиональный Документы
Культура Документы
by
Contents
1 Introduction 4
contents continued
3
Contents (continued)
11 Summary 18
12 Conclusion 19
References 20
4
1. Introduction
5
3 Definition of key concepts
Our discussion begins with an overview of the following concepts:
6
Validity, for example often pertains to the contents of a test or measuring
instrument. For instance, if a particular personality trait is been measured by
a certain test, then it is expected that this test actually measures what it is
supposed to measure, otherwise it is invalid.
Grade equivalents are quite similar to age grades. However, instead of just
checking the aptitude or ability of a similar age group, grade norms measure
the standard of test performance for every individual grade depicted in the
normative sample. This means that school performance is measured against
a normative sample from the same class or grade in the school. This is a
more convenient method to test a child or scholar’s academic performance
against a similar grade equivalent (Gregory, 2000, p.71).
7
3.2 Percentiles
Percentiles is a relative measure which can fluctuate between 0 and 100.
A percentile describes the distribution of scores either falling below or
above a particular percentage of sampling scores. For example, we talk
about a 50th percentile (which is also referred to as the “median”) when
a typical score falls below the 50% level and above the 50% level (Rosnow
& Rosenthal, 1999, p. 233)
3.4 Deviation IQ
This norm is often used to measure the scores depicted on intelligence tests.
However, this kind of scale is often misinterpreted by inexperienced persons
wanting to find an easy “labelling” system to describe a persons intelligence
above or below a particular marker of 100. It would be foolish to use one
single IQ test as a final result of ones intelligence or character. For example,
using obsolete tests can either inflate or deflate a participants IQ scores.
8
4.1 Expectancy tables
This kind of table normally reflects a practical eye-view of candidates
predictor results and a specific criterion. For example, expectancy tables test
the relationships between an individual’s test scores and what he or she is
able to accomplish later on in life, whether it be a certain career path or
achieving good grades in an upcoming college entrance exam. However,
such tables also have limitations, mainly because they reflect the results of
large representative group scores, which reflect their present social or school
standards of the time. In fact, such tests, which also include most other norm
tests require constant updates or checks in order to accomplish what it is
supposed to; which is reliability and validity of results (Gregory, 2000, p.
72).
9
In other words, test items can come in different formats and styles such
as multiple-choice questionnaires, true-false items, forced-choice,
closed/open-response and so forth.
The item difficulty value tests a large amount of students correct answers
against a single test question. If a minimum percentage get it wrong, then it
is obvious that the test item is too easy and should be adjusted. The reverse
is also true.
10
6. Reliability of a psychological measure
Reliability according to Gregory (2000) “expresses the relative influence of
true and error scores on obtained test scores.” To understand what reliability
Means, is to try and imagine a scale weighing a kilo of grapes. The
greengrocer, weighs the grapes twice in a row and each time he gets a
slightly different reading, but never the same as the first weighing. In other
words, reliability is not always an absolute measure. There will always be a
slight inconsistency between the first test and the second test. But again,
slight fluctuations between tests is a matter of degrees. Repeating results
helps the tester to confirm some form of accuracy in scores, but this again
will not mean much without validity which will be discussed later on in this
essay.
To test reliability of psychological test scores, the same test should be taken
twice, namely with a test-retest method. We can then test the degree of
variance in the obtained scores with the variance in true scores.
11
6.3 Reliability coefficient
The reliability coefficient is the proportion of true score variance (factors
which are consistent) to the complete total variance of test results. In plain
terms, we add the true score variance (the stable attribute which we are
testing) with the error score variance or errors of measurement.
12
6.5 True variance and error variance in a test score
Briefly, the true variance shows a more homogeneous, internal item
consistency than the error variance. Error variance results from bad content
sampling, such as in alternate-form and split-half reliability, as well as
heterogeneity of the traits under observation. On the other hand, a high
interitem consistency shows a more homogeneous variance with little
inconsistency. For example, if 2 half-tests show 2 different results we speak
about an error variance. This means that both half-tests are inconsistent with
one another.
13
8. Internal consistency of a test – advantages and limitations
Apart from alternate forms reliability and test-retest reliability, there
are other methods to test items for consistency. For example, the split-half
reliability, the Kuder-Richardson 20 and Cronbach’s Alpha.
Of course there are advantages to this method such as lengthening the test to
produce more reliability or studying a large behaviour domain. But, there are
also limitations as to how one can “split” items on a single test. One can try
dividing even and odd numbers or separating easy and difficult items.
However, this becomes a problem when the test developer has to split
drawings or comprehension texts.
8.2 Kuder-Richardson 20
We use the Kuder-Richardson or KR20 (1937) formula if one wants to find
internal consistency of a single administration of one test, such as discussed
in the split-half procedure. What this formula actually does is to test
individual test items as a 0 for wrong and a 1 for right. However, when tests
go beyond the KR20 formula, such as in the testing of heterogeneous items,
we then use the Coefficient Alpha (Cronbach (1951). This formula is
suitable for example, in attitude scales where test persons must rate their
answers as; strongly agree, disagree, and so forth (Gregory, 2000, p.86).
14
9 Validity of a psychological measure
Validity can be described as the degree to which a measure does what it is
supposed to do. In other words, the psychological measure should give a
good indication of well-grounded truth/fact between both the trait been
tested, and the operational definition of the construct. Furthermore, this
measuring instrument must test, and only test what it was designed to do.
For example, it is no use designing an instrument for intelligence scales and
then using the same measure to test for “running speed” ((Blanche &
Durrheim, 2002, p. 83). The following validity procedures will be discussed:
- content validity
- criterion-related validity
- predictive validity
- concurrent validity
Content validation sometimes runs into difficulties when abstract traits, such
as personality and aptitudes have to be tested. It is difficult to give an
accurate test description of something like racism or morals, as these traits
do not fit smugly between the pages of a subject book (Blanche et al, 2002,
p85).
Face validity is another matter to consider. For instance, how does the test
appear to others? Does it look too complicated, or does it have an
unprofessional appearance? Face validity needs to be taken serious if the
measure is going to be accepted by other persons in authority, namely from a
legal and educational point of view (PSY498-/8102).
15
9.2 Criterion-related validity
Criterion-related validity normally correlates with other similar tests or
research. In other words, a researcher who discovers a new form of “job
mobbing” in corporate and industry will compare previous studies in this
field with his/her new findings. There are 2 types of validity measures to
test for criterion validity, namely, predictive validity and concurrent validity.
16
We are looking at both a theoretical and empirical method of establishing
construct validity. Several methods will be discussed under the following.
17
back too many “untidy” test variables to a more efficient economical set of
common traits.
18
10.5 Experimental interventions
Any form of experimental intervention a researcher does will involve
“control” of the test situation. This is done in order to “isolate” common
treatment factors, and remove any unwanted interferences that could
invalidate results. There are numerous research designs to choose from,
such as a standard one-group pretest-posttest design for testing construct
validation in a scholastic test.
Then there are other tests such as the Equivalent Time Series that spread out
over lengthy time periods (Neumann,1997, pp 183-197). What ever test is
chosen, there will always be a certain amount of experimental interference.
The researcher seeks solutions to problems, or tries to find a better
experimental method to test different hypothesis for present and future
generations.
11. Summary
The goal of this essay was to explain what was meant by psychometric
properties of psychological assessment measures. The principles necessary
to psychometric theory were discussed, namely, the different types of norms,
criterion referenced tests, psychological measures and the reliability and
validity of these measures, as well as the advantages and limitations of each
test.
19
12. Conclusion
Psychometric testing of psychological measurements is an extensive
procedure. There are a number of processes involved in assessing
human data which cannot be done in a vacuum. People are human constructs
which do not remain stable over time, that is why researchers are always
testing and retesting their products against the dynamics of man. It is
therefore safe to conclude that no test is a complete test. As this essay has
demonstrated, there are always advantages and limitations to assessment
measures. What works for one test, may not necessarily work for another.
Sometimes it is not a matter of degrees whether a test is supposed to
measure what it is supposed to measure, but how the test sample relates to
the real world of people.
20
References
Neumann, W.L. (1997). Social Research Methods. 3rd edition. Needham Heights:
Allyn & Bacon.