Вы находитесь на странице: 1из 5

UECM3253 Applied Nonparametric Statistics

Applied Nonparametric Statistics


UECM3253
References:
1) Wayne W. Daniel (1990), Applied Nonparametric Statistics, 2nd edition, Canada:
Duxbury Thomson Learning.
2) W.J. Conover (1999), Practical Nonparametric Statistics, 3rd edition, New York: John
Wiley & Sons, Inc.

CHAPTER 1
INTRODUCTION

1.0) Preliminaries
The subject of statistics encompasses a wide variety of activities, ideas and results.
Practitioners of the science of statistics usually acknowledge that it has two broad subdivisions:
descriptive statistics and inferential statistics (inductive statistics).

Descriptive statistics Inductive statistics (Statistical inference)


• Relate only to the calculation or presentation • Based on fairly specific assumptions
of figures (visual or conceptual) to regarding the nature of the underlying
summarize or characterize a set of data population distribution. Its form and
• No assumptions are made some parameter values must be stated.

• No question of legitimacy of techniques


• Able to make evaluations or probability
statements concerning the accuracy of
• Eg: mean, median, variance, range, an estimate or reliability of a decision.
histogram etc. • Eg: estimation and hypothesis testing.

Parameter is generally employed to connote a characteristic of the population. It is often an


unspecified constant appearing in a family of probability distributions.

Example 1.1: mean and variance are parameters for a normal distribution X ~ N   ,   .
2

1.1) Nonparametric statistics


The first use of what we would now call a nonparametric statistical procedure seems to
have been reported in 1710 by John Arbuthnot. Uses of such procedures were conspicuously
1
UECM3253 Applied Nonparametric Statistics

sparse until the 1940s. the word nonparametric appeared for the first time in 1942 in a paper by
Wolfowitz. Since then, the growth of interest in both the theory and the application of
nonparametric statistics has been rapid. Nonparametric statistics is currently on of the most
important branches of statistics. The techniques that fall within this category of statistics are used
in most, if not all, of the physical, biological, and social sciences.

A nonparametric procedure is a statistical procedure that has desirable properties that


hold under relatively mild assumptions regarding the underlying population(s) from which the
data are obtained. The rapid development of nonparametric statistical procedures may be traced
in part to:-

(i) Nonparametric methods require few assumptions about the underlying populations
from which the data are obtained. In particular, nonparametric procedures forgo the
traditional assumption that the underlying populations are normal.
(ii) Nonparametric techniques are often easier to apply than their normal theory
counterparts.
(iii) Nonparametric procedures are often quite easy to understand.
(iv) Nonparametric procedures are applicable in situations where the normal theory
procedures cannot be utilized. For example, many of the procedures require not the
actual magnitudes of the observations, but rather, their ranks.
(v) Although at first glance most nonparametric procedures seem to sacrifice too much
of the basic information in the samples, theoretical investigations have shown that
this is not the case. More often than not, the nonparametric procedures are only
slightly less efficient than their normal theory counterparts when the underlying
populations are normal, and they can be mildly and wildly more efficient than these
competitors when the underlying populations are not normal.

Nonparametric (Distribution Free) Procedures: the methods are based on functions of the
sample observations whose corresponding random variable has a distribution which does not
depend on the specific distribution function of the population from which the sample was drawn.

Nonparametric test implies a test for a hypothesis which is not a statement about parameter
values.

Nonparametric inferences generally relates to some function of the actual magnitudes of the
random variables in the sample.

1.1.1) Advantages of nonparametric statistics:


(i) Since most nonparametric procedures depend on a minimum of assumptions, the
change of their being improperly used is small.
(ii) For some nonparametric procedures, the computations can be quickly and easily
performed, especially if calculations are done by hand. Thus using them saves
computation time. This can be an important consideration if results are needed in a
hurry or if high-powered calculation devices are not available.
(iii) Researchers with minimum preparation in mathematics and statistics usually find the
concepts and methods of nonparametric procedures easy to understand.

2
UECM3253 Applied Nonparametric Statistics

(iv) Nonparametric procedures may be applied when the data are measured on a weak
measurement scale, as when only count data or rank data are available for analysis.

1.1.2) Disadvantages of nonparametric statistics:


(i) Because the calculations needed for most nonparametric procedures are simple and
rapid, these procedures are sometimes used when parametric procedures are more
appropriate. Such a practice often wastes information.
(ii) Although nonparametric procedures have a reputation for requiring only simple
calculations, the arithmetic in many instances is tedious and laborious, especially
when samples are large and a computer is not handy.

1.1.3) When to use NP procedures:


(i) The hypothesis to be tested does not involve a population parameter.
(ii) The data have been measured on a scale weaker than that required for the parametric
procedure that would otherwise be employed. For example, the data may consist of
count data or rank data.
(iii) The assumptions necessary for the valid use of a parametric procedure are not met. In
many instances, the design of a research project may suggest a certain parametric
procedure. Examination of the data, however, may reveal that one or more
assumptions underlying the test are grossly violated. In that case, a nonparametric
procedure is frequently the only alternative.
(iv) Results are needed in a hurry, a computer is not readily available, and calculations
must be done by hand.

1.2) Some important terminology

1.2.1) Measurement Scales


Stevens defines four types of measurement scale: nominal, ordinal, interval and ratio.

(i) Nominal Scale:


The nominal scale is the weakest of the four measurement scales. As its name implies,
the nominal scale distinguishes one object or event from another on the basis of a name. Thus,
we may classify (name) items coming off an assembly line as defective or nondefective. A
newborn infant is male or female. Patients in a tuberculosis hospital may be normal, TB virus
affected, cured and unclassified.
Frequently we use arbitrary numbers, rather than names in the usual sense, to distinguish
among objects or events on the basis of some characteristic. For example, we may use the
number 1 to designate defective items coming off an assembly line and 0 to designate
nondefective items. Usually we use the nominal scale when we are interested in the number of
objects falling into each of the various nominal categories. For example, we may want to know
how many patients in a tuberculosis hospital are diagnosed as a cured. Data of this type are
frequently referred to as count data, frequency data, or categorical data.

(ii) Ordinal Scale


The next-most-precise measurement scale is the ordinal scale. We distinguish objects or
events measured on the ordinal scale from one another on the basis of the relative amounts of

3
UECM3253 Applied Nonparametric Statistics

some characteristic they possess. Ordinal measurement scale makes it possible for objects to be
ranked. Salespersons, for example, can be ranked from “poorest” to “best” on the basis of their
personalities. Beauty contestants can be ranked from least beautiful to most beautiful. Illnesses
can be ranked from least severe to most severe. If we are to rank n objects on the basis of some
trait, we may assign the number 1 to the object having the least amount of that trait, the number 2
to the object containing the next-smallest amount, and so on to n, the object with the largest
amount of the trait under consideration. Data of this type are frequently referred to as rank data.
The differences between rankings are not necessarily equal. For example, three students
taking an examination may be ranked first, second and third on the basis of the order in which
they complete the examination. This does not mean, however, that the time elapsing between
completion of the examination by number 1 and by number 2 is the same as that between number
2 and number 3. The student finishing first may, for example, finish 5 minutes before the second
student, who, in turn, may finish 8 minutes before the third. If we have only the ranks available
for analysis, we do not know the magnitudes of the differences between measurements that are
ranked.

(iii) Interval Scale


When objects or events can be distinguished one from another and ranked, and when the
differences between measurements also have meaning or there is a fixed unit of measurement,
the interval scale of measurement is applicable. The true interval scale has a zero point, but it is
arbitrary. A familiar example of interval measurement is the measurement of temperature. The
zero point does not indicate an absence of temperature, the trait being measured.
Suppose, for example, that 4 objects A, B, C, and D are assigned scores of 20, 30, 60 and
70, respectively, where measurement is on the interval scale. Since we used an interval scale, we
can say that the difference between 20 and 30 is equal to the difference between 60 and 70; that
is, equal distances between the members of each of two pairs of scores indicate equal differences
in the amount of the trait being measured. The interval scale, however, doest not permit us to
speak meaningfully about the ratios of two scores. In our example, we cannot say that a score of
60 for C and a score of 30 for B means that C has twice as much of the trait as B.

(iv) Ratio Scale


When measurements have the properties of the first three scales and the additional
property that their ratios are meaningful, the scale of measurement is the ratio scale. A property
of the ratio scale is a true zero, indicating a complete absence of the trait being measured. The
familiar measurement of height and weight are examples of measurement on the ratio scale. We
can say that a person who weighs 180 pounds weighs 60 pounds more than a person who weighs
120 pounds. With a ratio scale, we can also say that a 180-pound person weighs twice as much as
a 90-pound person. The ratio scale represents the highest level of measurement.

-- End of Chapter 1 --

4
UECM3253 Applied Nonparametric Statistics