Академический Документы
Профессиональный Документы
Культура Документы
S?
Proof. If ¥ is the mean of a simple random sample of size n,
ve ae nS?
From (8.1), V(¥.y) < V(¥) if and only if
N=1q k=) 2 .N-nS?
Sy Shy Se (8.3)
that is, if
K(n-1)S2,,> >( =") $2= k(n —1)8? (8.4)
This important result, which applies to cluster sampling in general, states that
systematic sampling is more precise than simple random sampling if the variance
within the systematic samples is larger than the population variance as a whole.
Systematic sampling is precise when units within the same sample are heterogene-
ous and is imprecise when they are homogeneous. The result is obvious intuitively.
If there is little variation within a systematic sample relative to that in the
population, the successive units in the sample are repeating more or less the same
information.
Another form for the variance is given in theorem 8.2.SYSTEMATIC SAMPLING 209
Theorem 8.2.
SSN Doe) 5)
Vin) =2-(AS
n
where py is the correlation coefficient between pairs of units that are in the same
systematic sample. It is defined as
E
Pw = (8.6)
Elyy— YY
where the numerator is averaged over all kn(n—1)/2 distinct pairs, and the
denominator over all N values of yj. Since the denominator is (N— 1)S?/N, this
gives
2 k Aa a
Ow = GINS? » & (yj — Yin — YD (8.7)
Proof.
iMs
MKV(Fy) =n” Y (F.-Y!
1
= 3 lon-P+00- + +00 DP
The squared terms amount to the total sum of squares of deviations from Y, that
is, to (N—1)S?. This gives
n7kVJyy) = (N= DS?+2E ¥ (yy- Vu Y) (8.8)
=(N-1)S?+(n-1)(N-1)S’p,, (8.9)
Hence
2 NT
V6n)== (A+ —Deu] (8.10)
This result shows that positive correlation between units in the same sample
inflates the variance of the sample mean. Even a small positive correlation may
have a large effect, because of the multiplier (n — 1).
The two preceding theorems express V(¥,,) in terms of S?, hence relate it to the
variance for a simple random sample. ‘There is an analogue of theorem 8.2 that
expresses V(j,,) in terms of the variance for a stratified random sample in which
the strata are composed of the first k units, the second k units, and so on. In our
notation the subscript j in y, denotes the stratum. The stratum mean is written ;.
Theorem 8.3.
NP) Hn Deal an210 SAMPLING TECHNIQUES
where
1 nk :
ee WJ 8.1
meat ae FP (8.12)
This is the variance among units that lie in the same stratum. The divisor n(k —1)
is used because each of the n strata contributes (k—1) degrees of freedom.
Furthermore,
Srat=
Doon = =F) Yiu = Fn)
wst = =a
F E(yj— 95)
This quantity is the correlation between the deviations from the stratum means of
pairs of items that are in the same systematic sample.
(8.13)
a 2 (vg =F Vin = Fu)
Prat n(n—Ik—Dicice See a
The proof is similar to that of theorem 8.2.
Corollary. A systematic sample has the same precision as the corre-
sponding stratified random sample, with one unit per stratum, if p,,.=0. This
follows because for this type of stratified random sample V(j,,) is (theorem 5.3,
corollary 3)
N-1) Ste
N/Jn
Other formulas for V(y,,), appropriate to an autocorrelated population, have
been given by W. G. and L. H. Madow (1944), who made the first theoretical study
of the precision of systematic sampling.
vi.)=( @.15)
Example. The data in Table 8.3 are for a small artificial population that exhibits a fairly
steady rising trend. We have N= 40, k = 10, n =4. Each column represents a systematic
sample, and the rows are the strata. The example illustrates the situation in which the
“within-stratum” correlation is positive. For instance, in the first sample each of the four
numbers 0, 6, 18, and 26 lies below the mean of the stratum to which it belongs. This is
consistently true, with a few exceptions, in the first five systematic samples. In the last five
samples, deviations from the strata means are mostly positive. Thus the cross-product
terms in p,,,, are predominantly positive. From theorem 8.3 we expect systematic sampling
to be less precise than stratified random sampling with one unit per stratum.
The variance V(¥,,) is found directly from the systematic sample totals as.
eo
we BY)
1s =
Vin) = Vn = EPP
1 2 ede 2
= oo +(58)?-+-» -+(88)
For random and stratified random sampling, we need an analysis of variance of the
population into “between rows” and “within rows.” This is presented in Table 8.4. hence
= 11.63SYSTEMATIC SAMPLING 211
TABLE 8.3
Data For 10 SysTEMATIC SAMPLES WITH 1 = 4, N = kn = 40
Systematic sample numbers Strata
Strata 12 3 4 5 6 7 8 9 10 means
1 OSE eee ee SEO: 4.1
I 6 8 9 10 13 12 15 16 16 17 12.2
IIE 18 19 20 20 24 23 25 28 29 27 23.3
Iv 26 30 31 31 33 32 35 37 38 38 33.1
Totals 50 58 61 63 75 71 82 88 91 88 72.7
TABLE 8.4
ANALYSIS OF VARIANCE
df ss ms
Between rows (strata) 3 4828.3
Within strata 36 485.5 13.49 = S2,,
Totals 39 5313.8 136.25 = S?
the variances of the estimated means from simple random and stratified random samples
are as follows.
N=n\S?_ 9 136.25
Van = (S28) S23 13628. a,
N/n io 4 30
i Neon) Sia 9 13.49
v= ("5 AicHO cela a
Both stratified random sampling and systematic sampling are much more
effective than simple random sampling but, as anticipated, systematic sampling is
less precise than stratified random sampling.
Table 8.5 shows the same data, with the order of the observations reversed in
the second and fourth strata. This has the effect of making p,,,, negative, because it
makes the majority of the cross products between deviations from the strata
means negative for pairs of observations that lie in the same systematic sample. In
the first systematic sample, for instance, the deviations from the strata means are
now —4.1, +4.8, —5.3, +4.9. Of the six products of pairs of deviations, four are
negative. Roughly the same situation applies in every systematic sample.
This change does not affect V,,,, and V,,. With systematic sampling, it brings
about a dramatic increase in precision, as is seen when the systematic sample totals
in Table 8.5 are compared with those in Table 8.3. We now have
ee 7
V, = lesa +++ (65) —
9 = TE 0.46212 SAMPLING TECHNIQUES
TABLE 8.5
Data IN TaBLE 8.3, WITH THE ORDER REVERSED IN STRATA II AND IV
Systematic sample numbers Strata
Strata 1 2 3 4 5 6 7 8 9 10 means
I ool meee 4.1
IL 17 16 16 15 12 13 10 9 8 6 12.2
TEL 18 19 20 20 24 23 25 28 29 27 23.3
IV 38 38 37 35 32 33 31%31 30 26 33.1
Totals 73°74 74 72 73 73 73 75 75 65 72.7
It is sometimes possible to exploit this result-by numbering the units to create
negative correlations within strata. Accurate knowledge of the trends within the
population is required. However, as will be seen later, the situation in Table 8.5 is
one in which it is difficult to obtain from the sample a good estimate of the
standard error of f.y.
8.4 COMPARISON OF SYSTEMATIC WITH STRATIFIED
RANDOM SAMPLING
The performance of systematic sampling in relation to that of stratified or
simple random sampling is greatly dependent on the properties of the population.
There are populations for which systematic sampling is extremely precise and
others for which it it is less precise than simple random sampling. For some
populations and some values of n, V(¥,,) may even increase when a larger sample
is taken—a startling departure from good behavior. Thus it is difficult to give
general advice about the situations in which systematic sampling is to be recom-
mended. A knowledge of the structure of the population is necessary for its most
effective use.
Two lines of research on this problem have been followed. One is to compare
the different types of sampling on artificial populations in which y; is some simple
function of i. The other is to make the comparisons for natural populations. Some
of the principal results are presented in the succeeding sections.
8.5 POPULATIONS IN “RANDOM” ORDER
Systematic sampling is sometimes used, for its convenience, in populations in
which the numbering of the units is effectively random. This is so in sampling fiom
a file arranged alphabetically by surnames, if the item that is being measured has
no relation to the surname of the individual. There will then be no trend orSYSTEMATIC SAMPLING 213
stratification in y; as we proceed along the file and no correlation between
neighboring values.
In this situation we would expect systematic sampling to be essentially equival-
ent to simple random sampling and to have the same variance. For any single finite
population, with given values of n and k, this is not exactly true, because V,,,
which is based on only k degrees of freedom, is rather erratic when k is small and
may turn out to be either greater or smaller than V,,,,. There are two results which
show that on the average the two variances are equal.
Theorem 8.4. Consider all N! finite populations that are formed by the N!
permutations of any set of numbers y;, y2,..., yw. Then, on the average over
these finite populations,
E( Voy) = Vian (8.16)
Note that V,,, is the same for all permutations.
This result, proved by W. G. and L. H. Madow (1944), shows that if the order of
the items in a specific finite population can be regarded as drawn at random from
the N! permutations, systematic sampling is on the average equivalent to simple
random sampling.
The second approach is to regard the finite population as drawn at random from
an infinite superpopulation which has certain properties. The result that is proved
does not apply to any single finite population (i.e., to any specific set of values
Y1, Ya)» » » Yn) but to the average of all finite populations that can be drawn from
the infinite population.
The symbol @ denotes averages over all finite populations that can be drawn
from this superpopulation.
Theorem 8.5. If the variates y; (i= 1, 2,..., N) are drawn at random from a
superpopulation in which
By=H, Bi-w)yj-w)=0 4A), EOi-u)=07
Then
EV = BV ian
The crucial conditions are that all y; have the same mean y, that is, there is no
trend, and that no linear correlation exists between the values y; and y, at two
different points. The variance o;? may change from point to point in the series.
Proof. For any specific finite population,
x y\2
tie 2 Y)
Nn N-1
Vian =214 SAMPLING TECHNIQUES
Now
N a N a 7
ZO YP = Zio.) -(P-w)]
X 2
= 2 0.- way
Since y; and y; are uncorrelated (i #/),
#(P-w)=55 Lo? (8.17)
Hence
N-n (5 wiz)
8V on = Kan = Det NP (8.18)
This gives
No" $2
Vian i Nn ye a
Turning to V,,, let ¥,, denote the mean of the uth systematic sample. For any
specific finite population,
x (iu YY (8.20)
= ¥ Gam) -KP-H) | (8.21)
By the theorem for the variance of the mean of an uncorrelated sample from an
infinite population,
i=
BV =e
N
=i we Loe = EVran (8.23)
(8.22)
n? N?
|
{Lor & al
8.6 POPULATIONS WITH LINEAR TREND
If the population consists solely of a linear trend, as illustrated in Fig. 8.2, it is
fairly easy to guess the nature of the results. From Fig. 8.2, it looks as if V., and Vi,
(with one unit per stratum) will both be smaller than V,,,,. Furthermore, v, will be
larger than V,,, for if the systematic sample is too low in one stratum it is too low inSYSTEMATIC SAMPLING 215
nm
x. = systematic sample
© = stratified random sample
Fig. 8.2 Systematic sampling in a population with linear trend.
all strata, whereas stratified random sampling gives an opportunity for within-
stratum errors to cancel.
To examine the effects mathematically, we may assume that y; =i. We have
N + N
§ NW 1) § p-NWW+DON+)
1 2 it 6
The population variance S$? is given by
(Ly?-NY?)
eee. [M+ DONS) NINSI") NING) (8.24)
N-1 6 4 J 2 ;
Hence the variance of the mean of a simple random sample is
at 2 es sas
Vin = n S* _n(k-1) NIN+1)_(kK-1)(N+1) (8.25)
Non N 12n 12
To find the variance within strata, S,,, we need only replace N by k in (8.24).
This gives
N- Sw _n(k=1) K(k +1) _(k?-1)
ta N on nk 12n 12n
(8.26)
For systematic sampling, the mean of the second sample exceeds that of the first
by 1; the mean of the third exceeds that of the second by 1, and so on. Thus the
means y, may be replaced by the numbers 1,2,...,k. Hence, by a further
application of (8.24),
k(k?—-1)
k
ari
EG ¥) 12216 SAMPLING TECHNIQUES
This gives
Sesser kad
Voy = Eh ua YY =F (8.27)
From the formulas (8.25), (8.26), and (8.27) we deduce, as anticipated,
k=1 _k-1 _(k-1)(N+1)
Ty SV = Gg S Veen = 49 ea
Equality occurs only when n = 1. Thus, for removing the effect of a linear trend,
suspected or unsuspected, the systematic sample is miuch more effective than the
simple random sample but less effective than the stratified random sample.
8.7 METHODS FOR POPULATIONS WITH LINEAR TRENDS
The performance of systematic sampling in the presence of a linear trend can be
improved in several ways. One is to use a centrally located sample. Another is to
change the estimate from an unweighted to a weighted mean in which all internal
members of the sample have weight unity (before division by n) but different
weights are given to the first and last members. If the random number drawn
between 1 and k is i, these weights are
n(2i-k-1)
2n-1)k
the + sign being used for the first member, the — sign for the last. For any i, the two
weights obviously add to 2. The reader may verify that if the population consists of
a linear trend and N = nk the weighted sample mean gives the correct population
mean. The performance of these end corrections has been examined by Yates
(1948), to whom they are due.
Bellhouse and Rao (1975) have extended the Yates corrections to the case
N-#nk when the systematic sample is drawn by Lahiri’s circular method (section
8.1), which guarantees constant n. As before, the weights different from 1 are
applied to the first and last sample numbers in the original serial order of the
population. For example, if the starting random number in drawing the sample is
19 with N= 23, n=5, units 19, 1, 6, 11, 16 constituting the sample, the first and
last members are y; and yj9. Two cases arise.
Case 1. Small i for which i+(n—1)k =N, so that the n units are obtained
without passing over yy. The weights for the first (+) and last (—) members are
n[2it+(n—1)k—(N+1)]
2(n-Dk
Case 2. i+(n—1)k>N. Let n, be the number of sample units obtained after
passing over yy. Thus, with i = 19, n. =4. The weights for the first (+) and last (—)
1+ (8.29)
1+ (8.30)SYSTEMATIC SAMPLING 217
members are
*TNWa apt )k-(W+1)-2n.*] (8.31)
Jn both cases the internal sample members receive weight 1 in the sample total.
With N= 23, n=k=S,i=19, n.=4, the first and last weights are 1+(—7/18).
Hence y; receives a weight 11/18, while y,, receives 25/18.
Two alternative methods attempt to change the method of sample selection so
that the sample mean is unaffected by a linear trend. With N= nk and n even, a
method suggested by Sethi (1965) divides the population into n/2 strata of size 2k,
choosing two units equidistant from the end of each stratum. With starting
random number i, the n/2 pairs of units are those numbered
[i+2yk, 2G+D)k-i+1], 7=0,1,2,...4n-1 (8.32)
This selection removes the effect of a linear trend in any stratum of 2k units,
even if the linear slope varies from stratum to stratum. Murthy (1967) has called
the method balanced systematic sampling.
The modified method of Singh et al. (1968) chooses pairs of units equidistant
from the ends of the edanoone With n even, the n/2 equidistant pairs that start
with unit i (i=1,2,..., k) are
[i+jk,(N-jk)-i+1], 7=0,1,2,...3n-1 (8.33)
With n odd in these methods, j goes up to 3(n — 1)—1 in (8.32) and (8.33). The
balanced method (8.32) adds the remaining sample member near the end at
[i+ (n - 1k]; the modified method near the middle at [i +3(n — 1)k]. The effect of
a linear trend is not completely eliminated in y for n odd.
Comparisons of the performances of these two methods with Yates’ corrections
and with ordinary systematic sampling have been made on superpopulation
models representing linear and parabolic trends, periodic and autocorrelated
variation (Bellhouse and Rao, 1975), and on a few small natural populations by
these authors and by Singh (personal communication). In general the three
methods (Yates, balanced, modified) performed similarly, being superior to
ordinary systematic sampling in the presence of a linear or parabolic trend.
The population in Table 8.3, p. 211, for example, is one on which these methods
should perform very well. Ordinary systematic sampling gave V,, = 11.63. Com-
parable variances for the other methods (n = 4, k = 10) are: Yates, 1.29; Sethi
(balanced), 0.46; Singh (modified), 0.34. The balanced method happens to be that
obtained in Table 8.5 by reversal of strata II and IV in Table 8.3.
8.8 POPULATIONS WITH PERIODIC VARIATION
If the population consists of a periodic trend, for example, a simple sine curve,
the effectiveness of the systematic sample depends on the value of k. This may be218 SAMPLING TECHNIQUES
seen pictorially in Fig. 8.3. In this representation the height of the curve is the
observation y;. The sample points A represent the case least favorable to the
systematic sample. This case holds whenever k is equal to the period of the sine
curve or is an integral multiple of the period. Every observation within the
systematic sample is exactly the same, so that the sample is no more precise than a
single observation taken at random from the population.
Fig. 8.3 Periodic variation.
The most favorable case (sample B) occurs when’ k is an odd multiple of the
half-period. Every systematic sample has a mean exactly equal to the true
population mean, since successive deviations above and below the middle line
cancel. The sampling variance of the mean is therefore zero. Between these two
cases the sample has various degrees of effectiveness, depending on the relation
between k and the wavelength.
Populations that exhibit an exact sine curve are not likely to be encountered in
practice. Populations with a more or less definite periodic trend are, however, not
uncommon. Examples are the flow of road traffic past a point on a road over 24
hours of the day and store sales over seven days of the week. For estimating an
average over a time period, a systematic sample daily at 4 p.m. or every Tuesday
would obviously be unwise. Instead, the strategy is to stagger the sample over the
periodic curve, for example, by seeing that every weekday is equally represented
in the case of store sales.
Some populations have a kind of periodic effect that is less obvious. A series of
weekly payrolls in a small sector of a factory may always list the workers in the
same order and may contain between 19 and 23 names every week. A systematic
sample of 1 in 20 names over a period of weeks might consist mainly of the records
of one worker or of the records of two or three workers. Similarly, a systematic
sample of names from a city directory might contain too many heads of house-
holds, or too many children. If there is time to study the periodic structure, a
systematic sample can usually be designed to capitalize on it. Failing this, a simple
or stratified random sample is preferable when a periodic effect is suspected but
not well known.
In some natural populations quasiperiodic variation may be present that would
be difficult to anticipate. L. H. Madow (1946) found evidence pointing this way in
a bed of hardwood seedling stock in a rather small population (N = 420). Finney
(1950) discussed a similar phenomenon in timber volume per strip in the Dehra
Dun forest, although in a reexamination of the data Milne (1959) suggested thatSYSTEMATIC SAMPLING 219
the apparent periodicity might have been produced by the process of measure-
ment. The effect of quasiperiodicity is that systematic sampling performs poorly at
some values of n and particularly well for others. Whether this effect occurs
frequently is not known. Matérn (1960) cites examples in which natural forces
(e.g., tides) might produce a spatial periodic variation, but he is of the opinion that
no clear case has been found in forest surveys.
8.9 AUTOCORRELATED POPULATIONS
With many natural populations, there is reason to expect that two observations
yi, yj Will be more nearly alike when i and j are close together in the series than
when they are distant. This happens whenever natural forces induce a slow change
as we proceed along the series. In a mathematical model for this effect we may
suppose that y; and y, are positively correlated, the correlation between them
being a function solely of their distance apart, i — j, and diminishing as this distance
increases. Although this model is oversimplified, it may represent one of the
salient features of many natural populations.
In order to investigate whether this model does apply to a population, we can
calculate the set of correlations p, for pairs of items that are u units apart and plot
this correlation against u. This curve, or the function it represents, is called a
correlogram. Even if the model is valid, the correlogram will not be a smooth
function for any finite population because irregularities are introduced by the
finite nature of the population. In a comparison of systematic with stratified
random sampling for this model these irregularities make it difficult to derive
results for any single finite population. The comparison can be made over the
average of a whole series of finite populations, which are drawn at random from an
infinite superpopulation to which the model applies. This technique has already
been applied in theorem 8.5 and in sections 6.7, 7.8.
Thus we assume that the observations y,; (i=1,2,...,N) are drawn from a
superpopulation in which
B=H, EYi-wY=07, C-wOiu-w)=—,07 —(8.34)
where
Pu = Py = 0, whenever u