Вы находитесь на странице: 1из 10

PROBABILITY PROPORTIONAL TO SIZE (PPS)

SAMPLING WITH REPLACEMENT 1


CONTENTS
1. INTRODUCTION ........................................................................................................................... 2
2. PPS SAMPLE SELECTION METHODS WITH REPLACEMENT ................................................. 2
2.1 Cumulative Total Method ......................................................................................................... 2
2.2 Lahiris Method ......................................................................................................................... 3
3. ESTIMATION PROCEDURES ...................................................................................................... 3
3.1 Theorem: Estimator of population total and its variance .......................................................... 4
_

3.2 Estimate of variance of

y pps i.e. z n ......................................................................................... 5

4. GAIN DUE TO PPS SAMPLING .................................................................................................... 7


5. NUMERICAL ILLUSTRATION: PPS (wr) ...................................................................................... 7
6. FORMULAE TO REMEMBER ....................................................................................................... 9
7. EXERCISE ..................................................................................................................................... 9
8. REFERENCES ............................................................................................................................ 10

Prepared by Dr. V. K. Dwivedi, Department of Statistics, UB for STA 453: Sampling Theory and
Applications

-1-

PROBABILITY PROPORTIONAL TO SIZE SAMPLING


WITH REPLACEMENT
1. INTRODUCTION
In simple random sampling (SRS) the selection probabilities are equal for all the units of
the population. However, the units vary considerably in size and SRS does not take into
account the possible importance of the larger units in the population. Such ancillary
information about the size of the unit can be utilized in selecting the sample so as to get
more efficient estimators of the population parameters. One such method is to assign
unequal probabilities for selection to different units of the population.
For example the villages with larger geographical area are likely to have larger area under
food crops and in estimating the production, it would be desirable to adopt a sampling
scheme in which villages are selected with probability proportional to geographical area.
Definition: Probability Proportional to Size (PPS) Sampling: When units vary in their size
and the variable under study is directly related with the size of the unit, the probability
may be assigned proportional to the size of the unit. This type of sampling where the
probability of selection is proportional to the size of the unit is known as PPS Sampling.

2. PPS SAMPLE SELECTION METHODS WITH REPLACEMENT


2.1 Cumulative Total Method
Let the size of the ith unit be Xi (i= 1,2,,N).
Step 1: We associate number 1 to X1 with the first unit, the numbers (X1+1) to (X1+X2)
with the second unit and so on such that the total of the numbers so associated is X=
X1+X2 ++XN.
Step 2: Then random number R is chosen at random from 1 to X (total of sizes) and the
units with which this number is associated is selected.
Example: The following table shows the households and its respective size. Select a
sample of 3 households using PPS with replacement.
Household
No.
(sampling
unit)
1
2
3
4
5
6
7
8
Total

Household
Size (Xi)

Probability of
selection in
draw (pi)

Cumulative
total

Range leading to
selection of
random number

10
8
7
6
6
5
4
4
X= 50

10/50
8/50
7/50
6/50
6/50
5/50
4/50
4/50
1

10
18
25
31
37
42
46
50

1-10
11-18
19-25
26-31
32-37
38-42
43-46
47-50

Now we select 3 (R) random numbers between 1 and 50. The random numbers selected
are 22, 48 and 03. The units (household No.) associated with these three numbers are 3rd,
8th and 1st respectively. Hence the sample so selected contains with the serial number 1, 3
and 8.

-2-

2.2 Lahiris Method


The main drawback of cumulative procedure is that it involves writing down the
successive cumulative totals which is time consuming and tedious especially if the number
of units in the population is large. Lahiri (1951) has suggested an alternative procedure
which avoids the necessity of writing down cumulative totals.
Lahiris method consist following steps:
Step 1: Select a pair of random number, say (i,j) such that 1iN and 1jXmax, where
Xmax is the maximum of the size of the N units in the population.
Step 2: Examine if jXi, the i-th unit is selected; otherwise it is rejected and another pair of
random number is chosen.
Step 3: For selecting a sample of n units with probability proportional to size with
replacement (PPS wr), the STEPS 1 & 2 are to be repeated till n units are selected.
Example: The following table shows the households and its respective size. Using Lahiri,s
method select a sample of 3 households using PPS with replacement.
Solution
STEP 1: Select a pair of random number, say (i,j) such that 1i8 and 1j10, where X is
the maximum of the size of the N units in the population.
STEP 2: If jXi, the i-th unit is selected; otherwise it is rejected and another pair of random
number is chosen and the procedure would be repeated till 3 households are selected.
Household No.
(sampling unit)
1
2
3
4
5
6
7
8

Household
Size (Xi)
10
8
7
6
6
5
4
4

Pair of random number


chosen
(1,8)
(2,10)
(3,7)

Status
Selected
Not selected
Selected

(5,7)

Not selected

(8,3)

Selected

3. ESTIMATION PROCEDURES
Consider a population of N units and let yi be the value of the characteristics under study
X
of the unit Ui of the population (i=1,2,3...,N). Suppose further that pi = i be the
X
N

probability that the unit Ui is selected in a sample such that

=1 .

i =1

Let n independent selection be made with the replacement method and the value of yi for
each selected unit is observed.
Further, let (yi, pi) be the value and probability of selection of i-th unit of the sample.
y
The random variates i (i=1, 2, 3,,n) are independently and identically distributed.
pi
1
Remark: If pi =
it gives rise to simple random sample. This shows that simple random
N
sampling is a particular case of pps sampling.

-3-

3.1 Theorem: Estimator of population total and its variance


In pps sampling wr an unbiased estimator of population total YN is given by
n

( y

y pps =

/ pi )

i =1

(1)

n
with its sampling variance

pi i YN

i =1
pi

V ( y pps ) =
n
N

(2)

N
Xi
, and X = X i
X
i =1

Where pi =

Proof:
Let us define random variate

zi =

yi
(i=1,2,3,...,n) which are independently and
pi

identically distributed.
n

( y

/ pi )

_
1 n
z
=
z
i n ; for sampling with replacement at each draw,
n
n i =1
N
N
_
y
Now, E ( zi ) = pi zi = pi i = YN = z
(3)
pi
i =1
i =1

Hence, y pps =

i =1

_
1 n
(4)
E ( zi ) = YN = z

n i =1
This shows that in pps sampling wr, sample total y pps is an unbiased estimator of
population total YN,
_

It follows from (3) that E ( z n ) =

Variance of y pps i.e. z n


_2

_
V ( y pps ) = V ( z n ) = E ( z n ) E ( z n )

1 n _
= E zi z ;
n i =1

sin ce E ( z n ) = z

_2
1 n 2 n n
E
z
+
(
z
z
)
i i j z
n 2 i =1
i =1 j ( i ) =1

n
n
_2
1 n
2
= 2 E ( zi ) + E ( zi z j ) z
n i =1
i =1 j ( i ) =1

(5)

Now

E ( zi2 ) = pi zi2

(6)

i =1

-4-

E ( zi z j ) = E ( zi ) E ( z j )

and

since draws are made independently. On substituting for E ( zi ) E ( z j ) from (3) we have
_2

E ( zi z j ) = z

(7)

Substituting (6) and (7) in (5), we obtain,


_

V ( zn ) =

V (zn ) =

1
n2

_2
_2
N
2
n
p
z
+
n
(
n

1)
z

z
i i

i =1

_
1 N
z2
2
2
p
z

z
=
i i

n i
n

(8)

where
N

z2 = pi zi2 z 2 = pi ( zi z )2
i =1

(9)

i =1

From (4) and (8) we see that in terms of y an unbiased estimate of YN and its variance are
given by
_

z n = y pps =

1 n yi

n i pi

(10)

1 N y
V ( z n ) = V ( y pps ) = pi i YN
n i
pi

(11)
Q.E.D.

Remark: Finite multiplier (1-f) does not enter into the expression for the variance of the
estimate when sampling is carried out with replacement.
_

3.2 Estimate of variance of y pps i.e. z n


Est.( z2 )
Est.V ( z n ) = v ( z n ) =
n
2
_
s
v( z n ) = z
n
where
_
1 n
sz2 =
( zi z n ) 2

n 1 i =1
_

(12)
(13)

(14)

An estimator of the V ( y pps ) in terms of y is given by


2

yi

y pps
i =1 pi
=
v ( y pps ) =
n ( n 1)
n

yi
2

n y pps
p
i =1 i
n (n 1)
n

-5-

(15)

Theorem
Show that E ( sz2 ) = z2
i.e sz2 is an unbiased estimator of z2

Proof
Expanding and taking expectations, we get
_2
n
1
E zi2 n z n
n 1 i =1

E ( sz2 ) =

_2
1 n
2
=
E ( zi ) nE ( z n )
n 1 i =1

Now by definition,
_2

(16)

_2

V (zn ) = E( zn ) z
So that
_2

_2

E( zn ) = V ( zn ) + z
_2

E( zn ) =

z2
n

_2

+z

(17)

Also
N

E ( zi2 ) = pi zi2

(18)

i =1

Substituting (17) and (18) in (16) we get


z2 _ 2
1 N
2
2
E ( sz ) =
+ z
n pi zi n
n 1 i =1
n

_2

1 N
2
n
p
z

n
z
z2
i i
n 1 i =1

_2

1 N
2
2
n
p
z

z
i i
z
n 1 i =1

1
n z2 z2 }
{
n 1
( n 1) 2
=
z = z2
n 1

Q.E.D.

Remark: The estimator for population mean can simply be obtained by dividing the
estimator of population total ( ypps ) by N, the size of the population. The corresponding
variance and estimate of variance can be obtained by dividing Vpps and vpps by N2.
_

i.e y

pps

= y pps / N ; and v ( y pps ) =

v ( y pps )
N2

-6-

4. GAIN DUE TO PPS SAMPLING


One may be interested to know whether it is possible to estimate the gain due to pps
sampling, as compared to simple random sampling from the pps sample itself. Yes, gain in
efficiency due to pps sampling over SRS wor can be estimated by
_

v
(
y
pps
srs )

(19)
Percent Gain =
1 x 100
_

v ( y pps )

v ( y pps ) =

v ( y pps )
N2

Where,
sz2
, and
n
_2
1 n 2
1 n
2
2
sz2 =
z

n
z
=
n
i

( yi / pi ) n y pps
n 1 i =1

n 1 i =1
_

v ( y pps ) = v ( z n ) =

(20)

An unbiased estimator of the variance of the estimate based on SRS wor sample on the
basis of pps sample estimate.
_

( N n ) 1 n yi2 1 2
v pps ( y srs ) =

y pps v ( y pps
(21)

Nn( N 1) n i =1 pi N

5. NUMERICAL ILLUSTRATION: PPS (wr)


A sample survey was conducted to study the yield of Maize in a district. A sample of 10
farms from a total of 100 was taken with probability proportional to area under Maize crop
with replacement method. The area under crop (Xi) and yield (y) were noted in hectares
and quintals per hectare respectively. The total area (X) under Maize crop in the region
was 450 hectares.
(i) Estimate the average yield per farm, and (ii) its standard error.
(ii) Estimate the percentage gain in efficiency due to pps sampling compared to SRS
wor.
Solution:
Given, N= 100; X=450; n=10
Farm
No.
1
2
3
4
5
6
7
8
9
10
Total

Area under
crop (Xi)
5.2
5.9
3.9
4.2
4.7
4.8
4.9
6.8
4.7
5.7

Yield(yi)
28
29
30
22
22
25
28
37
26
32

pi = Xi /X
0.0116
0.0131
0.0087
0.0093
0.0104
0.0107
0.0109
0.0151
0.0104
0.0127

zi=yi / pi
2423.08
2211.86
3461.54
2357.14
2106.38
2343.75
2571.43
2448.53
2489.36
2526.32
24939.39

-7-

zi2
5871301.78
4892344.15
11982248.52
5556122.45
4436849.25
5493164.06
6612244.90
5995296.28
6196921.68
6382271.47
63418764.54

y2i / pi
67846.15
64144.07
103846.15
51857.14
46340.43
58593.75
72000.00
90595.59
64723.40
80842.11
700788.79

(i) Estimate of average yield per farm y

= y pps / N

pps

pps = z n = zi / n = 24939.39/10 = 2493.94 quintals


i =1

pps

= y pps / N = 2493.94/100 = 24.94 quintal/farm

(ii) Estimate of standard error mean


SE( y

pps

)=

pps

Now, estimate of variance of y

pps

is

_2
1 n 2
v ( y pps ) =
zi n z n
n(n 1) i

= (63418764.54 -10 x 24.94 x 24.94)/(10 x 9) = 13571.57


_

v ( y pps ) = v ( y pps ) / N 2 =13571.57/1002 = 1.36


Standard error of y

pps

= 1.36 = 1.16

v
(
y
pps
srs )

Percent Gain =
1 x 100
_

v
(
y
)
pps

Where,
_
( N n ) 1 n yi2 1 2
v pps ( y srs ) =
y v( y pps
Nn ( N 1) n i pi N pps

(100 10)
1
1

x 700788.79
(2493.94)2 13571.57}
{

100 x10 x (100 1) 10


100

_
1
8017.27
v pps ( y srs ) =
[70078.88 62061.61] =
1100
1100
_

v pps ( y srs ) =

v pps ( y srs ) = 7.29


7.29
Percent Gain =
1 x 100 = 436.03%
1.36

-8-

6. FORMULAE TO REMEMBER
PPS WR

Sample
Indicator

Estimator

Sample total

Estimate of variance; standard error


n

pps = z n = zi / n

(a)

i =1

v ( y pps ) =

_2
1 n 2
z

n
z
n
i
n(n 1) i

Where zi=yi / pi
(b) Estimate of standard

SE( y pps ) =
Sample mean

pps

= y pps / N

error mean

v ( y pps )

(c)

v ( y pps ) = v ( y pps ) / N 2

(d) Estimate of standard

SE( y

pps

)=

v( y

pps

error mean

GAIN DUE TO PPS SAMPLING


_

_
v
(
y
( N n)
pps
srs )

Where,
v
(
y

1
100
Percent Gain =
x
pps
srs ) =
_

Nn( N 1)
v ( y pps )

1 n yi2 1 2
n p N y pps v ( y pps
i =1 i

7. EXERCISE
Q1.

What is PSS sampling ?

Q2.

The following table shows the households and its respective size.
(a) Select a sample of 4 households using PPS with replacement by(i) Cumulative
total method, and (ii) Lahiri method.
Household
No.
(sampling
unit)
1
2
3
4
5
6
7
8
Total

Household
Size (Xi)

20
16
14
12
12
12
8
8
X= 100

Q 3.
In pps sampling wr show that (i)

-9-

(y

/ pi )

y pps =

n
an unbiased estimator of population total YN .
and, (ii) its sampling variance is given by
N

p (y
i

V ( y pps ) =

/ pi YN )2

Where pi =

n
Xi
and
X

=X

Q 4.
Show that E ( sz2 ) = z2
i.e sz2 is an unbiased estimator of z2
where,
_
1 n
2
and,
sz2 =
(
z

z
n) ,

i
n 1 i
N

z2 = pi ( zi z )2
i

Q 5. A sample survey was conducted to study the yield of Maize in a district. A sample
of 10 farms from a total of 100 was taken with probability proportional to area under
Maize crop with replacement method. The area under crop (Xi) and yield (y) were noted
in hectares and quintals per hectare respectively. The total area (X) under Maize crop in
the region was 450 hectares.
(a) Estimate the average yield per farm, and its standard error.
(b) Estimate the percentage gain in efficiency due to pps sampling compared to
SRS wor.
It is given that:
N= 100; X=450; n=10
zi=(yi / pi)
24939.39

zi2
63418764.54

(y2i / pi)
700788.79

8. REFERENCES
Singh, Daroga and Chaudhary, F. S. (1986): Theory and Analysis of Sample Survey
Designs, Wiley Eastern Limited, New Delhi, India.
Sukhatme, PV and Sukhatme BV (1977) Sampling Theory of Surveys with Applications.
Asia Publishing House, New Delhi (Call #. 519.52SUK).
Munoz Junan (2002): A guide for data management of household surveys- Households
Surveys in Developing and Transition Countries: Design, Implementation and
Analysis (Chapter 15), United Nations Statistics Division (UNSD) Publication.
New York. This publication can be downloaded free of cost from internet
http://unstats.un.org/unsd/HHsurveys
- 10 -

Вам также может понравиться