Академический Документы
Профессиональный Документы
Культура Документы
Prepared by Dr. V. K. Dwivedi, Department of Statistics, UB for STA 453: Sampling Theory and
Applications
-1-
Household
Size (Xi)
Probability of
selection in
draw (pi)
Cumulative
total
Range leading to
selection of
random number
10
8
7
6
6
5
4
4
X= 50
10/50
8/50
7/50
6/50
6/50
5/50
4/50
4/50
1
10
18
25
31
37
42
46
50
1-10
11-18
19-25
26-31
32-37
38-42
43-46
47-50
Now we select 3 (R) random numbers between 1 and 50. The random numbers selected
are 22, 48 and 03. The units (household No.) associated with these three numbers are 3rd,
8th and 1st respectively. Hence the sample so selected contains with the serial number 1, 3
and 8.
-2-
Household
Size (Xi)
10
8
7
6
6
5
4
4
Status
Selected
Not selected
Selected
(5,7)
Not selected
(8,3)
Selected
3. ESTIMATION PROCEDURES
Consider a population of N units and let yi be the value of the characteristics under study
X
of the unit Ui of the population (i=1,2,3...,N). Suppose further that pi = i be the
X
N
=1 .
i =1
Let n independent selection be made with the replacement method and the value of yi for
each selected unit is observed.
Further, let (yi, pi) be the value and probability of selection of i-th unit of the sample.
y
The random variates i (i=1, 2, 3,,n) are independently and identically distributed.
pi
1
Remark: If pi =
it gives rise to simple random sample. This shows that simple random
N
sampling is a particular case of pps sampling.
-3-
( y
y pps =
/ pi )
i =1
(1)
n
with its sampling variance
pi i YN
i =1
pi
V ( y pps ) =
n
N
(2)
N
Xi
, and X = X i
X
i =1
Where pi =
Proof:
Let us define random variate
zi =
yi
(i=1,2,3,...,n) which are independently and
pi
identically distributed.
n
( y
/ pi )
_
1 n
z
=
z
i n ; for sampling with replacement at each draw,
n
n i =1
N
N
_
y
Now, E ( zi ) = pi zi = pi i = YN = z
(3)
pi
i =1
i =1
Hence, y pps =
i =1
_
1 n
(4)
E ( zi ) = YN = z
n i =1
This shows that in pps sampling wr, sample total y pps is an unbiased estimator of
population total YN,
_
_
V ( y pps ) = V ( z n ) = E ( z n ) E ( z n )
1 n _
= E zi z ;
n i =1
sin ce E ( z n ) = z
_2
1 n 2 n n
E
z
+
(
z
z
)
i i j z
n 2 i =1
i =1 j ( i ) =1
n
n
_2
1 n
2
= 2 E ( zi ) + E ( zi z j ) z
n i =1
i =1 j ( i ) =1
(5)
Now
E ( zi2 ) = pi zi2
(6)
i =1
-4-
E ( zi z j ) = E ( zi ) E ( z j )
and
since draws are made independently. On substituting for E ( zi ) E ( z j ) from (3) we have
_2
E ( zi z j ) = z
(7)
V ( zn ) =
V (zn ) =
1
n2
_2
_2
N
2
n
p
z
+
n
(
n
1)
z
z
i i
i =1
_
1 N
z2
2
2
p
z
z
=
i i
n i
n
(8)
where
N
z2 = pi zi2 z 2 = pi ( zi z )2
i =1
(9)
i =1
From (4) and (8) we see that in terms of y an unbiased estimate of YN and its variance are
given by
_
z n = y pps =
1 n yi
n i pi
(10)
1 N y
V ( z n ) = V ( y pps ) = pi i YN
n i
pi
(11)
Q.E.D.
Remark: Finite multiplier (1-f) does not enter into the expression for the variance of the
estimate when sampling is carried out with replacement.
_
n 1 i =1
_
(12)
(13)
(14)
yi
y pps
i =1 pi
=
v ( y pps ) =
n ( n 1)
n
yi
2
n y pps
p
i =1 i
n (n 1)
n
-5-
(15)
Theorem
Show that E ( sz2 ) = z2
i.e sz2 is an unbiased estimator of z2
Proof
Expanding and taking expectations, we get
_2
n
1
E zi2 n z n
n 1 i =1
E ( sz2 ) =
_2
1 n
2
=
E ( zi ) nE ( z n )
n 1 i =1
Now by definition,
_2
(16)
_2
V (zn ) = E( zn ) z
So that
_2
_2
E( zn ) = V ( zn ) + z
_2
E( zn ) =
z2
n
_2
+z
(17)
Also
N
E ( zi2 ) = pi zi2
(18)
i =1
_2
1 N
2
n
p
z
n
z
z2
i i
n 1 i =1
_2
1 N
2
2
n
p
z
z
i i
z
n 1 i =1
1
n z2 z2 }
{
n 1
( n 1) 2
=
z = z2
n 1
Q.E.D.
Remark: The estimator for population mean can simply be obtained by dividing the
estimator of population total ( ypps ) by N, the size of the population. The corresponding
variance and estimate of variance can be obtained by dividing Vpps and vpps by N2.
_
i.e y
pps
v ( y pps )
N2
-6-
v
(
y
pps
srs )
(19)
Percent Gain =
1 x 100
_
v ( y pps )
v ( y pps ) =
v ( y pps )
N2
Where,
sz2
, and
n
_2
1 n 2
1 n
2
2
sz2 =
z
n
z
=
n
i
( yi / pi ) n y pps
n 1 i =1
n 1 i =1
_
v ( y pps ) = v ( z n ) =
(20)
An unbiased estimator of the variance of the estimate based on SRS wor sample on the
basis of pps sample estimate.
_
( N n ) 1 n yi2 1 2
v pps ( y srs ) =
y pps v ( y pps
(21)
Nn( N 1) n i =1 pi N
Area under
crop (Xi)
5.2
5.9
3.9
4.2
4.7
4.8
4.9
6.8
4.7
5.7
Yield(yi)
28
29
30
22
22
25
28
37
26
32
pi = Xi /X
0.0116
0.0131
0.0087
0.0093
0.0104
0.0107
0.0109
0.0151
0.0104
0.0127
zi=yi / pi
2423.08
2211.86
3461.54
2357.14
2106.38
2343.75
2571.43
2448.53
2489.36
2526.32
24939.39
-7-
zi2
5871301.78
4892344.15
11982248.52
5556122.45
4436849.25
5493164.06
6612244.90
5995296.28
6196921.68
6382271.47
63418764.54
y2i / pi
67846.15
64144.07
103846.15
51857.14
46340.43
58593.75
72000.00
90595.59
64723.40
80842.11
700788.79
= y pps / N
pps
pps
pps
)=
pps
pps
is
_2
1 n 2
v ( y pps ) =
zi n z n
n(n 1) i
pps
= 1.36 = 1.16
v
(
y
pps
srs )
Percent Gain =
1 x 100
_
v
(
y
)
pps
Where,
_
( N n ) 1 n yi2 1 2
v pps ( y srs ) =
y v( y pps
Nn ( N 1) n i pi N pps
(100 10)
1
1
x 700788.79
(2493.94)2 13571.57}
{
_
1
8017.27
v pps ( y srs ) =
[70078.88 62061.61] =
1100
1100
_
v pps ( y srs ) =
-8-
6. FORMULAE TO REMEMBER
PPS WR
Sample
Indicator
Estimator
Sample total
pps = z n = zi / n
(a)
i =1
v ( y pps ) =
_2
1 n 2
z
n
z
n
i
n(n 1) i
Where zi=yi / pi
(b) Estimate of standard
SE( y pps ) =
Sample mean
pps
= y pps / N
error mean
v ( y pps )
(c)
v ( y pps ) = v ( y pps ) / N 2
SE( y
pps
)=
v( y
pps
error mean
_
v
(
y
( N n)
pps
srs )
Where,
v
(
y
1
100
Percent Gain =
x
pps
srs ) =
_
Nn( N 1)
v ( y pps )
1 n yi2 1 2
n p N y pps v ( y pps
i =1 i
7. EXERCISE
Q1.
Q2.
The following table shows the households and its respective size.
(a) Select a sample of 4 households using PPS with replacement by(i) Cumulative
total method, and (ii) Lahiri method.
Household
No.
(sampling
unit)
1
2
3
4
5
6
7
8
Total
Household
Size (Xi)
20
16
14
12
12
12
8
8
X= 100
Q 3.
In pps sampling wr show that (i)
-9-
(y
/ pi )
y pps =
n
an unbiased estimator of population total YN .
and, (ii) its sampling variance is given by
N
p (y
i
V ( y pps ) =
/ pi YN )2
Where pi =
n
Xi
and
X
=X
Q 4.
Show that E ( sz2 ) = z2
i.e sz2 is an unbiased estimator of z2
where,
_
1 n
2
and,
sz2 =
(
z
z
n) ,
i
n 1 i
N
z2 = pi ( zi z )2
i
Q 5. A sample survey was conducted to study the yield of Maize in a district. A sample
of 10 farms from a total of 100 was taken with probability proportional to area under
Maize crop with replacement method. The area under crop (Xi) and yield (y) were noted
in hectares and quintals per hectare respectively. The total area (X) under Maize crop in
the region was 450 hectares.
(a) Estimate the average yield per farm, and its standard error.
(b) Estimate the percentage gain in efficiency due to pps sampling compared to
SRS wor.
It is given that:
N= 100; X=450; n=10
zi=(yi / pi)
24939.39
zi2
63418764.54
(y2i / pi)
700788.79
8. REFERENCES
Singh, Daroga and Chaudhary, F. S. (1986): Theory and Analysis of Sample Survey
Designs, Wiley Eastern Limited, New Delhi, India.
Sukhatme, PV and Sukhatme BV (1977) Sampling Theory of Surveys with Applications.
Asia Publishing House, New Delhi (Call #. 519.52SUK).
Munoz Junan (2002): A guide for data management of household surveys- Households
Surveys in Developing and Transition Countries: Design, Implementation and
Analysis (Chapter 15), United Nations Statistics Division (UNSD) Publication.
New York. This publication can be downloaded free of cost from internet
http://unstats.un.org/unsd/HHsurveys
- 10 -