Академический Документы
Профессиональный Документы
Культура Документы
CLASS NOTE
SANGKAR ROY
JAHANGIRNAGAR UNIVERSITY,BANGLADESH.
EMAIL:sankar1604@gmail.com
Estimation ~ 1 of 22
Estimation
Sufficient Statistic: Let X1 , X 2 , , Xn be a random sample from the density f (⋅ ; θ ) , where θ may
conditional distribution of X1 , X 2 , , Xn given T=t does not depend on θ for any value t of T .
Note: This definition of a sufficient statistic is not very workable. First, it does not tell us which
statistic is likely to be sufficient and second, it requires us to derive a conditional distribution which
may not be easy, especially for continuous random variables. For this reason we may use
Factorization Criterion that may aid us in finding sufficient statistics.
Another Definition: Let X1 , X 2 , , Xn be a random sample from the density f (⋅ ; θ ) . A statistic
Note: This definition is particularly useful in showing that a particular statistic is not sufficient. For instance,
to prove that a statistic T ′ = t ′ ( X 1 , X 2 , , X n ) is not sufficient, one needs only to find another statistic
Jointly Sufficient Statistics: Let X1 , X 2 , , X n be a random sample form the density f ( ⋅ ; θ ) . The
statistics T1 , , Tr are defined to be jointly sufficient if and only if the conditional distribution of
Concept of Sufficient Statistic: A sufficient statistic is a particular kind of statistic. It is a statistic that
condenses Ω is such a way that no “information about θ ” is lost. The only information about the parameter
θ in the density f ( ⋅ ; θ ) from which we sampled is contained in the sample X 1 , X 2 , , X n ; so, when we say
that a statistic loses no information, we mean that it contains all the information about θ that is contained in
the sample. We emphasize that the type of information of which we are speaking is that information about θ
contained in the sample given that we know the form of the density; that is, we know the function f ( ⋅ ; ⋅) in
f ( ⋅ ; θ ) , and the parameter θ is the only unknown. We are not speaking of information in the sample that
might be useful in checking the validity of out assumption that the density does indeed have form f ( ⋅ ; ⋅) .
Example: Let X1 , X 2 , X 3 be a sample of size 3 from the Bernoulli distribution. Consider the two statistics
S = s ( X 1 , X 2 , X 3 ) = X 1 + X 2 + X 3 and T = t ( X 1 , X 2 , X 3 ) = X1 X 2 + X 3 . We have to show that s ( ⋅, ⋅, ⋅) is
Values of Values of f ( x1 , x2 , x3 | S ) f ( x1 , x2 , x3 | T )
S T
Estimation ~ 2 of 22
1− p
( 0, 0, 0 ) 0 0 1
1+ p
1− p
( 0, 0, 1) 1 1 1
3 1+ 2 p
p
( 0, 1, 0 ) 1 0 1
3 1+ p
p
(1, 0, 0 ) 1 0 1
3 1+ p
p
( 0, 1, 1) 2 1 1
3 1+ 2 p
p
(1, 0, 1) 2 1 1
3 1+ 2 p
p
(1, 1, 0 ) 2 1 1
3 1+ 2 p
(1, 1, 1) 3 2 1 1
P [ X1 = 0; X 2 = 1; X 3 = 0; T = 0]
and f x1 , x2 , x3 |T = 0 ( 0, 1, 0 | 0 ) =
P [T = 0]
(1 − p )2 p
=
(1 − p )3 + 2 p (1 − p )2
p p
= =
1− p + 2 p 1+ p
The conditional distribution of the sample given the values of S is independent of p ; so S is a sufficient
statistic. However, the conditional distribution of the sample given the values of T depends on p ; so T is
not sufficient.
Factorization Theorem (Single Sufficient Statistic): Let X1 , X 2 , , X n be a random sample of size n
n
sufficient if and only if the joint density of X 1 , X 2 , , X n , which is ∏ f ( xi ;θ ) , factors as
i =1
f x1 , , xn { x1 , , xn ;θ } = g {t ( x1 , x2 , , xn ) ;θ } ⋅ h{ x1 , x2 , , xn }
= g {t ;θ } ⋅ h{ x1 , x2 , , xn }
where the function h{ x1 , x2 , , xn } is nonnegative and does not involve the parameter θ and the function
g {t ( x1 , x2 , , xn ) ;θ } is nonnegative and depends on x1 , , xn only through the function t ( ⋅, , ⋅) .
Estimation ~ 3 of 22
Factorization Theorem (Jointly Sufficient Statistics): Let X1 , X 2 , , X n be a random sample of size
n
X1 , X 2 , , X n , which is ∏ f ( xi ;θ ) , can be factored as
i =1
f x1 , , xn { x1 , , xn ;θ } = g {t1 ( x1 , x2 , , xn ) , , tr ( x1 , x2 , , xn ) ;θ } ⋅ h{ x1 , x2 , , xn }
= g {t1 , , tr ;θ } ⋅ h{ x1 , x2 , , xn }
where the function h{ x1 , x2 , , xn } is nonnegative and does not involve the parameter θ and the function
N.B.: To get more about this topic, see Mood, Graybill, Boes; Introduction to the Theory of Statistics, P-300-
311.
Efficient Estimator: If x1 , x 2 , , x n be a sample drawn from the population with density f (x ;θ ) and t be
a unbiased consistent estimator of θ . If the variance of t is less than the variance of all other estimators, then
t is said to be the most efficient estimator of θ , simply called efficient estimator of θ . The efficiency of an
estimator can be written as
Var ( Most efficient estimator )
c=
Var ( Given estimator )
Regular Distribution: The joint p.d . f . of X ’s is said to be regular with respect to its first θ derivative,
where X ~ f (x ;θ ) θ ∈ Ω i.e.,
∞
∫ f (x ;θ )dx = 1
−∞
∂f (x ;θ )
∞
⇒ ∫ ∂θ
dx = 0
−∞
∂f (x ;θ )
∞
⇒ ∫ ∂θ ⋅ f (x ;θ ) f (x ;θ )dx = 0
−∞
∂ ln f (x ;θ )
∞
⇒ ∫ f (x ;θ )dx = 0
−∞
∂θ
This is called regular distribution.
Regular Estimator and Regularity Condition: Let X1 , , X n be a random variables having the joint
the following regularity conditions hold then the statistic t ( X1 , , X n ) is known as the regular estimator of
θ ∈Θ .
i) θ lies in a non-degenerate open interval Θ in the real line; Θ may be infinite;
∂fθ ( x )
ii) exists ∀ θ ∈ Θ ;
∂θ
iii) ∫ fθ ( x ) dx can be differentiated with respect to θ under the integral sign;
iv) ∫ t ( x ) fθ ( x ) dx can be differentiated under the integral sign;
Estimation ~ 4 of 22
2
⎡ ∂ ln fθ ( x ) ⎤
v) Eθ ⎢ ⎥ Exists and is positive ∀ θ ∈ Θ .
⎣ ∂θ ⎦
Best Regular Unbiased Estimator (BRUE): In any regular estimation case, the efficiency of an
unbiased regular estimator tn ( X1 , , X n ) is
1
⎧⎪ ⎛ ∂ ln f ( X | θ ) ⎞ ⎫⎪
2
⎨ θ⎜
nE ⎟ ⎬
⎪⎩ ⎝ ∂θ ⎠ ⎪⎭
eθ ( tn ) =
Varθ ( tn )
Note: In any regular estimation case, 0 ≤ eθ ( tn ) ≤ 1 . We have eθ ( tn ) ≡ 1 iff Varθ ( tn ) achieves the lower
bound for all θ . In any regular estimation case, the asymptotic efficiency of an unbiased regular estimator
tn ( X 1 , , X n ) is lim eθ ( tn ) .
n →∞
N.B.: From the Chapter Consistency and Efficient Estimator of Third Year Note, we have to read Example
of efficient and sufficient estimator, Fisher’s Information, Raw-Cramer Inequality and others.
Generalized Rao-Cramer Inequality: See in the chapter of Asymptotically Most Efficient Estimator of
Third Year Note. (Ref. Kendal, Stuart; the Advanced Theory of Statistics, P-12)
Bhattacharyya Inequality: See in the chapter of Asymptotically Most Efficient Estimator of Third Year
Note. (Ref. Kendal, Stuart; The Advanced Theory of Statistics, P-12-15)
Chapman, Robbins and Kiefer Inequality: This inequality gives a lower bound for the variance of an
estimate but does not require regularity conditions like Rao-Cramer Inequality.
Statement: Suppose that X = ( x1 , x2 , , xn ) be random variables with joint density or frequency function
( )
estimate of τ (θ ) with Eθ T 2 < ∞ for all θ ∈ Ω . If θ ≠ ψ , assume that fθ and fψ are different and assume
⎡ fψ ( x ) − fθ ( x ) ⎤
∫ T ( x ) ⎢⎣⎢ fθ ( x ) ⎥⎦⎥ fθ ( x ) dx = τ (ψ ) − τ (θ )
S (θ )
Which gives
Estimation ~ 5 of 22
⎧⎪ ⎡ fψ ( X ) ⎤ ⎫⎪
Covθ ⎨T ( X ) , ⎢ − 1⎥ ⎬ = τ (ψ ) − τ (θ )
⎩⎪ ⎢⎣ fθ ( X ) ⎥⎦ ⎭⎪
⎧⎪ fψ ( X ) ⎫⎪ ⎧⎪ fψ ( X ) ⎫⎪
⇒ E {T ( X ) − τ (θ )} ⎨ − 1⎬ = τ (ψ ) − τ (θ ) Since E ⎨ − 1⎬ = 0
⎩⎪ fθ ( X ) ⎭⎪ ⎩⎪ fθ ( X ) ⎭⎪
Since ρ2 ≤1
⎧⎪ ⎡ fψ ( X ) ⎤ ⎫⎪ ⎧⎪ fψ ( X ) ⎫⎪
⇒ Covθ2 ⎨T ( X ) , ⎢ − 1⎥ ⎬ ≤ V {T ( X )} ⋅ V ⎨ − 1⎬
⎩⎪ ⎣⎢ θ ( ) ⎦⎥ ⎭⎪
f X ⎩⎪ fθ ( X ) ⎭⎪
⎧⎪ ⎡ fψ ( X ) ⎤ ⎫⎪ ⎧⎪ fψ ( X ) ⎫⎪
⇒ Covθ2 ⎨T ( X ) , ⎢ − 1⎥ ⎬ ≤ V {T ( X )} ⋅ V ⎨ ⎬
⎪⎩ ⎢⎣ fθ ( X ) ⎥⎦ ⎪⎭ ⎪⎩ fθ ( X ) ⎪⎭
⎧⎪ fψ ( X ) ⎫⎪
⎡⎣τ (ψ ) − τ (θ ) ⎤⎦ ≤ V {T ( X )} ⋅ V ⎨
2
⇒ ⎬
⎪⎩ fθ ( X ) ⎪⎭
2
⎡τ (ψ ) − τ (θ ) ⎤⎦
⇒ V {T ( X )} ≤ ⎣ ( Proved )
⎪⎧ fψ ( X ) ⎪⎫
V⎨ ⎬
⎩⎪ fθ ( X ) ⎭⎪
Example: Let X be U [ 0, θ ] . Then
⎧1
⎪ if 0 ≤ x ≤ θ
fθ ( x ) = ⎨θ
⎪⎩0 otherwise
2
⎡ ∂ ln fθ ( X ) ⎤ 1 θ2
Thus we get Eθ ⎢ ⎥ = 2 , so that the lower bound of the Rao-Cramer inequality is . Hence we
⎣ ∂θ ⎦ θ n
≥ sup
[ψ − θ ]2
{ψ : ψ <θ } θ − 1
ψ
≥ sup {ψ (θ −ψ )}
{ψ : ψ <θ }
Now, let us take
ψ (θ −ψ ) θ +1
K (ψ ) = >1 iff ψ <
(ψ − 1)(θ −ψ + 1) 2
θ +1 θ +1
Therefore, K (ψ ) increases as long as ψ < and decreases if ψ > . K (ψ ) attains maximum value if
2 2
θ +1
ψ = .
2
Estimation ~ 6 of 22
⎧θ + 1 ⎛ θ + 1 ⎞⎫
∴ Varθ {T ( X )} ≥ sup {ψ (θ −ψ )} =⎨ ⎜θ − ⎟⎬
{ψ : ψ <θ } ⎩ 2 ⎝ 2 ⎠⎭
θ2
So, Varθ {T ( X )} ≥
4
This is the lower bound for any unbiased estimate T ( X ) of θ .
Now, X is a complete sufficient statistic and 2 X is unbiased for θ so that T ( X ) = 2 X is the UMVUE. Also
θ2 θ2
Varθ {2 X } = 4Var ( X ) = >
3 4
2
θ
Thus the lower bound of the Chapman, Robbins and Kiefer (CRK) inequality is not achieved by any
4
unbiased estimate of θ .
Example: Let X have p.m. f .
⎧1
⎪ ; if k = 1, 2, ,N
PN { X = k} = ⎨ N
⎪⎩0 ; Otherwise
Let Ω = { N : N ≥ M , M > 1 given} . Takingψ ( N ) = N . The p.m. f . does not hold regularity conditions, so CRK
( N − N ′ )2
VarN (T ) ≥ sup
N ′< N ⎧P ⎫
VarN ⎨ N ′ ⎬
⎩ PN ⎭
⎧N
PN ′ PN ′ ( x ) ⎪ ; x = 1, 2, , N ′, N ′ < N
Now, ( x) = = ⎨N′
PN PN ( x ) ⎪
⎩0 ; Otherwise
2
⎪⎧ P ′ ( x ) ⎪⎫ 1 ⎪⎧ P ′ ( x ) ⎪⎫
N′ N′ 2
⎛N ⎞ 1 ⎛N ⎞
∑ ∑ ⎜⎝ N ′ ⎟⎠
N
EN ⎨ N ⎬= ⎜ ′⎟ =1 and EN ⎨ N ⎬ = =
⎩⎪ PN ( x ) ⎭⎪ N 1 ⎝N ⎠ ⎩⎪ PN ( x ) ⎭⎪ N 1 N′
⎪⎧ P ′ ( x ) ⎪⎫ N
∴ VarN ⎨ N ⎬= −1 > 0 for N > N ′
⎩⎪ PN ( x ) ⎭⎪ N ′
It follows that
( N − N ′ )2
VarN (T ) ≥ sup
N ′< N N
−1
N′
≥ sup ⎡⎣ N ′ ( N − N ′ ) ⎤⎦
N ′< N
N +1 N +1
Therefore, N ′ ( N − N ′ ) increases as long as N ′ < and decreases if N ′ > . The maximum is achieved
2 2
N +1
at N ′ = .
2
Estimation ~ 7 of 22
⎡ N +1⎛ N + 1 ⎞⎤
∴ VarN (T ) ≥ ⎢ ⎜N − ⎟
⎣ 2 ⎝ 2 ⎠ ⎥⎦
N +1
≥ M (N − M ) if M >
2
N.B.: Reference Rohatgi V K; An Introduction to Probability Theory and Mathematical Statistics, p-365.
And Rohatgi, Saleh; an Introduction to Probability and Statistics, p-397.
a) T * is unbiased for τ (θ )
( )
b) For any other estimator T of τ (θ ) will be V T * ≤ V (T ) for all θ ∈ Ω .
Concept of Raw-Blackwell Theorem: A very powerful method for finding minimum variance estimator
irrespective whether MVB is attained or not is provided by a theorem known as Rao-Blackwell Theorem.
This theorem says that if we look for an MVE of τ (θ ) , we need only inspect estimators which are function of
sufficient statistic. This theorem says that any unbiased estimator should be a function of sufficient statistic. If
not we can construct an estimator with smaller variance by taking the conditional expectation given a
sufficient statistic. However, this raises the question of which sufficient statistic to use to compute the
conditional expectation. For example, suppose that S is an unbiased estimator of τ (θ ) , with finite variance
and let T1 & T2 are both sufficient statistic for θ , with T2 = h (T1 ) for some function of h . Let us define
By the Rao-Blackwell theorem the variance of S1* & S 2* can not exceed V ( S ) . However, it is not obvious
E ( S ) = τ (θ ) and if S * = E ( S | T ) then
a) S * is a unbiased estimator of τ (θ )
( ) ( )
b) V S * ≤ V ( S ) . Moreover, V S * < V ( S ) unless P S = S * = 1 . ( )
N.B: For more on see the Chapter of Asymptotically Most Efficient Estimator of Third Year Note.
then joint sufficient statistic are x and s 2 . We have another set of sufficient statistic
X1 , X 2 , , X n ⇒ Y1 < Y2 < < Yn . Now these sufficient statistic condense the data.
Estimation ~ 8 of 22
A set of sufficient statistic is minimal if no other set of sufficient statistic condenses the data more. A set of
jointly sufficient statistic is define to be minimal sufficient iff it is a function of every other set of sufficient
statistic. That is among a number of sufficient statistic we should choose one t0 (say) which condenses the
data more than any other sufficient statistics. Then t0 is minimal sufficient statistic.
Sufficient statistic always exists but minimal sufficient statistic may not always exists.
Way of Finding Minimal Sufficient Statistic: Let f ( x ; θ ) be the p.d . f . of X and suppose that there
f (x ; θ )
exists a function T ( x ) such that for any two points x and y the ratio is independent of θ iff
f (y ;θ)
L(x ; θ )
If likelihood ratio is independent of θ when t ( x ) = t ( y ) for some sufficient statistic t ( ⋅) and
L( y ; θ )
L(x ; θ ) g (t ; θ ) h ( x ) h ( x)
= = then T ( x ) is a minimal sufficient statistic for θ .
L( y ; θ ) g (t ; θ ) h ( y ) h( y)
L(x |θ ) = ⎜ ⎟ e
⎝ σ 2π ⎠
⎛ 1 ⎞ − 2σ 2 ∑ xi − 2σ 2 ( −2θ ∑ xi + nθ )
n 1 2 1 2
=⎜ ⎟ e e
⎝ σ 2π ⎠
By Neyman Factorization theorem we can say that x is sufficient for θ . Similarly
⎛ 1 ⎞ − 2σ 2 ∑ yi − 2σ 2 ( −2θ ∑ yi + nθ )
n 1 2 1 2
L( y |θ ) = ⎜ ⎟ e e
⎝ σ 2π ⎠
θ
L(x |θ ) −
1
2σ 2
( ∑ x −∑ y )
2
i
2
i −
σ2
( ∑ xi − ∑ yi )
=e ⋅e
L( y |θ )
∑ xi = ∑ yi
⇒ t ( xi ) = t ( yi )
Example: Suppose we have n = 2 independent observation from the Cauchy distribution with p.d . f .
1 1
f X ( x) = ⋅ ; −∞ < x < ∞
π 1 + ( x − θ )2
Thus we have,
L(x |θ )
=
{1 + ( x − θ ) }{1 + ( x
1
2
2 −θ )
2
}
L( y |θ ) 1
{1 + ( y − θ ) }{1 + ( y − θ ) }
1
2
2
2
=
{1 + ( y − θ ) }{1 + ( y − θ ) }
1
2
2
2
{1 + ( x − θ ) }{1 + ( x − θ ) }
1
2
2
2
L(x |θ )
Since is depends on θ , so hence we cannot get the minimal sufficient statistic.
L( y |θ )
L(x |θ )
statistic, is independent of θ . So there will exists a minimal sufficient statistic.
L( y |θ )
Thus we can partition the sample space into the sets {−3, 0} , {6, 13, 52} , {60} , {68} and a minimal sufficient
statistic is
⎧c1 if x = −3 or 0
⎪c x = 6 or 13 or 52
⎪ 2 if
t ( x) = ⎨
⎪c3 if x = 60
⎪⎩c4 if x = 68
where c1 , c2 , c3 and c4 are distinct constants.
⎧ 5θ
⎪6 if w = c1
⎪
⎪1 − 2θ if w = c2
P {t ( x = w )} = ⎨
⎪θ 2 + θ if w = c3
⎪ 6
⎪ 2
⎩θ − θ if w = c4
Estimation ~ 10 of 22
Example: Let X be a single observation from the probability function
⎧θ 2 if x = −1, 3
⎪
⎪1 θ 2
⎪2 − 2 if x=0
⎪
P(x |θ ) = ⎨ θ 2 θ
⎪− + if x = 2, 4
⎪ 2 2
⎪1 θ 2
⎪ − −θ if x =1
⎩2 2
where θ is an unknown number between zero and 2 − 1 . Find a minimal sufficient statistic for θ .
Solution: Here we partition the sample space into the sets {−1, 3} , {0} , {2, 4} , {1} . Hence the minimal
sufficient statistic is
⎧c1 if x = −1 or 3
⎪c x=0
⎪ 2 if
t ( x) = ⎨
⎪c3 if x = 2 or 4
⎪⎩c4 if x =1
Best Asymptotically Normal Estimator (BAN Estimator): A sequence of estimator T1′, T2′, , Tn′ of
τ (θ ) is defined to be best asymptotically normal ( BAN ) if and only if the fallowing four conditions are
satisfied:
a) The Distribution of { }
n ⎡⎣Tn′ − τ (θ ) ⎤⎦ approaches N 0, σ ′2 (θ ) as n → ∞
c) Let {Tn } be any other sequence of simple consistent estimators for which the distribution of
Example: Let x1 , x2 , ( )
, xn be a random sample from N µ , θ 2 . Then Tn′ =
∑ xi = xn is a BAN estimator of
n
Completeness: The family of density or probability functions f ( x | θ ) , θ ∈ Ω , is called complete if, for
express by saying that there are no unbiased estimators of zero. In particular it means that two different
function of T can not have the same expected value. For exmple
Estimation ~ 11 of 22
E {T ( X )} = θ and E { K ( X )} = θ
∴ E {T ( X ) − K ( X )} = 0
⇒ T (X )− K (X ) = 0
That is any unbiased estimator is unique. In this sense, we are primarily interested in knowing that the family
of density function of a sufficient statistic is complete, since in that case an unbiased function of the sufficient
statistic will be unique and it must be a uniformly minimum variance unbiased estimator by the Rao-
Blackwell theorem.
x2
1 −
Example: Suppose f ( x ) = e 2 ; − ∞ < x < ∞ . Check it whether it is complete or not.
2π
Solution: Let us consider the function ϕ ( x ) = x . Now,
∞ x2
1 −
E {ϕ ( x )} = E ( x ) = ∫ x⋅e 2 dx
2π −∞
0 x2 ∞ x2
1 − 1 −
=
2π ∫ −x ⋅ e 2 dx +
2π ∫ x⋅e 2 dx
−∞ 0
∞ x2 ∞ x2
1 − 1 −
=−
2π ∫ x⋅e 2 dx +
2π ∫ x⋅e 2 dx
0 0
∴ E {ϕ ( x )} = 0
But ϕ ( x ) is non-zero. Hence f ( x ) is not complete.
x2
−
1
Example: Let f ( x ) = e 2β 2
; − ∞ < x < ∞ ; β > 0 . Check it whether it is complete or not.
β 2π
Solution: Let us consider the function ϕ ( x ) = x . Now,
∞ x2
−
1
E {ϕ ( x )} = E ( x ) = ∫ x⋅e
2β 2
dx
β 2π −∞
0 x2 ∞ x2
− −
1 1
∫ −x ⋅ e ∫ x⋅e
2β 2 2β 2
= dx + dx
β 2π −∞
β 2π 0
∞ x2 ∞ x2
− −
1 1
∫ x⋅e ∫ x⋅e
2β 2 2β 2
=− dx + dx
β 2π 0
β 2π 0
∴ E {ϕ ( x )} = 0
But ϕ ( x ) is non-zero. Hence f ( x ) is not complete.
Example: Let x1 , x2 , , xn be independent random variable each with P ( λ ) ; λ > 0 . Find a minimal
e− nλ λ ∑ ⎛ n 1 ⎞
( )
xi
L(x ; θ ) = = ⎜ ∏ ⎟ e − nλ λ ∑ i
x
n ⎜ x ! ⎟
∏ xi ! ⎝ i =1 i ⎠
i =1
By the Neyman Factorization theorem, we can say that ∑ xi is a sufficient statistic. Now,
Estimation ~ 12 of 22
e − nλ λ ∑
xi
n
L(x | λ)
∏ xi !
i =1
=
L( y | λ) e − nλ
λ∑ i
y
n
∏ yi !
i =1
e− nλ ( nλ )
x
; x = 0, 1, 2,
x!
Hence showing that the family of probability functions of the minimal sufficient statistic ∑ xi is complete is
Since e −λ
≠ 0, k ! ≠ 0 , so that u ( k ) = 0 . Hence the Poisson family of distribution is complete and ∑ xi is a
1
f ( x) =
θ
1
∴ L ( x) =
θn
Hence, x( n ) is the sufficient statistic for θ . We known that
f n:n ( x ) = n { F ( x )}
n −1
⋅ f ( x)
nx n −1 ⎡ x
1 x⎤
⎢∵ F ( x) = ∫ θ dx = θ ⎥⎥
θ n
⎢⎣ 0 ⎦
Let u ( ⋅) be any function. Then
Estimation ~ 13 of 22
Eθ ⎡⎣u ( xn ) ⎤⎦ = 0
θ
nx n −1
⇒ ∫ u ( xn ) ⋅ θn
dx = 0
0
θ
∫ u ( xn ) x
n −1
⇒ dx = 0
0
Thus, unlike a sufficient statistic, an ancillary statistic does not contain any information about the parameter
θ . In such cases, intuition suggests that (since the sufficient statistic T ( X1 , X 2 , , X n ) contains all the
Example: Let X1 , X 2 , ( )
, X n be a random sample from N 0, σ 2 . Then the statistic U ( X ) = X follows
( )
N 0, n −1σ 2 and is not ancillary with respect to the parameter σ 2 .
Example: Let X (1) , X ( 2) , , X ( n ) be the order statistics of a random sample from the p.d . f . f ( x − θ ) , where
(
θ ∈ ℜ , then the statistic U ( X ) = X ( 2 ) − X (1) , , X ( n ) − X (1) ) is ancillary for θ .
Example: Let X1 , X 2 , , X n be a iid random variable with distribution
1
f ( x ; µ ,θ ) = ; µ −θ ≤ x ≤ µ +θ
2θ
Then the statistic R = X ( n ) − X (1) is not ancillary statistic because the distribution of R is
n ( n − 1) x n − 2 ⎛ x ⎞
fR ( r ) = ⎜1 − ; 0 ≤ x ≤ 2θ
( 2θ ) n −1
⎝ 2θ ⎟⎠
which is dependent of θ .
Estimation ~ 14 of 22
Eθ { g (T )} = ∑ P {U = u | T = t} P {T = t}
t
∑ P {U = u, T = t}
= t
P {T = t}
P {T = t}
= ∑ P {U = u , T = t} = P {U = u}
t
P {U = u | T = t} = P {U = u} for all t
Example: Let X1 & X 2 be independent random variables each N ( µ , σ 2 ) with σ 2 known and µ unknown.
( )
Let T ( X1 , X 2 ) = X 1 + X 2 , U ( X 1 , X 2 ) = X1 − X 2 . Then U ( X1 , X 2 ) is N 0, 2σ 2 and its distribution does not
follows from the Basu’s theorem the X 1 − X 2 and X 1 + X 2 are independent random variables.
Example: Let X1 , X 2 , , X n be a random sample of size n from the uniform distribution on [ 0, θ ] , and let
Y1 < Y2 < < Yn denote the corresponding order statistics. Show that Y1 and Yn are independent random
Yn
variables.
Solution: Since Yn is a complete sufficient statistic for θ , it suffices (by the Basu’s Theorem) to show that
the distribution of Y1 Y does not depend on θ (i.e. that Y1 Y is an ancillary statistic), which follows since for
n n
0 < t ≤1
Theorem: Suppose that X = ( X1 , X 2 , , X n ) have joint density or joint frequency function that is a k -
Estimation ~ 15 of 22
Example: Show that if X1 , X 2 , , Xn are independent random variables each
( )
N µ , σ 2 ; − ∞ < µ < ∞ , σ 2 > 0 are both unknown, then the joint density of X 1 , X 2 , , X n is a member of
(
L x ; µ, σ 2 = ⎜ )⎟ e
⎝ σ 2π ⎠
⎛ 1 ⎞ − 2σ 2 {∑ xi − 2 µ ∑ xi + nµ }
n 1 2 2
=⎜ ⎟ e
⎝ σ 2π ⎠
⎡
⎣
1 n 1
{
⎤
= exp ⎢ n ln − ln ( 2π ) − 2 ∑ xi2 − 2 µ ∑ xi + nµ 2 ⎥
σ 2 2σ ⎦
}
⎡ µ nµ ⎤2
( 1
)
= exp ⎢ − n ln σ 2π − 2 ∑ xi2 + 2 ∑ xi − 2 ⎥
2σ σ 2σ ⎦
⎣
⎡µ ⎞ ⎧⎪ nµ
2 ⎫⎤
= exp ⎢ 2
⎢⎣ σ
⎛ 1
∑ xi + ∑ xi2 ⎜⎝ − 2σ 2 ⎟ − ⎨
⎠ ⎪⎩ 2σ 2
+ n ln σ 2π ( )⎬⎪⎪⎥⎥ (1)
⎭⎦
So we can say, there is sufficient statistic for µ and σ 2 . The joint density of X 1 , X 2 , , X n is said to be a
member of the exponential family or a member of the Koopman-Darmois class or is said to have Koopman-
Darmois form if
⎡ k ⎤
L ( x ; θ ) = exp ⎢ ∑ Ci (θ ) Ti ( x ) − d (θ ) + S ( x ) ⎥ ( 2)
⎣ i =1 ⎦
T1 ( x ) = ∑ xi T2 ( x ) = ∑ xi2
µ
(
C1 µ , σ 2 = ) σ2
(
C2 µ , σ 2 = −) 1
2σ 2
⎪⎧ nµ ⎪⎫
2
(
d µ, σ 2 ) (
= ⎨ 2 + n ln σ 2π ⎬ ) S ( x) = 0
⎩⎪ 2σ ⎭⎪
So, the joint density of X 1 , X 2 , , X n is a member of two parameter exponential family.
Lehmann-Scheffe Theorem: This theorem gives a simple criterion for existence of a uniformly minimum
variance unbiased estimator when a complete and sufficient statistic exists.
Statement: Let X1 , , X n be a random sample from a density f ( ⋅ , θ ) . If S = s ( X1 , , X n ) is a complete
UMVUE of τ (θ ) .
Proof: Let T ′ be any unbiased estimator of τ (θ ) which is a function of S ; that is, T ′ = t ′ ( S ) . Then
for all θ ∈ Ω . Hence there is only one unbiased estimator of τ (θ ) . T * must be equal to E [T | S ] since
Estimation ~ 16 of 22
E [T | S ] is an unbiased estimator of τ (θ ) depending on S . By Rao-Blackwell theorem, Vθ ⎡⎣T * ⎤⎦ ≤ Vθ [T ] for
all θ ∈ Ω ; so T * is an UMVUE.
Explanation: This theorem states that if a complete sufficient statistic S exists and if there is an unbiased
estimator for τ (θ ) , then there is an UMVUE for τ (θ ) . This theorem also simplifies search for unbiased
estimator if a complete and sufficient statistic T exist and there exist no function h such that
E ⎣⎡ h ( s ) ⎦⎤ = τ (θ ) , then no unbiased estimator of τ (θ ) exist. The Rao-Blackwell theorem and Lehmann-Scheffe
theorem suggest two approaches to finding UMVUE when a complete and sufficient statistic exists.
Note:
a) Find a function h such that E ⎡⎣ h ( s ) ⎤⎦ = τ (θ ) then h ( s ) is the unique UMVUE of τ (θ ) . The
Example: Let X1 , , X n be iid Bernoulli random variable with parameter θ . By factorization theorem
k =0
k ⎝ ⎠
θ = h ( 0 )(1 − θ ) + h (1) θ (1 − θ ) + h ( 2 )θ 2 (1)
2 2
⇒
S ( S − 1)
Thus h ( S ) = is UMVUE of θ 2 if h = 2 . But for n > 2 , T * = I ( X1 + X 2 = 2 ) .
2
Note: UMVUE of τ (θ ) can be found if a complete and sufficient statistic exists. However in many cases we
can not find complete and sufficient statistic. So in that case we can not apply the Lehmenn-Scheffe theorem
to find UMVUE estimators. So, crammer-Rao lower bound derives, a lower bound for the variance of a
unbiased estimator of τ (θ ) . If the variance of some unbiased estimator achieves this lower bound, then the
and let t ( X1 , , X n ) be a statistic such that the mode of the density function of t is θ . Then t ( X1 , , X n ) is
Estimation ~ 17 of 22
Median Unbiased Estimate: Let X i ( i = 1, , n ) be iid random variables with common p.d . f . f ( x ; θ )
and let t ( X1 , , X n ) be a statistic such that the median of the density function of t is θ . Then t ( X1 , , Xn )
x
1 −
f ( x, θ ) = e θ ; x > 0, θ > 0
θ
Find median unbiased estimate.
Solution:
x x
1 −
We get , F ( x ) = ∫ e θ dx
0
θ
x
⎡ −x ⎤ −
x
= − ⎢e θ ⎥ = 1− e θ
⎢⎣ ⎥⎦
0
Let , Y1 = min X i
1≤i ≤ 2 n +1
n −1
∴ fY1 ( y1 ) = n ⎡⎣1 − F ( x ) ⎤⎦ f ( x)
2 n +1−1
⎡ − ⎤
x
1 −
x
= ( 2n + 1) ⎢1 − 1 + e θ ⎥ e θ
⎢⎣ ⎥⎦ θ
( 2n + 1) −
x
θ
( 2 n −1)
= e ; 0≤ x≤∞
θ
m
( 2n + 1) −
x
( 2 n −1) 1
We can write ∫ θ
e θ dx =
2
0
m
( 2n + 1) −θ ⎡ − θ ( 2 n −1) ⎤
⎢e
x
⎥ =
1
⇒ ⋅
θ ( 2n + 1) ⎢⎣ ⎥⎦
0
2
m
− ( 2 n −1) 1
⇒ −e θ +1 =
2
θ
⇒ m= ln 2
( 2n + 1)
⎡ θ ⎤
∴ E⎢ ln 2 ⎥ = θ
⎣⎢ ( 2n + 1) ⎦⎥
( 2n + 1) ( 2n + 1)
Thus, y1 is the estimate of θ . So, y1 is the unbiased estimate of median?
ln 2 ln 2
⎛ σ2 ⎞
Example: Suppose X ~ N (θ , σ 2 ) and X ~ N ⎜⎜ θ , ⎟ . Then the model X is unbiased estimate of θ .
⎝ n ⎟⎠
2
1 ⎛ x −θ ⎞
1 − ⎜ ⎟
Solution: We have f ( x) = e 2⎝ σ ⎠ ; −∞ < x < ∞
σ 2π
2
⎛ 1 ⎞ 1 ⎛ x −θ ⎞
ln L (θ ) = f ( x ) = n ln ⎜ ⎟ − ∑⎜ ⎟
⎝ σ 2π ⎠ 2 ⎝ σ ⎠
∂ ln L (θ ) 1 ⎛ x −θ ⎞
⇒ = 0 − ⋅ 2∑ ⎜ ⎟ ( −1) = 0
∂θ 2 ⎝ σ ⎠
⇒ θ=X
Estimation ~ 18 of 22
It can be shown that the second derivative is negative. So, X is the modal value of θ . Again, X is also an
unbiased estimate of θ . Thus we can say that the X is modal unbiased estimate of θ .
Theorem: If m is a median of a discrete density p ( x ) and g ( r ) = E ( X − r ) = ∑ X − r P ( x ) then g ( r ) is
x
minimized for r = m provided that the sum exists for at least one r ∈ R (The real line).
Proof: Let r0 ∈ R and r0 ∈ [ m1 , m2 ] . Now, if it can be shown that E ( X − r0 ) = E ( X − m ) > 0 , it can be
∑ ( r0 − m ) P ( x ) + ∑ ( r0 + m − 2m ) P ( x ) + ∑ ( m − r0 ) P ( x )
x≤m m < x < r0 x > r0
1
⇒ ∑ ( r0 − m ) P ( x ) ≥ 2 ( r0 − m )
x≤m
∑ ( m − r0 ) P ( x ) = ∑ ( m − r0 ) P ( x ) − ∑ ( m − r0 ) P ( x )
x > r0 x>m m < x < r0
⇒ E ( X − r0 ) − E ( X − m ) ≤ 0
Example: Let X i ( i = 1, , n ) be independent random samples from a continuous uniform distribution with
1
p.d . f . f ( x, θ ) = ; 0 < x ≤ θ . We have to show that y is a model unbiased estimate of θ .
θ
Consider Y = max X i . The density function of Y is
i
ny n −1
φ(y ;θ) = ; 0 < y ≤θ
θn
The mode of this distribution is θ . Hence, y is a modal unbiased estimate of θ .
x
1 −
f ( x, θ ) = e θ ; x > 0, θ > 0
θ
For this distribution, the statistic
X (n)
Y1 = where X ( n ) = max X i
ln n i
n
∑ Xi
Sn
Y2 = where S=
( n − 1) i =1
0
x
⎡ −x ⎤ −
x
= − ⎢e θ ⎥ = 1− e θ
⎢⎣ ⎥⎦
0
The p.d . f . of X ( n ) is
Estimation ~ 19 of 22
n −1
n⎛ ⎞
( )
x( n ) x
− − ( n)
g x( n ) , θ = ⎜1 − e θ ⎟ e θ ; x( n ) > 0 , θ > 0
⎜
θ⎝ ⎟
⎠
Now,
X (n)
Y1 = ⇒ dX ( n ) = ( ln n ) dY1
ln n
Hence the p.d . f . of Y1 is
n −1
n ln n ⎛ − 1 ln n ⎞
y y1
− ln n
g ( y1 , θ ) = ⎜1 − e θ ⎟ e θ ; y1 > 0 , θ > 0
θ ⎜ ⎟
⎝ ⎠
n −1
⎛ 1 y1 ⎞ y1
n ln n ⎜ ⎧⎪ − θ ⎫⎪ ⎟ ⎧⎪ − 1 ⎫⎪ ⎡ ⎤
1
−
= 1 − ⎨n ⎬ ⎨n θ ⎬ ⎢ Putting n θ = a ⎥
θ ⎜⎜ ⎩⎪ ⎟
⎭⎪ ⎟⎠ ⎩⎪ ⎭⎪ ⎢⎣ ⎥⎦
⎝
n ln n
( )
n −1 y
= 1 − a y1 a1 (i )
θ
∂g ( y1 , θ )
⇒
∂y1
=
n ln n
θ
(1 − a ) y1 n − 2
( n − 1) ( −a y 1
ln a a y1 +) n ln n
θ
(1 − a )y1 n −1
a y1 ln a = 0
⇒ (1 − a ) ( n − 1) a
y1 y1
=1
⎡ −
1 ⎤
⇒ na y1 = 1 ⎢∵ n θ = a⎥
⎢⎣ ⎥⎦
⇒ y1 = θ
It can be shown that second derivative of ( i ) at y1 = θ is negative and hence Y1 is modal unbiased estimate of
θ.
Now, the p.d . f . of Sn is
snn −1 −
sn
h ( sn , θ ) = e θ ; sn > 0 , θ > 0
θ n
n
Now,
Sn
Y2 = ⇒ dSn = ( n − 1) dY2
n −1
1
⇒ y2−1 − =0
θ
⇒ y2 = θ
Estimation ~ 20 of 22
Theorem: Let X be a random variable with density function f ( x ; θ ) . Y = g ( x ) And let φ ( y ) be the
density function of Y such that
∂φ ( y )
(i ) =0 at y=0
∂y
∂ 2φ 2 ( y )
( ii ) <0 at y<0
∂y 2
( iii ) y is 1 − 1 transformation from x to y and from y to x
Then the solution of the following differential equation is a modal unbiased estimate of θ .
∂f ( x, θ ) ⎛ ∂x ⎞
2
∂2 x
(a) f ( x, θ ) + ⎜ ⎟ =0 at y = θ
∂y 2 ∂x ⎝ ∂y ⎠
∂ 2 f ( x, θ ) ⎛ ∂x ⎞ ∂f ( x,θ ) ∂x ∂ 2 x
3
∂3 x
(b) ⎜ ⎟ + 3 ⋅ ⋅ + f ( x , θ ) <0 at y = θ
∂x 2 ⎝ ∂y ⎠ ∂x ∂y ∂y 2 ∂y 3
Proof:
∂ ∂
φ ( y) = F ( y) = Fx ⎡ g −1 ( y ) ⎤⎦ ⎡ Since y = g ( x ) ⇒ x = g −1 ( y ) ⎤⎦
∂y ∂y ⎣ ⎣
∂ ∂x
= Fx ( x )
∂y ∂y
∂x
∴ φ ( y) = f ( x ; θ )
∂y
∂φ ( y ) ∂f ( x ; θ ) ⎛ ∂x ⎞
2
∂2 x
⇒ = ⎜ ⎟ + f ( x, θ ) 2 = 0 at y = θ by our hypothesis
∂y ∂x ⎝ ∂y ⎠ ∂y
∂ 2φ ( y ) ∂ 2 f ( x,θ ) ⎛ ∂x ⎞ ∂f ( x,θ ) ∂x ∂ 2 x
3
∂3 x
⇒ = ⎜ ⎟ + 3 ⋅ ⋅ + f ( x , θ ) <0
∂y 2 ∂x 2 ⎝ ∂y ⎠ ∂x ∂y ∂y 2 ∂y 3
at y = θ according to our hypothesis
Solution: We have,
2
1 ⎛ x −θ ⎞
1 − ⎜ ⎟
f ( x) = e 2⎝ σ ⎠ ; −∞ < x < ∞
σ 2π
2
1 ⎛ x −θ ⎞
∂f ( x, θ ) 1 − ⎜ ⎟ 2( x −θ )
∴ = e 2⎝ σ ⎠
∂x σ 2π 2σ 2
Now,
Estimation ~ 21 of 22
2 2
1 ⎛ x −θ ⎞ 1 ⎛ x −θ ⎞
2( x −θ )
2
1 − ⎜ ⎟ ∂2 x 1 − ⎜ ⎟ ⎛ ∂x ⎞
e 2⎝ σ ⎠ − e 2⎝ σ ⎠ ⎜ ⎟ =0
σ 2π ∂y 2 2σ 2 σ 2π ⎝ ∂y ⎠
2
1 ⎛ x −θ ⎞
1 − ⎜ ⎟ ⎡ ∂ 2 x ( x − θ ) ⎛ ∂x ⎞2 ⎤
2⎝ σ ⎠
⇒ e ⎢ 2− ⎜ ⎟ ⎥=0
σ 2π ⎢⎣ ∂y σ 2 ⎝ ∂y ⎠ ⎥⎦
∂2 x ( x − θ ) ⎛ ∂x ⎞2 We have, Y = g ( x)
∴ − ⎜ ⎟ =0 at y = θ
∂y 2 σ2 ⎝ ∂y ⎠ ⇒ Y=x
( x −θ ) ∂Y
⇒ 0− (1) 2
=0 ⇒ =1
σ2 ∂x
⇒ x =θ at y = θ ∂ 2Y
⇒ =0
∂x 2
Estimation ~ 22 of 22
Jackknife Estimator and Correction for Bias
Jackknife Estimator: The jackknife estimator was introduced by Quenouille in 1949 and named by Tukey
in 1958. The jackknife technique’s purpose is to decrease the bias of an estimator and provide an approximate
confidence interval for the parameter of interest.
If a parameter has a UMVUE associated with it, then clearly there is no chance of improving such as
estimator’s bias. However, MLE’s are often biased and hence improvement may be possible in the sense of an
estimator with lower bias. Jackknifing is an important technique for accomplishing such bias reduction.
Let X1 , X 2 , " , X n be a random sample of size n from a population with real valued parameter θ . Let θˆ be
n
an estimator of θ . Divide the random sample into N groups of equal size m = observations each ( N is
N
one of the factors of n ). Delete one group at a time and estimate θ based on the remaining ( N − 1) m
observations, using the same estimation procedure previously used with a sample of size n . Denote the
estimator of θ obtained with the i th group deleting by θˆi ( i = 1, 2, " , N ) called a jackknife statistic.
J i = Nθˆ − ( N − 1)θˆi
and consider
()
N
1
J θˆ =
N
∑ ⎡⎣ Nθˆ − ( N − 1)θˆi ⎤⎦
i =1
N
1
= Nθˆ − ( N − 1)θˆi Where, θˆi =
N
∑θˆi
i =1
()
J θˆ is called Jackknife estimator of θ .
()
Generally, we take m = 1 , then the commonly used Jackknife estimate is J θˆ = nθˆ + ( n − 1)θˆi , since
n
m =1= ⇒n=N.
N
Note:
()
J θˆ = Nθˆ + θˆ − θˆ − Nθˆi + θˆi
= θˆ + θˆ ( N − 1) − θˆi ( N − 1)
(
= θˆ + ( N − 1) θˆ − θˆi )
()
Which shows that the estimator J θˆ is an adjustment of θˆ with the amount of adjustment depending on the
Correction for Bias: If we have a biased estimator then we have to add or make simple adjustment to have
unbiased estimator. But sometimes the expected value is a rather complicated function of the parameter then it
is very difficult to add a simple factor to make a biased estimator into an unbiased one.
Let tn denote the biased estimator of θ based on n observation.
estimated value of θ based on ( n − 1) observation where i th observation is omitted. Then we will have
tn −1, 1 , tn −1, 2 , " , tn −1, n . tn −1 is the average of these n estimated values with ( n − 1) observations each.
tn′ = ntn − ( n − 1) tn −1
⎣⎢
()
⎡ J θˆ = nθˆ − ( n − 1)θˆ ; here, t = θˆ and t = θˆ ⎤
i n n −1 i⎥
⎦
= tn + ( n − 1)( tn − tn −1 )
∞ ⎛ ∞ ⎧⎪ 1 n ⎫⎪ ⎞
∑ ∑ ∑
ar ar
⇒ E ( tn′ ) = θ + + ( n − 1 ) ⎜ θ + − E ⎨ tn −1, i ⎬ ⎟
r ⎜ r
⎪⎩ n i =1 ⎪⎭ ⎟⎠
r =1 n ⎝ r =1 n
∞ ⎛ ∞ ∞
ar ⎞
∑ ∑ ∑
ar ar
=θ + + ( n − 1) ⎜θ +
⎜
− θ − r ⎟
⎟
r =1 n
r
⎝ r =1 n r
r =1 ( n − 1 ) ⎠
∞ ∞ ∞
∑ nr ∑ nr ∑
ar ar ar
=θ + + ( n − 1) − ( n − 1)
r =1 r =1 r =1 ( n − 1)r
∞ ∞ ∞ ∞
∑ nr ∑ nr ∑ nr ∑
ar ar ar ar
=θ + +n − − r −1
r =1 r =1 r =1 r =1 ( n − 1)
∞ ∞
∑ nrr−1 − a1 − ∑
a ar
= θ + a1 +
r =2 r =2 ( n − 1)r −1
∞ ∞
∑ nrr−1 − ∑
a ar
=θ +
r =2 r =2 ( n − 1)r −1
⎛1 1 ⎞ ⎛ 1 1 ⎞ ⎛ 1 1 ⎞
= θ + a2 ⎜ − ⎟ + a3 ⎜⎜ 2 − ⎟ + a4 ⎜ −
⎟
⎟ +"
⎜ n {n − 1}3 ⎟
⎝ n n −1 ⎠ ⎝ n { 1}2
n − ⎠ ⎝
3
⎠
a2 ⎛ 1 ⎞
⇒ E ( tn′ ) = θ − − Ο⎜ 3 ⎟
n2 ⎝n ⎠
1 1
That is, tn′ is only biased of order but tn has the bias of order i.e. tn′ reduces the bias. Similarly we can
n2 n
take another statistic
n 2 tn′ − ( n − 1) tn −1
2
tn′′ =
n 2 − ( n − 1)
2
a2 ⎛ 1 ⎞
⇒ E ( tn′′ ) = θ − − Ο⎜ 4 ⎟
n 3
⎝n ⎠
1
That is, bias of order . So, every step amount of bias is very small. So in this method we can remove bias
n3
completely or to any required degree.
N.B.: Explain Jackknife Method and discuss how it reduce the bias.
Example: Let, x1 , x2 , ..........., xn be a random sample of size n with the probability density function
2
1 ⎛ x−µ ⎞
− ⎜
( ) 1 ⎟
f x ; µ, σ 2 = e 2⎝ σ ⎠ ; −∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0
σ 2π
Find the jackknife estimator of µ .
Jackknife Estimator and Correction for Bias ~ 2 of 13
2
1 ⎛ x−µ ⎞
− ⎜
( )=σ 1 ⎟
Solution: Here, we have that f x ; µ, σ 2
e 2⎝ σ ⎠ ; −∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0
2π
∂ log L n
∑ 2 ( xi − µ )( −1) = 0
1
⇒ =− 2
∂µ 2σ i =1
n
⇒ ∑ ( xi − µ ) = 0
i =1
1 n
⇒ µˆ = ∑ xi
n i =1
1 ⎛ n ⎞
∑ x j − xi ⎟⎟
1
⇒ θˆi = ⎜ = ( nx − xi )
n − 1 ⎜⎝ j =1 ⎠ n −1
n n
⎡ 1 ⎤
∑θˆi ∑ ⎢⎣ n − 1 ( nx − xi )⎥⎦
1 1
So, θˆi = =
n i =1 n i =1
=
1
n ( n − 1)
(
n 2 x − nx )
⇒ θˆ = x " " " (+ + +)
So, the jackknife estimator is given by
() J θˆ = nθˆ − ( n − 1)θˆi
Example: Let x1 , x2 , ..........., xn be a random sample of size n with the probability density function
2
1 ⎛ x−µ ⎞
− ⎜
( ) 1 ⎟
f x ; µ, σ 2 = e 2⎝ σ ⎠ ; −∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0
σ 2π
∂ log L n
∑ 2 ( xi − µ )( −1) = 0
1
⇒ =− 2
∂µ 2σ i =1
n
⇒ ∑ ( xi − µ ) = 0
i =1
n
∑ xi
1
⇒ µˆ = " " " (1)
n i =1
1 n
µˆ = x σˆ 2 = ∑
n i =1
( xi − µˆ )2
1 n
∑ ( xi − x ) (+)
2
⇒ θˆ = S 2 = σˆ 2 = " " "
n i =1
1 n
∑ ⎡⎣( xi − µ ) − ( x − µ ) ⎤⎦
2
=
n i =1
1⎡ n 2⎤
⎢ ∑ ( xi − µ ) − n ( x − µ ) ⎥
2
=
n ⎢⎣ i =1 ⎥⎦
1 n
= ∑
n i =1
( xi − µ )2 − ( x − µ )2
1 n
∑ E ( xi − E ( xi ) ) − E ( x − E ( x ) )
2 2
=
n i =1
⇒ () ( ) 1 n
E θˆ = E σˆ 2 = ∑ Var ( xi ) − Var ( x )
n i =1
() ( ) σ 2
⇒ E θˆ = E σˆ 2 = σ 2 − " " " (+ +)
n
n
So, we can say that θˆ = σˆ 2 = 1 ∑ ( xi − x )2 is a biased estimator of σ 2 . But, we can remove the bias by the
n i =1
n ˆ n
So, we can say that θ= σˆ 2 is an unbiased estimator of θ = σ 2 . Thus, the jackknife estimator is not to
n −1 n −1
be found here.
But, we can find the jackknife estimator by taking
2
⎛ n
n ⎞
∑ ⎜ x 2j
xj ∑ ⎟
j ≠ i =1 ⎜ j ≠ i =1 ⎟
θi = σˆ i =
ˆ 2
−
n −1 ⎜ n −1 ⎟
⎜ ⎟
⎜ ⎟
⎝ ⎠
2
n ⎛ n ⎞
∑ xi2 − xi2 ⎜ ∑ xi − xi ⎟
⇒ θˆi = σˆ i2 = i =1
−⎜ i =1 ⎟
n −1 ⎜ n −1 ⎟
⎜ ⎟
⎜ ⎟
⎝ ⎠
n
∑ xi2 − xi2 ⎛ nx − xi ⎞
2
i =1
= −⎜ ⎟
n −1 ⎝ n −1 ⎠
⎡ n 2 ⎤
⎢ ∑ xi − xi
2
2⎥
⎛ nx − xi ⎞ ⎥
n n
θˆi = ∑ θˆi = ∑ ⎢ i =1
1 1
So, −⎜ ⎟
n i =1 n i =1 ⎢ n − 1 ⎝ n −1 ⎠ ⎥
⎢ ⎥
⎢⎣ ⎥⎦
⎡ n 2 n 2⎤ n
⎢ n∑ xi − ∑ xi ⎥ − 2 ∑
1 1
= ( nx − xi )2
n ( n − 1) ⎣ i =1 i =1 ⎦ n ( n − 1) i =1
⎡ n 2 ⎤
∑ ( n2 x 2 − 2nxi x − xi2 )
n
⎢ ∑ xi ( n − 1) ⎥ −
1 1
=
n ( n − 1) ⎣ i =1 ⎦ n ( n − 1)
2
i =1
n
∑ xi2 ⎡ 3 2 2⎤
n n
⎢ n x − 2nx ∑ xi + ∑ xi ⎥
i =1 1
= −
n ( n − 1)
2
n ⎣ i =1 i =1 ⎦
n
∑ xi2 n 2 − 2n + 1 − 1 n ( n − 2 ) x
2
i =1
= × −
n ( n − 1)2 ( n − 1)2
⎛ n ⎛ n ⎞
2 ⎞
⎜
n ( n − 2 ) ⎜ i =1
∑
xi2 ⎜ xi ∑ ⎟ ⎟
⎟
= − ⎜ i =1 ⎟
⎜ ⎟
( n − 1)2 ⎜ n ⎜⎜ n ⎟
⎟ ⎟
⎜⎜ ⎜ ⎟ ⎟⎟
⎝ ⎝ ⎠ ⎠
n ( n − 2) ˆ n ( n − 2) 2
⇒ θˆi = θ= σˆ " " " (+ + + +)
( n − 1)2 ( n − 1)2
So, the jackknife estimator is given by
()
J θˆ = nθˆ − ( n − 1)θˆi
n ( n − 2)
⇒ J (θˆ ) = nσˆ 2
− ( n − 1) σˆ 2 ⎡⎣ from ( + ) and ( + + + + ) ⎤⎦
( n − 1)2
⎡ ( n − 2) ⎤
⇒ J (θˆ ) = nσˆ 2 ⎢1 − ⎥
⎣⎢ ( n − 1) ⎦⎥
⇒ ()
J θˆ =
n
n −1
σˆ 2 " " " ( A)
⇒
⎣ ()
E ⎡ J θˆ ⎤ =
n
⎦ n −1
E σˆ 2 ( )
⎛1 n ⎞
∑ ( xi − x )
n 2
== E⎜ ⎟⎟
n − 1 ⎜⎝ n i =1 ⎠
n ⎛ 2 σ2 ⎞
= ⎜σ − ⎟
n − 1 ⎜⎝ n ⎟⎠
n ⎛ n −1 2 ⎞
= ⎜ σ ⎟ =σ2
n −1 ⎝ n ⎠
()
n
∑ ( xi − x )
n ˆ n 1
So, we can say that J θˆ = θ= σˆ 2 =
2
is an unbiased and uniformly minimum variance
n −1 n −1 n − 1 i =1
unbiased estimator θ = σ 2 .
Example: Let, x1 , x2 , ..........., xn be a random sample of size n with the probability density function
1− x
f ( x ; p ) = p x (1 − p ) ; x=0 ,1
L ( x; p ) = p i =1
(1 − p )n −∑ x
i =1
i
n ⎛ n ⎞
⇒ ln L ( x; p ) = ∑ xi ln p + ⎜⎜ n − ∑ xi ⎟⎟ ln (1 − p )
i =1 ⎝ i =1 ⎠
⎛ n ⎞ n
∂ ln L ( x; p ) i =1
xi ⎜⎜ n − xi ⎟⎟ ∑ ∑
+⎝ ⎠ =0
i =1
⇒ =
∂p p 1− p
n n n
∑ xi − p∑ xi − np + p∑ xi
i =1 i =1 i =1
⇒ =0
p (1 − p )
n
∑ xi ⎡ n ⎤
∑ xi = y ~ B ( n, p )⎥⎥
i =1 y
⇒ pˆ = = ⎢ Let ,
n n ⎣⎢ i =1 ⎦
So, the maximum likelihood estimator of p is:
n
∑ xi y
i =1
pˆ = = .
n n
And, we know that if θˆ is the maximum likelihood estimator of θ and g (θ ) is a one-to-one function of θ ,
So, from the above, we can say that the maximum likelihood estimator of θ = pq is given by:
y⎛ y⎞
θˆ = pq
ˆˆ= ⎜1 − ⎟ " " " (+)
n⎝ n⎠
2
⇒ ()
E θˆ = E ( pq
⎛ y⎞
ˆ ˆ) = E ⎜ ⎟ − E ⎜ ⎟
⎝n⎠
⎛ y⎞
⎝n⎠
= E ( y ) − 2 ⎡Var ( y ) + ⎡⎣ E ( y ) ⎤⎦ ⎤
1 1 2
n n ⎢⎣ ⎥⎦
1
n
1
= np − 2 npq + n 2 p 2
n
( )
( S in ce, y ~ B ( n, p ) )
pq
= p− − p2
n
⇒ ()
E θˆ = E ( pq
ˆ ˆ ) = pq −
pq
n
" " " (+ +)
⎛ n ˆ⎞ ⎛ n ⎞
⇒ E⎜ θ ⎟ = E⎜ ˆ ˆ ⎟ = pq
pq " " " (+ + +)
⎝ n −1 ⎠ ⎝ n −1 ⎠
n ˆ n
So, we can say that θ= ˆ ˆ is an unbiased estimator of θ = pq . Thus, the jackknife estimator is not to
pq
n −1 n −1
be found here.
But, we can find the jackknife estimator by taking
1 ⎡ y ( y − 1)( n − y ) y ( n − y )( n − y − 1) ⎤
= ⎢ + ⎥
n⎢
⎣ ( n − 1)2 ( n − 1)2 ⎥⎦
y ( n − y )( n − 2 )
=
n ( n − 1)
2
y ( n − y ) n ( n − 2)
=
n n ( n − 1)2
y⎛ y ⎞ n ( n − 2)
= ⎜1 − ⎟
n ⎝ n ⎠ ( n − 1)2
n ( n − 2)
⇒ θˆi = ˆˆ
pq " " " (+ + + +)
( n − 1)2
So, the jackknife estimator is given by
()
J θˆ = nθˆ − ( n − 1)θˆi
n ( n − 2)
ˆ ˆ − ( n − 1)
= npq ˆˆ
pq ⎣⎡ from ( + ) and ( + + + + ) ⎦⎤
( n − 1)2
⎡ ( n − 2) ⎤
= npq
ˆ ˆ ⎢1 − ⎥
⎢⎣ ( n − 1) ⎥⎦
n
= ˆˆ
pq " " " ( A)
n −1
⇒
⎣ () n
⎦ n −1 ( )
E ⎡ J θˆ ⎤ = ˆˆ
E pq
n ⎛ pq ⎞
= ⎜ pq − ⎟ ⎡⎣ from ( + + ) ⎤⎦
n −1 ⎝ n ⎠
n n −1
= pq
n −1 n
⇒ ⎡ ⎤
E J θ = pq
⎣
ˆ
⎦ () ⎡⎣ from ( A ) ⎤⎦
θ = pq .
Example: Let, x1 , x2 , ..........., xn be a random sample of size n with the probability density function
⎛n⎞ 1− x
f ( x ; n, p ) = ⎜ ⎟ p x (1 − p ) ; x = 0, 1, " , n
⎝ x⎠
⎛n⎞ 1− x
Solution: Here, we have that f ( x ; n, p ) = ⎜ ⎟ p x (1 − p ) ; x = 0, 1, " , n
⎝ x⎠
⇒ ( )
E x 2 = npq + n 2 p 2 " " " (**)
⎛ x 2 ⎞ pq
⇒ E ⎜⎜ 2 ⎟⎟ = + p2
⎝ ⎠
n n
⎡⎛ x ⎞ 2⎤
p (1 − p ) ⎡ ⎛ x⎞ ⎤
2
⇒ ()
E θˆ = E ⎢⎜ ⎟
⎢⎣⎝ n ⎠
⎥= p +
⎥⎦
2
n
⎢ where, θˆ = ⎜ ⎟ ⎥
⎢⎣ ⎝ n ⎠ ⎥⎦
" " " (+)
2
So, we can say that θˆ = ⎛⎜ ⎞⎟ is a biased estimator of θ = p 2 . And, we cannot remove the bias by the
x
⎝n⎠
application of a simple adjustment. So, the jackknife estimator is needed to be found here.
And, we can find the jackknife estimator by taking
⎧⎛ x − 1 ⎞ 2
⎪⎜
⎪⎝ n − 1
⎟
⎠
; if xi = 1 (
That is, if the i th trial is a success )
θˆi = ⎨
2
⎪⎛ x ⎞
⎪⎜ n − 1 ⎟
⎩⎝ ⎠
; if xi = 1 (
That is, if the i th trial is a failure )
Now, since there are x success and n − x failures to be removed, then we have that
1 n ˆ 1 ⎡ ⎛ x −1⎞ ⎛ x ⎞ ⎤
2 2
θˆi = ∑
n i =1
θi = ⎢ x ⎜ ⎟ + (n − x)⎜
n ⎢⎣ ⎝ n − 1 ⎠
⎟ ⎥
⎝ n − 1 ⎠ ⎥⎦
1 ⎡ x ( x − 1)
2⎤
= ⎢
2
+
(n − x) x ⎥ =
1 ⎡ x3 − 2 x 2 + x + nx 2 − x3 ⎤
n ⎢ ( n − 1)2 ( n − 1)2 ⎥⎦ n ( n − 1) ⎣ ⎦
2
⎣
1
⇒ θˆi = ⎡ nx 2 − 2 x 2 + x ⎤ " " " (+ +)
n ( n − 1) ⎣ ⎦
2
( n − 1) x 2 − ⎡⎣ nx 2 − 2 x 2 + x ⎤⎦ x ( x − 1)
= = " " " ( A)
n ( n − 1) n ( n − 1)
E ( x ) − E ( x)
2
⇒ E ⎡ J (θˆ ) ⎤ =
⎣ ⎦ n ( n − 1)
npq + n 2 p 2 − np
= ⎡⎣ from (**) ⎤⎦
n ( n − 1)
np − np 2 + n 2 p 2 − np np 2 ( n − 1)
= =
n ( n − 1) n ( n − 1)
⎛ x −x ⎞
()
2
∴ E ⎡ J θˆ ⎤ = E ⎜ ⎟ = p2 ⎡⎣ from ( A)⎤⎦
⎣ ⎦ ⎜ n ( n − 1) ⎟
⎝ ⎠
x ( x − 1)
So, we can say that J θˆ = () is an unbiased and uniformly minimum variance unbiased estimator
n ( n − 1)
θ = pq .
Example: Let, x1 , x2 , ..........., xn be a random sample of size n with the probability density function
f ( x ;θ ) = e ( )
− x −θ
; x >θ
f ( x ;θ ) = e ( )
− x −θ
Solution: Here, we have that ; x >θ
θ ≤ x(1) ≤ x( 2 ) ≤ ......... ≤ x( n ) ≤ ∞
Since, the maximum value of θ consistent with the sample is x(1) , the smallest observation, then we have that
( )
f x( r ) =
n!
( r − 1)!( n − r )! ⎣
r −1
⎡ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦
n−r
f ( x) " " " ( 2)
n −1
= n ⎡⎣1 − F ( x ) ⎤⎦ f ( x) " " " ( 3)
Now, we have that
f ( x; θ ) = e ( )
− x −θ
; x >θ
x
F ( x ) = ∫ e ( ) dx
− x −θ
⇒
θ
x
= − ⎡e ( ) ⎤
− x −θ
⎣ ⎦0
= − ⎡ e ( ) − 1⎤
− x −θ
⎣ ⎦
F ( x) = 1− e ( )
− x −θ
⇒ " " " ( 4)
So, from equation ( 3) and ( 4 ) , we have that
( )
f x(1) = n ⎡⎣1 − F ( x ) ⎤⎦
n −1
f ( x)
n −1
= n ⎡1 − 1 + e ( ) ⎤
− x −θ
f ( x)
⎣ ⎦
n −1
= n ⎡e ( ) ⎤ e ( ) = ne ( )
− x −θ − x −θ − n x −θ
⎣ ⎦
∞
∴ ( )
E x(1) = n ∫ xe ( ) dx
− n x −θ
" " " (***)
θ
⎡ e − nx ∞ ∞ e− nx ⎤
⎢ − n θ θ∫ −n
nθ
= ne ⎢ x − dx ⎥
⎥
⎣ ⎦
⎡ θ e − nθ 1 e− nx ⎤ ∞
= ne nθ ⎢ 0 − + ⎥
⎢ −n n −n θ ⎥
⎣ ⎦
⎡ − nθ ⎤ ⎡ θ e − nθ e− nθ ⎤
θe
= ne nθ ⎢
1
− 2 0 − e− nθ ⎥ = nenθ ⎢ ( + 2 ⎥ )
⎣ n n ⎦ ⎣ n n ⎦
∴ ()
E θˆ = E x(1) = θ +
1
n
( ) " " " (+ +)
So, we can say that θˆ = x(1) is a biased estimator of θ . And, we cannot remove the bias by the application of a
⎛ 2n − 1 ⎞ ( n − 1)
=⎜ ⎟ x(1) − x( 2 )
⎝ n ⎠ n
⎛ n + n −1 ⎞ ( n − 1)
=⎜ ⎟ x(1) − x( 2 )
⎝ n ⎠ n
n ⎛ n −1 ⎞ ( n − 1)
= x(1) + ⎜ ⎟ x(1) − x( 2 )
n ⎝ n ⎠ n
⎛ n −1 ⎞
= x(1) + ⎜
⎝ n ⎠
(
⎟ x(1) − x( 2 ) ) " " " ( A)
∴
⎣ () ⎦ ( )
E ⎡ J θˆ ⎤ = E x(1) + ⎜
⎛ n −1 ⎞ ⎡
⎝ n ⎠⎣
( ) ( )
⎤
⎟ E x(1) − E x( 2 ) ⎦ " " " ( B)
Now, to find the above expected value, first of all we have to find the expected value of the second order
statistics as follows:
From equation ( 2 ) , we have that
( )
f x( r ) =
n! r −1
⎡ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦
( r − 1)!( n − r )! ⎣
n−r
f ( x)
∴ ( )
f x( 2 ) =
( )(
2 − 1
n!
! n − 2 ) !
2 −1
⎣⎡ F ( x ) ⎦⎤ ⎣⎡1 − F ( x ) ⎦⎤
n−2
f ( x)
n−2
= n ( n − 1) ⎡⎣ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦ f ( x) " " " (5)
Now, from equation ( 5 ) and ( 4 ) , we have that
( )
n−2
f x( 2 ) = n ( n − 1) ⎡1 − e ( ) ⎤ ⎡1 − 1 + e ( ) ⎤ e ( )
− x −θ − x −θ − x −θ
⎣ ⎦⎣ ⎦
n −1
= n ( n − 1) ⎡1 − e ( ) ⎤ ⎡ e ( ) ⎤
− x −θ − x −θ
⎣ ⎦⎣ ⎦
n −1 n
= n ( n − 1) ⎡e ( ) ⎤ − n ( n − 1) ⎡ e ( ) ⎤
− x −θ − x −θ
⎣ ⎦ ⎣ ⎦
( )( ) ( )⎤
= n ( n − 1) ⎡e ⎤ − ( n − 1) n ⎡ e
− n −1 x −θ − n x −θ
⎣ ⎦ ⎣ ⎦
∞ ∞
⇒ ( )
E x( 2 ) = n ( n − 1) x ⎡e ( )( ) ⎤ dx − ( n − 1) n x ⎡e ( ) ⎤
∫
⎣
− n −1 x −θ
⎦ ⎣
− n x −θ
⎦ ∫
θ θ
∞
= n ( n − 1) x ⎡e ( )( ) ⎤ − ( n − 1) E x(1)
⎣
− n −1 x −θ
∫ ⎦ ( ) ⎡⎣ from (***) ⎤⎦
θ
∞
⎛ 1⎞
= n ( n − 1) e(
n −1)θ −( n −1) x ⎤
∫ x ⎣⎡e ⎦
− ( n − 1) ⎜ θ + ⎟
⎝ n⎠
θ
⎡ ∞ ⎤
−( n −1) x
e ( )
∞ − n −1 x
⎛ 1⎞
= n ( n − 1) e( ) ⎢ x
e
dx ⎥ − ( n − 1) ⎜ θ + ⎟
n −1 θ
−∫
⎢ − ( n − 1) − ( n − 1) ⎥ ⎝ n⎠
⎢⎣ θ θ ⎥⎦
⎡ 1 ⎤ ⎛ 1⎞
= n ⎢θ + ⎥ − ( n − 1) ⎜ θ + ⎟
⎢⎣ ( n − 1) ⎥⎦ ⎝ n⎠
n n −1
= nθ + − nθ + θ −
n −1 n
⎛ n n −1 ⎞
=θ +⎜ − ⎟
⎝ n −1 n ⎠
⎡ n 2 − ( n − 1)2 ⎤
=θ + ⎢ ⎥
⎢⎣ n ( n − 1) ⎥⎦
2n − 1
=θ + " " " (C )
n ( n − 1)
() ( ) ⎛ n −1 ⎞ ⎡
E ⎡ J θˆ ⎤ = E x(1) + ⎜
⎣ ⎦ ⎝ n ⎠⎣
( ) ( )
⎟ E x(1) − E x( 2 ) ⎦
⎤
⎛ 1 ⎞ ⎛ n − 1 ⎞ ⎡⎛ 1⎞ ⎛ 2n − 1 ⎞ ⎤
= ⎜θ + ⎟ + ⎜ ⎟ ⎢⎜ θ + ⎟ − ⎜⎜ θ + ⎟⎥
⎝ n ⎠ ⎝ n ⎠ ⎣⎢⎝ n⎠ ⎝ n ( n − 1) ⎟⎠ ⎦⎥
1 n − 1 ⎛ n − 1 − 2n + 1 ⎞
=θ + + ⎜ ⎟
n n ⎜⎝ n ( n − 1) ⎟⎠
1 n − 1 ⎛ −n ⎞
=θ + + ⎜ ⎟
n n ⎜⎝ n ( n − 1) ⎟⎠
=θ
∴
⎣ () ⎦
⎡
⎣
⎛ n −1 ⎞
E ⎡ J θˆ ⎤ = E ⎢ x(1) + ⎜
⎝ n ⎠
( ⎤
⎟ x(1) − x( 2 ) ⎥ = θ
⎦
)
()
So, we can say that J θˆ = x(1) + ⎜
⎛ n −1 ⎞
( )
⎟ x(1) − x( 2 ) is an unbiased estimator of θ .
⎝ n ⎠
Location invariant
An estimator T = t ( X1 , X 2 , " , X n ) is defined to be location invariant if and only if
Y1 + Yn
Example: Show that is location invariant where Y1 is the smallest order statistics and Yn is the largest order
2
statistics.
Solution:
Y1 + Yn
So, we can say that is location invariant.
2
Solution:
1 n
t ( x1 , x2 , " , xn ) = s 2 = ∑ ( xi − xn )
2
Let,
n − 1 i =1
Then we have that
2
n ⎛ n ⎞
∑ ∑
1 ⎜ xi + c − 1
t ( x1 + c, x2 + c, ", xn + c ) = ( xi + c ) ⎟⎟
n −1 ⎜ n
i =1 ⎝ i =1 ⎠
n
∑( x − x )
1 2
= i n
n −1
i =1
⇒ t ( x1 + c, x2 + c, ", xn + c ) = t ( x1, x2 , ", xn )
1 n
(
∑ Xi − Xn )
2
So, we can say that s =
2
is not location invariant.
n − 1 i =1
Solution:
Let, t ( x1 , x2 , " , xn ) = Yn − Y1 = max ( x1 , x2 , " , xn ) − min ( x1 , x2 , " , xn )
Then we have that
t ( x1 + c, x2 + c, ", xn + c ) = max ( x1 + c, x2 + c, ", xn + c ) − min ( x1 + c, x2 + c, ", xn + c )
= max ( x1, x2 , ", xn ) + c − min ( x1, x2 , ", xn ) − c
= max ( x1, x2 , ", xn ) − min ( x1, x2 , ", xn )
⇒ t ( x1 + c, x2 + c, ", xn + c ) = t ( x1, x2 ,......, xn )
Location parameter
Let { f (⋅ ; θ ) ; θ ∈ Ω} be a family of densities indexed by a parameter θ . The parameter θ is defined to be a
location parameter if and only if the density f ( x ; θ ) can be written as function of ( x −θ ) . That is
We note that if θ is a location parameter for the family of densities { f (⋅ ; θ ) ; θ ∈ Ω} , then the function h (⋅) of the
Solution:
1
1 − ( x −θ )2
Here, we have that f ( x ; θ ) = φθ , 1 ( x ) = e 2 = φ0, 1 ( x − θ ) = h ( x − θ )
2π
Or, we can say that if X is distributed normally with mean θ and variance 1, then ( X −θ ) has a standard normal
Solution:
Here, we have that
f ( x ; θ ) = I⎛ 1 1⎞ ( x)
⎜θ − 2 , θ + 2 ⎟
⎝ ⎠
1
= =1
1 1
θ + −θ −
2 2
= I⎛ 1 1⎞ ( x −θ )
⎜ − 2, 2 ⎟
⎝ ⎠
= h ( x −θ )
Hence, the distribution of ( X −θ ) is independent of θ . So, we can say that θ is a location parameter.
1 1
Example: If f ( x ; θ ) = , then show that θ is a location parameter.
π ⎡1 + ( x − θ )2 ⎤
⎢⎣ ⎥⎦
Solution:
1 1
Here, we have that f (x ; θ ) = = h( x −θ )
π ⎡1 + ( x − θ )2 ⎤
⎢⎣ ⎥⎦
Hence, the distribution of ( X −θ ) is independent of θ . So, we can say that θ is a location parameter.
Solution:
Here, we have that
1
1 − ( x −θ )2
f ( x ; θ ) = φθ , 9 ( x ) = e 2×9
3 2π
= φ0, 9 ( x − θ ) = h ( x −θ )
Or, we can say that if X is distributed normally with mean θ and variance 9, then ( X −θ ) has a normal distribution
the estimator
n
∫θ ∏ f ( X i ;θ )dθ
t ( X1 , X 2 , " , X n ) = i =1
n
∫∏
i =1
f ( X i ;θ )dθ
is the estimator of θ which has uniformly smallest mean-squared error within the class of location-invariant
estimators.
The estimator given in the above equation is defined to be the pitman estimator location.
Pitman Estimator for Location Parameter ~ 3 of 8
Example: Let, X 1 , X 2 , " , X n be a random sample from a normal distribution with mean θ and the variance unity,
where θ is a location parameter. Find the pitman estimator of θ .
Solution:
We know that the pitman estimator for θ is given by
⎡ 1 2⎤
n
n
⎛ 1 ⎞ n
∫ θ ∏ f ( X i ;θ )dθ ∫θ ⎜ ⎟ exp ⎢ −
⎝ 2π ⎠
∑( Xi −θ ) ⎥ dθ
t ( X1 , X 2 , " , X n ) = i =1
= ⎣⎢ 2 i =1 ⎦⎥
⎡ 1 2⎤
n n
⎛ 1 ⎞ n
∫ ∏ f ( X i ;θ )dθ ⎜ ∫
⎟ exp ⎢ − ∑( Xi −θ ) ⎥ dθ
i =1 ⎝ 2π ⎠ ⎢⎣ 2 i =1 ⎥⎦
⎡ 1⎛ n n ⎞⎤
∫ ∑ ∑
θ exp ⎢ − ⎜⎜ X i2 − 2θ X i + nθ 2 ⎟⎟ ⎥ dθ
= ⎣⎢ 2 ⎝ i =1 i =1 ⎠ ⎦⎥
⎡ 1⎛ n 2 n ⎞⎤
∫ ∑ ∑
exp ⎢ − ⎜ X i − 2θ X i + nθ 2 ⎟ ⎥ dθ
⎜ ⎟⎥
⎣⎢ 2 ⎝ i =1 i =1 ⎠⎦
∫
⎡ 1
(
θ exp ⎢ − −2nX nθ + nθ 2 ⎥ dθ
⎣ 2
⎤
⎦
)
=
∫
⎡ 1
(
exp ⎢ − −2nX nθ + nθ 2 ⎥ dθ
⎣ 2
⎤
⎦
)
⎡ ⎛ ⎞ ⎤
2
⎢ ⎜ ⎟ ⎥
1 θ−X ⎥
θ exp ⎢ − ⎜
∫ ⎟ dθ
⎢ 2⎜ 1 ⎟ ⎥
⎡ n 2
∫ θ exp ⎢−
⎣ 2
( ⎤
θ − 2 X nθ + X 2 ⎥ dθ
⎦
) ⎢
⎢
⎣
⎜ ⎟ ⎥
⎝ n ⎠ ⎦⎥
= =
⎡ n
( ⎤
) ⎡ ⎞ ⎤
2
∫ exp ⎢ − θ 2 − 2 X nθ + X 2 ⎥ dθ
⎣ 2 ⎦
⎢
⎛
1 ⎜θ − X ⎟ ⎥
⎥
⎢
∫
exp − ⎜
⎢ 2⎜ 1 ⎟ ⎥
⎟ dθ
⎢ ⎜ n ⎟ ⎥
⎢⎣ ⎝ ⎠ ⎥⎦
⎡ ⎛ ⎞ ⎤
2
⎢ ⎜ ⎟ ⎥
1 ⎢ 1 θ−X ⎥
∫
θ
1
exp − ⎜
⎢ 2⎜ 1 ⎟ ⎥
⎟ dθ
2π ⎢ ⎜ ⎟ ⎥
n ⎢⎣ ⎝ n ⎠ ⎥⎦
=
⎡ ⎛ ⎞ ⎤
2
⎢ ⎥
1 ⎜θ − X ⎟ ⎥
exp ⎢ − ⎜
1
∫1 ⎢ 2⎜ 1 ⎟ ⎥
⎟ dθ
2π ⎢ ⎜ n ⎟ ⎥
n ⎣⎢ ⎝ ⎠ ⎦⎥
⇒ t ( X 1 , X 2 , " , X n ) = E (θ ) = X n
⎛ 1 1⎞
Example: Let, X 1 , X 2 , " , X n be a random sample from a uniform distribution over the interval ⎜ θ − , θ + ⎟ , where
⎝ 2 2⎠
θ is a location parameter. Find the pitman estimator for θ .
Solution:
We know that the pitman estimator for θ is given by
n
∫ θ ∏ I⎛ ( X i )dθ
n
∫θ ∏ f ( X i ;θ )dθ 1 1⎞
θ− ,θ+ ⎟
i =1 ⎜⎝ 2
t ( X1 , X 2 , " , X n ) = i =1 2⎠
n
= n
∫∏
i =1
f ( X i ;θ )dθ ∫∏ I ⎛ 1 1 ⎞ ( X i )dθ
i =1 ⎜
θ− ,θ+ ⎟
⎝ 2 2⎠
Theorem
A pitman estimator for location is a function of sufficient statistics.
Proof:
We know that if S1 = s1 ( X 1 , X 2 , " , X n ) , " , S k = sk ( X 1 , X 2 , " , X n ) is a set of sufficient statistics, then by the
factorization criterion
n
∏ f ( xi ;θ ) = g ( s1 , s2 , ", sk ; θ ) h ( x1 , x2 , ", xn )
i =1
∫θ ∏ f ( X i ; θ ) dθ
t ( X1, X 2 , " , X n ) = i =1
n
∫∏ f ( X
i =1
i ; θ ) dθ
∫θ g ( S , S , ", S ; θ ) h ( X , X , ", X ) dθ
=
1 2 k 1 2 n
∫ g ( S , S , ", S ; θ ) h ( X , X , ", X ) dθ
1 2 k 1 2 n
)= ∫
θ g ( S , S , ", S ; θ ) dθ
1 2 k
⇒ t ( X1, X 2 , " , X n
∫ g ( S , S , ", S ; θ ) dθ
1 2 k
The above is the function of the sufficient statistics. So, we can say that a pitman estimator is a function of the
sufficient statistics.
Example: Let, X 1 , X 2 , " , X n be a random sample from a normal distribution with mean θ and the variance 9, where θ
15
is a location parameter. Find the pitman estimator of θ when ∑ xi = 225 .
i =1
Pitman Estimator for Location Parameter ~ 5 of 8
Solution:
We know that the pitman estimator for θ is given by
n
∫ θ ∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
∫∏
i =1
f ( X i ; θ ) dθ
⎛
1 ⎞
n
⎡ 1 n ⎛ X − θ ⎞2 ⎤
∫θ ⎜ π
⎟ exp ⎢− ⎜ 3 ⎟ ⎥ dθ
i
∑
⎝ 3 2 ⎠ ⎢⎣ 2 i =1 ⎝ ⎠ ⎦⎥
=
⎛ 1 ⎞
n
⎡ 1 ⎛ Xi −θ ⎞ ⎤
n 2
⎜ ∫
⎝ 3 2π ⎠
⎟ exp ⎢ − ⎜
⎢⎣ 2 i =1 ⎝ 3 ⎠ ⎥⎦
⎟ ∑ ⎥ dθ
⎡ 1 ⎛ n 2 n
2⎞
⎤
∫
θ exp ⎢ − ∑
⎜⎜ X i − 2θ X i + nθ ⎟⎟ ⎥ dθ ∑
= ⎣⎢ 2 × 9 ⎝ i =1 i =1 ⎠ ⎦⎥
⎡ 1 ⎛ n 2 n ⎞⎤
∫
exp ⎢ − ∑
⎜⎜ X i − 2θ X i + nθ ⎟⎟ ⎥ dθ
2
∑
⎣⎢ 2 × 9 ⎝ i =1 i =1 ⎠ ⎦⎥
∫
⎡ n
θ exp ⎢ −
⎣ 2×9
(
−2 X nθ + θ 2 ⎥ dθ
⎤
⎦
)
=
∫
⎡ n
exp ⎢ −
⎣ 2×9
(
−2 X nθ + θ 2 ⎥ dθ
⎤
⎦
)
⎡ ⎛ ⎞ ⎤
2
⎢ ⎜ ⎟ ⎥
1 θ−X ⎥
exp ⎢ − ⎜
1
∫
θ
3 ⎢ 2⎜ 3 ⎟ ⎥
⎟ dθ
⎡ n
∫ θ exp ⎢ −
⎣ 2×9
( ⎤
θ 2 − 2 X nθ + X 2 ⎥ dθ
⎦
) n
2π ⎢
⎢
⎣
⎜ ⎟ ⎥
⎝ n ⎠ ⎦⎥
= =
⎡ n
( 2 ⎤
) ⎡ ⎞ ⎤
2
∫ exp ⎢ − θ − 2 X nθ + X ⎥ dθ ⎛
2
⎢ ⎥
⎣ 2×9 ⎦ 1 ⎜θ − X ⎟ ⎥
exp ⎢ − ⎜
1
∫ 3 ⎢ 2⎜ 3 ⎟ ⎥
⎟ dθ
2π ⎢ ⎜ n ⎟ ⎥
n ⎢⎣ ⎝ ⎠ ⎥⎦
⇒ t ( X1 , X 2 , " , X n ) = E (θ )
n
∑ Xi
1
= Xn =
n i =1
15
So, we can say that X n is a pitman estimator. Now, when ∑ xi = 225 , then the pitman estimator for θ is given by
i =1
15
∑ xi =
1 225
xn = = 15
15 i =1 15
f ( x ; θ ) = e ( ) I(θ ,∞ ) ( x )
− x −θ
for − ∞ < θ < ∞
Solution:
We know that the pitman estimator for θ is given by
n n
∫θ ∏ f ( X i ; θ ) dθ ∫θ ∏ exp ⎡⎣ − ( X i − θ ) ⎤⎦I (θ ,∞ ) ( x ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
= i =1
n
∫∏
i =1
f ( X i ; θ ) dθ ∫∏
i =1
exp ⎡⎣ − ( X i − θ ) ⎤⎦I (θ ,∞ ) ( x ) dθ
∫ θ exp ⎢−∑ ( X i − θ )⎥ dθ
⎢⎣ i =1 ⎥⎦
∫ θ exp ⎢⎣⎢−∑
i =1
X i + nθ ⎥ dθ
⎦⎥
−∞ −∞
= =
Y1
⎡ n ⎤ Y1
⎡ n ⎤
∫ exp ⎢⎣⎢−∑ i =1
( X i − θ ) ⎥ dθ
⎦⎥
∫ exp ⎢⎣⎢−∑
i =1
X i + nθ ⎥ dθ
⎦⎥
−∞ −∞
Y1
enθ e nθ
Y1 Y1
∫ θ e dθ ∫
nθ
θ − dθ
n −∞
n
−∞ −∞
= Y1
= Y1
enθ
∫ enθ dθ
n
−∞ −∞
nY1
Y1 e 1 nY1
− e
∴ t ( X1 , X 2 , " , X n ) =
n n2 = Y1 −
1
nY1
e n
n
Solution:
We know that the pitman estimator for θ is given by
n n
∫ θ ∏ f ( X i ; θ ) dθ ∫ θ ∏θ (1 − θ )
1− x
x
dθ
t ( X1 , X 2 , " , X n ) = i =1
n
= i =1
n
∫∏ ∫∏
1− x
f ( X i ; θ ) dθ θ x (1 − θ ) dθ
i =1 i =1
n
1 ∑ Xi n
∫ θθ i =1
(1 − θ )n −∑ Xi =1
i
dθ
= 0
n
1 ∑ Xi n
∫θ i =1
(1 − θ )n −∑ X i =1
i
dθ
0
n
1 ∑ X i +1 n
∫ θ i=1 (1 − θ )n −∑ Xi =1
i
dθ
= 0
n
1 ∑ Xi n
∫θ i =1
(1 − θ )n −∑ X
i =1
i
dθ
0
n
1 ∑ X i + 2−1 n
n − ∑ X i +1−1 ⎛ n n ⎞
∫θ i =1
(1 − θ ) i =1 dθ β ⎜⎜ ∑ X i + 2, n − ∑ X i + 1⎟⎟
= 0
= ⎝ i =1 i =1 ⎠
⎛ n ⎞
n
n
∑ X i +1−1
∑ ∑
1 n
β ⎜⎜ X i + 1, n − X i + 1⎟⎟
∫θ i =1
(1 − θ )n −∑ X +1−1 dθ
i =1
i
⎝ i =1 i =1 ⎠
0
n n
∑ Xi + 2 n − ∑ Xi +1 n+2
i =1 i =1
= ×
n+3 n n
∑ Xi +1 n − ∑ Xi +1
i =1 i =1
n
∑ Xi +1
∴ t ( X1 , X 2 , " , X n ) = i =1
n+2
Solution:
We know that the pitman estimator for θ is given by
n
∫ θ ∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
∫∏
i =1
f ( X i ; θ ) dθ
∞ n
∫θ ∏
−θ
θe dθ
i =1
= 0
∞ n
∫∏
−θ
θe dθ
0 i =1
∞
∫ θθ
n − nθ
e dθ
= 0
∞
∫θ
n − nθ
e dθ
0
∞
∫θ
n +1 − nθ
e dθ
= 0
∞
∫θ
n − nθ
e dθ
0
∞
∫e
− nθ
θ n + 2 −1dθ
= 0
∞
∫e
− nθ
θ n +1−1dθ
0
n+2
n+2
= n
n +1
n n +1
n +1
=
n
1
∴ t ( X1 , X 2 , " , X n ) = 1 +
n
Scale invariant
An estimator T = t ( X 1 , X 2 , " , X n ) is defined to be scale invariant if and only if
n
∑ Xi
i =1
Example: Show that X n = is scale invariant.
n
Solution:
n
∑ xi
Let, t ( x1 , x2 , " , xn ) = xn = i =1
n
Y1 + Yn
Example: Show that is scale invariant where Y1 is the smallest order statistics and Yn is the largest order
2
statistics.
Solution:
min ( cx1 , cx2 , " , cxn ) + max ( cx1 , cx2 , " , cxn )
t ( cx1 , cx2 , " , cxn ) =
2
min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn )
=c
2
⇒ t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )
Y1 + Yn
So, we can say that is scale invariant.
2
Pitman Estimator for Scale Parameter ~ 1 of 5
1 n
(
∑ Xi − X n )
2
Example: Show that s2 = is scale invariant.
n − 1 i =1
Solution:
1 n
t ( x1 , x2 , " , xn ) = s 2 = ∑ ( xi − xn )
2
Let,
n − 1 i =1
2
1 n ⎛ n ⎞
∑ ∑
1
Then we have that t ( cx1 , cx2 , " , cxn ) = ⎜⎜ cxi − cxi ⎟
n − 1 i =1 ⎝ n ⎟
i =1 ⎠
1 n
=c ∑
n − 1 i =1
( xi − xn )2
⇒ t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )
1 n
(
∑ Xi − X n )
2
So, we can say that s2 = is scale invariant.
n − 1 i =1
Location Parameter
Let { f (⋅ ; θ ) ; θ > 0} be a family of densities indexed by a real parameter θ . The parameter θ is defined to be a
X 1 ⎛x⎞
scale parameter if and only if the density f ( x ; θ ) can be written as a function of . That is, f ( x ; θ ) =
θ ⎜⎝ θ ⎟⎠
h
θ
for some function h ( ⋅) . Equivalently θ is a scale parameter for the density f X ( x ; θ ) of a random variable X if
X
and only if the distribution of does not depend on θ .
θ
We note that if θ is a scale parameter for the family of densities { f (⋅ ; θ ) ; θ > 0} , then the function h (⋅) of the
Solution:
2
1⎛ x ⎞
1 − ⎜ ⎟
Here, we have that f ( x ; θ ) = φ0, σ 2 ( x ) = e 2⎝ σ ⎠
σ 2π
⎛x⎞ ⎛x⎞
= φ0, 1 ⎜ ⎟ = h⎜ ⎟
⎝σ ⎠ ⎝σ ⎠
X
Or, we can say that if X is distributed normally with mean 0 and variance σ 2 , then has a standard normal
θ
X
distribution. Hence, the distribution of is independent of θ .
θ
So, we can say that θ is a scale parameter.
x
1 −
Example: If f ( x ; θ ) = e θ I ( 0, ∞ ) ( x ) , then show that θ is a scale parameter.
θ
Solution:
x
1 − ⎛x⎞
Here, we have that f (x ; θ ) = e θ I ( 0, ∞ ) ( x ) = h ⎜ ⎟
θ ⎝θ ⎠
X
Hence, the distribution of is independent of θ . So, we can say that θ is a scale parameter.
θ
1
Example: If f ( x ; θ ) = I ( x ) , then show that θ is a scale parameter.
θ ( 0, θ )
Solution:
Here, we have that
1
f (x ; θ ) = I ( x)
θ ( 0, θ )
1 ⎛x⎞ ⎛x⎞
= I ( 0, 1) ⎜ ⎟ = h⎜ ⎟
θ ⎝θ ⎠ ⎝θ ⎠
X
Hence, the distribution of is independent of θ . So, we can say that θ is a scale parameter.
θ
Assume that f ( x ; θ ) = 0 for x ≤ 0 . That is, the random variable X i assume only the positive values. Then, the
estimator
n
⎛ 1 ⎞
∫ ⎜ 2⎟
⎝θ ⎠
∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
⎛ 1 ⎞
∫ ⎜⎝ θ 3 ⎟⎠ ∏
i =1
f ( X i ; θ ) dθ
is the estimator of θ which has uniformly smallest risk within the class of scale-invariant estimators for the loss
( t − θ )2
function l ( t ; θ ) = .
θ2
The estimator given in the above equation is defined to be the pitman estimator for scale.
1
f (x ; θ ) = I ( x)
θ ( 0, θ )
Find the pitman estimator of θ for the scale parameter.
Solution:
We know that the pitman estimator for θ is given by
n
⎛ 1 ⎞
∫ ⎜⎝ θ 2 ⎟⎠ ∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
⎛ 1 ⎞
∫ ⎜⎝ θ 3 ⎟⎠ ∏
i =1
f ( X i ; θ ) dθ
∞ n
⎛ 1 ⎞
n 1 ⎛1⎞
∫ ⎜ 2⎟ ∏θ
1
I ( 0, θ ) ( X i ) dθ ∫ θ 2 ⎜⎝ θ ⎟⎠ dθ
⎝θ ⎠ i =1 Yn
= n
= ∞
⎛ 1 ⎞
n
1 ⎛1⎞
∏θ
1
∫ I ( 0, θ ) ( X i ) dθ
⎜ 3⎟
⎝θ ⎠ i =1
∫ ⎜ ⎟ dθ
θ3 ⎝θ ⎠
Yn
∞
∞ ⎡ θ −( n + 2 ) +1 ⎤
−( n + 2 )
∫θ dθ ⎢ ⎥
⎢⎣ − ( n + 2 ) + 1 ⎥⎦Y
Yn
= ∞
= n
∞
− ( n + 3) ⎡ θ −( n + 3) +1 ⎤
∫θ dθ ⎢ ⎥
Yn ⎢⎣ − ( n + 3) + 1 ⎥⎦Y
n
1
− ( n + 1)
0 − Yn− n −1( )
=
1
− ( n + 2)
(
0 − Yn− n − 2 )
n+2
∴ t ( X1 , X 2 , " , X n ) = × Yn
n +1
n
Note: We know that Yn is a complete sufficient statistic and E (Yn ) = θ . So, by the Lehmann-Scheffe theorem
n +1
n +1
Yn is the UMVUE of θ .
n
Solution:
We know that the pitman estimator of θ for scale parameter is given by
n
⎛ 1 ⎞
∫ ⎜⎝ θ 2 ⎟⎠ ∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
.
⎛ 1 ⎞
∫ ⎜ 3⎟
⎝θ ⎠
∏ f ( X i ; θ ) dθ
i =1
n x
⎛ 1 ⎞ −
∫ ⎜⎝ θ 2 ⎟⎠ ∏
1
e θ I ( 0, ∞ ) ( x ) dθ
i =1
θ
= x
n
⎛ 1 ⎞ −
∏θ
1
∫ ⎜ 3⎟ e θ I ( 0, ∞ ) ( x ) dθ
⎝θ ⎠ i =1
⎛ n ⎞ 0 ∫
Z n e− Z dz
⎜ ∑
= ⎜ Xi ⎟ ∞
⎟
⎝ i =1 ⎠
∫
Z n +1e− Z dz
0
∞
⎞∫
−Z
e Z n +1−1dz
⎛ n
⎜ ∑
= ⎜ X i ⎟ ∞0
⎟
⎝ i =1 ⎠
∫e
−Z
Z n + 2 −1dz
0
⎛n ⎞ n +1
∑
= ⎜ Xi ⎟
⎜ ⎟
⎝ i =1 ⎠ n + 2
⎛ n ⎞
∑
⎜⎜ X i ⎟⎟
t ( X1 , X 2 , " , X n ) = ⎝ ⎠
i =1
∴
n +1
n n
∑ Xi ∑ Xi
i =1 i =1
Note: It can be shown that UMVUE of θ is . Again note that is a scale invariant estimator and hence
n n
n
∑ Xi ( t − θ )2
i =1
is a scale-invariant estimator having uniformly smallest risk for the loss function l ( t ; θ ) = , the risk
n +1 θ2
n n
∑ Xi ∑ Xi 1
i =1 i =1
of is uniformly smaller than the risk of . Also, since here risk equals times the MSE , the MSE
n +1 n θ2
n n
∑ Xi ∑ Xi
i =1 i =1
of is uniformly smaller than the MSE of .
n +1 n
Decision Function
A decision function δ ( x ) is a statistic that takes values in D , that is, δ is a Borel measurable function that maps
R n into D .
Prior Distribution
Let f (θ ) be the probability distribution of the parameter θ which is also summarizes the objective information
about θ prior to obtaining sample observation. We will choose f (θ ) with sampler variance, so that f (θ ) is the
prior distribution of θ .
Posterior Distribution
Consider a random variable X and the distribution of X is denoted by f ( x | θ ) . This distribution depends on θ ,
Let x1 , x2 , " , xn be a random sample, then the joint distribution can be written as
The posterior distribution of θ as the conditional distribution of θ given the sample values or sample measures. So,
f ( x1 , x2 , " , xn , θ ) Where,
f (θ x1 , x2 , " , xn ) =
f ( x1 , x2 , " , xn ) f ( x1 , x2 , " , xn , θ ) = joint distribution
f ( x1 , x2 , " , xn | θ ) f (θ ) of sample & θ
=
f ( x1 , x2 , " , xn ) = f (θ ) f ( x1 , x2 , " , xn | θ )
(
Thus f θ x1 , x2 , " , xn ) is known as the posterior distribution of θ .
Example: A time failure of a transistor is known to be exponentially distributed with parameter θ having the density
function:
f ( x θ ) = θ e−θ x ; x>0
g Θ (θ ) = ke − kθ ; θ >0
That is, Θ is also exponentially distributed over the interval ( 0, ∞ ) . Find the posterior distribution of Θ .
f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =
f X1 , X 2 , ", X n ( x1 , x2 , " , xn )
n
∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )
i =1
= n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦
n ⎛ n ⎞
−θ ∑ xi −θ ⎜ ∑ xi + k ⎟
− kθ ⎜ ⎟
θ e n i =1
ke θ n
e ⎝ i =1 ⎠
= =
n
⎛ n ⎞
∞ −θ ∑ xi ∞ −θ ⎜ ∑ xi + k ⎟
⎜ ⎟
∫θ
− kθ
dθ
∫θ
n i =1 n ⎝ i =1 ⎠ dθ
e ke e
0 0
⎛ n ⎞
−θ ⎜ ∑ xi + k ⎟
⎜ ⎟
θ e
n ⎝ i =1 ⎠
=
⎛ n ⎞
∞ − ⎜ ∑ xi + k ⎟θ
⎜ ⎟
∫e ⎝ i =1 ⎠ θ n +1−1dθ
0
n +1
⎛ n ⎞ ⎛ n ⎞
θ e
n
−θ ⎜ ∑ xi + k ⎟
⎜
⎝ i =1
⎟
⎠
∑
⎜⎜ xi + k ⎟⎟ ⎛ n ⎞
− ⎜ ∑ xi + k ⎟θ
⎜ ⎟
=⎝ ⎠
i =1
= e ⎝ i =1 ⎠ θ n +1−1
n +1 n +1
n +1
⎛ n ⎞
∑
⎜⎜ xi + k ⎟⎟
⎝ i =1 ⎠
⎛ n ⎞
⇒ f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = Gamma ⎜ n + 1, xi + k ⎟ ∑ ; θ ≥0
⎜ ⎟
⎝ i =1 ⎠
Example: Let X 1 , X 2 , " , X n denote a random sample from normal distribution with the density
⎡ 1 2⎤
f (x θ ) =
1
exp ⎢ − ( x − θ ) ⎥ ; −∞ ≤ x ≤ ∞
2π ⎣ 2 ⎦
Assume that the prior distribution of Θ is given by
1 ⎡ 1 ⎤
g Θ (θ ) = exp ⎢ − θ 2 ⎥ ; −∞ ≤θ ≤ ∞
2π ⎣ 2 ⎦
That is, Θ is standard normal. Find the posterior distribution of Θ .
Solution:
We know that the posterior distribution of Θ is given by
f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =
f X1 , X 2 , ", X n ( x1 , x2 , " , xn )
n
∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )
i =1
= n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦
⎡ 1 n 2⎤
n
⎛ 1 ⎞ ⎡ 1 ⎤
⎟ exp ⎢ − ∑ ( xi − θ ) ⎥
1
⎜ exp ⎢ − θ 2 ⎥
⎝ 2π ⎠ ⎢⎣ 2 i =1 ⎦⎥ 2π ⎣ 2 ⎦
=
∞
⎡ ⎤
n
⎛ 1 ⎞ n
⎡ 1 ⎤
∫ ⎜⎝ 2π ⎟⎠ exp ⎢⎢⎣− 2 ∑
1 1
( xi − θ )2 ⎥ exp ⎢ − θ 2 ⎥ dθ
−∞ i =1 ⎥⎦ 2π ⎣ 2 ⎦
⎡ n +1 ⎛ n ⎛ n ⎞ ⎞⎤
2
exp ⎢ − ⎜ θ 2 − 2θ x +⎜ x ⎟ ⎟⎥
⎢⎣ 2 ⎜⎝ n +1 ⎝ n + 1 ⎠ ⎟⎠ ⎥⎦
=
∞ ⎡ n +1 ⎛ n ⎛ n ⎞ ⎞⎤
2
∫exp ⎢ −
⎢⎣ 2 ⎜⎝
⎜ θ 2 − 2θ
n +1
x +⎜ x ⎟ ⎟ ⎥ dθ
⎝ n + 1 ⎠ ⎟⎠ ⎥⎦
−∞
⎡ ⎛ n ⎞ ⎤
2
⎢ θ− x⎟ ⎥
1⎜ n +1 ⎟ ⎥
exp ⎢ − ⎜
⎢ 2⎜ 1 ⎟ ⎥
⎢ ⎜ ⎟ ⎥
⎢⎣ ⎝ n + 1 ⎠ ⎥⎦
=
⎡ ⎛ n ⎞ ⎤
2
⎢ θ − ⎟ ⎥
1⎜
∞ x
exp ⎢ − ⎜ n + 1 ⎟ ⎥ dθ
∫ ⎢ 2⎜ 1 ⎟ ⎥
−∞ ⎢ ⎜ ⎥
⎣⎢ ⎝ n + 1 ⎟⎠ ⎦⎥
⎡ ⎛ n ⎞ ⎤
2
⎢ θ − ⎟ ⎥
1⎜
x
1
exp ⎢ − ⎜ n +1 ⎟ ⎥
1 ⎢ 2⎜ 1 ⎟ ⎥
2π ⎢ ⎜ ⎥
n +1 ⎣⎢ ⎝ n + 1 ⎟⎠ ⎦⎥
=
⎡ ⎛ n ⎞ ⎤
2
∞ ⎢ ⎜ θ− x⎟ ⎥
1
exp ⎢ − ⎜
1 n + 1 ⎟ ⎥ dθ
∫ 1 ⎢ 2⎜ 1 ⎟ ⎥
−∞ 2π ⎢ ⎜ ⎥
n +1 ⎣⎢ ⎝ n + 1 ⎟⎠ ⎦⎥
⎛ 1 ⎞
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = N ⎜ θ ;
n
⇒ −∞ ≤θ ≤ ∞
n + 1 n + 1 ⎟⎠
x, ;
⎝
Example:
Let X 1 , X 2 , " , X n denote a random sample from Poisson distribution with the density
e −θ θ x
f (x θ ) = ; x = 0, 1, " , ∞
x!
Assume that the prior distribution of Θ is given by
⎛1 ⎞
⎜ ⎟ − 1θ
β
g Θ (θ ) = ⎝ ⎠ e β θ α −1 ; θ >0
α
That is, Θ is standard normal. Find the posterior distribution of Θ .
f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =
f X1 , X 2 , ", X n ( x1 , x2 , " , xn )
n
∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )
i =1
= n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫∏
i =1
⎣ ⎦
(× 1β ) e
n
− nθ
∑ xi 1
− θ
e θ i=1 β
θ α −1
n
α
∏ ( x !)
i =1
=
( 1β ) e
n
∞ − nθ
∑ xi 1
− θ
e θ i =1
∫ θ α −1dθ
β
×
n
α
0
∏ ( x !)
i =1
⎛ 1⎞ ⎛ n ⎞ ⎛ 1⎞ ⎛ n ⎞
− ⎜ n + ⎟θ ⎜⎜ ∑ xi +α −1⎟⎟ −⎜ n + ⎟θ ⎜⎜ ∑ xi +α −1⎟⎟
β ⎠ ⎝ i =1 β ⎠ ⎝ i=1
e ⎝ θ ⎠ e ⎝ θ ⎠
= =
⎛ n ⎞
∞ −⎛⎜ n + 1 ⎞⎟θ ⎜ ∑ xi +α −1⎟
n
∫e ⎝ β ⎠ ⎝⎜ i=1
θ
⎟
⎠ dθ ∑ xi + α
i =1
0 n
( n + 1β ) ∑ xi +α
i =1
=
( n + 1β ) ∑ xi +α
i =1
e
⎛
⎝
1⎞ ⎛ n
β ⎠ ⎝ i=1
θ
⎞
−⎜ n + ⎟θ ⎜⎜ ∑ xi +α −1⎟⎟
⎠ ; θ >0
n
∑ xi + α
i =1
⇒
⎛ n
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = Gamma ⎜ xi + α , n + 1 ⎟
⎜
⎝ i =1
⎞
β ⎟
⎠
∑ ( )
Posterior Bayes estimator
Let X 1 , X 2 , " , X n be a random sample from a density f ( x | θ ) , where θ is the value of a random variable Θ
with known density g Θ ( ⋅) . The posterior Bayes estimator of τ (θ ) with respect to the prior density g Θ ( ⋅) is defined
to be
E ⎡⎣τ (θ ) | X1 , X 2 , " , X n ⎤⎦
∫ τ (θ ) ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) dθ
i =1
= n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦
One might note the similarity between the posterior Bayes estimator of τ (θ ) = θ and the Pitman estimator of a
location parameter.
f ( x θ ) = θ x (1 − θ )
1− x
I ( 0, 1) ( x ) for 0 ≤θ ≤1
g Θ (θ ) = I( 0, 1) (θ )
That is, Θ is uniformly distributed over the interval ( 0,1) . Find the posterior distribution of Θ and find the Bayes
estimator of θ and θ (1 − θ ) .
Solution:
We know that the posterior distribution of Θ is given by
f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =
f X1 , X 2 , ", X n ( x1 , x2 , " , xn )
n n
∏ ⎡ f ( xi θ ) ⎤ gΘ (θ ) ∑ xi n
i =1
⎣ ⎦ θ i=1 (1 − θ )n−∑
i =1
xi
I ( 0, 1) (θ )
= n
= n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ∑ xi
∫∏
n
n − ∑ xi
i =1
⎣ ⎦ ∫θ i =1
(1 − θ ) i =1 I ( 0, 1) (θ ) dθ
n
∑ xi +1−1 n
θ i=1 (1 − θ )n −∑
i =1
xi +1−1
=
⎛ n n ⎞
β ⎜⎜ ∑ xi + 1, n − ∑ xi + 1⎟⎟
⎝ i =1 i =1 ⎠
⎛ n n ⎞
⇒ f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = Beta 1st ⎜⎜θ ; ∑ xi + 1, n − ∑ xi + 1⎟⎟ ; 0 ≤θ ≤1
⎝ i =1 i =1 ⎠
Again, we have that the posterior Bayes estimator of θ with respect to the prior distribution g Θ (θ ) = I ( 0, 1) (θ ) is
given by
∫ θ ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )dθ ∑ xi n
i =1 ∫ θθ i=1 (1 − θ )n −∑ x i =1
i
I ( 0, 1) (θ ) dθ
= n
= n
⎡ f ( xi θ ) ⎤ gΘ (θ )dθ ∑ xi
∫∏
n
n − ∑ xi
i =1
⎣ ⎦ ∫ θ i =1
(1 − θ ) i =1 I ( 0, 1) (θ ) dθ
n
1 ∑ xi +1 n
∫ θ i=1 (1 − θ )n −∑ x i =1
i
dθ
= 0
n
1 ∑ xi n
n − ∑ xi
∫θ i =1
(1 − θ ) i =1 dθ
0
⎛ n n ⎞ ⎛ n ⎞
β ⎜⎜ ∑ xi + 2, n − ∑ xi + 1⎟⎟ ∑
⎜⎜ xi + 1⎟⎟ n + 2
= ⎝ ⎠ =⎝ ⎠
i =1 i =1 i =1
⎛ n n ⎞ n+3
∑
β ⎜⎜ xi + 1, n − xi + 1⎟⎟ ∑
⎝ i =1 i =1 ⎠
n
∑ xi + 1
∴ E ⎡⎣τ (θ ) X1 = x1 , " , X n = xn ⎤⎦ = i =1
n+2
Again, we have that the posterior Bayes estimator of θ (1 − θ ) with respect to the prior distribution
g Θ (θ ) = I ( 0,1) (θ ) is given by
= ∫ θ (1 − θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ
n
⎡ f ( xi θ ) ⎤ gΘ (θ )dθ
∫ θ (1 − θ ) ∏
i =1
⎣ ⎦ ∫ θ (1 − θ ) f X , X , ", X ,Θ ( x1 , x2 , ", xn , θ ) dθ
= =
1 2 n
∫ f X , X , ", X ,Θ ( x1 , x2 , ", xn , θ ) dθ
n
∫ ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )dθ
i =1
1 2 n
n
∑ xi n
n − ∑ xi
=
∫ θ (1 − θ )θ i =1
(1 − θ ) i =1 I ( 0, 1) (θ ) dθ
n
∑ xi n
n − ∑ xi
∫ θ i =1
(1 − θ ) i =1 I ( 0, 1) (θ ) dθ
n
1 ∑ xi +1 n
⎛ n n ⎞
∫ θ i=1 (1 − θ )n −∑
i =1
xi +1
dθ β ⎜⎜ ∑ xi + 2, n − ∑ xi + 2 ⎟⎟
= 0
= ⎝ i =1 i =1 ⎠
⎛ n ⎞
n n
1 ∑ xi n
β ⎜⎜ ∑ xi + 1, n − ∑ xi + 1⎟⎟
∫θ i =1
(1 − θ )n −∑ xi =1
i
dθ
⎝ i =1 i =1 ⎠
0
⎛ n ⎞ n
⎜⎜ ∑ xi + 1⎟⎟ n − ∑ xi + 1 n + 2
=⎝ ⎠
i =1 i =1
n+4
⎛ n ⎞ n
⎜⎜ ∑ xi + 1⎟⎟ n − ∑ xi + 1
E ⎡⎣τ (θ ) X 1 = x1 , " , X n = xn ⎤⎦ = ⎝ ⎠
i =1 i =1
⇒
( n + 3)( n + 2 )
Hence, the posterior Bayes estimator of θ (1 − θ ) with respect to the uniform prior distribution is given by
⎛ n ⎞ n
⎜⎜ ∑ xi + 1⎟⎟ n − ∑ xi + 1
⎝ i =1 ⎠ i =1
. We noted in the above example that the posterior Bayes estimator that we obtained was
( n + 3)( n + 2 )
not unbiased.
The following remark states that in general a posterior Bayes estimator is not unbiased.
Remark: Let TG* = tG* ( X1 , X 2 , " , X n ) denote the posterior Bayes estimator of τ (θ ) with respect to a prior distribution
var ⎡TG* θ ⎤ = 0
⎣ ⎦
Proof:
Let us suppose that TG is an unbiased estimator of τ (θ ) . That is
*
( )
E TG* θ = τ (θ )
( ) ⎣ ⎣⎢ ⎥⎦ ⎦ ( )
Var TG* = E ⎡⎢Var ⎡ TG* Θ ⎤ ⎤⎥ + Var ⎡⎢ E ⎡ TG* Θ ⎤ ⎤⎥
⎣ ⎣⎢ ⎥⎦ ⎦ ( )
⎣ ⎢⎣ ( )
= E ⎢⎡Var ⎡ TG* Θ ⎤ ⎥⎤ + Var ⎡⎣τ (θ ) ⎤⎦
⎥⎦ ⎦
" " " (1)
And
⎣
( )
E ⎢⎡Var ⎡ TG* Θ ⎤ ⎥⎤ + Var ⎡⎣τ (θ ) ⎤⎦ = Var ⎡⎣τ ( Θ ) ⎤⎦ − E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤
⎣⎢ ⎦⎥ ⎦ ⎣ ⎦
⇒
⎣ ⎣⎢
( )
E ⎡⎢Var ⎡ TG* Θ ⎤ ⎤⎥ + E ⎡Var ⎣⎡τ ( Θ ) X1 , X 2 , " , X n ⎦⎤ ⎤ = 0
⎦⎥ ⎦ ⎣ ⎦
⎡
Now, since both E Var ⎡ TG Θ ⎤
⎢⎣ ⎣⎢
( )
* ⎤ and E ⎡Var ⎡τ Θ X , X , " , X ⎤ ⎤ are non-negative and their sum is zero,
⎦⎥ ⎥⎦ ⎣ ⎣ ( ) 1 2 n ⎦⎦
⎡
⎢⎣ ⎢⎣ ( ) ⎤
*
⎥⎦ ⎥⎦ ⎢⎣ ( )
In particular, E Var ⎡ TG Θ ⎤ = 0 and since Var ⎡ TG Θ ⎤ is non-negative and has zero expectation, then
*
⎥⎦
⎣⎢
( )
Var ⎡ TG* θ ⎤ = 0 .
⎥⎦
Loss Function
Consider estimating g (θ ) , let t = t ( x1 , x2 , " , xn ) denote an estimate of g (θ ) . The loss function, denoted by
2) l ( t ; θ ) = 0 for t = g (θ ) .
l ( t ; θ ) equals the loss incurred if one estimates g (θ ) to be t when θ is the true parameter value.
The word ‘loss’ is used in place of ‘error’ and loss function is used as the measure of the ‘error’.
⎧A
⎪ if t − g (θ ) > ε
3) l3 ( t ; θ ) = ⎨ .
⎪⎩0 if t − g (θ ) ≤ ε , where A > 0
Note that both l1 and l2 increases as the error T − g (θ ) increases in magnitude. l3 says that we loss nothing if
the estimate t is within ε units of g (θ ) and otherwise we loss the amount A . l4 is a general loss function that
Risk Function
For a given loss function l ( ⋅ ; ⋅) , the risk function, denoted by Rt (θ ) , of an estimator T = t ( X 1 , X 2 , " , X n ) is
defined to be
Rt (θ ) = Eθ ⎡⎣l (T ;θ ) ⎤⎦
The risk function is the average loss. The expectation in the above equation can be taken in two ways. For example,
R (θ , t ) = Eθ ⎡⎣l (T ; θ ) ⎤⎦ = l (T ;θ ) fT ( t ) dt
∫
Where, fT ( t ) is the density of the estimator T . In either case, the expectation averages out the values of
l1 ( t ;θ ) = ⎡⎣t − g (θ ) ⎤⎦
2
1) Corresponding to the loss function the risk function is given by
Rt (θ ) = Eθ ⎡⎣l1 (T ;θ ) ⎤⎦ = Eθ ⎡⎣t − g (θ ) ⎤⎦
2
Corresponding to the loss function l4 ( t ; θ ) = ρ (θ ) t − g (θ ) for ρ (θ ) ≥ 0 and r > 0 the risk function is
r
4)
given by Rt (θ ) = Eθ ⎡⎣l4 (T ;θ ) ⎤⎦ = ρ (θ ) Eθ ⎡ t − g (θ ) z ⎤ .
r
⎣ ⎦
Bayes and Minimax Estimation ~ 8 of 20
When a loss function is said to be Convex and Strictly Convex?
A real valued function L ( t ; θ ) defined over an open interval I = ( a, b ) with −∞ < t < t * < b and any 0 < γ < 1
( )
L ⎡⎣γ t + (1 − γ ) t * ⎤⎦ ≤ γ L ( t ) + (1 − γ ) L t * " " " (1)
The function is said to be strictly convex if strict inequality holds in (1) , for all indicated values of t , t and γ .
*
Convexity is a vary strong condition which implies, for example, that L is continuous in ( a, b ) and has a left and
Determination of Convexity
Determination of whether or not a loss function is conves is often easy with the help of the following two criteria.
a) If L is defined and differentiable on ( a, b ) , then a necessary and sufficient condition for L to be convex is that
( )
L′ ( t ) ≤ L′ t * for all a < t < t * < b " " " (1)
The function is strictly convex iff (1) is strict for all t < t * .
B ( d ) = E ⎡⎣ R ( d , θ ) ⎤⎦ ∫
= R ( d , θ ) f (θ ) dθ
third braket of the equation (**) for every set of x values. That is, the Bayes estimator of θ is a function of d of
∫ l {d ( x , ", x ) ; θ } f ( x , ", x | θ ) f (θ ) dθ
1 n 1 n
Since f ( x1 , " , xn , θ ) = f ( x1 , " , xn | θ ) f (θ )
= ∫ l {d ( x ) ; θ } f ( x , " , x , θ ) dθ f ( x1 , " , xn | θ ) f (θ )
f (θ | x1 , " , xn ) =
1 n
⇒
f ( x1 , " , xn )
= f ( x , " , x ) ∫ l {d ( x ) ; θ } f (θ | x , " , x ) dθ
1 n 1 n
∫ l {d ( x ) ; θ } f (θ | x , ", x ) dθ = Y
1 n ( say )
∫ ∫
Y = ⎡⎣ d ( x ) − θ ⎤⎦ f (θ | x1 , " , xn ) dθ = ⎡⎣ d ( x ) ⎤⎦ f (θ | x1 , " , xn ) dθ
2 2
∫ ∫
− 2 θ d ( x ) f (θ | x1 , " , xn ) dθ + θ 2 f (θ | x1 , " , xn ) dθ
∂Y
=0
∂ ⎡⎣ d ( x ) ⎤⎦
∫ ∫
⇒ 2 d ( x ) f (θ | x1 , " , xn ) dθ − 2 θ f (θ | x1 , " , xn ) dθ = 0
⇒ d ( x) =
∫ θ f (θ | x , ", x ) dθ
1 n
= Expected Posterior
∫ f (θ | x , ", x ) dθ
1 n
Hence, d ( x ) is the Bayes estimate for θ if the loss function is in squared error.
a) We make inferences about the unknown parameters given the data whereas in the classical approach we look at
the long run behavior e.g. in 95% of experiments p will lie between p ′ and p ′′ .
b) The posterior distribution tells the whole story and if a point estimate or confidence interval be desired they can
immediately be obtained from posterior distribution.
c) Bayesian approach provides solutions for problems which do not have solutions from the classical point of view.
Note:
a) A decision Rule δ is said to be uniformly better than a decision rule δ ′ if R (δ , θ ) ≤ R (δ ′, θ ) ∀ θ ∈ Θ with
b) A decision rule δ * is said to be uniformly best in a class of decision rules D if δ * is uniformly better than any
other decision rule δ ∈ D .
c) A decision rule is said to be admissible in a class of D if there exists no other decision rule in D which is
uniformly better that that δ .
( )
Example: Let X 1 , X 2 , " , X n be independent N µ , σ 2 variables where µ is unknown but σ 2 is known. Let the prior
(
distribution of µ be N θ , σ
2
) . Find the Bayes estimate of µ .
Solution:
The joint conditional distribution of the sample given µ is
n
⎛ 1 ⎞ ⎡ 1 2⎤
∑ ( xi − µ )
2
f ( x1 , " , xn | µ ) = ⎜ ⎟ exp ⎢ − 2 ⎥
⎝ 2πσ 2 ⎠ ⎣ 2σ ⎦
n
⎛ 1 ⎞ ⎡ n 2⎤
∑ ( xi − x )
2 1
exp ⎢ − 2 ( x − µ ) − 2
2
=⎜ ⎟ ⎥
⎝ 2πσ 2 ⎠ ⎣ 2σ 2σ ⎦
⎡ 1 2⎤
∴ f ( x1 , " , xn | µ ) ∝ exp ⎢ − 2 ( x − µ ) ⎥
⎣ 2σ ⎦
⎣⎢ 2σ 2σ 0 ⎦⎥
⎡ 1 ⎛ nσ 2 + σ 2 ⎞ ⎛ nxσ 02 + θσ 02 ⎞
2⎤
∝ exp ⎢ − ⎜ 02 2 ⎟ ⎜ µ − ⎟⎟ ⎥
⎢ 2 ⎜⎝ σ 0 σ ⎟⎜
⎠⎝ nσ 02 + σ 2 ⎠ ⎥
⎣ ⎦
⎡ nxσ 02 + θσ 02 σ 02σ 2 ⎤
∴ f (µ | x) ~ N ⎢ , ⎥
⎣⎢ nσ 0 + σ nσ 02 + σ 2 ⎦⎥
2 2
nxσ 02 + θσ 02
If the loss function is squared error, the Bayes estimator of µ is
nσ 02 + σ 2
Theorem
Let X 1 , X 2 , " , X n be a random sample from the density f ( x | θ ) and let g Θ (θ ) be the density of Θ . Further, let
l ( t ; θ ) be the loss function for estimating τ (θ ) . The Bayes estimator of τ (θ ) is that estimator t * ( ⋅ ; ", ⋅) which
minimizes
∫ l (t ( x , x , ", x ) ; θ ) f
Θ
1 2 n Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ
as a function of t ( ⋅ ; ", ⋅) .
Proof
For a general loss function l ( t ; θ ) , we seek that estimator, say t * ( ⋅ ; ", ⋅) , which minimizes the expression
∫ Rt (θ ) gΘ (θ ) dθ = ∫ Eθ ⎡⎣l ( t ; θ )⎤⎦ gΘ (θ ) dθ
Θ Θ
= ∫ Eθ ⎡⎣l ( t ( x1 , x2 , " , xn ) ; θ ) ⎤⎦ g Θ (θ ) dθ
Θ
⎡ n ⎤
= ∫ ⎢ ∫ l ( t ( x1 , " , xn ) ; θ ) f X1 , X 2 , ", X n Θ=θ ( x1 , x2 , " , xn θ ) ∏ dxi ⎥ g Θ (θ ) dθ
Θ⎣⎢R i =1 ⎦⎥
⎡ f X , ", X n Θ=θ ( x1 , " , xn θ ) g Θ (θ ) dθ ⎤ n
= ∫ ⎢ ∫ l ( t ( x1 , " , xn ) ; θ ) 1 ⎥ f X , ", X ( x1 , " , xn ) ∏ dxi
⎢
R ⎣Θ
f X1 , ", X n ( x1 , " , xn ) ⎥ 1 n
i =1
⎦
⎡ ⎤ n
= ∫ ⎢ ∫ l ( t ( x1 ," , xn ) ; θ ) f Θ X1 = x1 ,...., X n = xn (θ x1 , " , xn ) dθ ⎥ f X1 , ", X n ( x1 , " , xn ) ∏ dxi
⎢
R ⎣Θ
⎥
⎦ i =1
Since, the integral is non-negative, the double integral can be minimized if the expression within the braces, which is
sometimes called the posterior risk, is minimized for each x1 , x2 , " , xn .
So, in general, the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that
estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of
Θ given the observations x1 , x2 , " , xn .
∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ
1 1 n n
x1 , " , xn ) dθ
Θ
Hence, the theorem is proved.
Bayes and Minimax Estimation ~ 11 of 20
Theorem
Let X 1 , X 2 , " , X n be a random sample from the density f ( x | θ ) and let g Θ (θ ) be the density of Θ . Further, let
l ( t ; θ ) = ⎡⎣t ( x1 , " , xn ) − τ (θ ) ⎤⎦
2
n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫τ (θ ) ∏ ⎣ ⎦
E ⎡⎣τ (θ ) | X1 = x1 , " , X n = xn ⎤⎦ = i =1
n
∫ ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) dθ
i =1
Proof
We, know that the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that
estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of
Θ given the observations x1 , x2 , " , xn .
∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ
1 1 n n
x1 , " , xn ) dθ
Θ
Here, the loss function is squared error loss function. So, we have that the Bayes estimator of τ (θ ) is that estimator
which minimizes
⎡⎣τ (θ ) − t ( x1 , " , xn ) ⎤⎦
2
with respect to the posterior distribution of Θ given X1 = x1 , " , X n = xn , which is minimized as a function of
t ( x1 , " , xn ) for t * ( x1 , " , xn ) equal to the conditional expectation of τ ( Θ ) with respect to the posterior distribution
of Θ given X 1 = x1 , " , X n = xn .
n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫ τ (θ ) ∏ ⎣ ⎦
E ⎡⎣τ (θ ) X 1 = x1 , " , X n = xn ⎤⎦ = i =1
n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦
l ( t ; θ ) = t ( x1 , " , xn ) − τ (θ )
Then the Bayes estimator of τ (θ ) is given by the median of the posterior distribution of Θ given
X1 = x1 , " , X n = xn .
Proof
We know that the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that
estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of
Θ given the observations x1 , x2 , " , xn .
∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ
1 1 n n
x1 , " , xn ) dθ
Θ
Here, the loss function is absolute-error loss function. So, we have that the Bayes estimator of τ (θ ) is that estimator
which minimizes
t ( x1 , " , xn ) − τ (θ )
with respect to the posterior distribution of Θ given X 1 = x1 , " , X n = xn , which is minimized as a function of
t ( x1 , " , xn ) for t * ( x1 , " , xn ) equal to the conditional median with respect to the posterior distribution of Θ given
X1 = x1 , " , X n = xn .
Example: Let X 1 , X 2 , " , X n denote a random sample from normal distribution with the density
1 ⎡ 1 2⎤
f (x |θ ) = exp ⎢ − ( x − θ ) ⎥ ; −∞ ≤ x ≤ ∞
2π ⎣ 2 ⎦
Assume that the prior distribution of Θ is given by
1 ⎡ 1 2⎤
g Θ (θ ) = exp ⎢ − (θ − µ0 ) ⎥ ; −∞ ≤θ ≤ ∞
2π ⎣ 2 ⎦
That is, Θ is standard normal. Write µ0 = x0 when convenient. Find the Bayes estimator of τ (θ ) with respect to
∫ τ (θ ) ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) dθ
E ⎡⎣τ (θ ) X 1 = x1 , " , X n = xn ⎤⎦ = n
i =1
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦
⎡ ⎛ 1 ⎞ ⎤
2
⎢
⎢
⎜ xi ⎟
⎜ θ − i =0 ⎟ ⎥
⎥ ∑
∞ ⎢ 1⎜ ⎥
1 n + 1 ⎟ ⎥ dθ
= ∫θ exp ⎢ − ⎜
⎢ 2⎜
⎟
⎟ ⎥
1 1
−∞ 2π
n +1 ⎢ ⎜ n +1 ⎟ ⎥
⎢ ⎜ ⎟ ⎥
⎢⎣ ⎝ ⎠ ⎥⎦
1
∑x 1
∑x
i
θ− i =0 i
Now, let n +1 = z ⇒ θ= i =0 +
1
z ⇒ dθ =
1
dz
1 n +1 n +1 n +1
n +1
Now, we have that,
⎛ 1 ⎞
⎜ xi
∞ ∑ ⎟
⎡ 1 ⎤ 1
E ⎣⎡τ (θ ) | X1 = x1 , " , X n = xn ⎦⎤ = ⎜⎜ i =0 + z ⎟⎟
1 1
n +1∫ n +1 1
exp ⎢ − z 2 ⎥
⎣ 2 ⎦ n +1
dz
−∞ ⎜ ⎟ 2π
⎜ ⎟ n +1
⎝ ⎠
1
∑ xi 1
∞
1 ⎡ 1 ⎤
∫z
i =0
= + exp ⎢ − z 2 ⎥ dz
n +1 n + 1 −∞ 2π ⎣ 2 ⎦
Example: Let X 1 , X 2 , " , X n denote a random sample from normal distribution with the density
1
f (x |θ ) = I ( x)
θ ( 0, θ )
Assume that the prior distribution of Θ is given by g Θ (θ ) = I ( 0,1) (θ )
That is, Θ is standard uniform. Find the Bayes estimator of τ (θ ) with respect to the squared error loss function
( t − θ )2
l (t ; θ ) = .
θ2
Solution:
We know that the Bayes estimator of τ (θ ) with respect to any general loss function such as
l (t ; θ ) =
( t −θ )
2
θ2
is that estimator which minimizes
∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ 1 1 n n
x1 , " , xn ) dθ
Θ
We know that the posterior distribution of Θ is given by
n n n
⎡ f ( xi θ ) ⎤ gΘ (θ ) ⎛1⎞
∏ ⎣ ⎦ ⎜θ ⎟
⎝ ⎠
∏ I(0,θ ) ( xi ) I( 0,1) (θ )
f Θ X1 = x1 , ", X n = xn (θ | x1 , " , xn ) = i =1
n
= n
i =1
n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ⎛1⎞
∫∏
i =1
⎣ ⎦ ∫ ⎜θ ⎟
⎝ ⎠
∏ I(0,θ ) ( xi ) I(0,1) (θ )dθ
i =1
n n
⎛1⎞
⎜θ ⎟
⎝ ⎠
∏ I(0,θ ) ( xi )
i =1
= 1 n n
⎛1⎞
∫⎜θ ⎟ ∏ I(0,θ ) ( xi )dθ
0⎝ ⎠ i =1
n
⎛1⎞
⎜ θ ⎟ I ( yn ,1) (θ )
= ⎝ ⎠
1 n
⎛1⎞
∫ ⎜ θ ⎟ I ( yn ,1) (θ ) dθ
yn ⎝ ⎠
( t − θ )2
l (t ; θ ) =
θ2
is that estimator which minimizes
n
⎛1⎞
(t − θ ) 2
(t − θ ) 2 ⎜ θ ⎟ I ( yn ,1) (θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ = ⎝ ⎠
∫ θ 2 ∫ θ2 1 ⎡ 1 ⎤
dθ
Θ Θ
⎢ − 1⎥
( n − 1) ⎣⎢ ynn −1 ⎦⎥
n
⎛1⎞
( t ( yn ) − θ ) ⎜θ ⎟
1 2
⎝ ⎠
= ∫ θ2 1 ⎡ 1 ⎤
dθ
yn
⎢ − 1⎥
( n − 1) ⎢⎣ ynn −1 ⎥⎦
Or, that estimator which minimizes
( t ( yn ) − θ ) ( t ( yn ) − θ )
1 2 n 1 2
⎛1⎞
∫ θ2
⎜ θ ⎟ dθ =
⎝ ⎠
∫ θ n+ 2
dθ
yn yn
1 1 1
1 1 1
= ⎡⎣t ( yn ) ⎤⎦ ∫ θ n+2 dθ − 2t ( yn ) ∫ ∫ θ n dθ ( A)
2
n +1
dθ + " " "
yn yn
θ yn
Here, equation ( A) is a quadratic equation in t ( ⋅) . This quadratic equation assumes its minimum for
1
1
∫ θ n+1 dθ 1 ⎡
⎣
1 − yn− n ⎤
⎦
t *
( yn ) = yn
= − n
1 1 ⎡ −( n +1) ⎤
dθ − ( n + 1) ⎣⎢1 − yn
1
∫ θ n+2 ⎦⎥
yn
n + 1 ynn − 1 y n +1
= × n × n +n1
n yn yn − 1
n + 1 ynn − 1
∴ t * ( yn ) = × n +1 × yn
n yn − 1
So, the Bayes estimator of τ (θ ) with respect to the squared error loss function
( t − θ )2
l (t ; θ ) =
θ2
is given by
n + 1 ynn − 1
t * ( yn ) = × n +1 × yn .
n yn − 1
Example: Using the squared error loss function l ( t , θ ) = ( t − θ ) , estimators for the location parameters of a Normal
2
distribution given a sample of size n are the sample mean t1 ( x ) = x , sample median t2 ( x ) = m , the weighted
Inadmissibility of an Estimator
An estimator t is said to be inadmissible if there exists another estimator t ′ which dominates it such that
Let the range of estimator τ (θ ) be [ a, b ] and the loss function L (θ , t ) ≥ 0 and for any fixed θ , L (θ , t ) is
increasing and for any fixed θ , L (θ , t ) is increasing as t moves away from τ (θ ) in either direction. Then any
estimator taking on values outside the closed interval [ a, b ] with positive probability is inadmissible.
a) If the loss function L is strictly convex, then every admissible estimator must be non-randomized.
b) If L is strictly convex and t is an admissible estimator of τ (θ ) and if t ′ is another estimator with the same
Minimax Estimator
*
An estimator T is defined to be a minimax estimator if and only if
θ
( )
Sup R θ , t * ≤ Sup R (θ , t )
θ
for every estimator t
a) One appealing feature of the minimax estimator is that it does not depend on the particular parameterization.
i. t g is minimax
ii. if t g is the unique Bayes solution with respect to g ( ⋅) , it is unique minimax procedure.
Example: Suppose that θ = {θ1 , θ 2 } , where θ1 corresponds to oil and θ 2 to no oil. Let A = {a1 , a2 , a3 } where ai
corresponds to the choice i , i = 1, 2, 3 . Suppose that the following table gives the losses for the decision problem.
If there is oil and we drill, the loss is zero while if there is no oil and we drill, the loss is 12 and so on.
An esperiment is conducted to obtain the information about θ , resulting is the random variable X with possible
values coded as 0 and 1 given by
x
0 1
Oil θ1 0.3 0.7
No oil θ 2 0.6 0.4
When there is oil, 0 occurs with probability 0.3 and 1 occurs with probability 0.7
i
1 2 3 4 5 6 7 8 9
x
0 a1 a1 a1 a2 a2 a2 a3 a3 a3
1 a1 a2 a3 a1 a2 a3 a1 a2 a3
R (θ , δ ) = E ⎡⎣l (θ , δ ( x ) ) ⎤⎦
= l (θ , a1 ) P (δ ( x ) = a1 ) + l (θ , a2 ) P (δ ( x ) = a2 ) + l (θ , a3 ) P (δ ( x ) = a3 )
Thus we get,
1 2 3 4 5 6 7 8 9
⎧a1 if x=0
Thus minimax solution is δ 4 ( x ) = ⎨
⎩a2 if x =1
Suppose that in our oil dreling example an expart thinks the chance of finding oil is 0.2, then we treat the parameter
as a random variable θ with possible values θ1 , θ 2 and the frequency function is
i 1 2 3 4 5 6 7 8 9
In the Bayesian framework δ is preferable to δ ′ if and only if it has smaller Bayes risk. If there is a rule δ * which
( )
attains the minimum Bayes risk i.e. such that R δ * = min R (δ ) = 2.8 then it is called a Bayes rule. From this
δ
example we say that rule δ 5 = 2.8 is the unique Bayes rule for our prior distribution.
a1 a2
1
p1 = 1 4
4
1
p2 = 3 2
2
δ1 ( 0 ) = δ1 (1) = a1
δ 2 ( 0 ) = a1 δ 2 (1) = a2
δ 3 ( 0 ) = a2 δ 3 (1) = a1
δ 4 ( 0 ) = δ 4 (1) = a2
1 1 3 3
7 5 5
2
4 2 2 5
13 5 13 2
3
4 2 4
4 4 2 4
⎧a1 if x=0
δ2 ( x) = ⎨
⎩a2 if x =1
θ ( F ) , we can proceed in two ways: (1) evaluate θ ( F ) exactly; and (2) simulate to estimate θ ( F ) .
For example: Suppose that X is a random variable which is N ( 0, 1) so that F ( x ) = Φ ( x ) , and that we wish to
∞
know the 8 moment of X , so that θ ( F ) = ∫ x Φ ( x ) dx . Then one way we may proceed is to (1) evaluate the
8
th
−∞
integral exactly.
If the integral involved in the preceding method is such that no simple way to evaluate it exists, a second way we
may proceed is (2) simulate to estimate θ ( F ) . Generating random variables X 1 , X 2 , " , X n that are N ( 0, 1) and
independent, we can estimate θ ( F ) using the principles of estimation. Thus, we might estimate θ ( F ) by
(X 8
1 + X 28 + " + X n8 ) , which is a consistent estimator.
n
The preceding discussion assumed that we knew the distribution function F ( ⋅) . However, often we have not a
known distribution function F ( ⋅) for which we wish to know θ ( F ) , but rather a random sample X 1 , X 2 , " , X n
Now suppose we are taking approach (2) and wish to specify the variance of our estimator. With F ( ⋅) known, we
Step 2 : Generate X n +1 , X n + 2 , " , X 2 n and estimate θ ( F ) from these new random variables which are to be
independent of all random variables previously generated; call the estimate θˆ2 .
Step 3 : Generate X 2 n +1 , X 2 n + 2 , " , X 3n and estimate θ ( F ) from these new random variables which are to be
independent of all random variables previously generated; call the estimate θˆ3 .
# # # # # #
Step N : Generate X ( N −1) n +1 , X ( N −1) n + 2 , " , X Nn and estimate θ ( F ) from these new random variables which
are to be independent of all random variables previously generated; call the estimate θˆN .
Then θˆ1 , θˆ2 , θˆ3 , " , θˆN are N independent and identically distributed random variables each estimating θ ( F ) .
∑ (θˆi − θ )
N
Now, if F ( ⋅) is unknown, the SSP (θ , F , n, N ) cannot be used. However, then one will have a random sample
X 1 , X 2 , " , X n taken from the unknown distribution function F ( ⋅) . Using the random sample, the distribution
function may be estimated, say by some estimator F̂ , and then Step 1, Step 2, " , Step N followed. We call this
(
the General Statistical Simulation Procedure SSP θ , Fˆ , X 1 , " , X n , N . )
( )
There are many ways to choose the estimate F̂ of F in the SSP θ , Fˆ , X 1 , " , X n , N . One of the very simplest
is to take F̂ to be the empiric distribution function based on the sample X 1 , X 2 , " , X n . If that is done, then the
Bootstrap Sampling
Bootstrap sampling is a method of selecting a sample of size n with replacement from a set of n data points for a
data set X 1 , X 2 , " , X n . This is equivalent to record the value of each data point into a ping pong ball and placing
them into a box. Select a ping pong ball at random, record its value and replace the ball. We have to repeat this n
times. Doing this n times maintains the original sample size of n . With the Bootstrap method the basic sample is
treated as the population. Thus the bootstrap estimation procedure consists of following steps.
Step 1 : Using the original data set calculate some statistic of interest to estimate the characteristics of
population of interest. Call this B0 .
Step 2 : Take a Bootstrap sample of size n from the original data set which produces a new data set
X 1* , X 2* , " , X n* . Calculate some statistic of interest to estimate the characteristic of your population of
Step 3 : Then we have to repeat N times the step-2 and we will produce B1 , B2 , " , BN .
Step 1 : Take a sample of size n (with replacement) form { X1 , X 2 , " , X n } say { X11 , X12 , " , X1n } and
calculate its sample mean X1 .
Step 2 : Repeat step 1 independently N − 1 additional times, finding X1 , X 2 , " , X N . The bootstrap estimate of
V ( X ) is
N
∑ ( Xi − X⋅ ) X1 + X 2 + " + X N
i =1
σˆ 2 = ; X⋅ =
N −1 N
Suppose that, based on a random sample X 1 , X 2 , " , X n some quantity θ of interest is estimated by θˆ . The
( )
estimator θˆ has some bias b = E θˆ − θ . To estimate the bias, consider use of SSP θ , Fˆ , X1 , " , X n , N ( ) with F̂
taken to be the empiric distribution function (so we have a bootstrap estimate). Based on N bootstrap samples of
b = θˆ − θ .
Example:
Let X 1 , X 2 , " , X n be a random sample of size n from a Poisson distribution with unknown mean λ . If the
( )
parameter of interest is θ = P ( X ≤ 1) = e − λ (1 + λ ) , the MLE is e − X 1 + X , which is biased. To reduce the bias, let
Let X ij ( i = 1, " , N ; j = 1, " , n ) be the N bootstrap samples, that is, samples taken at random with
θˆi = e X i (1 + X i ) − (
Number of X i1 , X i 2 , " , X in that are ≤ 1)
n
Then the bootstrap estimate of the bias of θ is simply b̂ = θ
( )
Then, one might use e − X 1 + X − bˆ to estimate θ .
Remark:
Note that an approximate 100 (1 − α ) % confidence interval for θ can be constructed using bootstrap methods, as
follows. If θ is the bootstrap estimate of θ and σˆ 2 its sample variance based on θˆ1 , θˆ2 , " , θˆN for N large we
{θˆ − Φ (1 − α 2 )σˆ ,
−1
(
θˆ + Φ −1 1 − α 2 σˆ) }
Note that we use the original estimate of θ (not the bootstrap estimate), and the bootstrap procedure has been used
Note that the exact same details apply to the more general statistical simulation procedure
( )
SSP θ , Fˆ , X1 , " , X n , N , in which the only difference is what estimate of F is being sampled from.
δ δ
iv)
δθi ∫ t j L ( x, θ )dx = ∫ t j δθ j L ( x; θ ) dx where, t j is the esitmator of θ j
A
⎡ δ ln L ( x, θ ) δ ln L ( x θ ) ⎤
( )
v) The elements of the matrix ∆θ = δ ij (θ ) , where δ ij (θ ) = Eθ ⎢
δθi δθ j
⎥ , exist and are such
⎣⎢ ⎦⎥
that ∆θ is positive definite.
Theorem
In any regular estimation case, the variances and covariance’s δ ij (θ ) of unbiasedness estimator
Proof
δ δ
Let the same symbol λi (θ ) be used to denote log fθ ( X1 , , X n ) as well as log fθ ( x1 , , xn ) . In this
δθi δθi
situation, condition (iii) becomes
∫ λi (θ ) fθ ( x1 , , xn ) dx = 0, (i )
A
⎧θ if i − j
∫ ti λ j (θ ) fθ ( x1 , , xn ) dx = ⎨ i
⎩0 otherwise
( ii )
A
∫∑
i =1
ui ti fθ ( x1 , , xn ) dx = ∑ uiθi
i =1
( for all θ ∈ Θ )
A
∫∑
i =1
ui ti λ j (θ ) fθ ( x1 , , xn ) dx = u j because of ( ii )
A
k
Because of ( i ) again, ∫ ∑ ui [ti − θi ] λ j (θ ) fθ ( x1 , , xn ) dx = u j
A i =1
k k
Nothing that the left hand side is the covariance between ∑ uiTi and ∑ ci λi (θ ) , we have, since
i =1 i =1
2
⎡ ⎛ k k ⎞⎤ ⎛ k ⎞ ⎛ k ⎞
⎢ covθ ∑ ∑
⎜⎜ uiTi , ci λi (θ ) ⎟⎟ ⎥ ≤ Varθ ∑
⎜⎜ uiTi ⎟⎟ Varθ ∑
⎜⎜ ci λi (θ ) ⎟⎟
⎣⎢ ⎝ i =1 i =1 ⎠ ⎦⎥ ⎝ i =1 ⎠ ⎝ i =1 ⎠
2
⎛ k ⎞
⎛ k ⎞
⎜⎜ ci ui ⎟⎟ ∑
⎝ i =1 ⎠
⇒ ∑
varθ ⎜⎜ uiTi ⎟⎟ ≥
⎛ k ⎞
⎝ i =1 ⎠ var
θ ⎜⎜ ci λi (θ ) ⎟⎟∑
⎝ i =1 ⎠
Let us now maximize the right-hand side with respect to the c ' s . Noting that the right hand side remains unchanged
if the c ' s are a multiplied by a common number and that the maximizing c ' s must be such that the correlation
k k
∑ ui [ti − θi ] = ∑ ci λi (θ )
i =1 i =1
k k k
⇒ uj ≡ ∑ ui covθ (Ti , λ j (θ ) ) = ∑ ci covθ ( λi (θ ) , λ j (θ ) ) ≡ ∑ ciδ ij (θ )
i =1 i =1 i =1
Hence the maximizing c ' s are such that (in matrix notation)
∆θ c = u
⇒ c = ∆θ−1u ( because of (V ) ) ,
⎛ k ⎞
and c′u = u ′∆θ−1u , ∑
varθ ⎜⎜ ci λi (θ ) ⎟⎟ = c′∆θ c = u ′∆θ−1u
⎝ i =1 ⎠
Hence
⎛ k ⎞
∑
varθ ⎜⎜ uiTi ⎟⎟ ≥ u ′∆θ−1u
⎝ i =1 ⎠
u ′Σθ u ≥ u ′∆θ−1u
unknown.
1 ⎛ ( x −θ1 ) ⎞
2
− ⎜ ⎟
1 2 ⎜ θ2 ⎟
Here, f ( x) = e ⎝ ⎠ ; −∞ < x < ∞
2πθ 2
Now the likelihood function is as follows,
n
⎛ 1 ⎞ − 2θ ∑ ( xi −θ1 )
1 2
L ( x, θ1 ,θ 2 ) = ⎜ ⎟ e 2
⎜ 2πθ ⎟
⎝ 2 ⎠
∑
n 1
⇒ ln L = − ln 2πθ 2 − ( xi − θ1 )2 (i )
2 2θ 2
Now we differentiate it with respect to θ1 and get,
δ ln L ( x )
∑ ( xi − θ1 )( −1)
2
=−
δθ1 2θ 2
∑ ( xi − θ1 )
1
=
θ2
⎡ δ ln L ( x ) ⎤
2
⎥ = 2 E ⎣⎡ ∑ ( xi − θ1 ) ⎦⎤ = 2 ∑ ⎡⎣ E ( xi − θ1 ) ⎤⎦
1 2 1 2
and E⎢
⎣ δθ 1 ⎦ θ2 θ2
1 n
= 2 nθ 2 =
θ2 θ2
Again we differentiate eq ( i ) with respect to θ 2 and we get,
δ ln L ( x ) ⎡ 2⎤
∑ ( xi − θ1 ) ∑ ( xi − θ1 )
n 1 1 2 1 1
=− + = ⎢−n + ⎥
δθ 2 2 θ 2 2θ 22 2θ 2 ⎣ θ 2 ⎦
⎡ δ ln L ( x ) ⎤
2
n
E⎢ ⎥ = 2
⎣ δθ 2 ⎦ 2θ 2
⎡ δ ln L ( x ) δ ln L ( x ) ⎤
E⎢ , ⎥=0
⎣ δθ 2 δθ 2 ⎦
θ2 2θ 22
Hence the lower bounds, the variance of unbiased estimator of θ1 and θ 2 are δ
21
(θ ) = and δ 22 (θ ) =
n n
respectively.
∑( Xi − X )
2
θ
The traditional unbiased estimators for θ1 and θ 2 are X and S = . Since Varθ ⎡⎣ X ⎤⎦ = 2 while
2
n −1 n
2θ 2
Varθ ⎡⎣ s 2 ⎤⎦ = 2 , the lower bound in the first case is attained but not that in the second.
n −1
Vector of Parameters
Let us assume that a random sample X 1 , X 2 , , X n of size n form the density f ( x ; θ1, θ 2 , , θ k ) is available,
where the parameter θ = (θ1 , θ 2 , , θ k ) and parameter space Θ are k − dimensional. We want to simultaneously
but this need not be the case. An important special case is the estimation of θ = (θ1 , θ 2 , , θ k ) itself; then r = k ,
(
estimator of τ1 (θ ) , , τ r (θ ) ) is a vector of statistics, say (T1 , , Tr ) , where T j = t j ( X1 , , X n ) and T j is an
estimator of τ j (θ ) .
For single estimator, we consider the variance of estimator as a member of its closeness to real valued function of
population parameter. Here, we seek generalization of the notion of variance to r dimensions. Several such
generalization have been proposed; we consider here only four of them
i) Vector of variances.
ii) Linear combination of variances.
iii) Ellipsoid of concentration.
iv) Wilks’ generalized variance.
1. Vector of Variances
Let the vector ( varθ [T1 ] , , var [Tr ]) be a measure of the closeness of the estimator (T1 , , Tr ) to
(τ1 (θ ) , , τ r (θ ) ) . Its main advantage is that it is very easy and simple. And the disadvantage of such a definition
is that our measure is vector-valued and consequently sometimes difficult to work.
chosen a j ≥ 0 .
Both of these (1) and ( 2 ) generalization of variance embody only the variances of the T j , j = 1, , r . But T j (θ )
are likely to be correlated. So, one should incorporate the covariance of T j ' s for measuring the closeness.
3. Ellipsoid of Concentration
Let (T1 , , Tr ) be an unbiased estimator of (τ1 (θ ) , , τ r (θ ) ) . Let σ ij (θ ) be the ij − th element of the inverse
of the covariance matrix of (T1 , , Tr ) , where the ij − th element of the covariance matrix is
σ ij (θ ) = covθ ⎡⎣Ti , T j ⎤⎦ . The ellipsoid of concentration of (T1 , , Tr ) is defined as the interior and boundary of the
ellipsoid
r r
∑∑ σ ij (θ ) ⎡⎣ti − τ i (θ )⎤⎦ ⎡⎣t j − τ j (θ )⎤⎦ = r + 2
i =1 j =1
The ellipsoid of concentration measures how concentrated the distribution of (T1 , , Tr ) is about
(τ1 (θ ) , , τ r (θ ) ) . The distribution an estimator (T1 , , Tr ) whose ellipsoid of concentration is contained within
the ellipsoid of concentration of another estimator (T1′, , Tr′ ) is more highly concentrated about
Rd (θ ) = Expected loss = E ⎡⎣ w (θ − d ( x ) ) ⎤⎦
= Smaller should be desired
= Smaller the risk better the estimator
Minimax Estimator
If a random variable X as a density function f (θ ; x ) and d ( x ) is some estimate of θ then the risk function is
Rd (θ ) = E ⎡⎣ w (θ , d ( x ) ) ⎤⎦
T * = minimax estiamtor .
T ′ is an another estimator with the same risk i.e. R (θ , t ) = R (θ , t ′ ) then t = t ′ with probability 1 .
iii) Any unique Bayes estimator is admissible (here uniqueness mean that any two Bayes estimator on a set N
with Rθ ( N ) = θ ∀ Θ ).
Problem: If x1 , , xn are n independent Gaussian normal random variable with distribution function N ( µ , θ ) and the
loss function is the squared error. Find the minimax estimator of mean θ .
Solution
Consider a sequence of prior distribution with mean 0 and variance σ 2 . If θ is a prior distribution of
p (θ | x1 , , xn )
P (θ | x1 , , xn ) = for n = 1, x = x1
p ( x1 , , xn )
p (θ | x ) =
( )
N ( x ; θ , 1) N θ ; 0, σ 2
∞
∫ N ( x ; θ , 1) N (θ ; 0, σ ) dθ
2
−∞
1 ⎡ 1 1+σ 2 ⎛ σ2 ⎞
2⎤
= exp ⎢ − ⎜ θ − x ⎟⎟ ⎥
2πσ 2 ⎢ 2 σ 2 ⎝⎜ 1+ σ 2 ⎠ ⎥
⎣ ⎦
(1 + σ ) 2
E (θ | x ) = d ( x )
xσ 2
=
1+ σ 2
Estimation & Confidence Interval ~ 5 of 17
σ2 σ2
V (θ | x ) = then Sup ( V (θ | x ) ) = Sup
1+σ 2 σ 1+ σ 2
⎛ ⎞
⎜ 1 ⎟
= Sup ⎜ ⎟
σ ⎜ 1+ 1 ⎟
⎜ ⎟
⎝ σ2 ⎠
σ 2x
and lim δ ( x ) = lim
σ →∞ δ →∞ 1 + σ 2
⎡ ⎤
= lim ⎢
1 ⎥
x =1
δ →∞ ⎢ 1 + 1⎥
⎣⎢ σ 2
⎦⎥
Hence x is a minimax estimator.
Problem: Find the minimax estimator of θ in sampling from the Bernoulli distribution using a squared error loss function.
Solution
A Bayes estimator is given by
n−∑ x
∫0 θθ ∑ (1 − θ ) (1 B ( a, b ) )θ a −1 (1 − θ )b−1 dθ ∫0 θ ∑ x + a (1 − θ )n−∑ x +b−1dθ
1 x 1
i i i i
= 1
n−∑ x
∫0 θ ∑ (1 − θ ) (1 B ( a, b ) )θ a −1 (1 − θ )b−1 dθ ∫0 θ ∑ x + a −1 (1 − θ )n−∑ x +b−1 dθ
1 x i i i i
B ( ∑ xi + a + 1, n − ∑ xi + b )
=
B ( ∑ xi + a, n − ∑ xi + b )
=
∑ xi + a + 1 n − ∑ xi + b × ∑ xi + a + n − ∑ xi + b
∑ xi + a + 1 + n − ∑ xi + b ∑ xi + a n − ∑ xi + b
=
∑ xi + a n + a + b
a + b + n +1 1
=
∑ xi + a
n+a+b
So, the Bayes estimator with respect to a beta prior distribution with parameters a and b is given by
t * ( x1 , x2 , , xn ) =
∑ xi + a = ∑ xi +
a
(i )
n+a+b n+a+b n+a+b
⎡ ⎤
∑ xi + B
1 a
⇒ t *AB ( x1 , x2 , , xn ) = A ⎢ A = n + a + b and B = n + a + b ⎥
⎣ ⎦
(θ ) = E ⎡⎣( A∑ xi + B ) − θ ⎤⎦
2
ℜt *
AB
{ ( ∑ x − nθ ) + B − θ + nAθ }⎤⎦⎥
2
= E ⎡⎢ A
⎣ i
= A2 E ⎡⎢ ( ∑ xi − nθ ) {∑ E ( xi ) − nθ }
2⎤
⎥⎦ + ( B − θ + nAθ ) + 2 ( B − θ + nAθ ) A
2
⎣
2 ⎡
(∑ )
xi − nθ + ( B − θ + nAθ ) ⎤⎥
2 2
= A E⎢
⎣ ⎦
2 2 ⎡ 2⎤
= A n E ⎢ n ( xn − θ ) ⎥ + ( B − θ + nAθ )
2
⎣ ⎦
θ (1 − θ )
+ ( B − θ + nAθ )
2
= A2 n 2
n
= A2 nθ (1 − θ ) + ( B + θ ( nA − 1) )
2
And ( )
A2 n 2 − n − 2nA + 1 = 0
⇒ A=
(
2n ± 4n 2 − 4 ⋅ 1 n 2 − n )= 2n ± 2 n
2(n − n) ⎛ 2
( )
2 2⎞
2⎜ n − n ⎟
⎝ ⎠
n± n
=
( n + n )( n − n )
1
=
n± n
1
=
n ( )
n ±1
Again, nA2 + 2 ( nA − 1) − B = 0
2
⎛ 1 ⎞
⎜ ⎟
nA2 n +1 ⎠
= ⎝
1
∴ B= for A =
2 (1 − nA ) ⎛
2 ⎜⎜ 1 −
n ⎞
⎟
n ( n +1 )
⎝ n + 1 ⎟⎠
1
=
2 n +1 ( )
1
Now, A=
n+a+b
⎡ ⎤
1 1 ⎢∵ A = 1 ⎥
⇒ =
n ( n +1 ) n+a+b ⎢
⎢⎣ n ( )
n + 1 ⎥⎥
⎦
⇒ n+a+b = n ( n +1 )
⇒ a+b = n+ n −n
⇒ a+b = n
a
Again B=
n+a+b
1 a
=
2 ( n +1) n+a+b
⇒ n + a + b = 2a ( n +1 )
⇒ n + n = 2a ) ( n +1
n+ n n ( n + 1) n
∴ a= = =
2 ( n + 1) 2 ( n + 1) 2
So our estimator is
∑ xi + a = ∑ xi + a
n+a+b n+ n
So, this is the Bayes estimator with constant risk. Hence it is the Minimax estimator.
Estimation & Confidence Interval ~ 7 of 17
Bayesian Confidence Interval
Example: Assuming each item coming off a production line either is or is not defective. So, we can call each item a
Bernoulli trial. Assume again the trials are independent with P ⎣⎡defective⎦⎤ = θ for each trial.
For example, for our production line suppose our prior information suggest
E ⎡⎣θ ⎤⎦ = 0.01 Var ⎡⎣θ ⎤⎦ = 0.0001
The larger we take Var (θ ) , the less sure we are of our prior of our information. Thus we determine a & b .
a
= 0.01
a+b
⇒ a = 0.0101b
ab
= 0.0001
( a + b ) ( a + b + 1)
2
⇒ {
ab = 0.0001 ( 0.0101b + b ) ( 0.0101b + b + 1)
2
}
Now, if we observe ∑ X i = ∑ xi from the sample we observe that posterior distribution for θ is again a beta
a+b+n
θ (1 − θ )
a+ ∑ xi b+n− ∑ xi
Thus the Bayes estimator of θ is mean of these posterior distribution i.e.,
∑ xi ⎤⎦ = a + ∑ xi + ∑
a+ xi
θ * = E ⎡⎣θ |
b + x − ∑ xi
=
a+ ∑ xi
a+b+n
= Bayes estimator of θ =
( ∑ xi + 0.98)
n + 97.02 + 0.98
Bayesian Interval
Given a random sample of a random variable the confidence interval can be evaluated and in a sense we are
100 (1 − α ) % sure that observe confidence interval covers the true unknown parameter value. Very similar
manipulation can be accomplished with the Bayesian approach. Suppose we are given a random sample of a
random variable x whose distribution depends on unknown parameter θ . The parameter θ has a prior density
fθ (θ ) . Once the sample values x1, x2 , , xn are known, we can compute the posterior distribution fθ | x (θ | x ) which
summarizes all the current information about θ then if c1 < c2 are two constants
p ⎣⎡c1 ≤ θ ≤ c2 | x ⎦⎤ = 1 − α
We are 100 (1 − α ) % sure that ( c1, c2 ) includes θ given the sample values. We will call such an interval ( c1, c2 ) a
tn − θ
When such asymptotically normal estimation exists then may be taken as a pivotal quantity and a
σ n (θ )
⎡ ⎤
⎢⎣Tn + zα 2σ n (θ ) , Tn + z1−α σ n (θ ) ⎥
2 ⎦
( tn − θ )
The above method provides a large sample confidence interval so long as can be inverted.
σ n (θ )
Solution:
2
1⎛ x ⎞
− ⎜ ⎟
The probability density function, ( )
f x ; 0, σ 2 =
1
2πσ 2
e 2⎝ σ ⎠
⎞ − 1 ∑ 2i 1 ∑ xi
n x2 2
⎛ 1 ⎛ 1 ⎞ −2 θ
n
The likelihood function is given by, L=⎜ ⎟ e 2 σ =⎜ ⎟ e
⎜ 2πσ 2 ⎟ ⎝ 2πθ ⎠
⎝ ⎠
n n
ln L = − ln 2π − ln θ −
1 ∑ xi2
2 2 2 θ
Now,
δ ln L
=0
δθ
⇒ −
n1 1
+
xi2
=0
∑
2 θ 2 θ2
⇒ θˆ =
∑ xi2
n
Again,
δ 2 ln L n
= 2 −2
xi ∑ 2
δθ 2
2θ 2θ 3
⎡ δ 2 ln L ⎤ n nθ n
E⎢ ⎥ = 2 −2 3 =−
⎣⎢ δθ ⎦⎥ 2θ 2θ 2θ 2
2
⎡ δ 2 ln L ⎤ n
∴ −E⎢ ⎥= 2
⎢⎣ δθ ⎥⎦ 2θ
2
We have,
1 2θ 2
σ n2 = =
⎡ δ ln L ⎤
2 n
−E ⎢ 2 ⎥
⎣⎢ δθ ⎦⎥
⇒
⎡
⎢ ∑ xi2 + zα 2
θ,
∑ xi2 + z 2 ⎥
θ
⎤
⎡∵ θ = σ 2 ⎤
⎢ n n n 1−α n ⎥ ⎣ ⎦
2 2
⎣ ⎦
⇒
⎡
⎢ ∑ xi2 + zα 2 ∑ xi2 , ∑ xi2 + z 2 ∑ xi2 ⎤⎥ ⎡
⎢∵ θ = ∑ xi2 ⎤⎥
⎢ n n n n 1−α n n ⎥ ⎢ n ⎥
2 2
⎣ ⎦ ⎣ ⎦
⇒
⎡
⎢ ∑ xi2 ⎧⎪⎨1 + zα 2 ⎫⎪
⎬,
∑ xi2 ⎧⎪⎨1 + z 2 ⎫⎪⎤⎥
⎬
⎢ 1−α n ⎪⎭⎥
⎣
n ⎩⎪ 2 n ⎪⎭ n ⎪⎩ 2
⎦
⎡ ⎤
⎢
⎢
∑
xi2 ∑ xi2 ⎥
⎥
∴ n , n
⎢⎡ ⎥
⎢ ⎢1 + z 2⎤ ⎡ 2 ⎤⎥
⎢ ⎣⎢ 1−α ⎥ ⎢1 + zα ⎥⎥
2 n ⎥⎦ ⎣⎢ 2 n ⎦⎥
⎣ ⎦
This C.I is not invariant under transformation of parameters. Thus if we take square roots of C.I then that will not get
the C.I of σ .
Now we will consider construction of Large Sample Confidence Intervals which are invariant under transformation of
parameter.
∂ ln ( X i ; θ )
random variables ( i = 1, , n ) has mean zero and variance K 2 . Therefore, by the central limit
∂θ
theorem, their sample mean
δ ln L
1 δ ln L ⎛ k2 ⎞ δθ
~ N ⎜ 0, ⎟⎟ i.e., ~ N ( 0,1) (i )
n δθ ⎜ n
⎝ ⎠ ⎪⎧ ⎛ δ 2 ln L ⎞ ⎪⎫
⎨− E ⎜⎜ 2 ⎟⎟⎬
⎩⎪ ⎝ δθ ⎠ ⎭⎪
Using this property one can get a large sample C.I for θ . Note that the maximum likelihood estimate of θ has not
been used here.
δ ln L δ ln L δ Φ
=
δθ δ Φ δθ
⎡ δ 2 ln L ⎤ ⎡ δ 2 ln L ⎤ ⎛ δ Φ ⎞ 2 ⎛ δ ln L ⎞ δ 2 Φ
E⎢ ⎥ = E ⎢ 2 ⎥⎜ ⎟ +⎜ ⎟ 2
⎣⎢ δθ ⎦⎥ ⎣⎢ δ Φ ⎦⎥ ⎝ δθ ⎠ ⎝ δ Φ ⎠ δθ
2
⎡ δ 2 ln L ⎤ ⎛ δ Φ ⎞ 2 ⎡ δ 2 ln L ⎤ ⎡ δ ln L ⎤
Hence, E⎢ 2 ⎥ ⎜
= ⎟ E⎢ 2 ⎥ ⎢ Since, = 0⎥
⎢⎣ δθ ⎥⎦ ⎝ δθ ⎠ ⎢⎣ δ Φ ⎥⎦ ⎣ δΦ ⎦
Therefore, if (θ1 , θ 2 ) is a C.I for θ then {Φ (θ1 ) , Φ (θ 2 )} is C.I for Φ (θ ) .
∑ X i2 −n
Hence σ2 ~ N ( 0, 1)
2n
A central 100 (1 − α ) % confidence interval for σ is, therefore,
⎧ 1 1
2⎫
⎪⎪⎛⎜ ∑ ⎞ 2
X i2 ⎟
⎛
⎜ ∑
X i2 ⎟
⎞ ⎪⎪
⎨⎜ ⎟
,
⎜n−z ⎬ (1)
⎪⎜ n + zα 2n ⎟ ⎜ α 2n ⎟⎟ ⎪
⎝ ⎠ ⎝ ⎠
⎩⎪ ⎭⎪
2 2
If the variance σ 2 is treated as parameter, then the method yields 100 (1 − α ) % confidence interval for σ 2 as
⎧
⎪⎪
∑ X i2 ∑ X i2 ⎫
n , n ⎪⎪
⎨ ⎬
⎪1 + zα 2 1 − zα 2 ⎪
⎩⎪ 2 n 2 n ⎭⎪
It may be noted that the large sample confidence intervals based on maximum likelihood estimators will be shorter
on an average that the large sample confidence intervals based on any other estimator.
Confidence Belt
Let T be a statistic whose distribution depends on θ , preferably a sufficient statistic for θ . For each θ , let us
Where, α1 + α 2 = α . Supposing Θ is a
θ 2 ( t ) and that of the point of intersection of the line with C2 by θ1 ( t ) , so that θ1 ( t ) < θ 2 ( t ) .
Consider now the two random variables θ1 ( t ) and θ 2 ( t ) , which are so defined that for
T = t , θ1 (T ) = θ1 ( t ) and θ 2 (T ) = θ 2 ( t ) . From the way θ1 (T ) and θ 2 ( t ) have been obtained, it is obvious that
θ1 (T ) ≤ θ ≤ θ 2 (T ) iff t1 (θ ) ≤ T ≤ t2 (θ ) .
Pθ ⎡⎣θ1 (T ) ≤ θ ≤ θ 2 (T ) ⎤⎦ = Pθ ⎡⎣t1 (θ ) ≤ T ≤ t2 (θ ) ⎤⎦ = 1 − α ∀ θ ∈Θ
Hence given a set of observations X , if t denotes the corresponding value of T , then θ1 ( t ) and θ 2 ( t ) are a pair
The region in the (T , θ ) - plane which is bounded by the two curves C1 and C2 is called a confidence belt for θ
Example: Suppose ( X i , Yi ) , i = 1, 2, , 20 are a random sample drawn from bivariate normal with
( )
BIV µ x , µ y , σ x2 , σ y2 , ρ where the all parameters are not known. We want to set confidence limits for ρ .
Solution
The distribution of r depends on ρ only. Let α = 0.05 . From the tables of the correlation coefficient by F.N David
et al., we may obtain, for each ρ , the values r1 ( ρ ) and r2 ( ρ ) of r such that
These are shown in the following table for the values of ρ from −0.9 to 0.9 at intervals of 0.1 .
ρ r1 ( ρ ) r2 ( ρ ) ρ r1 ( ρ ) r2 ( ρ )
−0.9 −0.97065 −0.77222 0 −0.44486 0.44486
−0.8 −0.92223 −0.56661 0.1 −0.35862 0.52565
−0.7 −0.92289 −0.38984 0.2 −0.26394 0.59586
−0.6 −0.83500 −0.22886 0.3 −0.15880 0.71366
−0.5 −0.78095 −0.08607 0.4 −0.04226 0.72585
−0.4 −0.72585 0.04226 0.5 0.08607 0.78095
−0.3 −0.71366 0.15830 0.6 0.22886 0.83500
−0.2 −0.59586 0.26394 0.7 0.38984 0.92289
−0.1 −0.52565 0.35862 0.8 0.56661 0.92223
0.9 0.77222 0.97065
Now, given the observed value of r for a particular random sample of size 20 from the bivariate normal distribution,
we can obtain the confidence limits to ρ with confidence coefficient 1 − 2 × 0.025 = 0.95 . Suppose, e.g., the
observed value or r is 0.55 . Treating 0.55 as a value of r2 ( ρ ) , we find, by inverse interpolation, the
corresponding value of ρ to be 0.135 . Similarly, treating 0.55 as a value of r1 ( ρ ) , the corresponding value of ρ
is found to be 0.791 . Hence for this value of r , the 95% confidence limits to ρ are 0.135 and 0.791.
in such a way that α1 = α 2 = α . However, it is clear that α1 & α 2 may be chosen in infinitely many ways, each
2
satisfying the conditions α i ≥ 0 and α1 + α 2 = α . Let us consider a particular function
n ( X −θ )
ψ (T , θ ) =
σ
If α not fixed then we can get many confidence interval. But α is fixed then we have many confidence interval.
So we need some criterion which may make a choice among these infinite set of confidence. And obvious method of
selecting one out of the possible confidence interval is based on the width of the interval.
P ⎣⎡T1 ≤ τ (θ ) ≤ T2 ⎦⎤ = 1 − α (i )
Then the confidence interval given by T1 & T2 will be said to be better than that of the interval given by T1′ & T2′
which satisfy if
T2 − T1 ≤ T2′ − T1′ ∀ θ ∈Θ ( ii )
If equation ( ii ) holds for every other pair of statistics T1′ & T2′ satisfying Pθ ⎣⎡T1 ≤ γ (θ ) ≤ T2 ⎦⎤ = 1 − α for all θ ∈ Θ
then the confidence interval given by T1 & T2 will be called uniformly shortest confidence interval for τ (θ ) based
on the statistic T .
( )
Example: Consider X ~ N θ , σ 2 where σ 2 is known. Find the shortest confidence interval for θ .
Solution
⎡ n ( X −θ ) ⎤
We have, P ⎢τ1−α1 ≤ ≤ τα2 ⎥ = 1 − α
⎢⎣ σ ⎥⎦
⎡ σ σ ⎤
⇒ P ⎢ X − τα2 ≤ θ ≤ τ1−α1 ⎥ = 1−α
⎣ n n⎦
So, we have to minimize L i.e., minimize τ α 2 + τ 1−α1 subject to the condition α1 ≥ 0 and α1 + α 2 = α .
Due to symmetry of the distribution of n ( X − θ ) σ about zero, the difference will be minimum when
τ1−α1 = −τ α 2 i.e. α1 = α 2 = α 2
Hence the interval is in fact the shortest confidence interval based on the distribution of X .
Estimation & Confidence Interval ~ 13 of 17
In some situation the length of he confidence interval may involve some function of sample observations, e.g., when,
under the normal set-up the confidence interval for µ is obtained from the t − distribution for the statistic
n ( X − µ ) S or when, under the same set-up, the confidence interval for σ 2 is obtained from the
∑( Xi − X )
2
χ 2 − distribution for σ 2 . Here in order to make choice among all possible confidence intervals with
same confidence coefficient, we may make use of the average or expected length of the confidence interval.
For the statistics T1 and T2 expected length is,
Eθ (T2 − T1 )
The interval for which these expected length is minimum may be called the interval with shortest expected length or
shortest average length.
Example: Let X 1 , X 2 , , X n is a random sample draw from N µ , σ 2 ( ) here both µ and σ 2 are unknown. We have
Solution
⎡ n (X − µ) ⎤
We have, P ⎢t(1−α1 ), n −1 ≤ ≤ tα 2 , n −1 ⎥ = 1 − α
⎢⎣ S ⎥⎦
⎡ S S ⎤
⇒ P ⎢ X − tα 2 ,n −1 ≤ µ ≤ X − t1−α1 ,n −1 ⎥ = 1−α
⎣ n n⎦
The expected length of the confidence interval is,
Eθ ( S )
( tα ,n−1 − t1−α ,n−1 )
2 1
n
= kσ ⎡⎣tα 2 ,n −1 − t1−α1 ,n −1 ⎤⎦
where k is constant that depends on n alone. So we have to minimize ⎡⎣tα 2 , n −1 − t1−α1 , n −1 ⎤⎦ subject to the condition
α1 ≥ 0, α 2 ≥ 0 and α1 + α 2 = α .
Due to symmetric of the t − distribution around zero, the difference tα 2 , n −1 − t1−α1 , n −1 will be minimum if
( )
Example: Let X ~ N µ , σ 2 where µ is known. Find the confidence interval for σ 2 .
Solution
Pθ ⎡ ∑ ( xi − µ ) θ < χ12−α1 ,n ⎤ = α1 ⎫
2
The inequalities
⎣ ⎦ ⎪
⎬ here θ = σ 2
⎡
∑ ( xi − µ ) ⎤
θ > χα 2 ,n = α 2 ⎪
2 2
and Pθ
⎣ ⎦ ⎭
∑ i
⎡ ( X − µ )2
∑( Xi − µ ) ⎤
2
lead to the result Pθ ⎢ ≤θ ≤ ⎥ = 1−α
⎢ χα22 ,n χ12−α1 ,n ⎥
⎣ ⎦
The corresponding confidence interval has the length,
⎡ 1 1 ⎤
∑( Xi − µ )
2
⎢ 2 − 2 ⎥
⎢⎣ χ1−α1 ,n χα 2 ,n ⎥⎦
⎡ 1 1 ⎤
which has the expected value nθ ⎢ 2 − 2 ⎥
⎣⎢ χ1−α1 , n χα 2 , n ⎦⎥
Estimation & Confidence Interval ~ 14 of 17
⎡ 1 1 ⎤
The minimization of this expected length amounts to minimization of ⎢ − ⎥ subject to the condition,
⎢⎣ χ1−α1 ,n χα 2 ,n ⎥⎦
2 2
χ 22
∫ f ( χ ) d ( χ ) = 1−α
2 2
χ12
where, χ12 = χ12−α1 , n , χ 22 = χα22 , n and f is the p.d.f of the χ 2 − distribution with n degrees of freedom.
Using Lagrange’s method of undetermined multipliers, which involves the partial differentiation of
⎡ χ 22 ⎤
∫ ( ) ( )
1 1 ⎢ f χ 2 d χ 2 − (1 − α ) ⎥
− + λ
χ12 χ 22 ⎢ 2 ⎥
⎣ χ1 ⎦
1
χ14
( )
+ λ f χ12 = 0 and
1
χ 24
( )
+ λ f χ 22 = 0
( )
⇒ 1 + χ14 λ f χ12 = 0 (1) ⇒ ( )
1 + χ 24 λ f χ 22 = 0 ( 2)
4
( )
Now from equation (1) and ( 2 ) we can write, χ1 f χ1 = χ 2 f χ 2
2 4
( ) is satisfied, besides the equation,
2
χ 22
∫ f (χ ) d ( χ ) = 1−α
2 2
χ12
The actual determination of the values χ12 and χ 22 will, of course, by pretty difficult. In practice, one takes χ12 and
α
χ 22 such that α1 = α 2 = . But this may make the average length too big.
2
For example, if n = 10, α = 0.05, α1 = α 2 = 0.025 then χ12 = 3.247 and χ 22 = 20.483 . So, average length of the
interval is,
⎡ 1 1 ⎤
10 θ ⎢ − ⎥ = 3.0318 θ
⎣ 3.247 20.483 ⎦
On the other hand, if we take α1 = 0.05 , α 2 = 0 then χ12 = 3.940 and χ 22 = ∞ then the average length of the
interval is ,
⎡ 1 ⎤
10 θ ⎢ − 0 ⎥ = 2.58 θ
⎣ 3.940 ⎦
⎡ v ⎤ n
Thus this second procedure, where the confidence interval will be of the form ⎢ 0, ⎥ , where v =
χ12 ⎦⎥
∑ ( xi − µ )2 ,
⎣⎢ i =1
would seem to be preferable to this procedure. Thus, this interval is, in fact, not only shorter on the average, but
shorter in every case.
procedures, we immediately face a difficulty. In this case we cannot hope to get for each α ( 0 < α < 1) a confidence
The actual determination of the confidence intervals may be carried out by drawing confidence belts.
Example: Let X 1 , X 2 , , X10 be a random sample from a (point binomial) distribution with p.m. f .
1− x
⎪⎧θ x (1 − θ ) if x = 0, 1
fθ ( x ) = ⎨ where 0 ≤ θ ≤ 1.
⎪⎩0 oterwise
For obtaining confidence limits to θ with a confidence coefficient at least equal to 0.90 , we may first determine, for
a suitable set of values of θ , the values t1 (θ ) and t2 (θ ) of the sufficient statistic T = ∑ Xi such that
i
For values of θ from 0.1 to 0.9 (taken at intervals of 0.1 ), these numbers t1 (θ ) and t2 (θ ) are as shown in the
table bellow:
Let S be a set of parameter space Θ , then we shall write ' S c θ ' to mean that this set covers or includes θ , so
that S c θ ⇔ θ ∈ S .
Definition
A family of sets S ( X ) , for varying x ∈ ℑ , of the parameter space Θ is said to be a family of confidence sets at the
Pθ ⎣⎡ S ( X1 , X 2 , , X n ) c θ ⎦⎤ = 1 − α for all θ ε Θ
Pθ ⎡⎣ S0 ( X1 , X 2 , , X n ) c θ ⎤⎦ = 1 − α for all θ ∈ Θ (i )
and Pθ ⎡⎣ Sθ ( X 1 , X 2 , , X n ) c θ ⎤⎦ ≤ Pθ ⎡⎣ S ( X1 , X 2 , , X n ) c Θ ⎤⎦ for all θ , θ ′∈ Θ (θ ≠ θ ′ ) ( ii )
whatever, the other family of sets satisfies ( i ) and ( ii ) . The implication of equation ( ii ) is that it has a smaller
probability of including a wrong value or set of values of the parameter θ then any other family of sets at the same
level. In this sense S0 ( x ) is the smallest confidence set of level α corresponding to the set of observation x . In
most cases a family of UMA sets cannot be obtained. Hence we introduce the concept of unbiasedness.
Definition
A family of sets S ( Χ ) for different values of x ∈ ℑ of the Θ is said to constitute a family of unbiasedness
Pθ ⎡⎣ S ( X1 , X 2 , , X n ) c θ ⎤⎦ ≤ 1 − α for all θ , θ ′ ∈ Θ, θ ≠ θ ′
Hence S ( Χ ) is a family of unbiased sets iff the probability for S ( X 1 , X 2 , , X n ) to cover θ when some
alternative value θ ′ is true does not exceed the same probability for the case when θ itself is true. Surely this is a
desirable feature of a family of confidence sets.
Pθ ⎣⎡ S0 ( X 1 , X 2 , , X n ) c θ ⎦⎤ = 1 − α for all θ ∈ Θ,
Pθ ′ ⎡⎣ S0 ( X 1 , X 2 , , X n ) c θ ⎤⎦ ≤ 1 − α for all θ , θ ′∈Θ (θ ≠ θ ′ )
and Pθ ′ ⎡⎣ S0 ( X1 , X 2 , , X n ) c θ ⎤⎦ ≤ Pθ ′ ⎡⎣ S ( X 1 , X 2 , , X n ) c θ ⎤⎦ for all θ , θ ′∈Θ (θ ≠ θ ′ )
P ( x ∈ w H 0 ) = L0 dx = α
∫ ... ... ... (1)
w
test of level α .
H1 : θ ≠ θ 0 i.e. against H1 : θ = θ1 ≠ θ 0 if
P ( x ∈ w H 0 ) = L0 dx = α
∫ ... ... ... (1)
w
on it is said to be unbiased if the power of the test exceeds the size of the critical region i.e.
Power of the test ≥ Size of the C.R
⇒ 1− β ≥ α
⇒ Pθ1 ( w ) ≥ Pθ0 ( w )
⇒ P [ x : x ∈ w | H1 ] ≥ P [ x : x ∈ w | H 0 ]
In other words, the critical region w is said to be unbiased if
Pθ1 ( w ) ≥ Pθ0 ( w ) ; ∀ θ ( ≠ θ 0 ) ∈ Ω .
i.e.
i) E {φ ( x ) | θ 0 } = P ( x ∈ w | θ0 ) = α
ii) E {φ ( x ) | θ1} ≥ E {φ ( x ) | θ 0 } ; ∀ θ1 ∈ Ω
Suppose that for every other test φ satisfying the conditions (1) and ( 2 ) we have
*
{
E {φ ( x | θ1 )} ≥ E φ * ( x | θ1 )} ; ∀ θ1 ∈ Ω
Hypothesis-I ~ 1 of 11
UMPU Type A1 Test
Let φ be an unbiased test (or w a critical region) of sizw α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 ;
θ1 ∈ Ω , i.e.
i) E {φ ( x ) θ0 } = P ( x ∈ w θ0 ) = α
ii) E {φ ( x ) θ1} ≥ E {φ ( x ) θ 0 } ; ∀ θ1 ∈ Ω
δ
iii) E {φ ( x ) θ1} =0
δθ1 θ1 =θ0
Then φ is called UMPU type A1 test. For a UMPU test it is not required that power curve should have a regular
minimum at θ 0 but this is often the name UMPU test is used to imply type A1 test.
Show that 1 − β ≥ α .
Let w be a BCR of size α for testing a simple H 0 against a simple H1 . Then by definition we have,
P ( x ∈ w H 0 ) = ∫ L ( x H 0 ) dx = α
w
L ( x H0 )
≤K if x ∈ w ... ... ... (i )
L ( x H1 )
L ( x H0 )
and ≥K if x ∈ ( S − w ) ... .... ... ( ii )
L ( x H1 )
From ( i ) we have, K .L ( x H1 ) ≥ L ( x H 0 )
⇒ K ∫ L ( x H1 ) dx ≥∫ L ( x H 0 ) dx
w w
⇒ K (1 − β ) ≥ α " " " ( iii )
Again from ( ii ) we have, K .L ( x H1 ) ≤ L ( x H 0 )
⇒ K ∫ L ( x H1 ) dx ≤ ∫ L ( x H 0 ) dx
S −w S −w
⇒ K β ≤ (1 − α ) " " " ( iv )
From ( iii ) and ( iv ) we have, K (1 − α )(1 − β ) ≥ K αβ
⇒ 1− β ≥ α ( Proved )
Example: Let x1 , x2 , ..., xn be a random sample darwn from N ( µ ,1) . For testing H 0 : µ = µ0 against
Solution
Since x1 , x2 , ..., xn are drawn from N ( µ ,1) , we have,
n
⎛ 1 ⎞ − 2 ∑ ( xi − µ0 )
1 2
L ( x H0 ) = ⎜ ⎟ e
⎝ 2π ⎠
n
⎛ 1 ⎞ − 2 ∑ ( xi − µ1 )
1 2
and L ( x H1 ) = ⎜ ⎟ e
⎝ 2π ⎠
Hypothesis-I ~ 2 of 11
According to Neyman-Pearson lemma, we have the BCR is given by
L ( x H0 )
≤K
L ( x H1 )
⇒
⎡ n
{ 2 ⎤
exp ⎢ − ( x − µ0 ) − ( x − µ1 ) ⎥ ≤ K
⎣ 2
2
⎦
}
⇒
⎡ n 2
{ ⎤
exp ⎢ − x − 2 x µ0 + µ02 − x 2 + 2 x µ1 − µ12 ⎥ ≤ K
⎣ 2 ⎦
}
⇒
⎡ n
⎣ 2
{ ⎤
exp ⎢ − 2 x ( µ1 − µ0 ) + µ02 − µ12 ⎥ ≤ K
⎦
( )}
⇒
n
{
− 2 x ( µ1 − µ0 ) + µ02 − µ12 ≤ ln K
2
( )}
⇒ ( 2
2 x ( µ1 − µ0 ) + µ02 − µ12 ≥ − ln K
n
)
⇒ ( 2
2 x ( µ1 − µ0 ) ≥ µ12 − µ02 − ln K
n
)
⇒ x (µ − µ ) ≥
(µ 2
1 − µ02 ) − 1 ln K " " " (i )
1 0
2 n
IF µ1 > µ0 then
µ1 + µ0 1
x≥ + ln K
2 n ( µ0 − µ1 )
⇒ x ≥ λ1 ( say ) " " " ( ii )
We know that
P ( x ≥ λ1 H 0 ) = α
∞
⇒ ∫ f ( x ) dx = α under H 0 : µ = µ0
λ1
∞
n −
n
( x − µ0 ) 2 ⎛ σ2 ⎞
⇒
2π ∫ e 2 dx =α since x ~ N ⎜ µ ,
⎜
⎝ n
⎟⎟
⎠
λ1
∞ 2
n −z 1
⇒
2π ∫
1− µ
e 2
n
dz = α
0
1
n
∞ 2
1 −z
⇒
2π ∫ e 2 dz =α
zα
We have,
λ1 − µ0
zα =
1
n
1
⇒ λ1 − µ0 = zα
n
1
⇒ λ1 = µ0 + zα
n
Hence from equation we have, the BCR is
1
x ≥ µ0 + zα
n
Hypothesis-I ~ 3 of 11
Again, if µ1 < µ0 , then frim the equation ( i ) we have, the BCR is:
µ12 − µ02 1
x ( µ0 − µ1 ) ≥ − ln k
2 n
µ0 + µ1 1
⇒ x≥− − ln k
2 n ( µ0 − µ1 )
µ0 + µ1 1
⇒ x≤ + ln k
2 n ( µ0 − µ1 )
⇒ x ≤ λ2 ( say ) " " " ( iii )
Again, we know that,
P ( x < λ2 H 0 ) = α
λ2
⇒ ∫ f ( x ) dx = α under H 0 : µ = µ0
−∞
λ2 n
n − ( x − µ0 )2
⇒
2π ∫ e 2 dx =α
−∞
λ2 − µ0
1
z2
x − µ0
n
n − 1
⇒
2π ∫ e 2 dz = α
1
=z ⇒ dx =
n
dz
−∞
n
∞ z2
1 −
⇒
2π ∫
λ2 − µ0
e 2 dz = 1 − α
1
n
∞ z2
1 −
⇒
2π ∫ e 2 dz = 1 − α
z1−α
λ2 − µ0
∴ z1−α =
1
n
1
⇒ λ2 = µ0 + z1−α
n
By symmetry of normal distribution, we have,
z1−α = − zα
1
∴ λ2 = µ0 − zα
n
1
x ≤ µ0 − zα
n
1 1
w : x ≤ xα 2 , x ≥ xα1 where, xα1 = µ0 + zα1 and xα 2 = µ0 − zα 2
n n
where z is N ( 0,1) and α1 % to the right and α 2 % to the left side value.
∴ P ⎡⎣ z ≤ zα1 ⎤⎦ = α1 and
P ⎡⎣ z ≥ zα 2 ⎤⎦ = α 2 where, α1 + α 2 = α
Hypothesis-I ~ 4 of 11
For µ1 > µ0 , the power function is
∞ ∞
∫ f ( x ) dx = ∫ f ( z ) dz = 1− F (m)
xα 2 xα 2 − µ1
=m
1
n
= F ( −m )
⎛ ⎞
⎜ µ1 − xα 2 ⎟
= F⎜ ⎟
⎜ 1 n ⎟
⎝ ⎠
⎡ 1 ⎤
= F ⎡ n ( µ1 − µ0 ) + zα 2 ⎤ ⎢∴ xα 2 = µ0 − zα 2 ⎥
⎣ ⎦
⎣ n⎦
∫ f ( x ) dx = ∫ f ( z ) dz
−∞ −∞
= F ⎡ − n ( µ1 − µ0 ) + zα1 ⎤
⎣ ⎦
So power,
P = F ⎡ n ( µ1 − µ0 ) + zα 2 ⎤ + F ⎡ − n ( µ1 − µ0 ) + zα1 ⎤
⎣ ⎦ ⎣ ⎦
⇒ P = F ⎡ n ∆ + zα 2 ⎤ + F ⎡ − n ∆ + zα1 ⎤
⎣ ⎦ ⎣ ⎦ [ ∆ = µ1 − µ0 ]
n ∆+ zα 2
−
z2 − ( n ∆− zα1 ) −
z2
1 1
⇒ P= ∫ 2π
e 2 dz + ∫ 2π
e 2 dz
−∞ −∞
( ) ( )
2 2
n ∆+ zα 2 n ∆− zα1
1 − 1 1 − ⎛ 1 ⎞
∴ P= e 2 + e 2
⎜− ⎟=0
2π n 2π ⎝ n⎠
⎧ ( n ∆+ zα 2 )
2
( n ∆− zα1 )
2
⎫
n ⎪ − − ⎪
⇒ ⎨e 2 −e 2
⎬=0
2π ⎪ ⎪
⎩ ⎭
( )
2
n ∆− zα1
1
( )
2
− n ∆+ zα 2 −
⇒ e 2 =e 2
⇒ n ∆ + zα 2 = n ∆ − zα1
⇒ − zα 2 = z α1
∴ α1 = α 2
Thus we see the power curve is minimum at µ1 = µ0 if and only if α1 = α 2 . Otherwise the minimum occurs at some
µ1 ≠ µ0 implying that the probability of rejecting H 0 is actually smaller when H 0 is false them when it is true,
Evidently two curves (b ) and (c) representing one sided UMP tests are biased. Power curve (a) represents a
most powerful test among all unbiased tests, but not a most powerful among all tests.
test. This test is also called uniformly most powerful unbiased test of type A . The critical region associated with this
test is called unbiased critical region of type A .
Hypothesis-I ~ 5 of 11
The region w is said to be a type A critical region of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 , if
i) P ( x ∈ w | H0 ) = α
ii ) P ( x ∈ w | H1 ) ≥ α
(
iii ) P ( x ∈ w | H1 ) ≥ P x ∈ w* | H1 )
δ
iv) ⎡ P ( x ∈ w | H1 ) ⎤⎦ =0
δθ1 ⎣ θ1 =θ0
δ2 δ2 ⎡
v) ⎡
δθ12 ⎣
P ( x ∈ w | H 1 ) ⎤ * ⎤
⎦θ1 =θ0 δθ 2 ⎣ P x ∈ w | H1 ⎦θ =θ
≥
1 0
( )
1
We must choose a critical region for which the power is largest in the neighborhood of H 0 : θ = θ 0 . This condition is
made by ( v ) , conditions ( i ) , ( ii ) and ( iii ) controls the first type of error and unbiasedness and condition ( iv )
makes the region locally unbiased. This test is recommended only when H 0 and H1 are close to each other. Also
condition (v) states that the rate of increase of the curve related to w is very large than that of w* in the
neighborhood of θ 0 .
Theorem
If w be an MP region for testing H 0 : θ = θ 0 against H1 : θ = θ1 , then it is necessarily unbiased. Similarly, if w be
Proof
If w be an MP region of size α for testing H 0 against H1 then for a non-negative constant k ,
∫ L0 ( x ) dx = ∫ L0 ( x ) dx = α
w
{ x| L1 ( x ) > kLo( x ) }
where L0 ( x ) be the likelihood function under H 0 , and
∫ L1 ( x ) dx = ∫ L1 ( x ) dx = α
w
{x|L ( x )> kL ( ) }
1 o x
So that,
∫ L1 ( x ) dx > α
w
Hypothesis-I ~ 6 of 11
If k < 1 , then from ( ii ) we have,
1 − ∫ L1 ( x ) dx < 1 − α
w
which implies,
Hence w is unbiased.
In case w is a UMP region of size α , then too the above approach will hold good if for θ1 we read θ such that
θ ∈ Ω . So we have,
Pθ ( w ) > α for all θ ∈ Ω
So, here also w is unbiased.
( )
Example: Consider the case of random sample from N θ , σ 2 , where θ is unknown ( −∞ < θ < ∞ ) and σ 2 is known.
Solution
⎡
∑ ( xi − θ ) ⎤
n 2
⎛ 1 ⎞
L ( x) = ⎜ ⎟ exp ⎢ − ⎥
⎝ σ 2π ⎠ ⎢ 2σ 2 ⎥
⎣ ⎦
∑ ( xi − θ )
2
n
ln L ( x ) = − ln 2πσ 2 −
2
( )
2σ 2
δ ln L ( x ) 2
Hence = 2 ∑ ( xi − θ )( −1)
δθ 0 σ
n ( x − θ0 )
⇒ φ=
σ2
δ 2 ln L ( x ) n
∴ =φ' =−
δθ 02 σ2
n
= a + bφ ( say ) where a = − , b=0
σ2
w = { x φ < c1 ∪ φ > c2 }
= { x x < d1 ∪ x > d 2 } ( say )
where c1 and c2 or d1 and d 2 are constants such that
∫ Lθ ( x ) dx = α
0
and ∫ φ Lθ ( x ) dx = 0
0
w w
Hypothesis-I ~ 7 of 11
d2
∫ gθ ( x ) dx = 1 − α
0
" " " ( *)
d1
d2
⎡ ∞ ⎤
and ∫ φ gθ0 ( x ) dx = 0 ⎢since
⎢⎣
∫ φ gθ ( x ) dx = 0⎥⎥
0
" " " (**)
d1 −∞ ⎦
n ( d 2 −θ0 )
σ y2
1 −
2π ∫ e 2 dy = 1 − α
n ( d1 −θ0 )
σ
n ( d 2 −θ 0 )
n ( x − θ0 )
σ y2
− ⎡ ny⎤
∫ e 2 dy = 0 ⎢since φ =
⎣⎢ σ 2
= ⎥
σ ⎥⎦
n ( d1 −θ0 )
σ
n ( d 2 −θ0 )
⎡ − y2 ⎤ σ
⇒ ⎢ −e 2 ⎥ =0 " " " (***)
⎢ ⎥ n ( d1 −θ 0 )
⎣ ⎦
σ
2 2
1 ⎧⎪ n ( d 2 −θ0 ) ⎫⎪ 1 ⎧⎪ n ( d1 −θ0 ) ⎫⎪
− ⎨ ⎬ − ⎨ ⎬
2 ⎩⎪ σ ⎭⎪ 2 ⎩⎪ σ ⎭⎪
⇒ −e +e =0
2 2
1 ⎧⎪ n ( d 2 −θ0 ) ⎫⎪ 1 ⎧⎪ n ( d1 −θ0 ) ⎫⎪
− ⎨ ⎬ − ⎨ ⎬
2 ⎩⎪ σ ⎭⎪ 2 ⎩⎪ σ ⎭⎪
⇒ −e = −e
2 2
1 ⎧⎪ n ( d 2 − θ0 ) ⎫⎪ 1 ⎧⎪ n ( d1 − θ 0 ) ⎫⎪
⇒ − ⎨ ⎬ =− ⎨ ⎬
2 ⎪⎩ σ ⎭⎪ 2 ⎩⎪ σ ⎭⎪
Solving (***) we have,
n ( d1 − θ0 ) n ( d2 − θ0 )
=−
σ σ
n ( d 2 −θ0 )
σ y2
1 −
and since
2π ∫ e 2 dy = 1 − α
n ( d 2 −θ 0 )
−
σ
n ( d2 − θ0 )
We have, = τα
σ 2
σ
⇒ d 2 = θ0 + τ α .
2 n
n ( d1 − θ0 )
Hence also, − = τα
σ 2
σ
⇒ d1 = θ0 − τ α .
2 n
As such, the type-A region of size α is
⎧ σ σ ⎫ ⎧⎪ n x − θ0 ⎫⎪
w = ⎨ x | x < θ0 − τ α . ∪ x > θ0 + τ α . ⎬ = ⎨x | > τα ⎬
⎩ 2 n 2 n⎭ ⎩⎪ σ 2⎪
⎭
Hypothesis-I ~ 8 of 11
Similar Region (Testing Composite Hypothesis)
Let X be a random variable distributed as f ( x ; θ1 , " , θ k ) . A hypothesis of the form
Since the parameters θ r +1 , " , θ k are unspecified by H 0 , α given in ( 2) is in general a function of these
parameters and hence con not be uniquely determined. If α does not depend on the unspecified parameters the
region ω for which equation ( 2 ) is true is called a region similar to the sample space with respect to the parameters
Let ω be any critical region of size α . Now we define a indicator function or variable Iω of the critical region ω
∫ L ( x | H ) = ∫ Iω L ( x | H1 ) dx = E ( Iω | H )
ω S
= Expected value of Iω when it is true
⎧α ; if H = H 0
and E ( Iω | H ) = ⎨
⎩1 − β ; if H = H1
If the parameter θ admits a sufficient estimator the likelihood function factorizes into
L ( x | θ ) = g ( t , θ ) h ( x, t )
where g ( t , θ ) is the frequency function of the sufficient statistic t , and h ( x, t ) is the functions of sample values
E ( Iω | t ) = E ( Iω )
If t is sufficient for θ , both H 0 and H1 are true, equation ( 4 ) implies that there is a region based on t similar to
the sample with size and power exactly equal to the original critical region ω .
Neyman Structure
A test with critical region ω is said to be of Neyman structure with respect to t if E ( Iω | t ) is the same almost
everywhere for θ i.e. a test satisfying E ( Iω | t ) = α is said to have Neyman structure with respect to t .
Example: Let x1 , " , xn be a random sample drawn from N µ , σ 2 ( ) where both µ and σ 2 are unknown. Test
H 0 : µ = µ0 against H1 : µ = µ1 .
Solution
The hypothesis H 0 has one d . f . , the parameter σ 2 being unspecified. We have
n
⎛ 1 ⎞ ⎡ 1 2⎤
L ( x | H0 ) = ⎜ ⎟ exp ⎢ − 2
⎝ 2πσ ⎠ ⎣ 2σ
∑ ( xi − µ0 ) ⎥
⎦
n
∑ ( xi − µ0 ) is sufficient for σ 2 and also this is complete sufficient statistic. Consider a
2
Under H 0 the statistic V =
i =1
simple H 0 and H1 as
H 0 : µ = µ0 , σ 2 = σ 02
H1 : µ = µ1 , σ 2 = σ12
According to Neyman-Pearson lemma, we have,
L ( x | H0 ) ⎡ 1 2⎤
∑ ( xi − µ0 ) ∑ ( xi − µ1 )
2 1
= exp ⎢ − 2 + ⎥ ≤ Constant
L ( x | H1 ) ⎣⎢ 2σ 0 2σ12 ⎦⎥
With this we can find out the MP critical region of size α for testing simple H 0 against simple H1 is
( ) (
L x | µ1 , σ12 > k ( v ) L x | µ0 , σ 02 )
where k ( v ) is such that the conditional size of ω0 given V = v is α which implies that
Case I:
If µ1 > µ0 , there condition (1) is equivalent to
( x − µ0 ) > k2 ( v )
n ( x − µ0 )
⇒ > k3 ( v ) ( say )
v
Hypothesis-I ~ 10 of 11
So as such, we can write,
⎪⎧ n ( x − µ0 ) ⎪⎫
ω0 = ⎨ x | > k3 ( v ) ⎬
⎪⎩ v ⎪⎭
n ( X − µ0 ) n ( X − µ0 )
Here, and v are independent. So that the conditional distribution given V = v is the
v v
n ( x − µ0 )
same as the distribution of .
V
⎡ n ( X − µ0 ) ⎤
So k3 ( v ) will be independent of V . Hence we can write P⎢ > k3 ⎥ = α
⎢⎣ V ⎥⎦
We know that,
n ( X − µ0 ) n ( X − µ0 )
=
n ( X − µ0 ) + ∑ ( xi − X )
v 2
n ( X − µ0 ) t n ( X − µ0 )
= = ~ tn −1 where, t=
⎛ t2 ⎞ t2 + n −1 S
∑ ( xi − X )
2
⎜⎜ 1 + ⎟⎟
⎝ n −1 ⎠
n ( X − µ0 )
Since < k3 iff t > k4 ( say ) , we may also write,
v
⎧⎪ n ( x − µ0 ) ⎫⎪
ω0 = ⎨ x | > k3 ( v ) ⎬ = { x | t > k4 } ⎡Where Pθ [t > k4 ] = α ⎤
⎣ ⎦
⎪⎩ ⎪⎭
0
v
⎧⎪ n ( x − µ0 ) ⎫⎪
ω0 = ⎨ x | > tα , n −1 ⎬
⎩⎪ S ⎭⎪
Since this is independent of σ 0 and σ 1 , it is the MP similar region of size α for testing H 0 against H1 .
2 2
Case II:
If µ1 < µ0 , in this case we have as before
( µ1 − µ0 )( x − µ0 ) > k1 ( v )
⇒ ( x − µ0 ) < k2′ ( v )
So preceding as before, MP similar region of size α for testing H 0 against H1 is
⎧⎪ n ( x − µ0 ) ⎫⎪
ω0′ = ⎨ x | < tα , n −1 ( v ) ⎬
⎩⎪ S ⎭⎪
Since ω0 is independent of µ1 i.e. it is the same for all µ1 > µ0 in fact it is the UMP similar region of size α for
Similarly, ω0′ is the UMP similar region of size α for testing H 0 : µ = µ1 against H1 : µ < µ1 .
Hypothesis-I ~ 11 of 11
Likelihood Ratio Test
Introduction
Neyman and Pearson (1928) developed a simpler method of testing hypothesis called the method of Likelihood
Ratio. Just like a method of maximum likelihood which yields an estimate of a parameter, the method of maximum
likelihood ratio test yields a statistic rather more easily.
Definition
Let θ ∈ Ω be a vector of parameters and let X = ( x1 , " , xn ) be a random vector with p.d . f . fθ , θ ∈ Ω .
Consider the problem of testing the null hypothesis H 0 : X ~ fθ , θ ∈ Ω0 against the alternative hypothesis
H1 : X ~ fθ , θ ∈ Ω1 = Ω − Ω0 . The likelihood ratio test for testing H 0 against H1 is defined as the ratio
sup fθ ( x1 , " , xn )
λ = λ ( X ) = λ ( x1 , " , xn ) =
θ ∈Ω0
=
( )
L Ω
ˆ
0
sup fθ ( x1 , " , xn ) L (Ω
ˆ)
θ ∈Ω
And the test is of the form: reject H 0 iff λ ( X ) < C , where C is some constant, determined from the size α (the
(
level of significance, 0 < α < 1 i.e., sup Pθ x : λ { x} < C = α ).)
θ ∈Ω0
Remarks
The numerator of the likelihood ratio λ is the best explanation of X that the H 0 can provide and denominator is
the best possible explanation of X . H 0 is rejected if there is a much better explanation of X then the best one
Properties of LRT
LRT has some desirable properties, specially large sample properties. LRT is generally UMP if an UMP test exists.
We state below, the two asymptotic properties of LRT.
i) The likelihood ratio λ is a function of x only and hence λ is a statistic which does not depend on θ .
ii) Since λ is the ratio of conditional maximum of likelihood function to its unconditional maximum, thus
0 ≤ λ ≤ 1.
λ0
iii) The critical region is 0 < λ < λ0 when ∫0 h ( λ ) d λ = α , the level of significance.
iv) λ is always a function of sufficient statistic.
v) If the null hypothesis H 0 is composite, the distribution of λ may not be always unique.
Let us consider, two independent random variables, X 1 and X 2 follows normal distribution with N µ1 , σ 1 ( 2
) and
( )
N µ2 , σ 22 respectively. We want to test the hypothesis
Let x1i ( i = 1, " , m ) and x2 j ( j = 1, " , n ) be two independent random samples of sizes m and n from the
(
population N µ1 , σ 1
2
) and N ( µ , σ ) respectively. Then the likelihood function is given by-
2
2
2
m n
⎛ 1 ⎞ 2 ⎡ 1 m
2⎤⎛ ⎞ 2 ⎡ 1 n
2⎤
∑ ( x1i − µ1 ) ∑ ( x2 j − µ2 )
1
L=⎜
⎜ 2πσ 2 ⎟⎟
exp ⎢ − 2 ⎥ ⎜⎜ 2 ⎟
⎟
exp ⎢ − 2 ⎥ " " " (1)
⎝ 1 ⎠ ⎣⎢ 2σ 1 i =1 ⎦⎥ ⎝ 2πσ 2 ⎠ ⎢⎣ 2σ 2 j =1 ⎥⎦
m n
⎛ ⎞ ⎛ 1 ⎞ ( m+ n)
( )
2 2
−
Now ˆ =⎜ 1 ⎟
L Ω ⎜⎜ ⎟⎟ e 2
⎜ 2π s 2 ⎟
⎝ 2π s2
2
⎝ 1 ⎠ ⎠
Under H 0 , the likelihood function is given by
m n
⎛ 1 ⎞ 2 ⎡ 1 m
2⎤⎛ ⎞ 2 ⎡ 1 n
2⎤
∑ ( x1i − x1 ) ∑ ( x2 j − x2 )
1
L ( Ω0 ) = ⎜ ⎟⎟ exp ⎢ − 2 ⎥ ⎜⎜ 2 ⎟
exp ⎢ − 2 ⎥
⎜ 2πσ 2 ⎟
⎝ 1 ⎠ ⎣⎢ 2σ1 i =1 ⎦⎥ ⎝ 2πσ 2 ⎠ ⎢⎣ 2σ 2 j =1 ⎥⎦
To obtain the maximum value of L ( Ω0 ) for variation in µ , σ 1 , σ 2 , it will be seen that estimate of µ is obtained as
2 2
m 2 ( x1 − µ ) n 2 ( x2 − µ )
m
+ n
∑ ( x1i − µˆ ) ∑ ( x2 j − µˆ )
2 2
i =1 j =1
And thus is complicated function of the sample observations. It is impossible to obtain the critical region 0 < λ < λ0 ,
for given α since the distribution of the population variances is ordinarily unknown. As an approximate test, −2 ln λ
( m+ n)
⎧ ⎫ 2
( m+n)
m+n
Now ( )
ˆ = ⎨⎪
L Ω
⎪
⎬
⎡ ms12 + ns22 ⎤ ⎪
e
−
2 ⎡ Substituting the values of µˆ1 , µˆ 2 , σˆ 2 in (1) ⎤
⎣ ⎦
⎪⎩ 2π ⎣ ⎦⎭
Under H 0 the likelihood function is
m+n
⎛ 1 ⎞ ⎡ 1 ⎧⎪ m n
⎪⎤
2⎫
∑ ∑ ( x2 j − µ )
2
L ( Ω0 ) = ⎜ exp ⎢ − 2 ⎨ ( x1i − µ ) + ⎬⎥
2
⎟
⎝ 2πσ 2 ⎠ ⎢⎣ 2σ ⎩⎪ i =1 j =1 ⎭⎪⎥⎦
m+n ⎪⎧ ⎫
2⎪
m n
∑ ∑ ( x2 j − µ )
1
ln L ( Ω0 ) = C − ln σ 2 − 2 ⎨ ( x1i − µ ) +
2
⇒ ⎬
2 2σ ⎩⎪ i =1 j =1 ⎭⎪
where C is a constant independent of µ and σ 2 . The likelihood equation for estimating µ is
∂ ln L
=0
∂µ
1 ⎧⎪ m n ⎫⎪
⇒ ∑
⎨ ( x1i − µ ) +
σ 2 ⎩⎪ i =1
∑ ( x2 j − µ )⎬ = 0
j =1 ⎭⎪
⇒ ( mx1 + nx2 ) − ( m + n ) µ = 0
( mx1 + nx2 )
⇒ µ=
(m + n)
∂ ln L
Also, =0
∂σ 2
m+n 1 ⎧⎪ m n
2⎫
⎪
⇒ −
2σ 2
+ 4 ⎨∑ 1i
2σ ⎩⎪ i =1
( x − µ )2 + ∑ x2 j − µ ( ) ⎬=0
j =1 ⎭⎪
1 ⎪⎧ m n ⎫
2⎪
⇒ σˆ 2 = ⎨∑ ( x1i − µˆ ) + ∑ x2 j − µˆ
( m + n ) ⎪⎩ i =1
2
( ) ⎬
j =1 ⎪⎭
But
m m
∑ ( x1i − µˆ ) = ∑ ( x1i − x1 + x1 − µˆ )
2 2
i =1 i =1
m
= ∑ ( x1i − x1 ) + m ( x1 − µˆ )
2 2
i =1
mn 2 ( x1 − x2 )
2 2
⎛ mx + nx2 ⎞
= ms12 + m ⎜ x1 − 1 ⎟ = ms12 +
⎝ m+n ⎠ ( m + n )2
Similarly
nm 2 ( x2 − x1 )
n 2
∑( )
2
x2 j − µˆ = ns22 +
j =1 ( m + n )2
1 ⎧⎪ 2 mn ( x1 − x2 ) nm 2 ( x2 − x1 ) ⎫⎪
2 2 2
∴ σˆ 2 = ⎨ ms + + ns 2
+ ⎬
( m + n ) ⎩⎪ 1 ( m + n )2
2
( m + n )2 ⎭⎪
1 ⎧⎪ 2 mn ( x1 − x2 ) ⎫⎪
2
= ⎨ ms + ns 2
+ ⎬
( m + n ) ⎩⎪ 1 2
( m + n ) ⎭⎪
Likelihood Ratio Test ~ 3 of 9
( m+n)
⎧ ⎫ 2
⎪⎪ m+n ⎪⎪ ( m+ n)
∴ ( )
L Ω
ˆ
0 =⎨
⎡ 2 mn 2⎤⎪
⎬ e
−
2
⎪ 2π ms + ns 2
+ ( x − x )
⎪⎩ ⎢⎣ 1 2
m+n
1 2 ⎥⎦ ⎪
⎭
∴ λ=
( )
L Ω
ˆ
0
L (Ω
ˆ)
( m+ n)
( m+ n)
⎧ ⎫ 2 −
⎪ ⎪ ⎧ mn ( x1 − x2 )
2 ⎫ 2
ms12 + ns22 ⎪ ⎪
=⎨ ⎬ = ⎨1 + ⎬
⎪ ms12 + ns22 + mn ( x1 − x2 )2 ⎪ ⎩⎪ (
( m + n ) ms12 + ns22 ) ⎭⎪
⎩ m+n ⎭
We know that, under H 0 : µ1 = µ2 , the test statistic
x1 − x2
t=
1 1
where
1
m+n−2
ms12 + ns22 ( )
S +
m n
follows student t distribution with m + n − 2 d . f .
H 0 : µ1 = µ2 = µ ; σ 12 = σ 22 = σ 2 > 0
against , H1 : µ1 ≠ µ2 ; σ12 = σ 22 = σ 2 > 0
If t =
x1 − x2
S 1 +1
> tm + n − 2 α ( 2 ) reject H 0 , otherwise H 0 may be accepted.
m n
Likelihood Ratio Test for Testing the Equality of Variances of Two Population
If x1i ( i = 1, " , m ) and x2 j ( j = 1, " , n ) be independent random samples of size m and n form N µ1 , σ 1 ( 2
)
(
and N µ 2 , σ 2
2
) respectively then
m n
⎛ 1 ⎞ 2 ⎡ 1 m
2⎤⎛ 1 ⎞ 2 ⎡ 1 n
2⎤
L=⎜
⎜ 2πσ 2 ⎟⎟
exp ⎢ − 2 ∑ ( x1i − µ1 ) ⎥⎥ ⎜⎜ 2πσ 2 ⎟⎟ exp ⎢ − 2 ∑ ( x2 j − µ2 ) ⎥ " " " (1)
⎝ 1 ⎠ ⎣⎢ 2σ 1 i =1 ⎦⎝ 2 ⎠ ⎣⎢ 2σ 2 j =1 ⎦⎥
In this case,
m n
⎛ ⎞ ⎛ 1 ⎞ ( m+ n)
( )
2 2
−
ˆ =⎜ 1 ⎟
L Ω ⎜⎜ ⎟⎟ e 2
⎜ 2π s 2 ⎟
⎝ 2π s2
2
⎝ 1 ⎠ ⎠
Under H 0 , the likelihood function is given by
m+ n
⎛ 1 ⎞ ⎡ 1 ⎧⎪ m n
⎪⎤
2⎫
∑ ∑ ( x2 j − µ2 )
2
L ( Ω0 ) = ⎜ exp ⎢ − 2 ⎨ ( x1i − µ1 ) + ⎬⎥ ( 2)
2
⎟ " " "
⎝ 2πσ 2 ⎠ ⎢⎣ 2σ ⎩⎪ i =1 j =1 ⎭⎪⎥⎦
m n
∑ ∑ x2 j = x2
1 1
µˆ1 = x1i = x1 and µˆ 2 =
m i =1 n j =1
1 ⎡m n
2⎤
σˆ 2 = ∑
⎢ ( x1i − µˆ1 ) + ∑ ( x2 j − µˆ 2 )
2
and ⎥
m + n ⎢⎣ i =1 j =1 ⎥⎦
1 ⎡m n
2⎤
∑ ∑ ( x2 j − x2 )
1 ⎡ 2
⎢ ( x1i − x1 ) + ms1 + ns22 ⎤⎦
2
= ⎥ =
m + n ⎣⎢ i =1 j =1 ⎦⎥ m+n ⎣
( m+ n)
⎧ ⎫ 2
( m+n)
m+n
( )
L Ω
ˆ
0
⎪
=⎨
⎪
⎬ e
−
2 ⎡ Substituting the values of µˆ1 , µˆ 2 , σˆ 2 in (1) ⎤
⎣ ⎦
⎪⎩ 2π ⎡⎣ ms1 + ns2 ⎤⎦ ⎪⎭
2 2
⎧ m n ⎫
∴ λ=
L Ω
ˆ( )
0
= (m + n)
m+n ( ) ( )
⎪⎪ s12 2 s22 2 ⎪⎪
⎨ m+n ⎬
L (Ω
ˆ)
2
⎪
⎪⎩ (ms1
2
+ )
ns 2
2
2 ⎪
⎪⎭
⎧ m n ⎫
( ) ( )
m+n
(m + n) ⎪⎪ ms1 ns22 2 ⎪⎪
2 2
( 3)
2
= m n ⎨ m+n ⎬
" " "
m 2n 2 ⎪
⎩⎪( )
ms1 + ns2 2 ⎪
2 2 ⎪
⎭
We know, that under H 0 the statistic
∑ ( x1i − x1 )
2
( m − 1) s12
F= =
∑ ( x2 j − x2 )
2
s22
( n − 1)
follows F -distribution with ( m − 1) , ( n − 1) d . f . and also implies
m ( n − 1) s12
F=
n ( m − 1) s22
( m − 1) ms 2
⇒ F = 12
( n − 1) ns2
Likelihood Ratio Test ~ 5 of 9
Substituting in ( 3) and simplifying, we get
⎧ m ⎫
m+ n ⎪ ⎛ m −1 F ⎞ 2 ⎪
(m + n) ⎜ ⎟
2 ⎪ ⎝ n −1 ⎠ ⎪
λ= m n ⎨ m+n ⎬
m 2n 2 ⎪⎛ m −1 ⎞ 2 ⎪
⎪ ⎜1 + F⎟ ⎪
⎪⎩ ⎝ n −1 ⎠ ⎪⎭
Thus λ is a monotonic function of F and hence the test can be carried on with F as test statistics. The critical
region 0 < λ < λ0 can be given by pair of intervals F ≤ F1 and F ≥ F2 , where F1 and F2 are determined so that
under H 0
P ( F ≥ F2 ) = α and P ( F ≥ F1 ) = 1 − α
2 2
F2 = Fm −1, n −1 α ( 2) and (
F1 = Fm −1, n −1 1 − α
2 )
where, Fm, n (α ) is upper α point of F -distribution with ( m, n ) d . f .
Example: Let x1 , " , xn be a random sample from f ( x ; θ ) = θ e−θ x I ( 0, ∞ ) ( x ) where Θ = {θ , θ > 0} . Test H 0 : θ ≤ θ0
against H1 : θ > θ 0 .
Solution
θ ∈Θ θ >0
⎢
⎣ ⎥⎦
n ⎡ ⎤
⎛ n ⎞ −n ⎢ 1 ⎥
=⎜ ⎟ e ⎢ Since, θ = = n
ˆ
⎜ ∑
xi ⎟⎠ ∑ xi ∑ xi ⎥
⎝ ⎢ ⎥
⎣ n ⎦
Reject H 0 if λ ≤ λ0
∑
n
⎛ θ 0 xi ⎞
Or, Reject H 0 if n
∑ xi
≤ θ 0 and ⎜
⎜ n ⎟
⎟ exp ⎣⎡ −θ 0 ∑ xi + n ⎦⎤ ≤ λ0
⎝ ⎠
Let, y = θ0 x and can say that y n exp ⎣⎡ −n ( y − 1) ⎦⎤ has a maximum for y = 1 . Hence, y < 1 and
1
That is, reject H 0 if x is less than some function of .
θ0
If that generalized likelihood ratio test having size α is Figure:
desired, k is obtained as the solution to the equation-
( ) (
Note that Pθ θ 0 X < k ≤ Pθ0 θ 0 X < k ) for θ ≤ θ 0 .
Uses of LRT
λ can be used for determination of the rejection as λ is positive monotonic function. It is used
H 0 : σ 2 = σ 02 ⎫⎪
⎬ σ 2 is specified
H1 : σ 2 ≠ σ 02 ⎪⎭
5) Test for the equality of variances of two normal populations
H 0 : σ12 = σ 22
H1 : σ12 ≠ σ 22
holds, the probability of rejecting H 0 tends to 1 as sample size tends to infinity. If c* is the CR and X the sample
The LRT is a consistent test. We have that under a very generally satisfied condition, the MLE θˆ of a parameter
vector θ is consistent. If we are dealing with a situation in which all the MLE’s are consistent, we see from the
definition of the LRT statistic that, as sample size increases,
λ→
(
L x | θ r0 , θ s ) " " " (1)
L ( x | θr , θs )
where, θ r , θ s are the true values of the parameters and θ r0 is the hypothetical values of θ r being tested. Thus,
when H 0 holds
λ → 1, in probability
and the critical region
λ ≤ cα
will therefore have its boundary cα approaching 1 . When H 0 does not hold, the limiting values of λ is (1) will
0 ≤ k <1
and thus we have
P [ λ ≤ cα ] → 1
Therefore, LRT is consistent.
⎦
}
⎡ 1
{
= exp ⎢ − 2 nx 2 − 2nµ0 x + nµ02 ⎥
⎣ 2σ
⎤
⎦
}
⎡ n 2⎤
= exp ⎢ − 2 { x − µ0 } ⎥
⎣ 2σ ⎦
Likelihood Ratio Test ~ 8 of 9
n
⇒ ln λ = − { x − µ 0 }2
2σ 2
n { x − µ 0 }2
⇒ − 2 ln λ = { x − µ 0 }2 =
σ 2
σ2
n
{ x − µ0 }2
If n is large then ~ χ (21)
σ2
n
L (θ 0 ) = h ( t | θ 0 ) k ( x ) and L (θ1 ) = h ( t | θ1 ) k ( x )
L (θ 0 ) h ( t | θ0 ) k ( x )
= ≤k
L (θ1 ) h ( t | θ1 ) k ( x )
h ( t | θ0 )
⇒ ≤k
h ( t | θ1 )
L ( x | θ2 )
values of the parameter θ1 < θ 2 , the ratio depends on X . Thoroughly the function t ( x ) and this ratio is
L ( x | θ1 )
a non-decreasing function of t ( x ) .
Example
Let X ~ b ( m, θ ) then we have
⎪⎧ ⎛ m ⎞ ⎪⎫
m
L ( x | θ ) = ⎨∏ ⎜ ⎟ ⎬θ ∑ i (1 − θ ) ∑ i
x mn − x
⎩⎪ i =1 ⎝ i ⎠ ⎭⎪
x
If θ 2 > θ1 , then
θ 2∑ i (1 − θ 2 ) ∑ i
mn − x
L ( x | θ2 )
x
=
L ( x | θ1 ) θ ∑ i (1 − θ ) ∑ i
x mn − x
1 1
∑ xi mn − ∑ xi
⎛θ ⎞ ⎪⎧ (1 − θ 2 ) ⎪⎫
=⎜ 2 ⎟ ⎨ ⎬
⎝ θ1 ⎠ ⎩⎪ (1 − θ1 ) ⎭⎪
⎛ θ (1 − θ1 ) ⎞∑ i ⎧⎪ (1 − θ 2 ) ⎫⎪
x mn
= ⎜⎜ 2 ⎟⎟ ⎨ ⎬
⎝ θ1 (1 − θ 2 ) ⎠ ⎩⎪ (1 − θ1 ) ⎭⎪
Uses
Distribution having MLR proving UMP test for testing simple H 0 against one sided H1 .
Example
1 −x
Let X ~ exp (θ ) , then f ( x) = e θ ; θ >0 , x>0
θ
−
∑ xi
1
We have, L(x |θ ) = e θ
θn
⎛ ∑ xi ⎞
exp ⎜ −
1 ⎟
L ( x | θ2 ) ⎜ θ2
θ 2n ⎟
= ⎝ ⎠
L ( x | θ1 )
1 exp ⎜ − ∑ i
⎛ x ⎞
⎟
θ1n ⎜ θ1 ⎟
⎝ ⎠
n
⎛θ ⎞ ⎛ ⎧θ − θ ⎫ ⎞
= ⎜ 1 ⎟ exp ⎜⎜ −∑ xi ⎨ 2 1 ⎬ ⎟⎟
⎝ θ2 ⎠ ⎝ ⎩ θ1θ 2 ⎭ ⎠
L ( x | θ2 )
For θ 2 > θ1 ,
L ( x | θ1 )
is a non-decreasing function of ∑ xi . So that L ( x | θ ) has MLR in ∑ xi .
Monotone Likelihood Ration (MLR) ~ 1 of 10
Example
Let X ~ N (θ , 1) , then we have,
1 ⎛ 1 2⎞
f ( x) = exp ⎜ − { x − θ } ⎟
2π ⎝ 2 ⎠
We have,
n
⎛ 1 ⎞ ⎛ 1 ⎞
L(x |θ ) = ⎜ ∑ { xi − θ }
2
⎟ exp ⎜ − ⎟
⎝ 2π ⎠ ⎝ 2 ⎠
n
⎛ 1 ⎞ ⎛ 1 ⎞
∑ { xi − θ2 }
2
L ( x | θ2 ) ⎜ ⎟ exp ⎜ − 2 ⎟
⎝ 2π ⎠ ⎝ ⎠
∴ =
L ( x | θ1 ) ⎛ 1 ⎞
n
⎛ 1 ⎞
∑ { xi − θ1}
2
⎜ ⎟ exp ⎜ − 2 ⎟
⎝ 2π ⎠ ⎝ ⎠
⎛ 1 ⎞
∑ ∑ ∑ xi2 − ∑ xiθ1 + 2 θ12 ⎟⎠
n 1 n
= exp ⎜ − xi2 + xiθ 2 − θ 22 +
⎝ 2 2 2
⎛
⎝
∑
n
= exp ⎜ xi (θ 2 − θ1 ) − θ 22 − θ12 ⎟
2
⎞
⎠
( )
which is a non-decreasing function of ∑ xi . So L ( x | θ ) has MLR in ∑ xi .
Example
Let x1 , " , xn ~ U ( 0, θ ) , θ > 0 , then we have
1
f ( x) =
θ
1
L(x |θ ) = ; 0 ≤ max xi ≤ θ
θn
L ( x | θ2 )
Define R ( x ) = ∞ if max xi > θ 2 . It follows that is a non-decreasing function of max xi and the L ( x | θ )
L ( x | θ1 ) 1≤ i ≤ n
Proof
For θ 2 > θ1 , Q (θ 2 ) > Q (θ1 ) and thus
L ( x | θ2 )
L ( x | θ1 )
{
= exp T ( x ) ⎡⎣Q (θ 2 ) − Q (θ1 ) ⎤⎦ + ⎡⎣ D (θ 2 ) − D (θ1 ) ⎤⎦ }
which is non-decreasing function in T ( x ) . Hence the exponential family has in MLR .
Example
L ( x | θ2 ) 1 + ( x − θ1 )
2
Let X ~ c (1, θ ) then we have, = →1 as x → ±∞
L ( x | θ1 ) 1 + ( x − θ2 )
2
Theorem: If a joint p.d . f . L ( x | θ ) has MLR in the statistic T = t ( x ) then there exists a UMP test for testing H 0 : θ = θ 0
against H1 : θ > θ 0 .
Proof
We know that, for testing a simple H 0 : θ = θ 0 against a simple H1 : θ = θ1 ( > θ0 ) there exists a BCR ω0 such that
L ( x | H1 )
≥ a constant " " " (1)
L ( x | H0 )
Since the ratio of the likelihood function is non-decreasing function of t ( x ) . For θ1 > θ 0 , the BCR determined by (1)
is also given by
∴ P (θ 0 ) = α
L ( x | θ2 )
≥ a constant which is inside the C.R.
L ( x | θ0 )
⇒ t ( x ) ≥ k2 " " " ( 3)
If we take k1 = k2 in ( 3) , the CR obtained is identical with ω0 defined in ( 2 ) and is still most powerful for testing
θ = θ 0 against θ = θ 2 ( > θ 0 ) with size of the region α ′ = P (θ ′ ) . As the test is most powerful P (θ 2 ) > P (θ 0 ) .
Therefore, for testing θ = θ 0 , the critical region defined by equation ( 2 ) can be used with size less than or equal to
α . The power of the test for nay alternative θ1 > θ0 is maximum and this is so for all alternatives greater than θ 0 .
Hence, the critical region given by ( 2 ) is a UMP for testing θ = θ 0 against θ > θ 0 .
n
⎛ 1 ⎞ ⎛ 1 2⎞
L ( x, µ ) = ⎜ ⎟ exp ⎜ − ∑ { xi − µ} ⎟
⎝ 2π ⎠ ⎝ 2 ⎠
L ( x | µ1 )
For µ1 > µ0 ,
L ( x | µ0 )
⎛ 1
(⎞
= exp ⎜ nx ( µ1 − µ0 ) + µ12 − µ02 ⎟
⎝ 2 ⎠
)
This is an increasing function of x . So there exist a UMP test for testing H 0 : µ = µ0 against H1 : µ ≥ µ0 .
f ( x ; θ ) = C (θ ) h ( x ) exp ⎡⎣ q (θ ) l ( x ) ⎤⎦
Example
Consider a random sample of size n from Poisson population with parameter µ , then we have
e − nµ µ ∑ i
x
L(x | µ) = ; xi = 0, 1, "
∏ xi !
= e − nµ (∏ xi !)
−1
exp ⎡ln µ ∑
⎣⎢ ( xi
)⎤⎦⎥
where
q ( µ ) = ln µ t ( x ) = ∑ xi
(∏ xi !)
−1
C ( µ ) = e − nµ h ( x) =
So there will exists a UMP test of size α for testing H 0 : µ = µ0 against H1 : µ > µ0 .
So P ⎡⎣T = ∑ xi ≥ k | H 0 ⎤⎦ = α
Randomized Test
A test γ of a hypothesis H is defined to be a randomized test if γ is defined by the function
H is toss a coin and reject H iff head appears, then γ is a randomized test.
Non-Randomized Test
Let a test γ of a statistical hypothesis H be defined as follows:
Reject H if and only if ( x1 , " , xn ) ∈ cr , where cr is a subset of the sample space χ ; then γ is called a non-
5
H : θ < 17 and the test γ : Reject H if and anly if x > 17 + , then γ is non-randomized and
n
⎧ 5 ⎫
cr = ⎨( x1 , " , x1 ) : x > 17 + ⎬.
⎩ n⎭
Theorem: Let x1 , " , xn be a random sample of size n from a p.d . f . f ( x ; θ ) which depends continuously is a single
parameter θ belongs to a parametric space Ω i.e. θ ∈ Ω . Let the likelihood function L ( x | θ ) have MLR in
T ( x ) = t ( x1 , " , xn ) . Then for testing H 0 : θ = θ 0 against H1 : θ > θ 0 there exists a UMP test φ ( x1 , " , xn ) of size
α given by
⎧1 ; if T ( x1 , " , xn ) > k
⎪
φ ( x1 , " , xn ) = ⎨γ ; if T ( x1 , " , xn ) = k
⎪
⎩0 ; if T ( x1 , " , xn ) < k
Proof
Since L ( x | θ ) has MLR in T ( x1 , " , xn ) for any θ1 > θ 0 and a constant k ,
⎧> k
L ( x | θ1 ) ⎪
⎨= k " " " (1)
L ( x | θ0 ) ⎪
⎩< k
⎧> c
⎪
is equivalent to T ( x1 , " , xn ) ⎨ = c for some constant c .
⎪< c
⎩
Hence by Neyman-Pearson lemma there exists a test
⎧1 ; if T ( x1 , " , xn ) > k
⎪
φ ( x1 , " , xn ) = ⎨γ ; if T ( x1 , " , xn ) = k
⎪
⎩0 ; if T ( x1 , " , xn ) < k
which is most powerful of size α for testing H 0 : θ = θ 0 against any simple alternative provided θ1 > θ 0 .
⎧1
⎪
Furthermore, for any pair (θ ′, θ ′′ ) with θ ′ ≤ θ ′′ the test φ ( x1 , " , xn ) = ⎨γ is most powerful for testing a simple
⎪0
⎩
H 0 : θ = θ ′ against a simple H 0 : θ = θ ′′ for size α . So, if we find the power for this test then it will be more
powerful. Therefore, there exist a UMP test for testing θ = θ 0 against θ = θ1 > θ 0 .
Example
Solution
n x2
⎡ 1 ⎤ − 12 ∑ i2
We have, L(x |σ ) = ⎢ ⎥ e σ
⎣⎢ 2πσ ⎥⎦
2
which is a non-decreasing function of ∑ xi2 for σ 2 > σ 1 . So there exist a MLR. So that
2 2
⎧1
⎪
; if ∑ xi2 > c
⎪
φ ( x1 , " , xn ) = ⎨γ ; if ∑ xi2 = c
⎪
⎪⎩0 ; if ∑ xi2 < c
where γ and c are constant.
∑ xi2 ≥ c | H 0 ⎤⎦ = α
P ⎡⎣
⎡ ∑ xi2 c ⎤
⇒ P⎢ ≥ | H0 ⎥ = α
⎢⎣ σ 0 σ 02 ⎥⎦
2
⎡ c ⎤
⇒ P ⎢χ 2 ≥ 2 | H0 ⎥ = α
⎣⎢ σ0 ⎦⎥
∑ xi2 ≥ c , then
c
Thus may be real from the table and c determined. If H 0 is rejected at the significance level
σ 02
α , otherwise H 0 is accepted.
c
= 18.307
σ 02
⇒ c = 18.307 × 2 = 36.614
Theorem: Let f ( x | θ ) be a continuous density function of a random variable x . If the likelihood function L ( x ; θ ) of n
independent observation is differentiable with respect to θ under the sign of integration, the derivative L′ ( x ;θ ) of
L ( x ; θ ) with regard to θ is everywhere continuous in θ and does not vanish identically the sub-space and for
testing a sample H 0 : θ = θ 0 defining the family of alternatives, there does not exist a UMP test for both negative
Proof
Let H1 : θ = θ1 be a simple alternatives. The likelihood functions under H 0 and H1 are L ( x | H 0 ) and L ( x | H1 )
L ( x | θ1 )
≥ k (θ1 ) within the C. R. " " " ( 2)
L ( x | θ0 )
Here k depends on α and sample size. But here fix k and we assume k will depend on θ1 only. Now from (1)
we have,
L ( x | θ1 ) L′ ( x | θ1′ )
= 1 + (θ1 − θ 0 )
L ( x | θ0 ) L ( x | θ0 )
L ′ ( x | θ1′ )
⇒ 1 + (θ1 − θ0 ) ≥ k (θ1 ) ⎡⎣by ( 2 ) ⎤⎦ " " " ( 3)
L ( x | θ0 )
When θ1 = θ 0 , we can write k (θ 0 ) = 1 . Therefore, we can expand k (θ1 ) about θ 0 using again by Taylor’s series
L′ ( x | θ1′ )
1 + (θ1 − θ 0 ) ≥ 1 + (θ1 − θ 0 ) k ′ (θ ′′ )
L ( x | θ0 )
⎡ L′ ( x | θ1′ ) ⎤
⇒ (θ1 − θ0 ) ⎢ − k ′ (θ ′′ ) ⎥ ≥ 0 " " " ( 5)
⎣⎢ L ( x | θ 0 ) ⎦⎥
If x denotes the point on the boundary of the BCR defined by equation ( 2 ) then,
L ( x | θ1 )
= k (θ1 )
L ( x | θ0 )
L′ ( x | θ1 )
So that , = k ′ (θ1 ) [by differentiating w. r. to θ1 ]
L ( x | θ0 )
L′ ( x | θ ′′ )
Similarly = k ′ (θ ′′ ) [by differentiating w. r. to θ1 ]
L ( x | θ0 )
⎡ L′ ( x | θ ′ ) L′ ( x | θ ′′ ) ⎤
⇒ (θ1 − θ0 ) ⎢ − ⎥≥0 " " " (6)
⎣⎢ L ( x | θ0 ) L ( x | θ 0 ) ⎦⎥
For the C. R. to be UMP must hold good for all θ . Therefore, ( 6 ) must be true identically for all values of θ1 , x
Since (θ1 − θ 0 ) can assume both positive and negative values and for all positive and negative values and for all
⎡ L′ ( x | θ ′ ) L ′ ( x | θ ′′ ) ⎤
these values ( 6 ) must hold good, the expression ⎢ − ⎥ must vanish within the BCR.
⎣⎢ L ( x | θ 0 ) L ( x | θ 0 ) ⎦⎥
L ( x | H1 )
< k (θ1 ) " " " (7)
L ( x | H0 )
reversed is true for both positive and negative values of (θ1 − θ0 ) outside the BCR and hence the expression
⎡ L′ ( x | θ ′ ) L′ ( x | θ ′′ ) ⎤
⎢ − ⎥ is zero outside the BCR also.
⎣⎢ L ( x | θ 0 ) L ( x | θ 0 ) ⎦⎥
L′ ( x | θ ′ ) L′ ( x | θ ′′ )
Thus, − = 0 throughout the sample space that is
L ( x | θ0 ) L ( x | θ0 )
L′ ( x | θ ′ ) L′ ( x | θ ′′ )
=
L ( x | θ0 ) L ( x | θ0 )
L′ ( x | θ 0 ) ∂ ln L ( x ;θ ) ⎤
= ⎥
L ( x | θ0 ) ∂θ ⎦θ =θ0
is a constant and this is the essential condition for the existence of a UMP test for the two sided alternatives. We
have,
∫ L ( x | θ ) dx = 1
S
∂ ln L ( x ;θ ) ⎤
∫ ⎥ L ( x | θ 0 ) dx = 0
∂θ ⎦θ =θ0
L′ ( x | θ 0 )
=0
L ( x | θ0 )
⇒ L ( x | θ0 ) = 0
Example
−( x −θ )
Let us consider f ( x | θ ) = e for testing H 0 : θ = θ 0 against two sided alternative.
L(x |θ ) = e ∑ i
− ( x −θ )
Here
⇒ ln L ( x | θ ) = − ∑ ( xi − θ )
∂ ln L ( x | θ )
⇒ =n
∂θ
sample is sufficient for µ . Therefore, the probability that x1 < µ1 is zero. Thus
L′ ( x | H 0 ) ⎧⎪∞ ; x1 < µ1
= ⎨ n( µ − µ )
L ( x | H0 ) ⎪⎩e 0 1 ; otherwise
n( µ0 − µ1 )
That is e ≤k " " " (1)
Determine the BCR where k is so chosen as to make its size equal to α . The left hand side of (1) is a constant
Hence (1) will be satisfied by every C. R. of size α with x1 ≥ µ1 . Thus every such C. R. is of equal power and is
therefore a BCR.
⎧∞ ; µ0 ≤ x1 < µ1
⎪ n( µ0 − µ1 )
L ( x | H0 ) ⎪e <1 ; x1 ≥ µ1 > µ0
=⎨
L ( x | H1 ) ⎪e
n ( µ0 − µ1 )
>1 ; x1 ≥ µ0 > µ1
⎪
⎩ 0 ; µ1 ≤ x1 < µ0
( x1 − µ0 ) < 0 , ( x1 − µ0 ) > c1
When H 0 hold probability that ( x1 − µ0 ) < 0 is zero, the value of c1 is so chosen as to satisfy the condition
P {( x1 − µ0 ) > c1 | H 0 } = α
This C. R. is BCR for all alternatives µ1 ≠ µ0 and is therefore UMP with respect to these alternatives.
Example
Solution
If H1 : θ = θ1 , σ = σ 1 is any simple alternative then
L ( x | H1 ) ⎡
1 ⎧⎪ ∑ ( xi − θ1 ) − ∑ ( xi − θ0 ) ⎫⎪⎤
n 2 2
⎛σ ⎞
= ⎜ 0 ⎟ exp ⎢ − ⎨ ⎬⎥ ≥ k
L ( x | H0 ) ⎝ σ1 ⎠ ⎢ 2⎪ σ 12 σ 02 ⎥
⎣ ⎩ ⎭⎪⎦
⎞ ( x − θ1 ) ( x − θ0 ) 2 ⎡ 1 ⎛ σ1 ⎞ ⎤
n
⎛ 1
2 2
1
S2 ⎜ 2 − 2 ⎟⎟ + − ≤ ln ⎢ ⎜ ⎟ ⎥
⎜σ n ⎢ k ⎝ σ0 ⎠ ⎥
⎝ 1 σ0 ⎠ σ12 σ 02 ⎣ ⎦
⎛ 1 1 ⎞ ⎛ 1 1 ⎞ ⎛θ θ ⎞
⇒ S 2 ⎜ 2 − 2 ⎟ + x 2 ⎜ 2 − 2 ⎟ + 2 x ⎜ 02 − 12 ⎟ ≤ constant
⎜σ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 1 σ0 ⎠ ⎝ σ1 σ 0 ⎠ ⎝ σ 0 σ1 ⎠
( )
⎛ 1 1 ⎞ 2 ⎡ θ0σ12 − θ1σ 02 ⎤
⎜⎜ 2 − 2 ⎟⎟ S + { x − δ } ≤ constant ⎢δ =
2
⇒ ⎥
⎝ σ1 σ 0 ⎠ ⎣⎢ σ 02σ12 ⎦⎥
⇒ (σ 2
0 − σ12 )∑(x −δ ) i
2
≤ constant
This means that if σ 0 > σ 1 , the BCR is bounded by a hyper sphere centered at (δ , " , δ ) where δ itself is
dependent on H1 . When σ 1 > σ 0 , the BCR lies outside this sphere. In both cases the BCR changes with the
alternative.
Therefore, there does not exist ant UMP test for any set of alternatives.
Example
Examine for what values of λ there exists a UMP test for H 0 : µ = µ0 , λ = λ0 in the distribution
1
1 − ( x−µ )
f ( x ; µ, λ ) = e λ ; µ≤x≤∞
λ
Solution
∑ xi then
1
If H1 : µ = µ1 , λ = λ1 is any simple hypothesis and if x =
n
L ( x | H1 )
n
⎛λ ⎞ ⎡ ⎧ x − µ0 x − µ1 ⎫⎤
= ⎜ 0 ⎟ exp ⎢ n ⎨ − ⎬⎥
L ( x | H0 ) ⎝ λ1 ⎠ ⎣⎢ ⎩ λ0 λ1 ⎭⎦⎥
n
⎛λ ⎞ ⎡ ⎧1 1⎫ ⎛ µ µ ⎞⎤
= ⎜ 0 ⎟ exp ⎢ nx ⎨ − ⎬ + n ⎜ 1 − 0 ⎟ ⎥
⎝ λ1 ⎠ ⎢⎣ ⎩ λ0 λ1 ⎭ ⎝ λ1 λ0 ⎠ ⎥⎦
⎧1 1⎫
x ⎨ − ⎬ ≥ Constant
⎩ λ0 λ1 ⎭
Thus, UMP tests exist separately for λ1 > λ0 and λ1 < λ0 irrespective of the value of µ1 .