Вы находитесь на странице: 1из 118

STATISTICAL INFERENCE

CLASS NOTE

SANGKAR ROY
JAHANGIRNAGAR UNIVERSITY,BANGLADESH.
EMAIL:sankar1604@gmail.com

Estimation ~ 1 of 22
Estimation

Sufficient Statistic: Let X1 , X 2 , , Xn be a random sample from the density f (⋅ ; θ ) , where θ may

be a vector. A statistic T = t ( X1 , X 2 , , Xn ) is defined to be a sufficient statistic if and only if the

conditional distribution of X1 , X 2 , , Xn given T=t does not depend on θ for any value t of T .
Note: This definition of a sufficient statistic is not very workable. First, it does not tell us which
statistic is likely to be sufficient and second, it requires us to derive a conditional distribution which
may not be easy, especially for continuous random variables. For this reason we may use
Factorization Criterion that may aid us in finding sufficient statistics.
Another Definition: Let X1 , X 2 , , Xn be a random sample from the density f (⋅ ; θ ) . A statistic

T = t ( X1 , X 2 , , Xn ) is defined to be a sufficient statistic if and only if the conditional distribution of


S given T does not depend on θ for any statistic S = s ( X 1 , X 2 , , Xn ) .

Note: This definition is particularly useful in showing that a particular statistic is not sufficient. For instance,
to prove that a statistic T ′ = t ′ ( X 1 , X 2 , , X n ) is not sufficient, one needs only to find another statistic

T = t ( X1 , X 2 , , X n ) for which the conditional distribution of T given T ′ depends on θ .

Jointly Sufficient Statistics: Let X1 , X 2 , , X n be a random sample form the density f ( ⋅ ; θ ) . The

statistics T1 , , Tr are defined to be jointly sufficient if and only if the conditional distribution of

X1 , X 2 , , X n given T1 = t1 , , Tr = tr does not depend on θ .

Concept of Sufficient Statistic: A sufficient statistic is a particular kind of statistic. It is a statistic that
condenses Ω is such a way that no “information about θ ” is lost. The only information about the parameter
θ in the density f ( ⋅ ; θ ) from which we sampled is contained in the sample X 1 , X 2 , , X n ; so, when we say

that a statistic loses no information, we mean that it contains all the information about θ that is contained in
the sample. We emphasize that the type of information of which we are speaking is that information about θ
contained in the sample given that we know the form of the density; that is, we know the function f ( ⋅ ; ⋅) in

f ( ⋅ ; θ ) , and the parameter θ is the only unknown. We are not speaking of information in the sample that

might be useful in checking the validity of out assumption that the density does indeed have form f ( ⋅ ; ⋅) .

Example: Let X1 , X 2 , X 3 be a sample of size 3 from the Bernoulli distribution. Consider the two statistics
S = s ( X 1 , X 2 , X 3 ) = X 1 + X 2 + X 3 and T = t ( X 1 , X 2 , X 3 ) = X1 X 2 + X 3 . We have to show that s ( ⋅, ⋅, ⋅) is

sufficient and t ( ⋅, ⋅, ⋅) is not.

Values of Values of f ( x1 , x2 , x3 | S ) f ( x1 , x2 , x3 | T )
S T

Estimation ~ 2 of 22
1− p
( 0, 0, 0 ) 0 0 1
1+ p
1− p
( 0, 0, 1) 1 1 1
3 1+ 2 p
p
( 0, 1, 0 ) 1 0 1
3 1+ p
p
(1, 0, 0 ) 1 0 1
3 1+ p
p
( 0, 1, 1) 2 1 1
3 1+ 2 p
p
(1, 0, 1) 2 1 1
3 1+ 2 p
p
(1, 1, 0 ) 2 1 1
3 1+ 2 p
(1, 1, 1) 3 2 1 1

Now, we have f x1 , x2 , x3 |S =1 ( 0, 1, 0 |1) = P [ X1 = 0; X 2 = 1; X 3 = 0 | S = 1]


P [ X 1 = 0; X 2 = 1; X 3 = 0; S = 1]
=
P [ S = 1]
(1 − p ) p (1 − p ) 1
= =
⎛ 3⎞ 3
⎜ ⎟ p (1 − p )
2
1
⎝ ⎠

P [ X1 = 0; X 2 = 1; X 3 = 0; T = 0]
and f x1 , x2 , x3 |T = 0 ( 0, 1, 0 | 0 ) =
P [T = 0]

(1 − p )2 p
=
(1 − p )3 + 2 p (1 − p )2
p p
= =
1− p + 2 p 1+ p

The conditional distribution of the sample given the values of S is independent of p ; so S is a sufficient
statistic. However, the conditional distribution of the sample given the values of T depends on p ; so T is
not sufficient.
Factorization Theorem (Single Sufficient Statistic): Let X1 , X 2 , , X n be a random sample of size n

from the density f ( ⋅ ; θ ) , where the parameter θ may be a vector. A statistic T = t ( X1 , X 2 , , X n ) is

n
sufficient if and only if the joint density of X 1 , X 2 , , X n , which is ∏ f ( xi ;θ ) , factors as
i =1

f x1 , , xn { x1 , , xn ;θ } = g {t ( x1 , x2 , , xn ) ;θ } ⋅ h{ x1 , x2 , , xn }
= g {t ;θ } ⋅ h{ x1 , x2 , , xn }
where the function h{ x1 , x2 , , xn } is nonnegative and does not involve the parameter θ and the function
g {t ( x1 , x2 , , xn ) ;θ } is nonnegative and depends on x1 , , xn only through the function t ( ⋅, , ⋅) .

Estimation ~ 3 of 22
Factorization Theorem (Jointly Sufficient Statistics): Let X1 , X 2 , , X n be a random sample of size

n from the density f (⋅ ; θ ) , where the parameter θ may be a vector. A statistics

T1 = t1 ( X 1 , X 2 , , Xn ), , Tr = tr ( X 1 , X 2 , , X n ) is jointly sufficient if and only if the joint density of

n
X1 , X 2 , , X n , which is ∏ f ( xi ;θ ) , can be factored as
i =1

f x1 , , xn { x1 , , xn ;θ } = g {t1 ( x1 , x2 , , xn ) , , tr ( x1 , x2 , , xn ) ;θ } ⋅ h{ x1 , x2 , , xn }
= g {t1 , , tr ;θ } ⋅ h{ x1 , x2 , , xn }

where the function h{ x1 , x2 , , xn } is nonnegative and does not involve the parameter θ and the function

g {t1 , , tr ;θ } is nonnegative and depends on x1 , , xn only through the function t1 ( ⋅, , ⋅) , , tr ( ⋅, , ⋅) .

N.B.: To get more about this topic, see Mood, Graybill, Boes; Introduction to the Theory of Statistics, P-300-
311.
Efficient Estimator: If x1 , x 2 , , x n be a sample drawn from the population with density f (x ;θ ) and t be

a unbiased consistent estimator of θ . If the variance of t is less than the variance of all other estimators, then
t is said to be the most efficient estimator of θ , simply called efficient estimator of θ . The efficiency of an
estimator can be written as
Var ( Most efficient estimator )
c=
Var ( Given estimator )

Regular Distribution: The joint p.d . f . of X ’s is said to be regular with respect to its first θ derivative,
where X ~ f (x ;θ ) θ ∈ Ω i.e.,

∫ f (x ;θ )dx = 1
−∞

∂f (x ;θ )

⇒ ∫ ∂θ
dx = 0
−∞

∂f (x ;θ )

⇒ ∫ ∂θ ⋅ f (x ;θ ) f (x ;θ )dx = 0
−∞

∂ ln f (x ;θ )

⇒ ∫ f (x ;θ )dx = 0
−∞
∂θ
This is called regular distribution.
Regular Estimator and Regularity Condition: Let X1 , , X n be a random variables having the joint

p.d . f . fθ ( x1 , , xn ) , θ ∈ Θ . if the statistic t ( X 1 , , X n ) is such that fθ {t ( X 1 , , X n )} = ψ (θ ) ∀ θ , and if

the following regularity conditions hold then the statistic t ( X1 , , X n ) is known as the regular estimator of

θ ∈Θ .
i) θ lies in a non-degenerate open interval Θ in the real line; Θ may be infinite;
∂fθ ( x )
ii) exists ∀ θ ∈ Θ ;
∂θ
iii) ∫ fθ ( x ) dx can be differentiated with respect to θ under the integral sign;
iv) ∫ t ( x ) fθ ( x ) dx can be differentiated under the integral sign;
Estimation ~ 4 of 22
2
⎡ ∂ ln fθ ( x ) ⎤
v) Eθ ⎢ ⎥ Exists and is positive ∀ θ ∈ Θ .
⎣ ∂θ ⎦
Best Regular Unbiased Estimator (BRUE): In any regular estimation case, the efficiency of an
unbiased regular estimator tn ( X1 , , X n ) is

1
⎧⎪ ⎛ ∂ ln f ( X | θ ) ⎞ ⎫⎪
2

⎨ θ⎜
nE ⎟ ⎬
⎪⎩ ⎝ ∂θ ⎠ ⎪⎭
eθ ( tn ) =
Varθ ( tn )

If eθ ( tn ) ≡ 1 , then tn is called efficient and a Best Regular Unbiased Estimator (BRUE).

Note: In any regular estimation case, 0 ≤ eθ ( tn ) ≤ 1 . We have eθ ( tn ) ≡ 1 iff Varθ ( tn ) achieves the lower

bound for all θ . In any regular estimation case, the asymptotic efficiency of an unbiased regular estimator
tn ( X 1 , , X n ) is lim eθ ( tn ) .
n →∞

N.B.: From the Chapter Consistency and Efficient Estimator of Third Year Note, we have to read Example
of efficient and sufficient estimator, Fisher’s Information, Raw-Cramer Inequality and others.
Generalized Rao-Cramer Inequality: See in the chapter of Asymptotically Most Efficient Estimator of
Third Year Note. (Ref. Kendal, Stuart; the Advanced Theory of Statistics, P-12)
Bhattacharyya Inequality: See in the chapter of Asymptotically Most Efficient Estimator of Third Year
Note. (Ref. Kendal, Stuart; The Advanced Theory of Statistics, P-12-15)
Chapman, Robbins and Kiefer Inequality: This inequality gives a lower bound for the variance of an
estimate but does not require regularity conditions like Rao-Cramer Inequality.
Statement: Suppose that X = ( x1 , x2 , , xn ) be random variables with joint density or frequency function

f ( x ;θ ) , where θ is a one dimensional parameter belongs to parametric space Ω . Let T be an unbiased

( )
estimate of τ (θ ) with Eθ T 2 < ∞ for all θ ∈ Ω . If θ ≠ ψ , assume that fθ and fψ are different and assume

that there exists a ψ ∈ Ω such that θ ≠ ψ and S (θ ) = { fθ ( x ) > 0} ⊃ S (ψ ) = { fψ ( x ) > 0} then


2
⎡⎣τ (ψ ) − τ (θ ) ⎤⎦
Varθ (T ( X ) ) ≥ sup ∀ θ ∈Ω
{ψ :S (ψ )⊂ S (θ ), ψ ≠θ } ⎪⎧ fψ ( X ) ⎪⎫
Varθ ⎨ ⎬
⎩⎪ fθ ( X ) ⎭⎪

Proof: Since T is unbiased for τ (θ ) , Eψ (T ( X ) ) = τ (ψ ) ∀ ψ ∈ Ω . Hence, for ψ ≠ θ

⎡ fψ ( x ) − fθ ( x ) ⎤
∫ T ( x ) ⎢⎣⎢ fθ ( x ) ⎥⎦⎥ fθ ( x ) dx = τ (ψ ) − τ (θ )
S (θ )

Which gives

Estimation ~ 5 of 22
⎧⎪ ⎡ fψ ( X ) ⎤ ⎫⎪
Covθ ⎨T ( X ) , ⎢ − 1⎥ ⎬ = τ (ψ ) − τ (θ )
⎩⎪ ⎢⎣ fθ ( X ) ⎥⎦ ⎭⎪
⎧⎪ fψ ( X ) ⎫⎪ ⎧⎪ fψ ( X ) ⎫⎪
⇒ E {T ( X ) − τ (θ )} ⎨ − 1⎬ = τ (ψ ) − τ (θ ) Since E ⎨ − 1⎬ = 0
⎩⎪ fθ ( X ) ⎭⎪ ⎩⎪ fθ ( X ) ⎭⎪
Since ρ2 ≤1
⎧⎪ ⎡ fψ ( X ) ⎤ ⎫⎪ ⎧⎪ fψ ( X ) ⎫⎪
⇒ Covθ2 ⎨T ( X ) , ⎢ − 1⎥ ⎬ ≤ V {T ( X )} ⋅ V ⎨ − 1⎬
⎩⎪ ⎣⎢ θ ( ) ⎦⎥ ⎭⎪
f X ⎩⎪ fθ ( X ) ⎭⎪
⎧⎪ ⎡ fψ ( X ) ⎤ ⎫⎪ ⎧⎪ fψ ( X ) ⎫⎪
⇒ Covθ2 ⎨T ( X ) , ⎢ − 1⎥ ⎬ ≤ V {T ( X )} ⋅ V ⎨ ⎬
⎪⎩ ⎢⎣ fθ ( X ) ⎥⎦ ⎪⎭ ⎪⎩ fθ ( X ) ⎪⎭
⎧⎪ fψ ( X ) ⎫⎪
⎡⎣τ (ψ ) − τ (θ ) ⎤⎦ ≤ V {T ( X )} ⋅ V ⎨
2
⇒ ⎬
⎪⎩ fθ ( X ) ⎪⎭
2
⎡τ (ψ ) − τ (θ ) ⎤⎦
⇒ V {T ( X )} ≤ ⎣ ( Proved )
⎪⎧ fψ ( X ) ⎪⎫
V⎨ ⎬
⎩⎪ fθ ( X ) ⎭⎪
Example: Let X be U [ 0, θ ] . Then
⎧1
⎪ if 0 ≤ x ≤ θ
fθ ( x ) = ⎨θ
⎪⎩0 otherwise
2
⎡ ∂ ln fθ ( X ) ⎤ 1 θ2
Thus we get Eθ ⎢ ⎥ = 2 , so that the lower bound of the Rao-Cramer inequality is . Hence we
⎣ ∂θ ⎦ θ n

can say that the regularity condition do not hold.


Let ψ (θ ) = θ . If ψ < θ , then S (ψ ) ⊂ S (θ ) . Also,
2
⎡ fψ ( X ) ⎤
2
ψ ⎛θ ⎞ 1
Eθ ⎢ ⎥ =
⎣⎢ fθ ( X ) ⎦⎥
∫0 ⎜ ⎟ dx
⎝ψ ⎠ θ
θ θ
= [ x ]ψ =
ψ2 0 ψ
Thus
2
⎡τ (ψ ) − τ (θ ) ⎦⎤
Varθ {T ( X )} ≥ sup ⎣
{ψ : ψ <θ } ⎧⎪ fψ ( X ) ⎫⎪
V⎨ ⎬
⎩⎪ fθ ( X ) ⎭⎪

≥ sup
[ψ − θ ]2
{ψ : ψ <θ } θ − 1
ψ
≥ sup {ψ (θ −ψ )}
{ψ : ψ <θ }
Now, let us take
ψ (θ −ψ ) θ +1
K (ψ ) = >1 iff ψ <
(ψ − 1)(θ −ψ + 1) 2

θ +1 θ +1
Therefore, K (ψ ) increases as long as ψ < and decreases if ψ > . K (ψ ) attains maximum value if
2 2
θ +1
ψ = .
2

Estimation ~ 6 of 22
⎧θ + 1 ⎛ θ + 1 ⎞⎫
∴ Varθ {T ( X )} ≥ sup {ψ (θ −ψ )} =⎨ ⎜θ − ⎟⎬
{ψ : ψ <θ } ⎩ 2 ⎝ 2 ⎠⎭
θ2
So, Varθ {T ( X )} ≥
4
This is the lower bound for any unbiased estimate T ( X ) of θ .

Now, X is a complete sufficient statistic and 2 X is unbiased for θ so that T ( X ) = 2 X is the UMVUE. Also
θ2 θ2
Varθ {2 X } = 4Var ( X ) = >
3 4
2
θ
Thus the lower bound of the Chapman, Robbins and Kiefer (CRK) inequality is not achieved by any
4
unbiased estimate of θ .
Example: Let X have p.m. f .
⎧1
⎪ ; if k = 1, 2, ,N
PN { X = k} = ⎨ N
⎪⎩0 ; Otherwise

Let Ω = { N : N ≥ M , M > 1 given} . Takingψ ( N ) = N . The p.m. f . does not hold regularity conditions, so CRK

inequality is applicable. Now for N ≠ N ′ ∈ Ω


S ( N ) = {1, 2, , N } ⊃ S ( N ′ ) = {1, 2, , N ′} ; if N ′ < N

Also, PN and PN ′ are different for N ≠ N ′ . Thus

( N − N ′ )2
VarN (T ) ≥ sup
N ′< N ⎧P ⎫
VarN ⎨ N ′ ⎬
⎩ PN ⎭

⎧N
PN ′ PN ′ ( x ) ⎪ ; x = 1, 2, , N ′, N ′ < N
Now, ( x) = = ⎨N′
PN PN ( x ) ⎪
⎩0 ; Otherwise
2
⎪⎧ P ′ ( x ) ⎪⎫ 1 ⎪⎧ P ′ ( x ) ⎪⎫
N′ N′ 2
⎛N ⎞ 1 ⎛N ⎞
∑ ∑ ⎜⎝ N ′ ⎟⎠
N
EN ⎨ N ⎬= ⎜ ′⎟ =1 and EN ⎨ N ⎬ = =
⎩⎪ PN ( x ) ⎭⎪ N 1 ⎝N ⎠ ⎩⎪ PN ( x ) ⎭⎪ N 1 N′

⎪⎧ P ′ ( x ) ⎪⎫ N
∴ VarN ⎨ N ⎬= −1 > 0 for N > N ′
⎩⎪ PN ( x ) ⎭⎪ N ′

It follows that

( N − N ′ )2
VarN (T ) ≥ sup
N ′< N N
−1
N′
≥ sup ⎡⎣ N ′ ( N − N ′ ) ⎤⎦
N ′< N

Now, let us take


k (N − k) N +1
>1 iff k<
( k − 1)( N − k + 1) 2

N +1 N +1
Therefore, N ′ ( N − N ′ ) increases as long as N ′ < and decreases if N ′ > . The maximum is achieved
2 2
N +1
at N ′ = .
2
Estimation ~ 7 of 22
⎡ N +1⎛ N + 1 ⎞⎤
∴ VarN (T ) ≥ ⎢ ⎜N − ⎟
⎣ 2 ⎝ 2 ⎠ ⎥⎦
N +1
≥ M (N − M ) if M >
2
N.B.: Reference Rohatgi V K; An Introduction to Probability Theory and Mathematical Statistics, p-365.
And Rohatgi, Saleh; an Introduction to Probability and Statistics, p-397.

Uniformly Minimum Variance Unbiased Estimator (UMVUE): Let X1 , X 2 , , X n be a random

sample from f ( x, θ ) . An estimator T * = t * ( X1 , , X n ) of τ (θ ) is defined to be a uniformly minimum

variance unbiased estimator of τ (θ ) if and only if

a) T * is unbiased for τ (θ )

( )
b) For any other estimator T of τ (θ ) will be V T * ≤ V (T ) for all θ ∈ Ω .

Concept of Raw-Blackwell Theorem: A very powerful method for finding minimum variance estimator
irrespective whether MVB is attained or not is provided by a theorem known as Rao-Blackwell Theorem.
This theorem says that if we look for an MVE of τ (θ ) , we need only inspect estimators which are function of

sufficient statistic. This theorem says that any unbiased estimator should be a function of sufficient statistic. If
not we can construct an estimator with smaller variance by taking the conditional expectation given a
sufficient statistic. However, this raises the question of which sufficient statistic to use to compute the
conditional expectation. For example, suppose that S is an unbiased estimator of τ (θ ) , with finite variance

and let T1 & T2 are both sufficient statistic for θ , with T2 = h (T1 ) for some function of h . Let us define

S1* = E ( S | T1 ) and S2* = E ( S | T2 )

By the Rao-Blackwell theorem the variance of S1* & S 2* can not exceed V ( S ) . However, it is not obvious

which estimator will have the smaller variance.


Statement of Rao-Blackwell Theorem: Suppose that S has a joint distribution depending on some
unknown parameter θ and that T = T ( x ) is a sufficient statistic for θ . Let S = S ( x ) be a statistic such that

E ( S ) = τ (θ ) and if S * = E ( S | T ) then

a) S * is a unbiased estimator of τ (θ )

( ) ( )
b) V S * ≤ V ( S ) . Moreover, V S * < V ( S ) unless P S = S * = 1 . ( )
N.B: For more on see the Chapter of Asymptotically Most Efficient Estimator of Third Year Note.

Minimal Sufficient Statistics: Suppose X1 , X 2 , (


, Xn ~ N θ, σ 2 ) where both θ and σ 2 are unknown,

then joint sufficient statistic are x and s 2 . We have another set of sufficient statistic
X1 , X 2 , , X n ⇒ Y1 < Y2 < < Yn . Now these sufficient statistic condense the data.

Estimation ~ 8 of 22
A set of sufficient statistic is minimal if no other set of sufficient statistic condenses the data more. A set of
jointly sufficient statistic is define to be minimal sufficient iff it is a function of every other set of sufficient
statistic. That is among a number of sufficient statistic we should choose one t0 (say) which condenses the
data more than any other sufficient statistics. Then t0 is minimal sufficient statistic.

A statistic T ( x1 , x2 , , xn ) is a minimal sufficient statistic if T ( ⋅) is a sufficient statistic and a function of

every other sufficient statistic that is


T ( x1 , x2 , , xn ) = ψ {t ( x1 , x2 , , xn )} ∀ sufficient statistic t

Sufficient statistic always exists but minimal sufficient statistic may not always exists.
Way of Finding Minimal Sufficient Statistic: Let f ( x ; θ ) be the p.d . f . of X and suppose that there
f (x ; θ )
exists a function T ( x ) such that for any two points x and y the ratio is independent of θ iff
f (y ;θ)

t ( x ) = t ( y ) then T ( x ) is a minimal sufficient statistic for θ .

L(x ; θ )
If likelihood ratio is independent of θ when t ( x ) = t ( y ) for some sufficient statistic t ( ⋅) and
L( y ; θ )

L(x ; θ ) g (t ; θ ) h ( x ) h ( x)
= = then T ( x ) is a minimal sufficient statistic for θ .
L( y ; θ ) g (t ; θ ) h ( y ) h( y)

Example: Suppose x1 , x2 , , xn are independent random variables each with N θ , σ 2 , where θ is ( )


unknown and σ 2 is known. Find a minimal sufficient statistic for θ .
Solution: Here we have,
1
⎛ 1 ⎞ − 2σ 2 ∑ ( xi −θ )
n 2

L(x |θ ) = ⎜ ⎟ e
⎝ σ 2π ⎠
⎛ 1 ⎞ − 2σ 2 ∑ xi − 2σ 2 ( −2θ ∑ xi + nθ )
n 1 2 1 2

=⎜ ⎟ e e
⎝ σ 2π ⎠
By Neyman Factorization theorem we can say that x is sufficient for θ . Similarly

⎛ 1 ⎞ − 2σ 2 ∑ yi − 2σ 2 ( −2θ ∑ yi + nθ )
n 1 2 1 2

L( y |θ ) = ⎜ ⎟ e e
⎝ σ 2π ⎠
θ
L(x |θ ) −
1
2σ 2
( ∑ x −∑ y )
2
i
2
i −
σ2
( ∑ xi − ∑ yi )
=e ⋅e
L( y |θ )

Which will be independent of θ iff

∑ xi = ∑ yi
⇒ t ( xi ) = t ( yi )

Hence t ( x1 , x2 , , xn ) = ∑ xi is a minimal sufficient statistic for θ .

Example: Suppose we have n = 2 independent observation from the Cauchy distribution with p.d . f .
1 1
f X ( x) = ⋅ ; −∞ < x < ∞
π 1 + ( x − θ )2

Show that no nontrivial sufficient statistic exists.


Estimation ~ 9 of 22
Solution: Since n = 2 , we consider two points x = ( x1 , x2 ) and y = ( y1 , y2 ) . Hence we get,
1 1 1
L(x |θ ) = ⋅
1 + ( x1 − θ ) 1 + ( x2 − θ )
2 2 2
π
1 1 1
Similarly, L( y |θ ) = ⋅
1 + ( y1 − θ ) 1 + ( y2 − θ )
2 2 2
π
1

Thus we have,
L(x |θ )
=
{1 + ( x − θ ) }{1 + ( x
1
2
2 −θ )
2
}
L( y |θ ) 1
{1 + ( y − θ ) }{1 + ( y − θ ) }
1
2
2
2

=
{1 + ( y − θ ) }{1 + ( y − θ ) }
1
2
2
2

{1 + ( x − θ ) }{1 + ( x − θ ) }
1
2
2
2

L(x |θ )
Since is depends on θ , so hence we cannot get the minimal sufficient statistic.
L( y |θ )

Example: Let X be a single observation from the point probability function


⎧θ
⎪2 if x = −3

⎪θ if x=0
⎪3

⎪1 − 2θ 1
f (x |θ ) = ⎨ if x = 6, 13, 52 where 0 < θ <
⎪ 3 2
⎪ 2 θ
⎪θ + 6 if x = 60

⎪θ − θ 2 if x = 68
⎪⎩
Find a minimal Sufficient statistic for θ .
Solution: We know at least one sufficient statistic always exists, namely the identity statistic
t ( x1 , x2 , , xn ) = x1 , x2 , , xn . Hence X itself is a sufficient statistic for θ . For finding minimal Sufficient

L(x |θ )
statistic, is independent of θ . So there will exists a minimal sufficient statistic.
L( y |θ )

Thus we can partition the sample space into the sets {−3, 0} , {6, 13, 52} , {60} , {68} and a minimal sufficient

statistic is
⎧c1 if x = −3 or 0
⎪c x = 6 or 13 or 52
⎪ 2 if
t ( x) = ⎨
⎪c3 if x = 60
⎪⎩c4 if x = 68
where c1 , c2 , c3 and c4 are distinct constants.

The probability distribution of t ( x ) is

⎧ 5θ
⎪6 if w = c1

⎪1 − 2θ if w = c2
P {t ( x = w )} = ⎨
⎪θ 2 + θ if w = c3
⎪ 6
⎪ 2
⎩θ − θ if w = c4

Estimation ~ 10 of 22
Example: Let X be a single observation from the probability function
⎧θ 2 if x = −1, 3

⎪1 θ 2
⎪2 − 2 if x=0

P(x |θ ) = ⎨ θ 2 θ
⎪− + if x = 2, 4
⎪ 2 2
⎪1 θ 2
⎪ − −θ if x =1
⎩2 2

where θ is an unknown number between zero and 2 − 1 . Find a minimal sufficient statistic for θ .

Solution: Here we partition the sample space into the sets {−1, 3} , {0} , {2, 4} , {1} . Hence the minimal
sufficient statistic is
⎧c1 if x = −1 or 3
⎪c x=0
⎪ 2 if
t ( x) = ⎨
⎪c3 if x = 2 or 4
⎪⎩c4 if x =1

Where c1 , c2 , c3 , c4 are distinct constants.

Best Asymptotically Normal Estimator (BAN Estimator): A sequence of estimator T1′, T2′, , Tn′ of

τ (θ ) is defined to be best asymptotically normal ( BAN ) if and only if the fallowing four conditions are

satisfied:
a) The Distribution of { }
n ⎡⎣Tn′ − τ (θ ) ⎤⎦ approaches N 0, σ ′2 (θ ) as n → ∞

b) For every ε > 0 , lim Pθ ⎣⎡ Tn′ − τ (θ ) > ε ⎦⎤ = 0 for each θ ∈ Ω


n →∞

c) Let {Tn } be any other sequence of simple consistent estimators for which the distribution of

n ⎡⎣Tn′ − τ (θ ) ⎤⎦ approaches N ⎡⎣0, σ 2 (θ ) ⎤⎦

d) σ 2 (θ ) is not less than σ ′2 (θ ) for all θ in any open interval.

BAN is sometimes replaced by consistent asymptotically normal efficient (CANE).

Example: Let x1 , x2 , ( )
, xn be a random sample from N µ , θ 2 . Then Tn′ =
∑ xi = xn is a BAN estimator of
n

µ . Since the limiting distribution of ( )


n [ xn − µ ] is N 0, σ 2 and no other estimator can have smaller limiting

variance is any interval of µ values.

Best Consistent Unbiased Asymptotically Normal (BCUAN): A CAN estimator TN of g (θ ) is said


to be the Best Consistent Unbiased Asymptotically Normal Estimator if it is unbiased and the variance of the
limiting distribution of n ⎡⎣Tn − g (θ ) ⎤⎦ has least possible value.

Completeness: The family of density or probability functions f ( x | θ ) , θ ∈ Ω , is called complete if, for

every function u ( x ) , the identity Eθ {u ( X )} = 0 implies Pθ {u ( X ) = 0} = 1 for all θ ∈ Ω . This is sometimes

express by saying that there are no unbiased estimators of zero. In particular it means that two different
function of T can not have the same expected value. For exmple
Estimation ~ 11 of 22
E {T ( X )} = θ and E { K ( X )} = θ
∴ E {T ( X ) − K ( X )} = 0
⇒ T (X )− K (X ) = 0

That is any unbiased estimator is unique. In this sense, we are primarily interested in knowing that the family
of density function of a sufficient statistic is complete, since in that case an unbiased function of the sufficient
statistic will be unique and it must be a uniformly minimum variance unbiased estimator by the Rao-
Blackwell theorem.
x2
1 −
Example: Suppose f ( x ) = e 2 ; − ∞ < x < ∞ . Check it whether it is complete or not.

Solution: Let us consider the function ϕ ( x ) = x . Now,
∞ x2
1 −
E {ϕ ( x )} = E ( x ) = ∫ x⋅e 2 dx
2π −∞
0 x2 ∞ x2
1 − 1 −
=
2π ∫ −x ⋅ e 2 dx +
2π ∫ x⋅e 2 dx

−∞ 0
∞ x2 ∞ x2
1 − 1 −
=−
2π ∫ x⋅e 2 dx +
2π ∫ x⋅e 2 dx

0 0

∴ E {ϕ ( x )} = 0
But ϕ ( x ) is non-zero. Hence f ( x ) is not complete.
x2

1
Example: Let f ( x ) = e 2β 2
; − ∞ < x < ∞ ; β > 0 . Check it whether it is complete or not.
β 2π
Solution: Let us consider the function ϕ ( x ) = x . Now,

∞ x2

1
E {ϕ ( x )} = E ( x ) = ∫ x⋅e
2β 2
dx
β 2π −∞

0 x2 ∞ x2
− −
1 1
∫ −x ⋅ e ∫ x⋅e
2β 2 2β 2
= dx + dx
β 2π −∞
β 2π 0

∞ x2 ∞ x2
− −
1 1
∫ x⋅e ∫ x⋅e
2β 2 2β 2
=− dx + dx
β 2π 0
β 2π 0

∴ E {ϕ ( x )} = 0
But ϕ ( x ) is non-zero. Hence f ( x ) is not complete.

Example: Let x1 , x2 , , xn be independent random variable each with P ( λ ) ; λ > 0 . Find a minimal

sufficient statistic for λ and check if it is complete.


Solution: The joint distribution of x1 , x2 , , xn is

e− nλ λ ∑ ⎛ n 1 ⎞
( )
xi
L(x ; θ ) = = ⎜ ∏ ⎟ e − nλ λ ∑ i
x
n ⎜ x ! ⎟
∏ xi ! ⎝ i =1 i ⎠
i =1

By the Neyman Factorization theorem, we can say that ∑ xi is a sufficient statistic. Now,

Estimation ~ 12 of 22
e − nλ λ ∑
xi
n

L(x | λ)
∏ xi !
i =1
=
L( y | λ) e − nλ
λ∑ i
y
n
∏ yi !
i =1

which is independent of λ if ∑ xi = ∑ yi . So that ∑ xi is a minimal sufficient statistic. The distribution of

∑ xi is also Poisson and has a p.m. f .

e− nλ ( nλ )
x
; x = 0, 1, 2,
x!

Hence showing that the family of probability functions of the minimal sufficient statistic ∑ xi is complete is

equivalent to showing that the Poisson family is complete.


Let u ( ⋅) be any function, then
Eλ ⎡⎣u ( x ) ⎤⎦ = 0

e−λ λ k
⇒ ∑ u (k ) ⋅ k!
=0
k =0

λk
⇒ ∑ u (k ) ⋅ k!
=0 ⎡ Since e − λ > 0 ⎤
⎣ ⎦
k =0

Since e −λ
≠ 0, k ! ≠ 0 , so that u ( k ) = 0 . Hence the Poisson family of distribution is complete and ∑ xi is a

complete minimal sufficient statistic for λ .


Example: Let x1 , x2 , , xn be independent random variable each with U ( 0, θ ) ; θ > 0 . Find a sufficient

statistic for θ and show that it is complete.


Solution: Since x1 , x2 , , xn be independent random variables each with U ( 0, θ ) ; θ > 0 then

1
f ( x) =
θ
1
∴ L ( x) =
θn
Hence, x( n ) is the sufficient statistic for θ . We known that

f n:n ( x ) = n { F ( x )}
n −1
⋅ f ( x)

So, the p.d . f . of x( n ) is

nx n −1 ⎡ x
1 x⎤
⎢∵ F ( x) = ∫ θ dx = θ ⎥⎥
θ n
⎢⎣ 0 ⎦
Let u ( ⋅) be any function. Then

Estimation ~ 13 of 22
Eθ ⎡⎣u ( xn ) ⎤⎦ = 0
θ
nx n −1
⇒ ∫ u ( xn ) ⋅ θn
dx = 0
0
θ

∫ u ( xn ) x
n −1
⇒ dx = 0
0

⇒ u (θ )θ n −1 = 0 [ Differentiating both sides w. r. to θ ]


∴ u (θ ) = 0 ∀θ ⎡ Since θ n −1 ≠ 0 ⎤
⎣ ⎦
Therefore x( n ) is a complete sufficient statistic for θ .

Ancillary Statistic: A statistic U ( X1 , X 2 , , X n ) is called first order ancillary if Eθ {U ( X1 , X 2 , , X n )} is

a constant independent of θ . U ( X1 , X 2 , , X n ) is called an ancillary statistic for θ if the distribution

function of U ( X1 , X 2 , , X n ) does not depend on θ .

Thus, unlike a sufficient statistic, an ancillary statistic does not contain any information about the parameter
θ . In such cases, intuition suggests that (since the sufficient statistic T ( X1 , X 2 , , X n ) contains all the

information about θ ) the ancillary statistic should be independent of T ( X 1 , X 2 , , Xn ) .

Example: Let X1 , X 2 , , Xn be a random sample from N ( µ , 1) . Then the statistic


n
U ( X ) = ( n − 1) S 2 = ∑ ( X i − X )
2
is ancillary since ( n − 1) S 2 ~ χ(2n −1) , which is free of µ . Some other ancillary
i =1
n
statistics are X1 − X , X ( n ) − X (1) and ∑ Xi − X .
i =1

Example: Let X1 , X 2 , ( )
, X n be a random sample from N 0, σ 2 . Then the statistic U ( X ) = X follows

( )
N 0, n −1σ 2 and is not ancillary with respect to the parameter σ 2 .

Example: Let X (1) , X ( 2) , , X ( n ) be the order statistics of a random sample from the p.d . f . f ( x − θ ) , where

(
θ ∈ ℜ , then the statistic U ( X ) = X ( 2 ) − X (1) , , X ( n ) − X (1) ) is ancillary for θ .
Example: Let X1 , X 2 , , X n be a iid random variable with distribution

1
f ( x ; µ ,θ ) = ; µ −θ ≤ x ≤ µ +θ

Then the statistic R = X ( n ) − X (1) is not ancillary statistic because the distribution of R is
n ( n − 1) x n − 2 ⎛ x ⎞
fR ( r ) = ⎜1 − ; 0 ≤ x ≤ 2θ
( 2θ ) n −1
⎝ 2θ ⎟⎠

which is dependent of θ .

Basu’s Theorem: Let T ( X1 , X 2 , , X n ) be a complete sufficient statistic and U ( X 1 , X 2 , , X n ) an

ancillary statistic. Then T and U are independent random variables.


Proof: Fixing u as an arbitrary value of U . Let g ( t ) = P {U = u | T = t} , then

Estimation ~ 14 of 22
Eθ { g (T )} = ∑ P {U = u | T = t} P {T = t}
t

∑ P {U = u, T = t}
= t
P {T = t}
P {T = t}
= ∑ P {U = u , T = t} = P {U = u}
t

So, Eθ ⎡⎣ g (T ) − P {U = u}⎤⎦ = 0 . By completeness of T , g ( t ) − P {U = u} = 0 for all t , that is

P {U = u | T = t} = P {U = u} for all t

Hence U and T are independent random variables.

Example: Let X1 & X 2 be independent random variables each N ( µ , σ 2 ) with σ 2 known and µ unknown.

( )
Let T ( X1 , X 2 ) = X 1 + X 2 , U ( X 1 , X 2 ) = X1 − X 2 . Then U ( X1 , X 2 ) is N 0, 2σ 2 and its distribution does not

depend on µ . Hence it is an ancillary statistic for µ . Since T ( X 1 , X 2 ) is a complete sufficient statistic, it

follows from the Basu’s theorem the X 1 − X 2 and X 1 + X 2 are independent random variables.

Example: Let X1 , X 2 , , X n be a random sample of size n from the uniform distribution on [ 0, θ ] , and let

Y1 < Y2 < < Yn denote the corresponding order statistics. Show that Y1 and Yn are independent random
Yn

variables.
Solution: Since Yn is a complete sufficient statistic for θ , it suffices (by the Basu’s Theorem) to show that

the distribution of Y1 Y does not depend on θ (i.e. that Y1 Y is an ancillary statistic), which follows since for
n n

0 < t ≤1

fY1 ( t ) = P ⎛⎜ Y1 Y ≤ t ⎞⎟ = P (Y1 ≤ tYn )


Yn ⎝ n ⎠
θ
= P (Y1 ≤ ty ) fYn ( y ) dy

0
θ
⎧⎪ ⎛ ty ⎞ n ⎫⎪ ny n −1

= ⎨1 − ⎜ 1 − ⎟ ⎬ n dy
⎪ ⎝ θ ⎠ ⎭⎪ θ
0⎩
θ θ n −1
ny n −1
n
⎛ y⎞ ⎛ y⎞ 1
= ∫ θ n ∫
dy − ⎜1 − t ⎟ n ⎜ ⎟
⎝ θ ⎠ ⎝θ ⎠ θ
dy
0 0
1
⎛ y⎞
= 1 − n (1 − tx ) x n −1dx

n
⎜ Taking x = ⎟
0 ⎝ θ⎠

Which is independent of θ . So Y1 Y and Yn are independent random variables.


n

Theorem: Suppose that X = ( X1 , X 2 , , X n ) have joint density or joint frequency function that is a k -

parameters exponential family


⎡ k ⎤

f ( x ; θ ) = exp ⎢ Ci (θ ) Ti ( x ) − d (θ ) + S ( x ) ⎥
⎣ i =1 ⎦

then the statistic {T1 ( x ) , T2 ( x ) , , Tk ( x )} is complete as well as sufficient for θ .

Estimation ~ 15 of 22
Example: Show that if X1 , X 2 , , Xn are independent random variables each

( )
N µ , σ 2 ; − ∞ < µ < ∞ , σ 2 > 0 are both unknown, then the joint density of X 1 , X 2 , , X n is a member of

two parameter exponential family.


Solution: The joint density of X1 , X 2 , , X n is
1
⎛ 1 ⎞ − 2σ 2 ∑ ( xi − µ )
n 2

(
L x ; µ, σ 2 = ⎜ )⎟ e
⎝ σ 2π ⎠
⎛ 1 ⎞ − 2σ 2 {∑ xi − 2 µ ∑ xi + nµ }
n 1 2 2

=⎜ ⎟ e
⎝ σ 2π ⎠


1 n 1
{

= exp ⎢ n ln − ln ( 2π ) − 2 ∑ xi2 − 2 µ ∑ xi + nµ 2 ⎥
σ 2 2σ ⎦
}
⎡ µ nµ ⎤2
( 1
)
= exp ⎢ − n ln σ 2π − 2 ∑ xi2 + 2 ∑ xi − 2 ⎥
2σ σ 2σ ⎦

⎡µ ⎞ ⎧⎪ nµ
2 ⎫⎤
= exp ⎢ 2
⎢⎣ σ
⎛ 1
∑ xi + ∑ xi2 ⎜⎝ − 2σ 2 ⎟ − ⎨
⎠ ⎪⎩ 2σ 2
+ n ln σ 2π ( )⎬⎪⎪⎥⎥ (1)
⎭⎦

So we can say, there is sufficient statistic for µ and σ 2 . The joint density of X 1 , X 2 , , X n is said to be a

member of the exponential family or a member of the Koopman-Darmois class or is said to have Koopman-
Darmois form if
⎡ k ⎤
L ( x ; θ ) = exp ⎢ ∑ Ci (θ ) Ti ( x ) − d (θ ) + S ( x ) ⎥ ( 2)
⎣ i =1 ⎦

Now by (1) and ( 2 ) we have,

T1 ( x ) = ∑ xi T2 ( x ) = ∑ xi2
µ
(
C1 µ , σ 2 = ) σ2
(
C2 µ , σ 2 = −) 1
2σ 2
⎪⎧ nµ ⎪⎫
2
(
d µ, σ 2 ) (
= ⎨ 2 + n ln σ 2π ⎬ ) S ( x) = 0
⎩⎪ 2σ ⎭⎪
So, the joint density of X 1 , X 2 , , X n is a member of two parameter exponential family.

Lehmann-Scheffe Theorem: This theorem gives a simple criterion for existence of a uniformly minimum
variance unbiased estimator when a complete and sufficient statistic exists.
Statement: Let X1 , , X n be a random sample from a density f ( ⋅ , θ ) . If S = s ( X1 , , X n ) is a complete

sufficient statistic and if T * = t * ( S ) , a function of S , is an unbiased estimator of τ (θ ) , then T * is an

UMVUE of τ (θ ) .

Proof: Let T ′ be any unbiased estimator of τ (θ ) which is a function of S ; that is, T ′ = t ′ ( S ) . Then

Eθ ⎣⎡T * − T ′⎦⎤ = 0 for all θ ∈ Ω and T * − T ′ is a function of S ; so by completeness of S , Pθ ⎣⎡t * ( S ) = t ′ ( S ) ⎦⎤ ≡ 1

for all θ ∈ Ω . Hence there is only one unbiased estimator of τ (θ ) . T * must be equal to E [T | S ] since

Estimation ~ 16 of 22
E [T | S ] is an unbiased estimator of τ (θ ) depending on S . By Rao-Blackwell theorem, Vθ ⎡⎣T * ⎤⎦ ≤ Vθ [T ] for

all θ ∈ Ω ; so T * is an UMVUE.
Explanation: This theorem states that if a complete sufficient statistic S exists and if there is an unbiased
estimator for τ (θ ) , then there is an UMVUE for τ (θ ) . This theorem also simplifies search for unbiased

estimator if a complete and sufficient statistic T exist and there exist no function h such that
E ⎣⎡ h ( s ) ⎦⎤ = τ (θ ) , then no unbiased estimator of τ (θ ) exist. The Rao-Blackwell theorem and Lehmann-Scheffe

theorem suggest two approaches to finding UMVUE when a complete and sufficient statistic exists.
Note:
a) Find a function h such that E ⎡⎣ h ( s ) ⎤⎦ = τ (θ ) then h ( s ) is the unique UMVUE of τ (θ ) . The

function h can be determined by solving the equation E ⎡⎣ h ( s ) ⎤⎦ = τ (θ ) .

b) Given an unbiased estimator T of τ (θ ) defined an estimator by the Rao-Blackwell theorem

E ⎡⎣T * | S ⎤⎦ = T then this T is the unique UMVUE of τ (θ ) .

Example: Let X1 , , X n be iid Bernoulli random variable with parameter θ . By factorization theorem

T = X1 + X 2 + + X n is sufficient for θ . And it is one parameter exponential family of distribution so it is

complete. We want to find the UMVUE of θ 2 .


Let n = 2 . If a UMVUE exists, it is a function of the form h ( S ) , where the function h satisfies
2
⎛2⎞
θ 2 = ∑ h ( k ) ⎜ ⎟ θ k (1 − θ )
2− k

k =0
k ⎝ ⎠
θ = h ( 0 )(1 − θ ) + h (1) θ (1 − θ ) + h ( 2 )θ 2 (1)
2 2

For the equation (1) , L.H .S = R.H .S iff h ( 0 ) = h (1) = 0 and h ( 2 ) = 1 .

S ( S − 1)
Thus h ( S ) = is UMVUE of θ 2 if h = 2 . But for n > 2 , T * = I ( X1 + X 2 = 2 ) .
2
Note: UMVUE of τ (θ ) can be found if a complete and sufficient statistic exists. However in many cases we
can not find complete and sufficient statistic. So in that case we can not apply the Lehmenn-Scheffe theorem
to find UMVUE estimators. So, crammer-Rao lower bound derives, a lower bound for the variance of a
unbiased estimator of τ (θ ) . If the variance of some unbiased estimator achieves this lower bound, then the

estimator will be UMVUE.

Goodness of the Estimator:


Modal Unbiased and Median Unbiased Estimators:
Modal Unbiased Estimate: Let X i ( i = 1, , n ) be iid random variables with common p.d . f . f ( x ; θ )

and let t ( X1 , , X n ) be a statistic such that the mode of the density function of t is θ . Then t ( X1 , , X n ) is

said to be a model unbiased estimate of θ .

Estimation ~ 17 of 22
Median Unbiased Estimate: Let X i ( i = 1, , n ) be iid random variables with common p.d . f . f ( x ; θ )

and let t ( X1 , , X n ) be a statistic such that the median of the density function of t is θ . Then t ( X1 , , Xn )

is said to be a median unbiased estimate of θ .


Example: Suppose X i ( i = 1, , n ) be the random variable with common p.d . f .

x
1 −
f ( x, θ ) = e θ ; x > 0, θ > 0
θ
Find median unbiased estimate.
Solution:
x x
1 −
We get , F ( x ) = ∫ e θ dx
0
θ
x
⎡ −x ⎤ −
x
= − ⎢e θ ⎥ = 1− e θ
⎢⎣ ⎥⎦
0
Let , Y1 = min X i
1≤i ≤ 2 n +1
n −1
∴ fY1 ( y1 ) = n ⎡⎣1 − F ( x ) ⎤⎦ f ( x)
2 n +1−1
⎡ − ⎤
x
1 −
x
= ( 2n + 1) ⎢1 − 1 + e θ ⎥ e θ
⎢⎣ ⎥⎦ θ

( 2n + 1) −
x
θ
( 2 n −1)
= e ; 0≤ x≤∞
θ
m
( 2n + 1) −
x
( 2 n −1) 1
We can write ∫ θ
e θ dx =
2
0
m
( 2n + 1) −θ ⎡ − θ ( 2 n −1) ⎤
⎢e
x
⎥ =
1
⇒ ⋅
θ ( 2n + 1) ⎢⎣ ⎥⎦
0
2
m
− ( 2 n −1) 1
⇒ −e θ +1 =
2
θ
⇒ m= ln 2
( 2n + 1)
⎡ θ ⎤
∴ E⎢ ln 2 ⎥ = θ
⎣⎢ ( 2n + 1) ⎦⎥

( 2n + 1) ( 2n + 1)
Thus, y1 is the estimate of θ . So, y1 is the unbiased estimate of median?
ln 2 ln 2
⎛ σ2 ⎞
Example: Suppose X ~ N (θ , σ 2 ) and X ~ N ⎜⎜ θ , ⎟ . Then the model X is unbiased estimate of θ .
⎝ n ⎟⎠
2
1 ⎛ x −θ ⎞
1 − ⎜ ⎟
Solution: We have f ( x) = e 2⎝ σ ⎠ ; −∞ < x < ∞
σ 2π
2
⎛ 1 ⎞ 1 ⎛ x −θ ⎞
ln L (θ ) = f ( x ) = n ln ⎜ ⎟ − ∑⎜ ⎟
⎝ σ 2π ⎠ 2 ⎝ σ ⎠
∂ ln L (θ ) 1 ⎛ x −θ ⎞
⇒ = 0 − ⋅ 2∑ ⎜ ⎟ ( −1) = 0
∂θ 2 ⎝ σ ⎠
⇒ θ=X
Estimation ~ 18 of 22
It can be shown that the second derivative is negative. So, X is the modal value of θ . Again, X is also an
unbiased estimate of θ . Thus we can say that the X is modal unbiased estimate of θ .
Theorem: If m is a median of a discrete density p ( x ) and g ( r ) = E ( X − r ) = ∑ X − r P ( x ) then g ( r ) is
x

minimized for r = m provided that the sum exists for at least one r ∈ R (The real line).
Proof: Let r0 ∈ R and r0 ∈ [ m1 , m2 ] . Now, if it can be shown that E ( X − r0 ) = E ( X − m ) > 0 , it can be

concluded that g ( r ) is minimum for r = m . Expanding we get,

∑ ( r0 − m ) P ( x ) + ∑ ( r0 + m − 2m ) P ( x ) + ∑ ( m − r0 ) P ( x )
x≤m m < x < r0 x > r0

1
⇒ ∑ ( r0 − m ) P ( x ) ≥ 2 ( r0 − m )
x≤m

From the definition of the median we get,

∑ ( m − r0 ) P ( x ) = ∑ ( m − r0 ) P ( x ) − ∑ ( m − r0 ) P ( x )
x > r0 x>m m < x < r0

⇒ E ( X − r0 ) − E ( X − m ) ≤ 0

Which will be minimized when r0 = m .

Example: Let X i ( i = 1, , n ) be independent random samples from a continuous uniform distribution with

1
p.d . f . f ( x, θ ) = ; 0 < x ≤ θ . We have to show that y is a model unbiased estimate of θ .
θ
Consider Y = max X i . The density function of Y is
i

ny n −1
φ(y ;θ) = ; 0 < y ≤θ
θn
The mode of this distribution is θ . Hence, y is a modal unbiased estimate of θ .

Show that a modal unbiased estimate may not always unique.


Let X i ( i = 1, , n ) be independent random variables each having exponential density

x
1 −
f ( x, θ ) = e θ ; x > 0, θ > 0
θ
For this distribution, the statistic
X (n)
Y1 = where X ( n ) = max X i
ln n i
n
∑ Xi
Sn
Y2 = where S=
( n − 1) i =1

are both modal unbiased for θ .


x x
1 −
We get , F ( x) = ∫θ e θ dx

0
x
⎡ −x ⎤ −
x
= − ⎢e θ ⎥ = 1− e θ
⎢⎣ ⎥⎦
0

The p.d . f . of X ( n ) is

Estimation ~ 19 of 22
n −1
n⎛ ⎞
( )
x( n ) x
− − ( n)
g x( n ) , θ = ⎜1 − e θ ⎟ e θ ; x( n ) > 0 , θ > 0

θ⎝ ⎟

Now,
X (n)
Y1 = ⇒ dX ( n ) = ( ln n ) dY1
ln n
Hence the p.d . f . of Y1 is
n −1
n ln n ⎛ − 1 ln n ⎞
y y1
− ln n
g ( y1 , θ ) = ⎜1 − e θ ⎟ e θ ; y1 > 0 , θ > 0
θ ⎜ ⎟
⎝ ⎠
n −1
⎛ 1 y1 ⎞ y1
n ln n ⎜ ⎧⎪ − θ ⎫⎪ ⎟ ⎧⎪ − 1 ⎫⎪ ⎡ ⎤
1

= 1 − ⎨n ⎬ ⎨n θ ⎬ ⎢ Putting n θ = a ⎥
θ ⎜⎜ ⎩⎪ ⎟
⎭⎪ ⎟⎠ ⎩⎪ ⎭⎪ ⎢⎣ ⎥⎦

n ln n
( )
n −1 y
= 1 − a y1 a1 (i )
θ
∂g ( y1 , θ )

∂y1
=
n ln n
θ
(1 − a ) y1 n − 2
( n − 1) ( −a y 1
ln a a y1 +) n ln n
θ
(1 − a )y1 n −1
a y1 ln a = 0

⇒ (1 − a ) ( n − 1) a
y1 y1
=1

⎡ −
1 ⎤
⇒ na y1 = 1 ⎢∵ n θ = a⎥
⎢⎣ ⎥⎦
⇒ y1 = θ

It can be shown that second derivative of ( i ) at y1 = θ is negative and hence Y1 is modal unbiased estimate of

θ.
Now, the p.d . f . of Sn is

snn −1 −
sn
h ( sn , θ ) = e θ ; sn > 0 , θ > 0
θ n
n

Now,
Sn
Y2 = ⇒ dSn = ( n − 1) dY2
n −1

Hence the p.d . f . of Y2 is


n −1 ( n −1) y2
⎡( n − 1) y2 ⎤⎦ −
h ( y2 , θ ) = ⎣ e θ
( n − 1)
θn n
( n −1) y2
( n − 1)n [ y2 ]n −1 −
= e θ ; y2 > 0 , θ > 0
θn n
( n −1) y2 ( n −1) y2
∂h ( y2 , θ ) ( n − 1)n − ( n − 1)n [ y2 ]n−1 − ⎪⎧ ( n − 1) ⎪⎫
⇒ = ( n − 1) y2n − 2 e θ − e θ
⎨ ⎬=0
∂y2 θ n θ n ⎩⎪ θ ⎭⎪
n n

1
⇒ y2−1 − =0
θ
⇒ y2 = θ

Again the second derivative of h ( y2 ,θ ) at y2 = θ is negative. Hence Y2 is a model unbiased estimate of θ .

Thus a modal unbiased estimate is not always unique.

Estimation ~ 20 of 22
Theorem: Let X be a random variable with density function f ( x ; θ ) . Y = g ( x ) And let φ ( y ) be the
density function of Y such that
∂φ ( y )
(i ) =0 at y=0
∂y
∂ 2φ 2 ( y )
( ii ) <0 at y<0
∂y 2
( iii ) y is 1 − 1 transformation from x to y and from y to x

Then the solution of the following differential equation is a modal unbiased estimate of θ .

∂f ( x, θ ) ⎛ ∂x ⎞
2
∂2 x
(a) f ( x, θ ) + ⎜ ⎟ =0 at y = θ
∂y 2 ∂x ⎝ ∂y ⎠
∂ 2 f ( x, θ ) ⎛ ∂x ⎞ ∂f ( x,θ ) ∂x ∂ 2 x
3
∂3 x
(b) ⎜ ⎟ + 3 ⋅ ⋅ + f ( x , θ ) <0 at y = θ
∂x 2 ⎝ ∂y ⎠ ∂x ∂y ∂y 2 ∂y 3

Proof:
∂ ∂
φ ( y) = F ( y) = Fx ⎡ g −1 ( y ) ⎤⎦ ⎡ Since y = g ( x ) ⇒ x = g −1 ( y ) ⎤⎦
∂y ∂y ⎣ ⎣
∂ ∂x
= Fx ( x )
∂y ∂y
∂x
∴ φ ( y) = f ( x ; θ )
∂y
∂φ ( y ) ∂f ( x ; θ ) ⎛ ∂x ⎞
2
∂2 x
⇒ = ⎜ ⎟ + f ( x, θ ) 2 = 0 at y = θ by our hypothesis
∂y ∂x ⎝ ∂y ⎠ ∂y
∂ 2φ ( y ) ∂ 2 f ( x,θ ) ⎛ ∂x ⎞ ∂f ( x,θ ) ∂x ∂ 2 x
3
∂3 x
⇒ = ⎜ ⎟ + 3 ⋅ ⋅ + f ( x , θ ) <0
∂y 2 ∂x 2 ⎝ ∂y ⎠ ∂x ∂y ∂y 2 ∂y 3
at y = θ according to our hypothesis

Hence the solution of the differential equation is a modal unbiased estimate of θ .

Example: Suppose X ~ N (θ , σ 2 ) . Show that X is a modal unbiased estimate of θ .

Solution: We have,
2
1 ⎛ x −θ ⎞
1 − ⎜ ⎟
f ( x) = e 2⎝ σ ⎠ ; −∞ < x < ∞
σ 2π
2
1 ⎛ x −θ ⎞
∂f ( x, θ ) 1 − ⎜ ⎟ 2( x −θ )
∴ = e 2⎝ σ ⎠
∂x σ 2π 2σ 2
Now,

Estimation ~ 21 of 22
2 2
1 ⎛ x −θ ⎞ 1 ⎛ x −θ ⎞
2( x −θ )
2
1 − ⎜ ⎟ ∂2 x 1 − ⎜ ⎟ ⎛ ∂x ⎞
e 2⎝ σ ⎠ − e 2⎝ σ ⎠ ⎜ ⎟ =0
σ 2π ∂y 2 2σ 2 σ 2π ⎝ ∂y ⎠
2
1 ⎛ x −θ ⎞
1 − ⎜ ⎟ ⎡ ∂ 2 x ( x − θ ) ⎛ ∂x ⎞2 ⎤
2⎝ σ ⎠
⇒ e ⎢ 2− ⎜ ⎟ ⎥=0
σ 2π ⎢⎣ ∂y σ 2 ⎝ ∂y ⎠ ⎥⎦

∂2 x ( x − θ ) ⎛ ∂x ⎞2 We have, Y = g ( x)
∴ − ⎜ ⎟ =0 at y = θ
∂y 2 σ2 ⎝ ∂y ⎠ ⇒ Y=x
( x −θ ) ∂Y
⇒ 0− (1) 2
=0 ⇒ =1
σ2 ∂x
⇒ x =θ at y = θ ∂ 2Y
⇒ =0
∂x 2

So, x = y is the solution of this equation?


Hence X is a modal unbiased estimate of θ .

Estimation ~ 22 of 22
Jackknife Estimator and Correction for Bias

Jackknife Estimator: The jackknife estimator was introduced by Quenouille in 1949 and named by Tukey
in 1958. The jackknife technique’s purpose is to decrease the bias of an estimator and provide an approximate
confidence interval for the parameter of interest.
If a parameter has a UMVUE associated with it, then clearly there is no chance of improving such as
estimator’s bias. However, MLE’s are often biased and hence improvement may be possible in the sense of an
estimator with lower bias. Jackknifing is an important technique for accomplishing such bias reduction.
Let X1 , X 2 , " , X n be a random sample of size n from a population with real valued parameter θ . Let θˆ be
n
an estimator of θ . Divide the random sample into N groups of equal size m = observations each ( N is
N

one of the factors of n ). Delete one group at a time and estimate θ based on the remaining ( N − 1) m

observations, using the same estimation procedure previously used with a sample of size n . Denote the
estimator of θ obtained with the i th group deleting by θˆi ( i = 1, 2, " , N ) called a jackknife statistic.

For i = 1, 2, " , N let us consider a new statistic

J i = Nθˆ − ( N − 1)θˆi
and consider

()
N
1
J θˆ =
N
∑ ⎡⎣ Nθˆ − ( N − 1)θˆi ⎤⎦
i =1
N
1
= Nθˆ − ( N − 1)θˆi Where, θˆi =
N
∑θˆi
i =1

()
J θˆ is called Jackknife estimator of θ .

()
Generally, we take m = 1 , then the commonly used Jackknife estimate is J θˆ = nθˆ + ( n − 1)θˆi , since

n
m =1= ⇒n=N.
N

Note:

()
J θˆ = Nθˆ + θˆ − θˆ − Nθˆi + θˆi

= θˆ + θˆ ( N − 1) − θˆi ( N − 1)

(
= θˆ + ( N − 1) θˆ − θˆi )
()
Which shows that the estimator J θˆ is an adjustment of θˆ with the amount of adjustment depending on the

difference between θˆ and θˆi .

Correction for Bias: If we have a biased estimator then we have to add or make simple adjustment to have
unbiased estimator. But sometimes the expected value is a rather complicated function of the parameter then it
is very difficult to add a simple factor to make a biased estimator into an unbiased one.
Let tn denote the biased estimator of θ based on n observation.

Jackknife Estimator and Correction for Bias ~ 1 of 13



⎡ where, ar is function of θ ⎤
()
E ( tn ) = E θˆ = θ + ∑ nrr
a
" " " ( A) ⎢but cons tan t w. r. to n



r =1

∑ nrr
a
⇒ E ( tn ) − θ =
r =1
Let tn −1 be the estimated value of θ based on ( n − 1) observation. Again tn −1, i ( i = 1, " , n ) denote the

estimated value of θ based on ( n − 1) observation where i th observation is omitted. Then we will have

tn −1, 1 , tn −1, 2 , " , tn −1, n . tn −1 is the average of these n estimated values with ( n − 1) observations each.

Let us define another new statistic as

tn′ = ntn − ( n − 1) tn −1
⎣⎢
()
⎡ J θˆ = nθˆ − ( n − 1)θˆ ; here, t = θˆ and t = θˆ ⎤
i n n −1 i⎥

= tn + ( n − 1)( tn − tn −1 )
∞ ⎛ ∞ ⎧⎪ 1 n ⎫⎪ ⎞
∑ ∑ ∑
ar ar
⇒ E ( tn′ ) = θ + + ( n − 1 ) ⎜ θ + − E ⎨ tn −1, i ⎬ ⎟
r ⎜ r
⎪⎩ n i =1 ⎪⎭ ⎟⎠
r =1 n ⎝ r =1 n

∞ ⎛ ∞ ∞
ar ⎞
∑ ∑ ∑
ar ar
=θ + + ( n − 1) ⎜θ +

− θ − r ⎟

r =1 n
r
⎝ r =1 n r
r =1 ( n − 1 ) ⎠
∞ ∞ ∞
∑ nr ∑ nr ∑
ar ar ar
=θ + + ( n − 1) − ( n − 1)
r =1 r =1 r =1 ( n − 1)r
∞ ∞ ∞ ∞
∑ nr ∑ nr ∑ nr ∑
ar ar ar ar
=θ + +n − − r −1
r =1 r =1 r =1 r =1 ( n − 1)
∞ ∞
∑ nrr−1 − a1 − ∑
a ar
= θ + a1 +
r =2 r =2 ( n − 1)r −1
∞ ∞
∑ nrr−1 − ∑
a ar
=θ +
r =2 r =2 ( n − 1)r −1
⎛1 1 ⎞ ⎛ 1 1 ⎞ ⎛ 1 1 ⎞
= θ + a2 ⎜ − ⎟ + a3 ⎜⎜ 2 − ⎟ + a4 ⎜ −

⎟ +"
⎜ n {n − 1}3 ⎟
⎝ n n −1 ⎠ ⎝ n { 1}2
n − ⎠ ⎝
3

a2 ⎛ 1 ⎞
⇒ E ( tn′ ) = θ − − Ο⎜ 3 ⎟
n2 ⎝n ⎠
1 1
That is, tn′ is only biased of order but tn has the bias of order i.e. tn′ reduces the bias. Similarly we can
n2 n
take another statistic
n 2 tn′ − ( n − 1) tn −1
2
tn′′ =
n 2 − ( n − 1)
2

a2 ⎛ 1 ⎞
⇒ E ( tn′′ ) = θ − − Ο⎜ 4 ⎟
n 3
⎝n ⎠
1
That is, bias of order . So, every step amount of bias is very small. So in this method we can remove bias
n3
completely or to any required degree.

N.B.: Explain Jackknife Method and discuss how it reduce the bias.
Example: Let, x1 , x2 , ..........., xn be a random sample of size n with the probability density function
2
1 ⎛ x−µ ⎞
− ⎜
( ) 1 ⎟
f x ; µ, σ 2 = e 2⎝ σ ⎠ ; −∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0
σ 2π
Find the jackknife estimator of µ .
Jackknife Estimator and Correction for Bias ~ 2 of 13
2
1 ⎛ x−µ ⎞
− ⎜
( )=σ 1 ⎟
Solution: Here, we have that f x ; µ, σ 2
e 2⎝ σ ⎠ ; −∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0

Thus the likelihood function is:


2
n
⎛ xi − µ ⎞
⎛ 1 ⎞ −2∑
n 1

i =1 ⎝ σ ⎟⎠
⇒ L=⎜ ⎟ e
⎝ σ 2π ⎠
n
∑ ( xi − µ )
n n 1
log L = − log ( 2π ) − log σ 2 −
2

2 2 2σ 2 i =1

∂ log L n
∑ 2 ( xi − µ )( −1) = 0
1
⇒ =− 2
∂µ 2σ i =1
n
⇒ ∑ ( xi − µ ) = 0
i =1

1 n
⇒ µˆ = ∑ xi
n i =1

So, the maximum likelihood estimate of the parameter µ is:


n
∑ xi
1
µˆ = x = " " " (+)
n i =1
n
∑ E ( xi ) = µ
1
⇒ E ( µˆ ) = " " " (+ +)
n i =1

Thus, the jackknife estimator is not to be found here.


But, we can find the jackknife estimator by taking
n

1
θˆi = xj
n − 1 j ≠ i =1

1 ⎛ n ⎞
∑ x j − xi ⎟⎟
1
⇒ θˆi = ⎜ = ( nx − xi )
n − 1 ⎜⎝ j =1 ⎠ n −1
n n
⎡ 1 ⎤
∑θˆi ∑ ⎢⎣ n − 1 ( nx − xi )⎥⎦
1 1
So, θˆi = =
n i =1 n i =1

=
1
n ( n − 1)
(
n 2 x − nx )
⇒ θˆ = x " " " (+ + +)
So, the jackknife estimator is given by

() J θˆ = nθˆ − ( n − 1)θˆi

⇒ J (θˆ ) = nx − ( n − 1) x ⎣⎡ from ( + ) and ( + + + ) ⎦⎤


⇒ J (θˆ ) = x " " " ( A)
∴ E ⎡ J (θˆ ) ⎤ = E ( x ) = µ
⎣ ⎦
So, we can say that J (θ ) = x is an unbiased and uniformly minimum variance unbiased estimator
ˆ of µ .

Example: Let x1 , x2 , ..........., xn be a random sample of size n with the probability density function
2
1 ⎛ x−µ ⎞
− ⎜
( ) 1 ⎟
f x ; µ, σ 2 = e 2⎝ σ ⎠ ; −∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0
σ 2π

Find the jackknife estimator of σ 2 , where µ and σ 2 both are unknown.

Jackknife Estimator and Correction for Bias ~ 3 of 13


2
1 ⎛ x−µ ⎞
− ⎜
( )=σ 1 ⎟
Solution: Here, we have that f x ; µ, σ 2
e 2⎝ σ ⎠ ; −∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0

Thus the likelihood function is:


2
n
⎛ xi − µ ⎞
⎛ 1 ⎞ −2∑
n 1

i =1 ⎝ σ ⎟⎠
⇒ L=⎜ ⎟ e
⎝ σ 2π ⎠
n
∑ ( xi − µ )
n n 1
log L = − log ( 2π ) − log σ 2 −
2

2 2 2σ 2 i =1

∂ log L n
∑ 2 ( xi − µ )( −1) = 0
1
⇒ =− 2
∂µ 2σ i =1
n
⇒ ∑ ( xi − µ ) = 0
i =1
n
∑ xi
1
⇒ µˆ = " " " (1)
n i =1

Again, we have that


∂ log L n
∑ ( xi − µ )
n 1 2
=− + =0
∂σ 2
2σ 2
2σ 4
i =1
n
∑ ( xi − µ )
1 2 n
⇒ =
2σ 4 i =1 2σ 2
n
∑ ( xi − µ )
1 2
⇒ =n
σ2 i =1
n
∑ ( xi − µ )
1
⇒ S 2 = σˆ 2 =
2
" " " ( 2)
n i =1

The likelihood equations for simultaneous estimation of µ and σ 2 are


∂ log L ∂ log L
=0 and =0
∂µ ∂σ 2
From equation (1) and (2 ) we have,

1 n
µˆ = x σˆ 2 = ∑
n i =1
( xi − µˆ )2
1 n
∑ ( xi − x ) (+)
2
⇒ θˆ = S 2 = σˆ 2 = " " "
n i =1
1 n
∑ ⎡⎣( xi − µ ) − ( x − µ ) ⎤⎦
2
=
n i =1
1⎡ n 2⎤
⎢ ∑ ( xi − µ ) − n ( x − µ ) ⎥
2
=
n ⎢⎣ i =1 ⎥⎦
1 n
= ∑
n i =1
( xi − µ )2 − ( x − µ )2

Jackknife Estimator and Correction for Bias ~ 4 of 13


() ( ) 1 n
⇒ E θˆ = E σˆ 2 = ∑ E ( xi − µ ) − E ( x − µ )
n i =1
2 2

1 n
∑ E ( xi − E ( xi ) ) − E ( x − E ( x ) )
2 2
=
n i =1

⇒ () ( ) 1 n
E θˆ = E σˆ 2 = ∑ Var ( xi ) − Var ( x )
n i =1

() ( ) σ 2
⇒ E θˆ = E σˆ 2 = σ 2 − " " " (+ +)
n
n
So, we can say that θˆ = σˆ 2 = 1 ∑ ( xi − x )2 is a biased estimator of σ 2 . But, we can remove the bias by the
n i =1

application of a simple adjustment such as follows:


σ
() ( )
2
⇒ E θˆ = E σˆ 2 = σ 2 −
n
⎛ n ˆ⎞ ⎛ n 2⎞
⇒ E⎜ θ ⎟ = E⎜ σˆ ⎟ = σ 2
⎝ n − 1 ⎠ ⎝ n − 1 ⎠

n ˆ n
So, we can say that θ= σˆ 2 is an unbiased estimator of θ = σ 2 . Thus, the jackknife estimator is not to
n −1 n −1
be found here.
But, we can find the jackknife estimator by taking
2
⎛ n
n ⎞
∑ ⎜ x 2j
xj ∑ ⎟
j ≠ i =1 ⎜ j ≠ i =1 ⎟
θi = σˆ i =
ˆ 2

n −1 ⎜ n −1 ⎟
⎜ ⎟
⎜ ⎟
⎝ ⎠
2
n ⎛ n ⎞
∑ xi2 − xi2 ⎜ ∑ xi − xi ⎟
⇒ θˆi = σˆ i2 = i =1
−⎜ i =1 ⎟
n −1 ⎜ n −1 ⎟
⎜ ⎟
⎜ ⎟
⎝ ⎠
n
∑ xi2 − xi2 ⎛ nx − xi ⎞
2
i =1
= −⎜ ⎟
n −1 ⎝ n −1 ⎠

⎡ n 2 ⎤
⎢ ∑ xi − xi
2
2⎥
⎛ nx − xi ⎞ ⎥
n n
θˆi = ∑ θˆi = ∑ ⎢ i =1
1 1
So, −⎜ ⎟
n i =1 n i =1 ⎢ n − 1 ⎝ n −1 ⎠ ⎥
⎢ ⎥
⎢⎣ ⎥⎦
⎡ n 2 n 2⎤ n
⎢ n∑ xi − ∑ xi ⎥ − 2 ∑
1 1
= ( nx − xi )2
n ( n − 1) ⎣ i =1 i =1 ⎦ n ( n − 1) i =1
⎡ n 2 ⎤
∑ ( n2 x 2 − 2nxi x − xi2 )
n
⎢ ∑ xi ( n − 1) ⎥ −
1 1
=
n ( n − 1) ⎣ i =1 ⎦ n ( n − 1)
2
i =1
n
∑ xi2 ⎡ 3 2 2⎤
n n
⎢ n x − 2nx ∑ xi + ∑ xi ⎥
i =1 1
= −
n ( n − 1)
2
n ⎣ i =1 i =1 ⎦

Jackknife Estimator and Correction for Bias ~ 5 of 13


n
∑ xi2 ⎡ 2 2 n ⎤
∑ xi2 ⎥
1
⎢n x ( n − 2) +
i =1
= −
n ( n − 1)
2
n ⎣ i =1 ⎦
n n
∑ xi2 n ( n − 2) x 2
∑ xi2
i =1 i =1
= − −
n ( n − 1) 2
n ( n − 1)
2

n
∑ xi2 n 2 − 2n + 1 − 1 n ( n − 2 ) x
2
i =1
= × −
n ( n − 1)2 ( n − 1)2
⎛ n ⎛ n ⎞
2 ⎞

n ( n − 2 ) ⎜ i =1

xi2 ⎜ xi ∑ ⎟ ⎟

= − ⎜ i =1 ⎟
⎜ ⎟
( n − 1)2 ⎜ n ⎜⎜ n ⎟
⎟ ⎟
⎜⎜ ⎜ ⎟ ⎟⎟
⎝ ⎝ ⎠ ⎠
n ( n − 2) ˆ n ( n − 2) 2
⇒ θˆi = θ= σˆ " " " (+ + + +)
( n − 1)2 ( n − 1)2
So, the jackknife estimator is given by

()
J θˆ = nθˆ − ( n − 1)θˆi
n ( n − 2)
⇒ J (θˆ ) = nσˆ 2
− ( n − 1) σˆ 2 ⎡⎣ from ( + ) and ( + + + + ) ⎤⎦
( n − 1)2
⎡ ( n − 2) ⎤
⇒ J (θˆ ) = nσˆ 2 ⎢1 − ⎥
⎣⎢ ( n − 1) ⎦⎥
⇒ ()
J θˆ =
n
n −1
σˆ 2 " " " ( A)


⎣ ()
E ⎡ J θˆ ⎤ =
n
⎦ n −1
E σˆ 2 ( )
⎛1 n ⎞
∑ ( xi − x )
n 2
== E⎜ ⎟⎟
n − 1 ⎜⎝ n i =1 ⎠
n ⎛ 2 σ2 ⎞
= ⎜σ − ⎟
n − 1 ⎜⎝ n ⎟⎠
n ⎛ n −1 2 ⎞
= ⎜ σ ⎟ =σ2
n −1 ⎝ n ⎠

()
n
∑ ( xi − x )
n ˆ n 1
So, we can say that J θˆ = θ= σˆ 2 =
2
is an unbiased and uniformly minimum variance
n −1 n −1 n − 1 i =1

unbiased estimator θ = σ 2 .
Example: Let, x1 , x2 , ..........., xn be a random sample of size n with the probability density function
1− x
f ( x ; p ) = p x (1 − p ) ; x=0 ,1

Find the jackknife estimator of Var ( x ) = pq .


1− x
Solution: Here, we have that f ( x ; p ) = p x (1 − p ) ; x=0 ,1

Thus the likelihood function is:

Jackknife Estimator and Correction for Bias ~ 6 of 13


n
∑ xi n

L ( x; p ) = p i =1
(1 − p )n −∑ x
i =1
i

n ⎛ n ⎞
⇒ ln L ( x; p ) = ∑ xi ln p + ⎜⎜ n − ∑ xi ⎟⎟ ln (1 − p )
i =1 ⎝ i =1 ⎠
⎛ n ⎞ n

∂ ln L ( x; p ) i =1
xi ⎜⎜ n − xi ⎟⎟ ∑ ∑
+⎝ ⎠ =0
i =1
⇒ =
∂p p 1− p
n n n
∑ xi − p∑ xi − np + p∑ xi
i =1 i =1 i =1
⇒ =0
p (1 − p )
n
∑ xi ⎡ n ⎤
∑ xi = y ~ B ( n, p )⎥⎥
i =1 y
⇒ pˆ = = ⎢ Let ,
n n ⎣⎢ i =1 ⎦
So, the maximum likelihood estimator of p is:
n
∑ xi y
i =1
pˆ = = .
n n

And, we know that if θˆ is the maximum likelihood estimator of θ and g (θ ) is a one-to-one function of θ ,

⎡⎣That is, g (θ1 ) = g (θ 2 ) ()


⇔ θ1 = θ 2 ⎤⎦ , then g θˆ is the maximum likelihood estimator of g (θ ) .

So, from the above, we can say that the maximum likelihood estimator of θ = pq is given by:
y⎛ y⎞
θˆ = pq
ˆˆ= ⎜1 − ⎟ " " " (+)
n⎝ n⎠
2
⇒ ()
E θˆ = E ( pq
⎛ y⎞
ˆ ˆ) = E ⎜ ⎟ − E ⎜ ⎟
⎝n⎠
⎛ y⎞
⎝n⎠
= E ( y ) − 2 ⎡Var ( y ) + ⎡⎣ E ( y ) ⎤⎦ ⎤
1 1 2

n n ⎢⎣ ⎥⎦
1
n
1
= np − 2 npq + n 2 p 2
n
( )
( S in ce, y ~ B ( n, p ) )
pq
= p− − p2
n
⇒ ()
E θˆ = E ( pq
ˆ ˆ ) = pq −
pq
n
" " " (+ +)

ˆ ˆ is a biased estimator of θ = pq . But, we can remove the bias by the application of


So, we can say that θˆ = pq
a simple adjustment such as follows:
⎛ n −1⎞
()
E θˆ = E ( pq
ˆ ˆ) = ⎜
⎝ n ⎠
⎟ pq

⎛ n ˆ⎞ ⎛ n ⎞
⇒ E⎜ θ ⎟ = E⎜ ˆ ˆ ⎟ = pq
pq " " " (+ + +)
⎝ n −1 ⎠ ⎝ n −1 ⎠

n ˆ n
So, we can say that θ= ˆ ˆ is an unbiased estimator of θ = pq . Thus, the jackknife estimator is not to
pq
n −1 n −1
be found here.
But, we can find the jackknife estimator by taking

Jackknife Estimator and Correction for Bias ~ 7 of 13


⎧ y −1 ⎛ y −1 ⎞
⎪ n − 1 ⎜1 − n − 1 ⎟
⎪ ⎝ ⎠
; if yi = 1 (That is, if the i th
trial is a success )
θˆi = ⎨
⎪ y ⎛
1−
y ⎞
⎪⎩ n − 1 ⎜⎝ n − 1 ⎟⎠
; if yi = 1 (That is, if the i th
trial is a failure )
Now, since there are x success and n − y failures to be removed, then we have that
n
1 ⎡ y −1 ⎛ y −1 ⎞ y ⎛ y ⎞⎤
∑θˆi = n ⎢⎣ y n − 1 ⎜⎝1 − n − 1 ⎟⎠ + ( n − y ) n − 1 ⎜⎝1 − n − 1 ⎟⎠⎥⎦
1
θˆi =
n i =1

1 ⎡ y ( y − 1)( n − y ) y ( n − y )( n − y − 1) ⎤
= ⎢ + ⎥
n⎢
⎣ ( n − 1)2 ( n − 1)2 ⎥⎦
y ( n − y )( n − 2 )
=
n ( n − 1)
2

y ( n − y ) n ( n − 2)
=
n n ( n − 1)2
y⎛ y ⎞ n ( n − 2)
= ⎜1 − ⎟
n ⎝ n ⎠ ( n − 1)2
n ( n − 2)
⇒ θˆi = ˆˆ
pq " " " (+ + + +)
( n − 1)2
So, the jackknife estimator is given by

()
J θˆ = nθˆ − ( n − 1)θˆi
n ( n − 2)
ˆ ˆ − ( n − 1)
= npq ˆˆ
pq ⎣⎡ from ( + ) and ( + + + + ) ⎦⎤
( n − 1)2
⎡ ( n − 2) ⎤
= npq
ˆ ˆ ⎢1 − ⎥
⎢⎣ ( n − 1) ⎥⎦
n
= ˆˆ
pq " " " ( A)
n −1

⎣ () n
⎦ n −1 ( )
E ⎡ J θˆ ⎤ = ˆˆ
E pq

n ⎛ pq ⎞
= ⎜ pq − ⎟ ⎡⎣ from ( + + ) ⎤⎦
n −1 ⎝ n ⎠
n n −1
= pq
n −1 n
⇒ ⎡ ⎤
E J θ = pq

ˆ
⎦ () ⎡⎣ from ( A ) ⎤⎦

So, we can say that J θˆ = () n


n −1
ˆ ˆ is an unbiased and uniformly minimum variance unbiased estimator
pq

θ = pq .

Example: Let, x1 , x2 , ..........., xn be a random sample of size n with the probability density function

⎛n⎞ 1− x
f ( x ; n, p ) = ⎜ ⎟ p x (1 − p ) ; x = 0, 1, " , n
⎝ x⎠

Find the jackknife estimator of p 2 .

⎛n⎞ 1− x
Solution: Here, we have that f ( x ; n, p ) = ⎜ ⎟ p x (1 − p ) ; x = 0, 1, " , n
⎝ x⎠

Jackknife Estimator and Correction for Bias ~ 8 of 13


So, for the binomial distribution, we know that
E ( x ) = np and Var ( x ) = npq

So, from the above we have that


⎛x⎞
E⎜ ⎟ = p
⎝n⎠
and ( )
Var ( x ) = E x 2 + ⎡⎣ E ( x ) ⎤⎦ = npq
2

⇒ ( )
E x 2 = npq + n 2 p 2 " " " (**)
⎛ x 2 ⎞ pq
⇒ E ⎜⎜ 2 ⎟⎟ = + p2
⎝ ⎠
n n
⎡⎛ x ⎞ 2⎤
p (1 − p ) ⎡ ⎛ x⎞ ⎤
2
⇒ ()
E θˆ = E ⎢⎜ ⎟
⎢⎣⎝ n ⎠
⎥= p +
⎥⎦
2
n
⎢ where, θˆ = ⎜ ⎟ ⎥
⎢⎣ ⎝ n ⎠ ⎥⎦
" " " (+)

2
So, we can say that θˆ = ⎛⎜ ⎞⎟ is a biased estimator of θ = p 2 . And, we cannot remove the bias by the
x
⎝n⎠
application of a simple adjustment. So, the jackknife estimator is needed to be found here.
And, we can find the jackknife estimator by taking
⎧⎛ x − 1 ⎞ 2
⎪⎜
⎪⎝ n − 1


; if xi = 1 (
That is, if the i th trial is a success )
θˆi = ⎨
2
⎪⎛ x ⎞
⎪⎜ n − 1 ⎟
⎩⎝ ⎠
; if xi = 1 (
That is, if the i th trial is a failure )
Now, since there are x success and n − x failures to be removed, then we have that
1 n ˆ 1 ⎡ ⎛ x −1⎞ ⎛ x ⎞ ⎤
2 2
θˆi = ∑
n i =1
θi = ⎢ x ⎜ ⎟ + (n − x)⎜
n ⎢⎣ ⎝ n − 1 ⎠
⎟ ⎥
⎝ n − 1 ⎠ ⎥⎦

1 ⎡ x ( x − 1)
2⎤
= ⎢
2
+
(n − x) x ⎥ =
1 ⎡ x3 − 2 x 2 + x + nx 2 − x3 ⎤
n ⎢ ( n − 1)2 ( n − 1)2 ⎥⎦ n ( n − 1) ⎣ ⎦
2

1
⇒ θˆi = ⎡ nx 2 − 2 x 2 + x ⎤ " " " (+ +)
n ( n − 1) ⎣ ⎦
2

So, the jackknife estimator is given by

Jackknife Estimator and Correction for Bias ~ 9 of 13


()
J θˆ = nθˆ − ( n − 1)θˆi
2
⎛x⎞ 1
= n ⎜ ⎟ − ( n − 1) ⎡ nx 2 − 2 x 2 + x ⎤ ⎡⎣ from ( + ) and ( + + ) ⎤⎦
⎝ ⎠ n ( n − 1) ⎣ ⎦
n 2

( n − 1) x 2 − ⎡⎣ nx 2 − 2 x 2 + x ⎤⎦ x ( x − 1)
= = " " " ( A)
n ( n − 1) n ( n − 1)
E ( x ) − E ( x)
2
⇒ E ⎡ J (θˆ ) ⎤ =
⎣ ⎦ n ( n − 1)
npq + n 2 p 2 − np
= ⎡⎣ from (**) ⎤⎦
n ( n − 1)
np − np 2 + n 2 p 2 − np np 2 ( n − 1)
= =
n ( n − 1) n ( n − 1)
⎛ x −x ⎞
()
2
∴ E ⎡ J θˆ ⎤ = E ⎜ ⎟ = p2 ⎡⎣ from ( A)⎤⎦
⎣ ⎦ ⎜ n ( n − 1) ⎟
⎝ ⎠
x ( x − 1)
So, we can say that J θˆ = () is an unbiased and uniformly minimum variance unbiased estimator
n ( n − 1)

θ = pq .

Example: Let, x1 , x2 , ..........., xn be a random sample of size n with the probability density function

f ( x ;θ ) = e ( )
− x −θ
; x >θ

Find the jackknife estimator of θ .

f ( x ;θ ) = e ( )
− x −θ
Solution: Here, we have that ; x >θ

Thus the likelihood function is:


⎡ n ⎤

L ( x ;θ ) = exp ⎢ − ( xi − θ ) ⎥ " " " (1)
⎣⎢ i =1 ⎦⎥

Here, we have to choose θ so that L is maximum in equation (1) . Now, L is maximum if ( x − θ ) is

minimum. That is, L is maximum if θ is maximum.


Let, x(1) , x( 2 ) , " , x( n ) be the ordered sample of n independent observations from the given population so that

θ ≤ x(1) ≤ x( 2 ) ≤ ......... ≤ x( n ) ≤ ∞

Since, the maximum value of θ consistent with the sample is x(1) , the smallest observation, then we have that

θˆ = x(1) = The smallest observation

So, the maximum likelihood estimator of θ is

θˆ = x(1) = The smallest observation " " " (+)


Now, we know that the density function of the r th observation is given by:

( )
f x( r ) =
n!
( r − 1)!( n − r )! ⎣
r −1
⎡ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦
n−r
f ( x) " " " ( 2)

So, from equation ( 2 ) , we have that

Jackknife Estimator and Correction for Bias ~ 10 of 13


( )
f x(1) =
n! 1−1
⎡ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦
(1 − 1)!( n − 1)! ⎣
n −1
f ( x)

n −1
= n ⎡⎣1 − F ( x ) ⎤⎦ f ( x) " " " ( 3)
Now, we have that
f ( x; θ ) = e ( )
− x −θ
; x >θ
x
F ( x ) = ∫ e ( ) dx
− x −θ

θ
x
= − ⎡e ( ) ⎤
− x −θ
⎣ ⎦0
= − ⎡ e ( ) − 1⎤
− x −θ
⎣ ⎦
F ( x) = 1− e ( )
− x −θ
⇒ " " " ( 4)
So, from equation ( 3) and ( 4 ) , we have that

( )
f x(1) = n ⎡⎣1 − F ( x ) ⎤⎦
n −1
f ( x)
n −1
= n ⎡1 − 1 + e ( ) ⎤
− x −θ
f ( x)
⎣ ⎦
n −1
= n ⎡e ( ) ⎤ e ( ) = ne ( )
− x −θ − x −θ − n x −θ
⎣ ⎦

∴ ( )
E x(1) = n ∫ xe ( ) dx
− n x −θ
" " " (***)
θ
⎡ e − nx ∞ ∞ e− nx ⎤
⎢ − n θ θ∫ −n

= ne ⎢ x − dx ⎥

⎣ ⎦
⎡ θ e − nθ 1 e− nx ⎤ ∞

= ne nθ ⎢ 0 − + ⎥
⎢ −n n −n θ ⎥
⎣ ⎦
⎡ − nθ ⎤ ⎡ θ e − nθ e− nθ ⎤
θe
= ne nθ ⎢
1
− 2 0 − e− nθ ⎥ = nenθ ⎢ ( + 2 ⎥ )
⎣ n n ⎦ ⎣ n n ⎦

∴ ()
E θˆ = E x(1) = θ +
1
n
( ) " " " (+ +)
So, we can say that θˆ = x(1) is a biased estimator of θ . And, we cannot remove the bias by the application of a

simple adjustment. So, the jackknife estimator is needed to be found here.


And, we can find the jackknife estimator by taking
⎧⎪ x(1) ; if xi ≠ x(1)
θˆi = ⎨
x ; if xi = x(1)
⎩⎪ ( 2 )
Now, since there are x success and n − x failures to be removed, then we have that
1 n ˆ 1⎡
θˆi = ∑θi = n ⎣( n − 1) x(1) + x( 2) ⎤⎦
n i =1
" " " (+ + +)

So, the jackknife estimator is given by

Jackknife Estimator and Correction for Bias ~ 11 of 13


()
J θˆ = nθˆ − ( n − 1)θˆi
1
= nx(1) − ( n − 1) ⎡( n − 1) x(1) + x( 2 ) ⎤ ⎣⎡ from ( + ) and ( + + + ) ⎦⎤
n⎣ ⎦
⎡ ( n − 1)2 ⎤ ( n − 1)
= ⎢n − ⎥ x(1) − x( 2 )
⎢⎣ n ⎥

n

⎛ 2n − 1 ⎞ ( n − 1)
=⎜ ⎟ x(1) − x( 2 )
⎝ n ⎠ n
⎛ n + n −1 ⎞ ( n − 1)
=⎜ ⎟ x(1) − x( 2 )
⎝ n ⎠ n
n ⎛ n −1 ⎞ ( n − 1)
= x(1) + ⎜ ⎟ x(1) − x( 2 )
n ⎝ n ⎠ n
⎛ n −1 ⎞
= x(1) + ⎜
⎝ n ⎠
(
⎟ x(1) − x( 2 ) ) " " " ( A)


⎣ () ⎦ ( )
E ⎡ J θˆ ⎤ = E x(1) + ⎜
⎛ n −1 ⎞ ⎡
⎝ n ⎠⎣
( ) ( )

⎟ E x(1) − E x( 2 ) ⎦ " " " ( B)

Now, to find the above expected value, first of all we have to find the expected value of the second order
statistics as follows:
From equation ( 2 ) , we have that

( )
f x( r ) =
n! r −1
⎡ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦
( r − 1)!( n − r )! ⎣
n−r
f ( x)

∴ ( )
f x( 2 ) =
( )(
2 − 1
n!
! n − 2 ) !
2 −1
⎣⎡ F ( x ) ⎦⎤ ⎣⎡1 − F ( x ) ⎦⎤
n−2
f ( x)

n−2
= n ( n − 1) ⎡⎣ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦ f ( x) " " " (5)
Now, from equation ( 5 ) and ( 4 ) , we have that

( )
n−2
f x( 2 ) = n ( n − 1) ⎡1 − e ( ) ⎤ ⎡1 − 1 + e ( ) ⎤ e ( )
− x −θ − x −θ − x −θ
⎣ ⎦⎣ ⎦
n −1
= n ( n − 1) ⎡1 − e ( ) ⎤ ⎡ e ( ) ⎤
− x −θ − x −θ
⎣ ⎦⎣ ⎦
n −1 n
= n ( n − 1) ⎡e ( ) ⎤ − n ( n − 1) ⎡ e ( ) ⎤
− x −θ − x −θ
⎣ ⎦ ⎣ ⎦
( )( ) ( )⎤
= n ( n − 1) ⎡e ⎤ − ( n − 1) n ⎡ e
− n −1 x −θ − n x −θ
⎣ ⎦ ⎣ ⎦
∞ ∞
⇒ ( )
E x( 2 ) = n ( n − 1) x ⎡e ( )( ) ⎤ dx − ( n − 1) n x ⎡e ( ) ⎤


− n −1 x −θ
⎦ ⎣
− n x −θ
⎦ ∫
θ θ

= n ( n − 1) x ⎡e ( )( ) ⎤ − ( n − 1) E x(1)

− n −1 x −θ
∫ ⎦ ( ) ⎡⎣ from (***) ⎤⎦
θ

⎛ 1⎞
= n ( n − 1) e(
n −1)θ −( n −1) x ⎤
∫ x ⎣⎡e ⎦
− ( n − 1) ⎜ θ + ⎟
⎝ n⎠
θ

⎡ ∞ ⎤
−( n −1) x
e ( )
∞ − n −1 x
⎛ 1⎞
= n ( n − 1) e( ) ⎢ x
e
dx ⎥ − ( n − 1) ⎜ θ + ⎟
n −1 θ
−∫
⎢ − ( n − 1) − ( n − 1) ⎥ ⎝ n⎠
⎢⎣ θ θ ⎥⎦

Jackknife Estimator and Correction for Bias ~ 12 of 13


⎡θ e−( n −1)θ e−( n −1)θ ⎤ ⎛ 1⎞
= n ( n − 1) e(
n −1)θ
⎢ + ⎥ − ( n − 1) ⎜ θ + ⎟
⎢⎣ ( n − 1 ) ( n − 1) ⎥⎦
2
⎝ n⎠

⎡ 1 ⎤ ⎛ 1⎞
= n ⎢θ + ⎥ − ( n − 1) ⎜ θ + ⎟
⎢⎣ ( n − 1) ⎥⎦ ⎝ n⎠
n n −1
= nθ + − nθ + θ −
n −1 n
⎛ n n −1 ⎞
=θ +⎜ − ⎟
⎝ n −1 n ⎠
⎡ n 2 − ( n − 1)2 ⎤
=θ + ⎢ ⎥
⎢⎣ n ( n − 1) ⎥⎦
2n − 1
=θ + " " " (C )
n ( n − 1)

Now, from equation ( + + ) , ( B ) and ( C ) , we have that

() ( ) ⎛ n −1 ⎞ ⎡
E ⎡ J θˆ ⎤ = E x(1) + ⎜
⎣ ⎦ ⎝ n ⎠⎣
( ) ( )
⎟ E x(1) − E x( 2 ) ⎦

⎛ 1 ⎞ ⎛ n − 1 ⎞ ⎡⎛ 1⎞ ⎛ 2n − 1 ⎞ ⎤
= ⎜θ + ⎟ + ⎜ ⎟ ⎢⎜ θ + ⎟ − ⎜⎜ θ + ⎟⎥
⎝ n ⎠ ⎝ n ⎠ ⎣⎢⎝ n⎠ ⎝ n ( n − 1) ⎟⎠ ⎦⎥

1 n − 1 ⎛ n − 1 − 2n + 1 ⎞
=θ + + ⎜ ⎟
n n ⎜⎝ n ( n − 1) ⎟⎠
1 n − 1 ⎛ −n ⎞
=θ + + ⎜ ⎟
n n ⎜⎝ n ( n − 1) ⎟⎠


⎣ () ⎦


⎛ n −1 ⎞
E ⎡ J θˆ ⎤ = E ⎢ x(1) + ⎜
⎝ n ⎠
( ⎤
⎟ x(1) − x( 2 ) ⎥ = θ

)
()
So, we can say that J θˆ = x(1) + ⎜
⎛ n −1 ⎞
( )
⎟ x(1) − x( 2 ) is an unbiased estimator of θ .
⎝ n ⎠

Jackknife Estimator and Correction for Bias ~ 13 of 13


Pitman Estimator for Location Parameter

Location invariant
An estimator T = t ( X1 , X 2 , " , X n ) is defined to be location invariant if and only if

t ( x1 + c, x2 + c, " , xn + c ) = t ( x1 , x2 , " , xn ) + c for all values x1 , x2 , " , xn and all c .


n
∑ Xi
i =1
Example: Show that X n = is location invariant.
n
Solution:
n
∑ xi
Let, t ( x1 , x2 ,......, xn ) = xn = i =1
n
Then we have that
n
∑ ( xi + c )
t ( x1 + c, x2 + c, " , xn + c ) = i =1
n
n
∑ xi
i =1
= +c
n
⇒ t ( x1 + c, x2 + c, " , xn + c ) = t ( x1 , x2 , " , xn ) + c

So, we can say that X n is location invariant.

Y1 + Yn
Example: Show that is location invariant where Y1 is the smallest order statistics and Yn is the largest order
2
statistics.

Solution:

y1 + yn min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn )


Let, t ( x1 , x2 , " , xn ) = =
2 2
Then we have that,

min ( x1 + c, x2 + c, " , xn + c ) + max ( x1 + c, x2 + c, " , xn + c )


t ( x1 + c, x2 + c, " , xn + c ) =
2
min ( x1 , x2 , " , xn ) + c + max ( x1 , x2 , " , xn ) + c
=
2
min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn )
= +c
2
⇒ t ( x1 + c, x2 + c, " , xn + c ) = t ( x1 , x2 , " , xn ) + c

Y1 + Yn
So, we can say that is location invariant.
2

Pitman Estimator for Location Parameter ~ 1 of 8


1 n
∑( )
2
Example: Show that s 2 = Xi − Xn is not location invariant.
n − 1 i =1

Solution:

1 n
t ( x1 , x2 , " , xn ) = s 2 = ∑ ( xi − xn )
2
Let,
n − 1 i =1
Then we have that
2
n ⎛ n ⎞
∑ ∑
1 ⎜ xi + c − 1
t ( x1 + c, x2 + c, ", xn + c ) = ( xi + c ) ⎟⎟
n −1 ⎜ n
i =1 ⎝ i =1 ⎠
n

∑( x − x )
1 2
= i n
n −1
i =1
⇒ t ( x1 + c, x2 + c, ", xn + c ) = t ( x1, x2 , ", xn )

1 n
(
∑ Xi − Xn )
2
So, we can say that s =
2
is not location invariant.
n − 1 i =1

Example: Show that Yn − Y1 is not location invariant.

Solution:
Let, t ( x1 , x2 , " , xn ) = Yn − Y1 = max ( x1 , x2 , " , xn ) − min ( x1 , x2 , " , xn )
Then we have that
t ( x1 + c, x2 + c, ", xn + c ) = max ( x1 + c, x2 + c, ", xn + c ) − min ( x1 + c, x2 + c, ", xn + c )
= max ( x1, x2 , ", xn ) + c − min ( x1, x2 , ", xn ) − c
= max ( x1, x2 , ", xn ) − min ( x1, x2 , ", xn )
⇒ t ( x1 + c, x2 + c, ", xn + c ) = t ( x1, x2 ,......, xn )

So, we can say that Yn − Y1 is not location invariant.

Location parameter
Let { f (⋅ ; θ ) ; θ ∈ Ω} be a family of densities indexed by a parameter θ . The parameter θ is defined to be a

location parameter if and only if the density f ( x ; θ ) can be written as function of ( x −θ ) . That is

f ( x ; θ ) = h ( x − θ ) for some function h ( ⋅) . Equivalently θ is a location parameter for the density f X ( x ; θ ) of a

random variable X if and only if the distribution of ( X −θ ) does not depend on θ .

We note that if θ is a location parameter for the family of densities { f (⋅ ; θ ) ; θ ∈ Ω} , then the function h (⋅) of the

definition is a density function given by h ( ⋅) = f ( ⋅ ; 0 ) .

Example: If f ( x ; θ ) = φθ , 1 ( x ) , then show that θ is a location parameter.

Solution:
1
1 − ( x −θ )2
Here, we have that f ( x ; θ ) = φθ , 1 ( x ) = e 2 = φ0, 1 ( x − θ ) = h ( x − θ )

Or, we can say that if X is distributed normally with mean θ and variance 1, then ( X −θ ) has a standard normal

distribution. Hence, the distribution of ( X −θ ) is independent of θ .

So, we can say that θ is a location parameter.


Pitman Estimator for Location Parameter ~ 2 of 8
Example: If f ( x ; θ ) = I ⎛ 1 1⎞ ( x ) , then show that θ is a location parameter.
⎜θ − 2 , θ + 2 ⎟
⎝ ⎠

Solution:
Here, we have that

f ( x ; θ ) = I⎛ 1 1⎞ ( x)
⎜θ − 2 , θ + 2 ⎟
⎝ ⎠
1
= =1
1 1
θ + −θ −
2 2
= I⎛ 1 1⎞ ( x −θ )
⎜ − 2, 2 ⎟
⎝ ⎠

= h ( x −θ )

Hence, the distribution of ( X −θ ) is independent of θ . So, we can say that θ is a location parameter.

1 1
Example: If f ( x ; θ ) = , then show that θ is a location parameter.
π ⎡1 + ( x − θ )2 ⎤
⎢⎣ ⎥⎦
Solution:
1 1
Here, we have that f (x ; θ ) = = h( x −θ )
π ⎡1 + ( x − θ )2 ⎤
⎢⎣ ⎥⎦

Hence, the distribution of ( X −θ ) is independent of θ . So, we can say that θ is a location parameter.

Example: If f ( x ; θ ) = φθ , 9 ( x ) , then show that θ is a location parameter.

Solution:
Here, we have that
1
1 − ( x −θ )2
f ( x ; θ ) = φθ , 9 ( x ) = e 2×9
3 2π
= φ0, 9 ( x − θ ) = h ( x −θ )

Or, we can say that if X is distributed normally with mean θ and variance 9, then ( X −θ ) has a normal distribution

with mean 0 and variance 9. Hence, the distribution of ( X −θ ) is independent of θ .

So, we can say that θ is a location parameter.

Pitman estimator for location


Let, X 1 , X 2 , " , X n denote a random sample from the density f ( ⋅ ; θ ) , where θ is a location parameter. Then,

the estimator
n

∫θ ∏ f ( X i ;θ )dθ
t ( X1 , X 2 , " , X n ) = i =1
n

∫∏
i =1
f ( X i ;θ )dθ

is the estimator of θ which has uniformly smallest mean-squared error within the class of location-invariant
estimators.

The estimator given in the above equation is defined to be the pitman estimator location.
Pitman Estimator for Location Parameter ~ 3 of 8
Example: Let, X 1 , X 2 , " , X n be a random sample from a normal distribution with mean θ and the variance unity,
where θ is a location parameter. Find the pitman estimator of θ .
Solution:
We know that the pitman estimator for θ is given by

⎡ 1 2⎤
n
n
⎛ 1 ⎞ n

∫ θ ∏ f ( X i ;θ )dθ ∫θ ⎜ ⎟ exp ⎢ −
⎝ 2π ⎠
∑( Xi −θ ) ⎥ dθ
t ( X1 , X 2 , " , X n ) = i =1
= ⎣⎢ 2 i =1 ⎦⎥
⎡ 1 2⎤
n n
⎛ 1 ⎞ n
∫ ∏ f ( X i ;θ )dθ ⎜ ∫
⎟ exp ⎢ − ∑( Xi −θ ) ⎥ dθ
i =1 ⎝ 2π ⎠ ⎢⎣ 2 i =1 ⎥⎦
⎡ 1⎛ n n ⎞⎤
∫ ∑ ∑
θ exp ⎢ − ⎜⎜ X i2 − 2θ X i + nθ 2 ⎟⎟ ⎥ dθ
= ⎣⎢ 2 ⎝ i =1 i =1 ⎠ ⎦⎥
⎡ 1⎛ n 2 n ⎞⎤
∫ ∑ ∑
exp ⎢ − ⎜ X i − 2θ X i + nθ 2 ⎟ ⎥ dθ
⎜ ⎟⎥
⎣⎢ 2 ⎝ i =1 i =1 ⎠⎦


⎡ 1
(
θ exp ⎢ − −2nX nθ + nθ 2 ⎥ dθ
⎣ 2


)
=

⎡ 1
(
exp ⎢ − −2nX nθ + nθ 2 ⎥ dθ
⎣ 2


)
⎡ ⎛ ⎞ ⎤
2
⎢ ⎜ ⎟ ⎥
1 θ−X ⎥
θ exp ⎢ − ⎜
∫ ⎟ dθ
⎢ 2⎜ 1 ⎟ ⎥
⎡ n 2
∫ θ exp ⎢−
⎣ 2
( ⎤
θ − 2 X nθ + X 2 ⎥ dθ

) ⎢


⎜ ⎟ ⎥
⎝ n ⎠ ⎦⎥
= =
⎡ n
( ⎤
) ⎡ ⎞ ⎤
2
∫ exp ⎢ − θ 2 − 2 X nθ + X 2 ⎥ dθ
⎣ 2 ⎦


1 ⎜θ − X ⎟ ⎥



exp − ⎜
⎢ 2⎜ 1 ⎟ ⎥
⎟ dθ
⎢ ⎜ n ⎟ ⎥
⎢⎣ ⎝ ⎠ ⎥⎦
⎡ ⎛ ⎞ ⎤
2
⎢ ⎜ ⎟ ⎥
1 ⎢ 1 θ−X ⎥

θ
1
exp − ⎜
⎢ 2⎜ 1 ⎟ ⎥
⎟ dθ
2π ⎢ ⎜ ⎟ ⎥
n ⎢⎣ ⎝ n ⎠ ⎥⎦
=
⎡ ⎛ ⎞ ⎤
2
⎢ ⎥
1 ⎜θ − X ⎟ ⎥
exp ⎢ − ⎜
1
∫1 ⎢ 2⎜ 1 ⎟ ⎥
⎟ dθ
2π ⎢ ⎜ n ⎟ ⎥
n ⎣⎢ ⎝ ⎠ ⎦⎥
⇒ t ( X 1 , X 2 , " , X n ) = E (θ ) = X n

So, we can say that X n is a pitman estimator.

⎛ 1 1⎞
Example: Let, X 1 , X 2 , " , X n be a random sample from a uniform distribution over the interval ⎜ θ − , θ + ⎟ , where
⎝ 2 2⎠
θ is a location parameter. Find the pitman estimator for θ .

Solution:
We know that the pitman estimator for θ is given by
n

∫ θ ∏ I⎛ ( X i )dθ
n

∫θ ∏ f ( X i ;θ )dθ 1 1⎞
θ− ,θ+ ⎟
i =1 ⎜⎝ 2
t ( X1 , X 2 , " , X n ) = i =1 2⎠
n
= n

∫∏
i =1
f ( X i ;θ )dθ ∫∏ I ⎛ 1 1 ⎞ ( X i )dθ
i =1 ⎜
θ− ,θ+ ⎟
⎝ 2 2⎠

Pitman Estimator for Location Parameter ~ 4 of 8


⎡ 1 1⎤
n
⎢ θ− ≤ Y1 ≤ Y2 ≤ ........ ≤ Yn ≤ θ + ⎥
∫θ ∏ I⎛ 1
i =1 ⎜
1 ⎞ (θ )dθ
X − ,X + ⎟ ⎢
2 2

= ⎝
i
2
i
2⎠ ⎢That is 1
θ − ≤ Y1 ⇒ θ ≤ Y1 +
1 ⎥
n ⎢ 2 2 ⎥
∫∏ I⎛ 1
i =1 ⎜
1 ⎞ (θ )dθ
X − ,X + ⎟ i i

⎢ 1
Yn ≤ θ + ⇒ θ ≥ Yn −
1


⎝ 2 2⎠
⎢⎣ 2 2 ⎥⎦
1
Y1 + 1
2 Y1 +
⎡θ 2 ⎤
∫ 1 θ dθ
2
⎢ ⎥
Yn − ⎣⎢ 2 ⎦⎥ Yn −
1
= 2
1
= 1
2
Y1 + Y1 +
2
[θ ]Y −21
∫ 1 dθ n
2
Yn −
2
2 2
1⎛ 1⎞ ⎛ 1⎞
Y1 + ⎟ − ⎜ Yn − ⎟
2 ⎜⎝ 2⎠ ⎝ 2⎠
=
⎛ 1⎞ ⎛ 1⎞
⎜ Y1 + 2 ⎟ − ⎜ Yn − 2 ⎟
⎝ ⎠ ⎝ ⎠
1
⇒ t ( X 1 , X 2 , " , X n ) = (Y1 + Yn )
2

Theorem
A pitman estimator for location is a function of sufficient statistics.

Proof:
We know that if S1 = s1 ( X 1 , X 2 , " , X n ) , " , S k = sk ( X 1 , X 2 , " , X n ) is a set of sufficient statistics, then by the

factorization criterion
n
∏ f ( xi ;θ ) = g ( s1 , s2 , ", sk ; θ ) h ( x1 , x2 , ", xn )
i =1

So, the pitman estimator can be written as


n

∫θ ∏ f ( X i ; θ ) dθ
t ( X1, X 2 , " , X n ) = i =1
n

∫∏ f ( X
i =1
i ; θ ) dθ

∫θ g ( S , S , ", S ; θ ) h ( X , X , ", X ) dθ
=
1 2 k 1 2 n

∫ g ( S , S , ", S ; θ ) h ( X , X , ", X ) dθ
1 2 k 1 2 n

)= ∫
θ g ( S , S , ", S ; θ ) dθ
1 2 k
⇒ t ( X1, X 2 , " , X n
∫ g ( S , S , ", S ; θ ) dθ
1 2 k

The above is the function of the sufficient statistics. So, we can say that a pitman estimator is a function of the
sufficient statistics.

Example: Let, X 1 , X 2 , " , X n be a random sample from a normal distribution with mean θ and the variance 9, where θ
15
is a location parameter. Find the pitman estimator of θ when ∑ xi = 225 .
i =1
Pitman Estimator for Location Parameter ~ 5 of 8
Solution:
We know that the pitman estimator for θ is given by
n

∫ θ ∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n

∫∏
i =1
f ( X i ; θ ) dθ


1 ⎞
n
⎡ 1 n ⎛ X − θ ⎞2 ⎤
∫θ ⎜ π
⎟ exp ⎢− ⎜ 3 ⎟ ⎥ dθ
i

⎝ 3 2 ⎠ ⎢⎣ 2 i =1 ⎝ ⎠ ⎦⎥
=
⎛ 1 ⎞
n
⎡ 1 ⎛ Xi −θ ⎞ ⎤
n 2

⎜ ∫
⎝ 3 2π ⎠
⎟ exp ⎢ − ⎜
⎢⎣ 2 i =1 ⎝ 3 ⎠ ⎥⎦
⎟ ∑ ⎥ dθ

⎡ 1 ⎛ n 2 n
2⎞


θ exp ⎢ − ∑
⎜⎜ X i − 2θ X i + nθ ⎟⎟ ⎥ dθ ∑
= ⎣⎢ 2 × 9 ⎝ i =1 i =1 ⎠ ⎦⎥
⎡ 1 ⎛ n 2 n ⎞⎤

exp ⎢ − ∑
⎜⎜ X i − 2θ X i + nθ ⎟⎟ ⎥ dθ
2

⎣⎢ 2 × 9 ⎝ i =1 i =1 ⎠ ⎦⎥


⎡ n
θ exp ⎢ −
⎣ 2×9
(
−2 X nθ + θ 2 ⎥ dθ


)
=

⎡ n
exp ⎢ −
⎣ 2×9
(
−2 X nθ + θ 2 ⎥ dθ


)
⎡ ⎛ ⎞ ⎤
2
⎢ ⎜ ⎟ ⎥
1 θ−X ⎥
exp ⎢ − ⎜
1

θ
3 ⎢ 2⎜ 3 ⎟ ⎥
⎟ dθ
⎡ n
∫ θ exp ⎢ −
⎣ 2×9
( ⎤
θ 2 − 2 X nθ + X 2 ⎥ dθ

) n
2π ⎢


⎜ ⎟ ⎥
⎝ n ⎠ ⎦⎥
= =
⎡ n
( 2 ⎤
) ⎡ ⎞ ⎤
2
∫ exp ⎢ − θ − 2 X nθ + X ⎥ dθ ⎛
2
⎢ ⎥
⎣ 2×9 ⎦ 1 ⎜θ − X ⎟ ⎥
exp ⎢ − ⎜
1
∫ 3 ⎢ 2⎜ 3 ⎟ ⎥
⎟ dθ
2π ⎢ ⎜ n ⎟ ⎥
n ⎢⎣ ⎝ ⎠ ⎥⎦
⇒ t ( X1 , X 2 , " , X n ) = E (θ )
n
∑ Xi
1
= Xn =
n i =1
15
So, we can say that X n is a pitman estimator. Now, when ∑ xi = 225 , then the pitman estimator for θ is given by
i =1
15
∑ xi =
1 225
xn = = 15
15 i =1 15

Exercise: Let, X 1 , X 2 , " , X n be a random sample from the density

f ( x ; θ ) = e ( ) I(θ ,∞ ) ( x )
− x −θ
for − ∞ < θ < ∞

Then, find the pitman estimator for the location parameter θ .

Solution:
We know that the pitman estimator for θ is given by
n n

∫θ ∏ f ( X i ; θ ) dθ ∫θ ∏ exp ⎡⎣ − ( X i − θ ) ⎤⎦I (θ ,∞ ) ( x ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
= i =1
n

∫∏
i =1
f ( X i ; θ ) dθ ∫∏
i =1
exp ⎡⎣ − ( X i − θ ) ⎤⎦I (θ ,∞ ) ( x ) dθ

Pitman Estimator for Location Parameter ~ 6 of 8


⎡ ⎤ ⎡ ⎤
Y1 n Y1 n

∫ θ exp ⎢−∑ ( X i − θ )⎥ dθ
⎢⎣ i =1 ⎥⎦
∫ θ exp ⎢⎣⎢−∑
i =1
X i + nθ ⎥ dθ
⎦⎥
−∞ −∞
= =
Y1
⎡ n ⎤ Y1
⎡ n ⎤
∫ exp ⎢⎣⎢−∑ i =1
( X i − θ ) ⎥ dθ
⎦⎥
∫ exp ⎢⎣⎢−∑
i =1
X i + nθ ⎥ dθ
⎦⎥
−∞ −∞
Y1
enθ e nθ
Y1 Y1

∫ θ e dθ ∫

θ − dθ
n −∞
n
−∞ −∞
= Y1
= Y1
enθ
∫ enθ dθ
n
−∞ −∞
nY1
Y1 e 1 nY1
− e
∴ t ( X1 , X 2 , " , X n ) =
n n2 = Y1 −
1
nY1
e n
n

Exercise: Let, X 1 , X 2 , " , X n be a random sample from the density


1− x
f ( x ; θ ) = θ x (1 − θ ) for x = 0, 1 and 0 <θ <1

Then, find the pitman estimator for the location parameter θ .

Solution:
We know that the pitman estimator for θ is given by
n n

∫ θ ∏ f ( X i ; θ ) dθ ∫ θ ∏θ (1 − θ )
1− x
x

t ( X1 , X 2 , " , X n ) = i =1
n
= i =1
n

∫∏ ∫∏
1− x
f ( X i ; θ ) dθ θ x (1 − θ ) dθ
i =1 i =1
n
1 ∑ Xi n

∫ θθ i =1
(1 − θ )n −∑ Xi =1
i

= 0
n
1 ∑ Xi n

∫θ i =1
(1 − θ )n −∑ X i =1
i

0
n
1 ∑ X i +1 n

∫ θ i=1 (1 − θ )n −∑ Xi =1
i

= 0
n
1 ∑ Xi n

∫θ i =1
(1 − θ )n −∑ X
i =1
i

0
n
1 ∑ X i + 2−1 n
n − ∑ X i +1−1 ⎛ n n ⎞
∫θ i =1
(1 − θ ) i =1 dθ β ⎜⎜ ∑ X i + 2, n − ∑ X i + 1⎟⎟
= 0
= ⎝ i =1 i =1 ⎠
⎛ n ⎞
n
n
∑ X i +1−1
∑ ∑
1 n
β ⎜⎜ X i + 1, n − X i + 1⎟⎟
∫θ i =1
(1 − θ )n −∑ X +1−1 dθ
i =1
i

⎝ i =1 i =1 ⎠
0
n n
∑ Xi + 2 n − ∑ Xi +1 n+2
i =1 i =1
= ×
n+3 n n
∑ Xi +1 n − ∑ Xi +1
i =1 i =1
n
∑ Xi +1
∴ t ( X1 , X 2 , " , X n ) = i =1
n+2

Pitman Estimator for Location Parameter ~ 7 of 8


Exercise: Let, X 1 , X 2 , " , X n be a random sample from the density

f ( x ; θ ) = θ e −θ for x>0 and θ >0


Then, find the pitman estimator for the location parameter θ .

Solution:
We know that the pitman estimator for θ is given by
n

∫ θ ∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n

∫∏
i =1
f ( X i ; θ ) dθ

∞ n

∫θ ∏
−θ
θe dθ
i =1
= 0
∞ n

∫∏
−θ
θe dθ
0 i =1

∫ θθ
n − nθ
e dθ
= 0

∫θ
n − nθ
e dθ
0

∫θ
n +1 − nθ
e dθ
= 0

∫θ
n − nθ
e dθ
0

∫e
− nθ
θ n + 2 −1dθ
= 0

∫e
− nθ
θ n +1−1dθ
0

n+2
n+2
= n
n +1
n n +1
n +1
=
n
1
∴ t ( X1 , X 2 , " , X n ) = 1 +
n

Pitman Estimator for Location Parameter ~ 8 of 8


Pitman Estimator for Scale Parameter

Scale invariant
An estimator T = t ( X 1 , X 2 , " , X n ) is defined to be scale invariant if and only if

t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )

for all values x1 , x2 , " , xn and all c > 0 .

n
∑ Xi
i =1
Example: Show that X n = is scale invariant.
n
Solution:
n
∑ xi
Let, t ( x1 , x2 , " , xn ) = xn = i =1
n

Then we have that


n
∑ cxi
t ( cx1 , cx2 , " , cxn ) = i =1
n
n
∑ xi
i =1
=c
n
⇒ t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )

So, we can say that X n is scale invariant.

Y1 + Yn
Example: Show that is scale invariant where Y1 is the smallest order statistics and Yn is the largest order
2
statistics.

Solution:

y1 + yn min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn )


Let, t ( x1 , x2 , " , xn ) = =
2 2
Then we have that,

min ( cx1 , cx2 , " , cxn ) + max ( cx1 , cx2 , " , cxn )
t ( cx1 , cx2 , " , cxn ) =
2
min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn )
=c
2
⇒ t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )

Y1 + Yn
So, we can say that is scale invariant.
2
Pitman Estimator for Scale Parameter ~ 1 of 5
1 n
(
∑ Xi − X n )
2
Example: Show that s2 = is scale invariant.
n − 1 i =1
Solution:
1 n
t ( x1 , x2 , " , xn ) = s 2 = ∑ ( xi − xn )
2
Let,
n − 1 i =1
2
1 n ⎛ n ⎞
∑ ∑
1
Then we have that t ( cx1 , cx2 , " , cxn ) = ⎜⎜ cxi − cxi ⎟
n − 1 i =1 ⎝ n ⎟
i =1 ⎠
1 n
=c ∑
n − 1 i =1
( xi − xn )2
⇒ t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )

1 n
(
∑ Xi − X n )
2
So, we can say that s2 = is scale invariant.
n − 1 i =1

Example: Show that Yn − Y1 is not scale invariant.


Solution:
Let, t ( x1 , x2 , " , xn ) = Yn − Y1 = max ( x1 , x2 , " , xn ) − min ( x1 , x2 , " , xn )
Then we have that
t ( cx1 , cx2 , " , cxn ) = max ( cx1 , cx2 , " , cxn ) − min ( cx1 , cx2 , " , cxn )
= c ⎡⎣ max ( x1 , x2 , " , xn ) − min ( x1 , x2 , " , xn ) ⎤⎦
⇒ t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )

So, we can say that Yn − Y1 is scale invariant.

Location Parameter
Let { f (⋅ ; θ ) ; θ > 0} be a family of densities indexed by a real parameter θ . The parameter θ is defined to be a

X 1 ⎛x⎞
scale parameter if and only if the density f ( x ; θ ) can be written as a function of . That is, f ( x ; θ ) =
θ ⎜⎝ θ ⎟⎠
h
θ
for some function h ( ⋅) . Equivalently θ is a scale parameter for the density f X ( x ; θ ) of a random variable X if
X
and only if the distribution of does not depend on θ .
θ

We note that if θ is a scale parameter for the family of densities { f (⋅ ; θ ) ; θ > 0} , then the function h (⋅) of the

definition is a density function given by h ( x ) = f ( x ; 1) .

Example: If f ( x ; θ ) = φ0, σ 2 ( x ) , then show that θ is a scale parameter.

Solution:
2
1⎛ x ⎞
1 − ⎜ ⎟
Here, we have that f ( x ; θ ) = φ0, σ 2 ( x ) = e 2⎝ σ ⎠
σ 2π
⎛x⎞ ⎛x⎞
= φ0, 1 ⎜ ⎟ = h⎜ ⎟
⎝σ ⎠ ⎝σ ⎠
X
Or, we can say that if X is distributed normally with mean 0 and variance σ 2 , then has a standard normal
θ
X
distribution. Hence, the distribution of is independent of θ .
θ
So, we can say that θ is a scale parameter.

Pitman Estimator for Scale Parameter ~ 2 of 5


1
Example: If f ( x ; θ ) = I ( x ) , then show that θ is a scale parameter.
θ (θ , 2θ )
Solution:
1
Here, we have that f (x ; θ ) = I ( x)
θ (θ , 2θ )
1 ⎛x⎞ ⎛x⎞
= I (1, 2 ) ⎜ ⎟ = h⎜ ⎟
θ ⎝θ ⎠ ⎝θ ⎠
X
Hence, the distribution of is independent of θ . So, we can say that θ is a scale parameter.
θ

x
1 −
Example: If f ( x ; θ ) = e θ I ( 0, ∞ ) ( x ) , then show that θ is a scale parameter.
θ
Solution:
x
1 − ⎛x⎞
Here, we have that f (x ; θ ) = e θ I ( 0, ∞ ) ( x ) = h ⎜ ⎟
θ ⎝θ ⎠
X
Hence, the distribution of is independent of θ . So, we can say that θ is a scale parameter.
θ

1
Example: If f ( x ; θ ) = I ( x ) , then show that θ is a scale parameter.
θ ( 0, θ )
Solution:
Here, we have that
1
f (x ; θ ) = I ( x)
θ ( 0, θ )
1 ⎛x⎞ ⎛x⎞
= I ( 0, 1) ⎜ ⎟ = h⎜ ⎟
θ ⎝θ ⎠ ⎝θ ⎠
X
Hence, the distribution of is independent of θ . So, we can say that θ is a scale parameter.
θ

Pitman estimator for Scale


Let, X 1 , X 2 , " , X n denote a random sample from the density f ( ⋅ ; θ ) , where θ > 0 is a scale parameter.

Assume that f ( x ; θ ) = 0 for x ≤ 0 . That is, the random variable X i assume only the positive values. Then, the

estimator
n
⎛ 1 ⎞
∫ ⎜ 2⎟
⎝θ ⎠
∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
⎛ 1 ⎞
∫ ⎜⎝ θ 3 ⎟⎠ ∏
i =1
f ( X i ; θ ) dθ

is the estimator of θ which has uniformly smallest risk within the class of scale-invariant estimators for the loss

( t − θ )2
function l ( t ; θ ) = .
θ2

The estimator given in the above equation is defined to be the pitman estimator for scale.

Pitman Estimator for Scale Parameter ~ 3 of 5


Example: Let, X 1 , X 2 , " , X n be a random sample from a density function.

1
f (x ; θ ) = I ( x)
θ ( 0, θ )
Find the pitman estimator of θ for the scale parameter.

Solution:
We know that the pitman estimator for θ is given by
n
⎛ 1 ⎞
∫ ⎜⎝ θ 2 ⎟⎠ ∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
⎛ 1 ⎞
∫ ⎜⎝ θ 3 ⎟⎠ ∏
i =1
f ( X i ; θ ) dθ

∞ n
⎛ 1 ⎞
n 1 ⎛1⎞
∫ ⎜ 2⎟ ∏θ
1
I ( 0, θ ) ( X i ) dθ ∫ θ 2 ⎜⎝ θ ⎟⎠ dθ
⎝θ ⎠ i =1 Yn
= n
= ∞
⎛ 1 ⎞
n
1 ⎛1⎞
∏θ
1
∫ I ( 0, θ ) ( X i ) dθ
⎜ 3⎟
⎝θ ⎠ i =1
∫ ⎜ ⎟ dθ
θ3 ⎝θ ⎠
Yn

∞ ⎡ θ −( n + 2 ) +1 ⎤
−( n + 2 )
∫θ dθ ⎢ ⎥
⎢⎣ − ( n + 2 ) + 1 ⎥⎦Y
Yn
= ∞
= n

− ( n + 3) ⎡ θ −( n + 3) +1 ⎤
∫θ dθ ⎢ ⎥
Yn ⎢⎣ − ( n + 3) + 1 ⎥⎦Y
n

1
− ( n + 1)
0 − Yn− n −1( )
=
1
− ( n + 2)
(
0 − Yn− n − 2 )
n+2
∴ t ( X1 , X 2 , " , X n ) = × Yn
n +1
n
Note: We know that Yn is a complete sufficient statistic and E (Yn ) = θ . So, by the Lehmann-Scheffe theorem
n +1
n +1
Yn is the UMVUE of θ .
n

Example: Let, X 1 , X 2 , " , X n be a random sample from a density function.


x
1 −
f (x ; θ ) = e θ I ( 0, ∞ ) ( x )
θ
Find the pitman estimator of θ for the scale parameter.

Solution:
We know that the pitman estimator of θ for scale parameter is given by
n
⎛ 1 ⎞
∫ ⎜⎝ θ 2 ⎟⎠ ∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
.
⎛ 1 ⎞
∫ ⎜ 3⎟
⎝θ ⎠
∏ f ( X i ; θ ) dθ
i =1
n x
⎛ 1 ⎞ −
∫ ⎜⎝ θ 2 ⎟⎠ ∏
1
e θ I ( 0, ∞ ) ( x ) dθ
i =1
θ
= x
n
⎛ 1 ⎞ −
∏θ
1
∫ ⎜ 3⎟ e θ I ( 0, ∞ ) ( x ) dθ
⎝θ ⎠ i =1

Pitman Estimator for Scale Parameter ~ 4 of 5


⎛ n ⎞ ⎡ ⎤
∞ ⎜ ∑ X i ⎟⎟ ⎢ Let , ⎥
θ ( ) exp ⎜⎜ −
− n+2
∫ i =1
⎟ dθ
θ
⎢ n
⎢ n


0 ⎜






⎢ Xi ∑ Xi ⎥
= ⎢ i =1 =Z ⇒ θ = i =1 ⎥
⎛ n ⎞ ⎢ θ Z ⎥
∞ ⎜ ∑ Xi ⎟ ⎢ n ⎥
∫θ ( ) exp ⎜⎜ −
− n +3 i =1
θ ⎟
⎟ dθ ⎢
⎢ ∑Xi ⎥

0 ⎜ ⎟ ⎢ ⇒ dθ = − i =1 2 ⎥
⎜ ⎟ ⎢⎣ Z ⎥⎦
⎝ ⎠
−( n + 2 )
⎛ n ⎞ ⎛ n ⎞
∞⎜

∑ Xi ⎟

⎜ Xi ∑ ⎟
⎜∫ i =1
e− Z ⎜⎜ − i =1 2 ⎟ dz
0⎜
Z ⎟ Z ⎟
⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟
= ⎝ ⎠ ⎝ ⎠
− ( n + 3)
⎛ n ⎞ ⎛ n ⎞
∞⎜

∑ Xi ⎟


⎜ − i =1

Xi ⎟
⎟ dz
⎜∫ i =1
Z ⎟
e− Z ⎜ Z2 ⎟
0⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠

⎛ n ⎞ 0 ∫
Z n e− Z dz

⎜ ∑
= ⎜ Xi ⎟ ∞

⎝ i =1 ⎠

Z n +1e− Z dz
0

⎞∫
−Z
e Z n +1−1dz
⎛ n

⎜ ∑
= ⎜ X i ⎟ ∞0

⎝ i =1 ⎠
∫e
−Z
Z n + 2 −1dz
0

⎛n ⎞ n +1

= ⎜ Xi ⎟
⎜ ⎟
⎝ i =1 ⎠ n + 2
⎛ n ⎞

⎜⎜ X i ⎟⎟
t ( X1 , X 2 , " , X n ) = ⎝ ⎠
i =1

n +1

n n
∑ Xi ∑ Xi
i =1 i =1
Note: It can be shown that UMVUE of θ is . Again note that is a scale invariant estimator and hence
n n
n
∑ Xi ( t − θ )2
i =1
is a scale-invariant estimator having uniformly smallest risk for the loss function l ( t ; θ ) = , the risk
n +1 θ2
n n
∑ Xi ∑ Xi 1
i =1 i =1
of is uniformly smaller than the risk of . Also, since here risk equals times the MSE , the MSE
n +1 n θ2
n n
∑ Xi ∑ Xi
i =1 i =1
of is uniformly smaller than the MSE of .
n +1 n

Pitman Estimator for Scale Parameter ~ 5 of 5


Bayes and Minimax Estimation

Decision Function
A decision function δ ( x ) is a statistic that takes values in D , that is, δ is a Borel measurable function that maps

R n into D .

Elements of Decision Function


The elements of decision are
• Choices available or alternatives or options.
• States of nature.
• Payoffs

Prior Distribution
Let f (θ ) be the probability distribution of the parameter θ which is also summarizes the objective information

about θ prior to obtaining sample observation. We will choose f (θ ) with sampler variance, so that f (θ ) is the

prior distribution of θ .

Posterior Distribution
Consider a random variable X and the distribution of X is denoted by f ( x | θ ) . This distribution depends on θ ,

where θ is unknown parameter.

Let x1 , x2 , " , xn be a random sample, then the joint distribution can be written as

f ( x1, x2 , " , xn | θ ) = f ( x1 | θ )" f ( xn | θ )

The posterior distribution of θ as the conditional distribution of θ given the sample values or sample measures. So,

f ( x1 , x2 , " , xn , θ ) Where,
f (θ x1 , x2 , " , xn ) =
f ( x1 , x2 , " , xn ) f ( x1 , x2 , " , xn , θ ) = joint distribution
f ( x1 , x2 , " , xn | θ ) f (θ ) of sample & θ
=
f ( x1 , x2 , " , xn ) = f (θ ) f ( x1 , x2 , " , xn | θ )

(
Thus f θ x1 , x2 , " , xn ) is known as the posterior distribution of θ .

Example: A time failure of a transistor is known to be exponentially distributed with parameter θ having the density
function:

f ( x θ ) = θ e−θ x ; x>0

Assume that the prior distribution of Θ is given by

g Θ (θ ) = ke − kθ ; θ >0

That is, Θ is also exponentially distributed over the interval ( 0, ∞ ) . Find the posterior distribution of Θ .

Bayes and Minimax Estimation ~ 1 of 20


Solution:
We know that the posterior distribution of Θ is given by

f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =
f X1 , X 2 , ", X n ( x1 , x2 , " , xn )
n
∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )
i =1
= n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦
n ⎛ n ⎞
−θ ∑ xi −θ ⎜ ∑ xi + k ⎟
− kθ ⎜ ⎟
θ e n i =1
ke θ n
e ⎝ i =1 ⎠
= =
n
⎛ n ⎞
∞ −θ ∑ xi ∞ −θ ⎜ ∑ xi + k ⎟
⎜ ⎟
∫θ
− kθ

∫θ
n i =1 n ⎝ i =1 ⎠ dθ
e ke e
0 0
⎛ n ⎞
−θ ⎜ ∑ xi + k ⎟
⎜ ⎟
θ e
n ⎝ i =1 ⎠
=
⎛ n ⎞
∞ − ⎜ ∑ xi + k ⎟θ
⎜ ⎟
∫e ⎝ i =1 ⎠ θ n +1−1dθ

0
n +1
⎛ n ⎞ ⎛ n ⎞
θ e
n
−θ ⎜ ∑ xi + k ⎟

⎝ i =1



⎜⎜ xi + k ⎟⎟ ⎛ n ⎞
− ⎜ ∑ xi + k ⎟θ
⎜ ⎟
=⎝ ⎠
i =1
= e ⎝ i =1 ⎠ θ n +1−1
n +1 n +1
n +1
⎛ n ⎞

⎜⎜ xi + k ⎟⎟
⎝ i =1 ⎠
⎛ n ⎞
⇒ f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = Gamma ⎜ n + 1, xi + k ⎟ ∑ ; θ ≥0
⎜ ⎟
⎝ i =1 ⎠

Example: Let X 1 , X 2 , " , X n denote a random sample from normal distribution with the density
⎡ 1 2⎤
f (x θ ) =
1
exp ⎢ − ( x − θ ) ⎥ ; −∞ ≤ x ≤ ∞
2π ⎣ 2 ⎦
Assume that the prior distribution of Θ is given by

1 ⎡ 1 ⎤
g Θ (θ ) = exp ⎢ − θ 2 ⎥ ; −∞ ≤θ ≤ ∞
2π ⎣ 2 ⎦
That is, Θ is standard normal. Find the posterior distribution of Θ .

Solution:
We know that the posterior distribution of Θ is given by

f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =
f X1 , X 2 , ", X n ( x1 , x2 , " , xn )
n
∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )
i =1
= n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦

⎡ 1 n 2⎤
n
⎛ 1 ⎞ ⎡ 1 ⎤
⎟ exp ⎢ − ∑ ( xi − θ ) ⎥
1
⎜ exp ⎢ − θ 2 ⎥
⎝ 2π ⎠ ⎢⎣ 2 i =1 ⎦⎥ 2π ⎣ 2 ⎦
=

⎡ ⎤
n
⎛ 1 ⎞ n
⎡ 1 ⎤
∫ ⎜⎝ 2π ⎟⎠ exp ⎢⎢⎣− 2 ∑
1 1
( xi − θ )2 ⎥ exp ⎢ − θ 2 ⎥ dθ
−∞ i =1 ⎥⎦ 2π ⎣ 2 ⎦

Bayes and Minimax Estimation ~ 2 of 20


⎡ 1⎛ n n ⎞⎤
∑ ∑
exp ⎢ − ⎜ xi2 − 2θ xi + nθ 2 + θ 2 ⎟ ⎥
⎜ ⎟⎥
= ⎣⎢ 2 ⎝ i =1 i =1 ⎠⎦
∞ ⎡ 1⎛ n n ⎞⎤
∫ exp ⎢⎢− 2 ⎜⎜⎝ ∑ xi2 − 2θ ∑ xi + nθ 2 + θ 2 ⎟ ⎥ dθ
⎟⎥

−∞ ⎣ i =1 i =1 ⎦
⎡ 1
⎣ 2
( ⎤
exp ⎢ − ( n + 1)θ 2 − 2θ nx ⎥

)
= ∞

∫ exp ⎢⎣− 2 ( ( n + 1)θ )


⎡ 1 ⎤
2
− 2θ nx ⎥ dθ
−∞ ⎦
⎡ n +1 ⎛ 2 n ⎞⎤
exp ⎢ − ⎜ θ − 2θ n + 1 x ⎟ ⎥
⎣ 2 ⎝ ⎠ ⎦
= ∞
⎡ n +1 ⎛ 2 n ⎞⎤
∫exp ⎢ −
⎣ 2 ⎝
⎜ θ − 2θ n + 1 x ⎟ ⎥ dθ
⎠⎦
−∞

⎡ n +1 ⎛ n ⎛ n ⎞ ⎞⎤
2
exp ⎢ − ⎜ θ 2 − 2θ x +⎜ x ⎟ ⎟⎥
⎢⎣ 2 ⎜⎝ n +1 ⎝ n + 1 ⎠ ⎟⎠ ⎥⎦
=
∞ ⎡ n +1 ⎛ n ⎛ n ⎞ ⎞⎤
2

∫exp ⎢ −
⎢⎣ 2 ⎜⎝
⎜ θ 2 − 2θ
n +1
x +⎜ x ⎟ ⎟ ⎥ dθ
⎝ n + 1 ⎠ ⎟⎠ ⎥⎦
−∞

⎡ ⎛ n ⎞ ⎤
2
⎢ θ− x⎟ ⎥
1⎜ n +1 ⎟ ⎥
exp ⎢ − ⎜
⎢ 2⎜ 1 ⎟ ⎥
⎢ ⎜ ⎟ ⎥
⎢⎣ ⎝ n + 1 ⎠ ⎥⎦
=
⎡ ⎛ n ⎞ ⎤
2
⎢ θ − ⎟ ⎥
1⎜
∞ x
exp ⎢ − ⎜ n + 1 ⎟ ⎥ dθ
∫ ⎢ 2⎜ 1 ⎟ ⎥
−∞ ⎢ ⎜ ⎥
⎣⎢ ⎝ n + 1 ⎟⎠ ⎦⎥

⎡ ⎛ n ⎞ ⎤
2
⎢ θ − ⎟ ⎥
1⎜
x
1
exp ⎢ − ⎜ n +1 ⎟ ⎥
1 ⎢ 2⎜ 1 ⎟ ⎥
2π ⎢ ⎜ ⎥
n +1 ⎣⎢ ⎝ n + 1 ⎟⎠ ⎦⎥
=
⎡ ⎛ n ⎞ ⎤
2
∞ ⎢ ⎜ θ− x⎟ ⎥
1
exp ⎢ − ⎜
1 n + 1 ⎟ ⎥ dθ
∫ 1 ⎢ 2⎜ 1 ⎟ ⎥
−∞ 2π ⎢ ⎜ ⎥
n +1 ⎣⎢ ⎝ n + 1 ⎟⎠ ⎦⎥
⎛ 1 ⎞
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = N ⎜ θ ;
n
⇒ −∞ ≤θ ≤ ∞
n + 1 n + 1 ⎟⎠
x, ;

Example:
Let X 1 , X 2 , " , X n denote a random sample from Poisson distribution with the density

e −θ θ x
f (x θ ) = ; x = 0, 1, " , ∞
x!
Assume that the prior distribution of Θ is given by

⎛1 ⎞
⎜ ⎟ − 1θ
β
g Θ (θ ) = ⎝ ⎠ e β θ α −1 ; θ >0
α
That is, Θ is standard normal. Find the posterior distribution of Θ .

Bayes and Minimax Estimation ~ 3 of 20


Solution:
We know that the posterior distribution of Θ is given by

f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =
f X1 , X 2 , ", X n ( x1 , x2 , " , xn )
n
∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )
i =1
= n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫∏
i =1
⎣ ⎦

(× 1β ) e
n

− nθ
∑ xi 1
− θ
e θ i=1 β
θ α −1
n
α
∏ ( x !)
i =1
=
( 1β ) e
n

∞ − nθ
∑ xi 1
− θ
e θ i =1

∫ θ α −1dθ
β
×
n
α
0
∏ ( x !)
i =1
⎛ 1⎞ ⎛ n ⎞ ⎛ 1⎞ ⎛ n ⎞
− ⎜ n + ⎟θ ⎜⎜ ∑ xi +α −1⎟⎟ −⎜ n + ⎟θ ⎜⎜ ∑ xi +α −1⎟⎟
β ⎠ ⎝ i =1 β ⎠ ⎝ i=1
e ⎝ θ ⎠ e ⎝ θ ⎠
= =
⎛ n ⎞
∞ −⎛⎜ n + 1 ⎞⎟θ ⎜ ∑ xi +α −1⎟
n

∫e ⎝ β ⎠ ⎝⎜ i=1
θ

⎠ dθ ∑ xi + α
i =1
0 n

( n + 1β ) ∑ xi +α
i =1

=
( n + 1β ) ∑ xi +α
i =1

e


1⎞ ⎛ n
β ⎠ ⎝ i=1
θ

−⎜ n + ⎟θ ⎜⎜ ∑ xi +α −1⎟⎟
⎠ ; θ >0
n
∑ xi + α
i =1


⎛ n
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = Gamma ⎜ xi + α , n + 1 ⎟

⎝ i =1

β ⎟

∑ ( )
Posterior Bayes estimator
Let X 1 , X 2 , " , X n be a random sample from a density f ( x | θ ) , where θ is the value of a random variable Θ

with known density g Θ ( ⋅) . The posterior Bayes estimator of τ (θ ) with respect to the prior density g Θ ( ⋅) is defined

to be

E ⎡⎣τ (θ ) | X1 , X 2 , " , X n ⎤⎦

Here, it is given that

E ⎡⎣τ (θ ) | X 1 = x1 , " , X n = xn ⎤⎦ = ∫ τ (θ ) f Θ| X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ


n

∫ τ (θ ) ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) dθ
i =1
= n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦

One might note the similarity between the posterior Bayes estimator of τ (θ ) = θ and the Pitman estimator of a

location parameter.

Bayes and Minimax Estimation ~ 4 of 20


Example:
Let X 1 , X 2 , " , X n denote a random sample from Bernoulli density

f ( x θ ) = θ x (1 − θ )
1− x
I ( 0, 1) ( x ) for 0 ≤θ ≤1

Assume that the prior distribution of Θ is given by

g Θ (θ ) = I( 0, 1) (θ )

That is, Θ is uniformly distributed over the interval ( 0,1) . Find the posterior distribution of Θ and find the Bayes

estimator of θ and θ (1 − θ ) .

Solution:
We know that the posterior distribution of Θ is given by

f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =
f X1 , X 2 , ", X n ( x1 , x2 , " , xn )
n n

∏ ⎡ f ( xi θ ) ⎤ gΘ (θ ) ∑ xi n

i =1
⎣ ⎦ θ i=1 (1 − θ )n−∑
i =1
xi
I ( 0, 1) (θ )
= n
= n

⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ∑ xi
∫∏
n
n − ∑ xi
i =1
⎣ ⎦ ∫θ i =1
(1 − θ ) i =1 I ( 0, 1) (θ ) dθ
n
∑ xi +1−1 n

θ i=1 (1 − θ )n −∑
i =1
xi +1−1
=
⎛ n n ⎞
β ⎜⎜ ∑ xi + 1, n − ∑ xi + 1⎟⎟
⎝ i =1 i =1 ⎠
⎛ n n ⎞
⇒ f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = Beta 1st ⎜⎜θ ; ∑ xi + 1, n − ∑ xi + 1⎟⎟ ; 0 ≤θ ≤1
⎝ i =1 i =1 ⎠
Again, we have that the posterior Bayes estimator of θ with respect to the prior distribution g Θ (θ ) = I ( 0, 1) (θ ) is

given by

E ⎡⎣τ (θ ) X1 = x1 , " , X n = xn ⎤⎦ = τ (θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ



= ∫ θ f Θ X = x , ", X = x (θ1 1 n n
x1 , x2 , " , xn ) dθ
n n

∫ θ ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )dθ ∑ xi n

i =1 ∫ θθ i=1 (1 − θ )n −∑ x i =1
i
I ( 0, 1) (θ ) dθ
= n
= n

⎡ f ( xi θ ) ⎤ gΘ (θ )dθ ∑ xi
∫∏
n
n − ∑ xi
i =1
⎣ ⎦ ∫ θ i =1
(1 − θ ) i =1 I ( 0, 1) (θ ) dθ
n
1 ∑ xi +1 n

∫ θ i=1 (1 − θ )n −∑ x i =1
i

= 0
n
1 ∑ xi n
n − ∑ xi
∫θ i =1
(1 − θ ) i =1 dθ
0

⎛ n n ⎞ ⎛ n ⎞
β ⎜⎜ ∑ xi + 2, n − ∑ xi + 1⎟⎟ ∑
⎜⎜ xi + 1⎟⎟ n + 2
= ⎝ ⎠ =⎝ ⎠
i =1 i =1 i =1
⎛ n n ⎞ n+3

β ⎜⎜ xi + 1, n − xi + 1⎟⎟ ∑
⎝ i =1 i =1 ⎠
n
∑ xi + 1
∴ E ⎡⎣τ (θ ) X1 = x1 , " , X n = xn ⎤⎦ = i =1
n+2

Bayes and Minimax Estimation ~ 5 of 20


n
∑ xi + 1
i =1
Hence, the posterior Bayes estimator of θ with respect to the uniform prior distribution is given by .
n+2
n n
∑ xi ∑ xi
i =1 i =1
Contrast this to the maximum likelihood estimator of θ , which is . We know that is unbiased and
n n
UMVUE , whereas the posterior Bayes estimator is not unbiased.

Again, we have that the posterior Bayes estimator of θ (1 − θ ) with respect to the prior distribution

g Θ (θ ) = I ( 0,1) (θ ) is given by

E ⎡⎣τ (θ ) X1 = x1 , " , X n = xn ⎤⎦ = ∫ τ (θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ

= ∫ θ (1 − θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ
n
⎡ f ( xi θ ) ⎤ gΘ (θ )dθ
∫ θ (1 − θ ) ∏
i =1
⎣ ⎦ ∫ θ (1 − θ ) f X , X , ", X ,Θ ( x1 , x2 , ", xn , θ ) dθ
= =
1 2 n

∫ f X , X , ", X ,Θ ( x1 , x2 , ", xn , θ ) dθ
n

∫ ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )dθ
i =1
1 2 n

n
∑ xi n
n − ∑ xi

=
∫ θ (1 − θ )θ i =1
(1 − θ ) i =1 I ( 0, 1) (θ ) dθ
n
∑ xi n
n − ∑ xi
∫ θ i =1
(1 − θ ) i =1 I ( 0, 1) (θ ) dθ
n
1 ∑ xi +1 n
⎛ n n ⎞
∫ θ i=1 (1 − θ )n −∑
i =1
xi +1
dθ β ⎜⎜ ∑ xi + 2, n − ∑ xi + 2 ⎟⎟
= 0
= ⎝ i =1 i =1 ⎠
⎛ n ⎞
n n
1 ∑ xi n
β ⎜⎜ ∑ xi + 1, n − ∑ xi + 1⎟⎟
∫θ i =1
(1 − θ )n −∑ xi =1
i

⎝ i =1 i =1 ⎠
0

⎛ n ⎞ n
⎜⎜ ∑ xi + 1⎟⎟ n − ∑ xi + 1 n + 2
=⎝ ⎠
i =1 i =1
n+4
⎛ n ⎞ n
⎜⎜ ∑ xi + 1⎟⎟ n − ∑ xi + 1
E ⎡⎣τ (θ ) X 1 = x1 , " , X n = xn ⎤⎦ = ⎝ ⎠
i =1 i =1

( n + 3)( n + 2 )
Hence, the posterior Bayes estimator of θ (1 − θ ) with respect to the uniform prior distribution is given by

⎛ n ⎞ n
⎜⎜ ∑ xi + 1⎟⎟ n − ∑ xi + 1
⎝ i =1 ⎠ i =1
. We noted in the above example that the posterior Bayes estimator that we obtained was
( n + 3)( n + 2 )
not unbiased.

The following remark states that in general a posterior Bayes estimator is not unbiased.

Remark: Let TG* = tG* ( X1 , X 2 , " , X n ) denote the posterior Bayes estimator of τ (θ ) with respect to a prior distribution

G ( ⋅) . If both TG* and τ ( Θ ) have finite variance, then either

var ⎡TG* θ ⎤ = 0
⎣ ⎦

Bayes and Minimax Estimation ~ 6 of 20


Or, TG* is not an unbiased estimator of τ (θ ) . That is, either TG* estimates τ (θ ) correctly with probability 1 or TG* is

not an unbiased estimator.

Proof:
Let us suppose that TG is an unbiased estimator of τ (θ ) . That is
*

( )
E TG* θ = τ (θ )

By the definition, we have that

TG* = tG* ( X 1 , X 2 , " , X n ) = E ⎡⎣τ (θ ) X 1 , X 2 , " , X n ⎤⎦

Now, we have that

( ) ⎣ ⎣⎢ ⎥⎦ ⎦ ( )
Var TG* = E ⎡⎢Var ⎡ TG* Θ ⎤ ⎤⎥ + Var ⎡⎢ E ⎡ TG* Θ ⎤ ⎤⎥
⎣ ⎣⎢ ⎥⎦ ⎦ ( )
⎣ ⎢⎣ ( )
= E ⎢⎡Var ⎡ TG* Θ ⎤ ⎥⎤ + Var ⎡⎣τ (θ ) ⎤⎦
⎥⎦ ⎦
" " " (1)
And

Var ⎡⎣τ ( Θ ) ⎤⎦ = E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤ + Var ⎡ E ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤


⎣ ⎦ ⎣ ⎦
= E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤ + Var ⎡TG* ⎤
⎣ ⎦ ⎣ ⎦
= Var ⎡⎣τ ( Θ ) ⎤⎦ − E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤ " " " ( 2)
⎣ ⎦
Now, from equation (1) and ( 2 ) we have that


( )
E ⎢⎡Var ⎡ TG* Θ ⎤ ⎥⎤ + Var ⎡⎣τ (θ ) ⎤⎦ = Var ⎡⎣τ ( Θ ) ⎤⎦ − E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤
⎣⎢ ⎦⎥ ⎦ ⎣ ⎦


⎣ ⎣⎢
( )
E ⎡⎢Var ⎡ TG* Θ ⎤ ⎤⎥ + E ⎡Var ⎣⎡τ ( Θ ) X1 , X 2 , " , X n ⎦⎤ ⎤ = 0
⎦⎥ ⎦ ⎣ ⎦


Now, since both E Var ⎡ TG Θ ⎤
⎢⎣ ⎣⎢
( )
* ⎤ and E ⎡Var ⎡τ Θ X , X , " , X ⎤ ⎤ are non-negative and their sum is zero,
⎦⎥ ⎥⎦ ⎣ ⎣ ( ) 1 2 n ⎦⎦

then both are zero.


⎢⎣ ⎢⎣ ( ) ⎤
*
⎥⎦ ⎥⎦ ⎢⎣ ( )
In particular, E Var ⎡ TG Θ ⎤ = 0 and since Var ⎡ TG Θ ⎤ is non-negative and has zero expectation, then
*
⎥⎦

⎣⎢
( )
Var ⎡ TG* θ ⎤ = 0 .
⎥⎦

Loss Function
Consider estimating g (θ ) , let t = t ( x1 , x2 , " , xn ) denote an estimate of g (θ ) . The loss function, denoted by

l ( t ; θ ) is defined to be a real valued function satisfying

1) l ( t ; θ ) ≥ 0 for all possible estimates t and all θ in Θ .

2) l ( t ; θ ) = 0 for t = g (θ ) .

l ( t ; θ ) equals the loss incurred if one estimates g (θ ) to be t when θ is the true parameter value.

The word ‘loss’ is used in place of ‘error’ and loss function is used as the measure of the ‘error’.

Bayes and Minimax Estimation ~ 7 of 20


Several Possible Loss Function

l1 ( t ; θ ) = ⎡⎣t − g (θ ) ⎤⎦ . It is called the squared error loss function.


2
1)

2) l2 ( t ; θ ) = t − g (θ ) . It is called the absolute error loss function.

⎧A
⎪ if t − g (θ ) > ε
3) l3 ( t ; θ ) = ⎨ .
⎪⎩0 if t − g (θ ) ≤ ε , where A > 0

l4 ( t ;θ ) = ρ (θ ) t − g (θ ) for ρ (θ ) ≥ 0 and r > 0 .


r
4)

Note that both l1 and l2 increases as the error T − g (θ ) increases in magnitude. l3 says that we loss nothing if

the estimate t is within ε units of g (θ ) and otherwise we loss the amount A . l4 is a general loss function that

includes both l1 and l2 as special cases.

Risk Function
For a given loss function l ( ⋅ ; ⋅) , the risk function, denoted by Rt (θ ) , of an estimator T = t ( X 1 , X 2 , " , X n ) is

defined to be

Rt (θ ) = Eθ ⎡⎣l (T ;θ ) ⎤⎦

The risk function is the average loss. The expectation in the above equation can be taken in two ways. For example,

if the density f ( x ;θ ) from which we sampled is a probability density function, then

R (θ , t ) = Eθ ⎡⎣l (T ;θ ) ⎤⎦ = Eθ ⎡⎣l ( t ( X 1 , X 2 , " , X n ) ;θ ) ⎤⎦


n
= " l ( t ( X 1 , X 2 , " , X n ) ;θ )
∫ ∫ ∏ f ( xi ;θ ) dxi
i =1
Or, we can consider the random variable T and the density of T is fT ( t ) then

R (θ , t ) = Eθ ⎡⎣l (T ; θ ) ⎤⎦ = l (T ;θ ) fT ( t ) dt

Where, fT ( t ) is the density of the estimator T . In either case, the expectation averages out the values of

x1 , x2 , " , xn . Since θˆ is consider to be a random so that risk itself a random variable.

Possible Risk Functions

l1 ( t ;θ ) = ⎡⎣t − g (θ ) ⎤⎦
2
1) Corresponding to the loss function the risk function is given by

Rt (θ ) = Eθ ⎡⎣l1 (T ;θ ) ⎤⎦ = Eθ ⎡⎣t − g (θ ) ⎤⎦
2

2) Corresponding to the loss function l2 ( t ; θ ) = t − g (θ ) the risk function is given by

Rt (θ ) = Eθ ⎡⎣l2 (T ; θ ) ⎤⎦ = Eθ t − g (θ ) . It is called the mean absolute error.

3) Corresponding to the loss function


⎧A
⎪ if t − g (θ ) > ε
l3 ( t ; θ ) = ⎨
⎪⎩0 if t − g (θ ) ≤ ε , where A > 0

the risk function is given by Rt (θ ) = Eθ ⎣⎡l3 (T ;θ ) ⎦⎤ = APθ ⎡ t − g (θ ) > ε ⎤ .


⎣ ⎦

Corresponding to the loss function l4 ( t ; θ ) = ρ (θ ) t − g (θ ) for ρ (θ ) ≥ 0 and r > 0 the risk function is
r
4)

given by Rt (θ ) = Eθ ⎡⎣l4 (T ;θ ) ⎤⎦ = ρ (θ ) Eθ ⎡ t − g (θ ) z ⎤ .
r
⎣ ⎦
Bayes and Minimax Estimation ~ 8 of 20
When a loss function is said to be Convex and Strictly Convex?
A real valued function L ( t ; θ ) defined over an open interval I = ( a, b ) with −∞ < t < t * < b and any 0 < γ < 1

( )
L ⎡⎣γ t + (1 − γ ) t * ⎤⎦ ≤ γ L ( t ) + (1 − γ ) L t * " " " (1)
The function is said to be strictly convex if strict inequality holds in (1) , for all indicated values of t , t and γ .
*

Convexity is a vary strong condition which implies, for example, that L is continuous in ( a, b ) and has a left and

right derivative at every point of ( a, b ) .

Determination of Convexity
Determination of whether or not a loss function is conves is often easy with the help of the following two criteria.

a) If L is defined and differentiable on ( a, b ) , then a necessary and sufficient condition for L to be convex is that

( )
L′ ( t ) ≤ L′ t * for all a < t < t * < b " " " (1)
The function is strictly convex iff (1) is strict for all t < t * .

b) If L is twice differentiable then the necessary and sufficient condition


(1) is equivalent to
L′′ ( t ) ≥ 0 for all a < t < b
with strict inequality sufficient for strict convexity.

Bayes Estimator With Respect to Loss Function


The Bayes estimator of the parameter θ to be the function d of the sample observation x1 , x2 , " , xn that

minimizes the expected risk, were expected risk is defined as

B ( d ) = E ⎡⎣ R ( d , θ ) ⎤⎦ ∫
= R ( d , θ ) f (θ ) dθ

= ⎡⎢ " l {d ( x1 , " , xn ) ; θ } f ( x1 , " , xn | θ ) dx1 " dxn ⎤⎥ f (θ ) dθ


∫∫ ∫ " " " ( *)
⎣ ⎦
Now, interchanging the order of integration we can write (*) as

B ( d ) = " ⎡⎢ l {d ( x1 , " , xn ) ; θ } f ( x1 , " , xn | θ ) f (θ ) dθ ⎤⎥ dx1 " dxn


∫ ∫∫ " " " (**)
⎣ ⎦
The function B ( d ) will be minimized if we can find the minimized function d i.e. minimizes the quantity within the

third braket of the equation (**) for every set of x values. That is, the Bayes estimator of θ is a function of d of

x1 , x2 , " , xn that minimizes

∫ l {d ( x , ", x ) ; θ } f ( x , ", x | θ ) f (θ ) dθ
1 n 1 n
Since f ( x1 , " , xn , θ ) = f ( x1 , " , xn | θ ) f (θ )
= ∫ l {d ( x ) ; θ } f ( x , " , x , θ ) dθ f ( x1 , " , xn | θ ) f (θ )
f (θ | x1 , " , xn ) =
1 n

f ( x1 , " , xn )
= f ( x , " , x ) ∫ l {d ( x ) ; θ } f (θ | x , " , x ) dθ
1 n 1 n

Thus the Bayes estimator of θ is the value θˆ that minimizes

∫ l {d ( x ) ; θ } f (θ | x , ", x ) dθ = Y
1 n ( say )

Bayes and Minimax Estimation ~ 9 of 20


If the loss function is the squared error i.e. ⎣⎡ d ( x ) − θ ⎦⎤ then
2

∫ ∫
Y = ⎡⎣ d ( x ) − θ ⎤⎦ f (θ | x1 , " , xn ) dθ = ⎡⎣ d ( x ) ⎤⎦ f (θ | x1 , " , xn ) dθ
2 2

∫ ∫
− 2 θ d ( x ) f (θ | x1 , " , xn ) dθ + θ 2 f (θ | x1 , " , xn ) dθ

Thus minimizing Y with respect to d ( x ) is

∂Y
=0
∂ ⎡⎣ d ( x ) ⎤⎦

∫ ∫
⇒ 2 d ( x ) f (θ | x1 , " , xn ) dθ − 2 θ f (θ | x1 , " , xn ) dθ = 0

⇒ d ( x) =
∫ θ f (θ | x , ", x ) dθ
1 n
= Expected Posterior
∫ f (θ | x , ", x ) dθ
1 n

Hence, d ( x ) is the Bayes estimate for θ if the loss function is in squared error.

Advantages of Bayesian Approach


Bayesian approach has the following advantages over classical approach.

a) We make inferences about the unknown parameters given the data whereas in the classical approach we look at
the long run behavior e.g. in 95% of experiments p will lie between p ′ and p ′′ .

b) The posterior distribution tells the whole story and if a point estimate or confidence interval be desired they can
immediately be obtained from posterior distribution.
c) Bayesian approach provides solutions for problems which do not have solutions from the classical point of view.

Note:
a) A decision Rule δ is said to be uniformly better than a decision rule δ ′ if R (δ , θ ) ≤ R (δ ′, θ ) ∀ θ ∈ Θ with

strict inequality holding for some θ .

b) A decision rule δ * is said to be uniformly best in a class of decision rules D if δ * is uniformly better than any
other decision rule δ ∈ D .
c) A decision rule is said to be admissible in a class of D if there exists no other decision rule in D which is
uniformly better that that δ .

( )
Example: Let X 1 , X 2 , " , X n be independent N µ , σ 2 variables where µ is unknown but σ 2 is known. Let the prior

(
distribution of µ be N θ , σ
2
) . Find the Bayes estimate of µ .

Solution:
The joint conditional distribution of the sample given µ is

n
⎛ 1 ⎞ ⎡ 1 2⎤
∑ ( xi − µ )
2
f ( x1 , " , xn | µ ) = ⎜ ⎟ exp ⎢ − 2 ⎥
⎝ 2πσ 2 ⎠ ⎣ 2σ ⎦
n
⎛ 1 ⎞ ⎡ n 2⎤
∑ ( xi − x )
2 1
exp ⎢ − 2 ( x − µ ) − 2
2
=⎜ ⎟ ⎥
⎝ 2πσ 2 ⎠ ⎣ 2σ 2σ ⎦
⎡ 1 2⎤
∴ f ( x1 , " , xn | µ ) ∝ exp ⎢ − 2 ( x − µ ) ⎥
⎣ 2σ ⎦

Bayes and Minimax Estimation ~ 10 of 20


The posterior distribution of µ given x is
f ( µ | θ ) f ( x1 , " , xn | µ )
g ( µ | x1 , " , xn ) = ∝ f ( µ | θ ) f ( x1 , " , xn | µ )
f ( x1 , " , xn )
⎡ n 1 2⎤
∝ exp ⎢ − 2 ( x − µ ) − 2 ( µ − θ ) ⎥
2

⎣⎢ 2σ 2σ 0 ⎦⎥
⎡ 1 ⎛ nσ 2 + σ 2 ⎞ ⎛ nxσ 02 + θσ 02 ⎞
2⎤

∝ exp ⎢ − ⎜ 02 2 ⎟ ⎜ µ − ⎟⎟ ⎥
⎢ 2 ⎜⎝ σ 0 σ ⎟⎜
⎠⎝ nσ 02 + σ 2 ⎠ ⎥
⎣ ⎦
⎡ nxσ 02 + θσ 02 σ 02σ 2 ⎤
∴ f (µ | x) ~ N ⎢ , ⎥
⎣⎢ nσ 0 + σ nσ 02 + σ 2 ⎦⎥
2 2

nxσ 02 + θσ 02
If the loss function is squared error, the Bayes estimator of µ is
nσ 02 + σ 2

Theorem
Let X 1 , X 2 , " , X n be a random sample from the density f ( x | θ ) and let g Θ (θ ) be the density of Θ . Further, let

l ( t ; θ ) be the loss function for estimating τ (θ ) . The Bayes estimator of τ (θ ) is that estimator t * ( ⋅ ; ", ⋅) which
minimizes

∫ l (t ( x , x , ", x ) ; θ ) f
Θ
1 2 n Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ

as a function of t ( ⋅ ; ", ⋅) .

Proof
For a general loss function l ( t ; θ ) , we seek that estimator, say t * ( ⋅ ; ", ⋅) , which minimizes the expression

∫ Rt (θ ) gΘ (θ ) dθ = ∫ Eθ ⎡⎣l ( t ; θ )⎤⎦ gΘ (θ ) dθ
Θ Θ

= ∫ Eθ ⎡⎣l ( t ( x1 , x2 , " , xn ) ; θ ) ⎤⎦ g Θ (θ ) dθ
Θ

⎡ n ⎤
= ∫ ⎢ ∫ l ( t ( x1 , " , xn ) ; θ ) f X1 , X 2 , ", X n Θ=θ ( x1 , x2 , " , xn θ ) ∏ dxi ⎥ g Θ (θ ) dθ
Θ⎣⎢R i =1 ⎦⎥
⎡ f X , ", X n Θ=θ ( x1 , " , xn θ ) g Θ (θ ) dθ ⎤ n
= ∫ ⎢ ∫ l ( t ( x1 , " , xn ) ; θ ) 1 ⎥ f X , ", X ( x1 , " , xn ) ∏ dxi

R ⎣Θ
f X1 , ", X n ( x1 , " , xn ) ⎥ 1 n
i =1

⎡ ⎤ n
= ∫ ⎢ ∫ l ( t ( x1 ," , xn ) ; θ ) f Θ X1 = x1 ,...., X n = xn (θ x1 , " , xn ) dθ ⎥ f X1 , ", X n ( x1 , " , xn ) ∏ dxi

R ⎣Θ

⎦ i =1

Since, the integral is non-negative, the double integral can be minimized if the expression within the braces, which is
sometimes called the posterior risk, is minimized for each x1 , x2 , " , xn .

So, in general, the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that

estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of
Θ given the observations x1 , x2 , " , xn .

That is, the Bayes estimator of τ (θ ) is that estimator which minimizes

∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ
1 1 n n
x1 , " , xn ) dθ
Θ
Hence, the theorem is proved.
Bayes and Minimax Estimation ~ 11 of 20
Theorem
Let X 1 , X 2 , " , X n be a random sample from the density f ( x | θ ) and let g Θ (θ ) be the density of Θ . Further, let

l ( t ; θ ) be the squared-error loss function for estimating τ (θ ) . That is,

l ( t ; θ ) = ⎡⎣t ( x1 , " , xn ) − τ (θ ) ⎤⎦
2

Then the Bayes estimator of τ (θ ) is given by

n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫τ (θ ) ∏ ⎣ ⎦
E ⎡⎣τ (θ ) | X1 = x1 , " , X n = xn ⎤⎦ = i =1
n

∫ ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) dθ
i =1

Proof
We, know that the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that

estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of
Θ given the observations x1 , x2 , " , xn .

That is, the Bayes estimator of τ (θ ) is that estimator which minimizes

∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ
1 1 n n
x1 , " , xn ) dθ
Θ

Here, the loss function is squared error loss function. So, we have that the Bayes estimator of τ (θ ) is that estimator

which minimizes

∫ ⎡⎣t ( x1 , ", xn ) − τ (θ )⎤⎦ f Θ| X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ


2

∫ ⎡⎣τ (θ ) t ( x1 , ", xn )⎤⎦ f Θ| X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ


2
=
Θ

But the expression in the above is the conditional expectation of

⎡⎣τ (θ ) − t ( x1 , " , xn ) ⎤⎦
2

with respect to the posterior distribution of Θ given X1 = x1 , " , X n = xn , which is minimized as a function of

t ( x1 , " , xn ) for t * ( x1 , " , xn ) equal to the conditional expectation of τ ( Θ ) with respect to the posterior distribution

of Θ given X 1 = x1 , " , X n = xn .

⎡ Recall that E ( Z − a )2 is minimized as a function of a for a* = E ( Z ) ⎤


⎣⎢ ⎦⎥
Hence, the Bayes estimator of τ (θ ) with respect to the squared-error loss function is given by

n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫ τ (θ ) ∏ ⎣ ⎦
E ⎡⎣τ (θ ) X 1 = x1 , " , X n = xn ⎤⎦ = i =1
n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦

Hence, the theorem is proved.

Bayes and Minimax Estimation ~ 12 of 20


Theorem
Let X 1 , X 2 , " , X n be a random sample from the density f ( x | θ ) and let g Θ (θ ) be the density of Θ . Further, let

l ( t ; θ ) be the absolute-error loss function for estimating τ (θ ) . That is,

l ( t ; θ ) = t ( x1 , " , xn ) − τ (θ )

Then the Bayes estimator of τ (θ ) is given by the median of the posterior distribution of Θ given

X1 = x1 , " , X n = xn .

Proof
We know that the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that

estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of
Θ given the observations x1 , x2 , " , xn .

That is, the Bayes estimator of τ (θ ) is that estimator which minimizes

∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ
1 1 n n
x1 , " , xn ) dθ
Θ

Here, the loss function is absolute-error loss function. So, we have that the Bayes estimator of τ (θ ) is that estimator

which minimizes

∫ t ( x1 , " , xn ) − τ (θ ) f Θ| X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ


Θ

= ∫ τ (θ ) − t ( x1 , ", xn ) fΘ| X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ


Θ

But the expression in the above is the conditional expectation of

t ( x1 , " , xn ) − τ (θ )

with respect to the posterior distribution of Θ given X 1 = x1 , " , X n = xn , which is minimized as a function of

t ( x1 , " , xn ) for t * ( x1 , " , xn ) equal to the conditional median with respect to the posterior distribution of Θ given

X1 = x1 , " , X n = xn .

⎡ Recall that E Z − a is minimized as a function of a for a* = median of Z ⎤


⎣ ⎦
Hence, the Bayes estimator of τ (θ ) with respect to the squared-error loss function is given by the median of the

posterior distribution of Θ given X1 = x1 , " , X n = xn . (Proved)

Example: Let X 1 , X 2 , " , X n denote a random sample from normal distribution with the density

1 ⎡ 1 2⎤
f (x |θ ) = exp ⎢ − ( x − θ ) ⎥ ; −∞ ≤ x ≤ ∞
2π ⎣ 2 ⎦
Assume that the prior distribution of Θ is given by

1 ⎡ 1 2⎤
g Θ (θ ) = exp ⎢ − (θ − µ0 ) ⎥ ; −∞ ≤θ ≤ ∞
2π ⎣ 2 ⎦
That is, Θ is standard normal. Write µ0 = x0 when convenient. Find the Bayes estimator of τ (θ ) with respect to

the squared error loss function.

Bayes and Minimax Estimation ~ 13 of 20


Solution
We know that the Bayes estimator of τ (θ ) with respect to the squared error loss function is given by

∫ τ (θ ) ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) dθ
E ⎡⎣τ (θ ) X 1 = x1 , " , X n = xn ⎤⎦ = n
i =1

⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦

We know that the posterior distribution of Θ is given by


n
∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )
f Θ X1 = x1 , ", X n = xn (θ | x1 , " , xn ) = i =1
n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦
n
⎛ 1 ⎞ ⎡ 1 n ⎤ ⎡ 1 2⎤

1
⎜ ⎟ exp ⎢ − ( xi − θ )2 ⎥ exp ⎢ − (θ − µ0 ) ⎥
⎝ 2π ⎠ ⎢⎣ 2 i =1 ⎦⎥ 2π ⎣ 2 ⎦
=
∞ n
⎛ 1 ⎞ ⎡ 1 n ⎤ ⎡ 1 2⎤

1
⎜∫ ⎟ exp ⎢ − ( xi − θ )2 ⎥ exp ⎢ − (θ − µ0 ) ⎥ dθ
−∞ ⎝
2π ⎠ ⎣⎢ 2 i =1 ⎦⎥ 2π ⎣ 2 ⎦
⎡ ⎛ 1 ⎞ ⎤
2


⎜ xi ⎟ ⎥
⎜ θ − i =0 ⎟ ⎥

⎢ 1⎜ ⎥
⇒ f Θ X1 = x1 , ", X n = xn (θ | x1 , " , xn ) =
1
exp ⎢ − ⎜ n +1 ⎟ ⎥

1
2π ⎢ 2⎜ 1 ⎟ ⎥
n +1 ⎢ ⎜ n +1 ⎟ ⎥
⎢ ⎜⎜ ⎟⎟ ⎥
⎢ ⎝ ⎠ ⎥⎦

So, the Bayes estimator of θ with respect to the squared error loss function is given by
n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫τ (θ ) ∏ ⎣ ⎦
E ⎡⎣τ (θ ) | X1 = x1 , " , X n = xn ⎤⎦ = i =1
n
= τ (θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ

⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫∏ i =1
⎣ ⎦

⎡ ⎛ 1 ⎞ ⎤
2


⎜ xi ⎟
⎜ θ − i =0 ⎟ ⎥
⎥ ∑
∞ ⎢ 1⎜ ⎥
1 n + 1 ⎟ ⎥ dθ
= ∫θ exp ⎢ − ⎜
⎢ 2⎜

⎟ ⎥
1 1
−∞ 2π
n +1 ⎢ ⎜ n +1 ⎟ ⎥
⎢ ⎜ ⎟ ⎥
⎢⎣ ⎝ ⎠ ⎥⎦
1

∑x 1

∑x
i
θ− i =0 i
Now, let n +1 = z ⇒ θ= i =0 +
1
z ⇒ dθ =
1
dz
1 n +1 n +1 n +1
n +1
Now, we have that,

⎛ 1 ⎞
⎜ xi
∞ ∑ ⎟
⎡ 1 ⎤ 1
E ⎣⎡τ (θ ) | X1 = x1 , " , X n = xn ⎦⎤ = ⎜⎜ i =0 + z ⎟⎟
1 1
n +1∫ n +1 1
exp ⎢ − z 2 ⎥
⎣ 2 ⎦ n +1
dz
−∞ ⎜ ⎟ 2π
⎜ ⎟ n +1
⎝ ⎠
1
∑ xi 1

1 ⎡ 1 ⎤
∫z
i =0
= + exp ⎢ − z 2 ⎥ dz
n +1 n + 1 −∞ 2π ⎣ 2 ⎦

Bayes and Minimax Estimation ~ 14 of 20


1
∑ xi 1 ⎡0 1 ⎡ 1 ⎤

1 ⎡ 1 ⎤ ⎤
∫ ∫
i =0
= + ⎢ −z exp ⎢ − z 2 ⎥ dz + z exp ⎢ − z 2 ⎥ dz ⎥
n +1 n + 1 ⎣⎢ −∞ 2π ⎣ 2 ⎦ 0
2π ⎣ 2 ⎦ ⎦⎥
1
∑ xi 1 ⎡ ∞ 1 ⎡ 1 ⎤

1 ⎡ 1 ⎤ ⎤
∫ ∫
i =0
= + ⎢− z exp ⎢ − z 2 ⎥ dz + z exp ⎢ − z 2 ⎥ dz ⎥
n +1 n + 1 ⎢⎣ 0 2π ⎣ 2 ⎦ 0
2π ⎣ 2 ⎦ ⎥⎦
1
∑ xi
∴ E ⎡⎣τ (θ ) | X 1 = x1 , " , X n = xn ⎤⎦ = i = 0
n +1
So, here we have that the Bayes estimator of θ with respect to the squared error loss is given by
1 1 1
∑ xi x0 + ∑ xi µ0 + ∑ xi
i =0 i =1 i =1
= =
n +1 n +1 n +1
Since, the posterior distribution of Θ is normal, its mean and median are the same. Hence,
1 1 1
∑ xi x0 + ∑ xi µ0 + ∑ xi
i =0 i =1 i =1
= =
n +1 n +1 n +1
is also the Bayes estimator with respect to the absolute-error loss function.

Example: Let X 1 , X 2 , " , X n denote a random sample from normal distribution with the density

1
f (x |θ ) = I ( x)
θ ( 0, θ )
Assume that the prior distribution of Θ is given by g Θ (θ ) = I ( 0,1) (θ )

That is, Θ is standard uniform. Find the Bayes estimator of τ (θ ) with respect to the squared error loss function

( t − θ )2
l (t ; θ ) = .
θ2
Solution:
We know that the Bayes estimator of τ (θ ) with respect to any general loss function such as

l (t ; θ ) =
( t −θ )
2

θ2
is that estimator which minimizes

∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ 1 1 n n
x1 , " , xn ) dθ
Θ
We know that the posterior distribution of Θ is given by
n n n
⎡ f ( xi θ ) ⎤ gΘ (θ ) ⎛1⎞
∏ ⎣ ⎦ ⎜θ ⎟
⎝ ⎠
∏ I(0,θ ) ( xi ) I( 0,1) (θ )
f Θ X1 = x1 , ", X n = xn (θ | x1 , " , xn ) = i =1
n
= n
i =1
n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ⎛1⎞
∫∏
i =1
⎣ ⎦ ∫ ⎜θ ⎟
⎝ ⎠
∏ I(0,θ ) ( xi ) I(0,1) (θ )dθ
i =1
n n
⎛1⎞
⎜θ ⎟
⎝ ⎠
∏ I(0,θ ) ( xi )
i =1
= 1 n n
⎛1⎞
∫⎜θ ⎟ ∏ I(0,θ ) ( xi )dθ
0⎝ ⎠ i =1
n
⎛1⎞
⎜ θ ⎟ I ( yn ,1) (θ )
= ⎝ ⎠
1 n
⎛1⎞
∫ ⎜ θ ⎟ I ( yn ,1) (θ ) dθ
yn ⎝ ⎠

Bayes and Minimax Estimation ~ 15 of 20


n n
⎛1⎞ ⎛1⎞
⎜ θ ⎟ I ( yn ,1) (θ ) ⎜ θ ⎟ I ( yn ,1) (θ )
=⎝ ⎠ 1
= ⎝ ⎠ 1
⎡ θ − n +1 ⎤ 1 ⎡θ −( n −1) ⎤
⎢ ⎥ − ( n − 1) ⎣⎢ ⎦⎥ yn
⎣⎢ −n + 1 ⎦⎥ yn
n n
⎛1⎞ ⎛1⎞
⎜ θ ⎟ I ( yn ,1) (θ ) ⎜ θ ⎟ I ( yn ,1) (θ )
= ⎝ ⎠ = ⎝ ⎠
1 ⎡1 − y −( n −1) ⎤ 1 ⎡ 1 ⎤
− ( n − 1) ⎣⎢
n
⎦⎥ ⎢ n −1 − 1⎥
( ) ⎣⎢ yn
n − 1 ⎦⎥
Now the Bayes estimator of τ (θ ) with respect to any general loss function such as

( t − θ )2
l (t ; θ ) =
θ2
is that estimator which minimizes
n
⎛1⎞
(t − θ ) 2
(t − θ ) 2 ⎜ θ ⎟ I ( yn ,1) (θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ = ⎝ ⎠
∫ θ 2 ∫ θ2 1 ⎡ 1 ⎤

Θ Θ
⎢ − 1⎥
( n − 1) ⎣⎢ ynn −1 ⎦⎥
n
⎛1⎞
( t ( yn ) − θ ) ⎜θ ⎟
1 2
⎝ ⎠
= ∫ θ2 1 ⎡ 1 ⎤

yn
⎢ − 1⎥
( n − 1) ⎢⎣ ynn −1 ⎥⎦
Or, that estimator which minimizes

( t ( yn ) − θ ) ( t ( yn ) − θ )
1 2 n 1 2
⎛1⎞
∫ θ2
⎜ θ ⎟ dθ =
⎝ ⎠
∫ θ n+ 2

yn yn
1 1 1
1 1 1
= ⎡⎣t ( yn ) ⎤⎦ ∫ θ n+2 dθ − 2t ( yn ) ∫ ∫ θ n dθ ( A)
2
n +1
dθ + " " "
yn yn
θ yn

Here, equation ( A) is a quadratic equation in t ( ⋅) . This quadratic equation assumes its minimum for

1
1
∫ θ n+1 dθ 1 ⎡

1 − yn− n ⎤

t *
( yn ) = yn
= − n
1 1 ⎡ −( n +1) ⎤
dθ − ( n + 1) ⎣⎢1 − yn
1
∫ θ n+2 ⎦⎥
yn

n + 1 ynn − 1 y n +1
= × n × n +n1
n yn yn − 1
n + 1 ynn − 1
∴ t * ( yn ) = × n +1 × yn
n yn − 1

So, the Bayes estimator of τ (θ ) with respect to the squared error loss function

( t − θ )2
l (t ; θ ) =
θ2
is given by

n + 1 ynn − 1
t * ( yn ) = × n +1 × yn .
n yn − 1

Bayes and Minimax Estimation ~ 16 of 20


Admissible Estimator
For two estimators T1 = t1 ( X 1 , X 2 , " , X n ) and T2 = t2 ( X 1 , X 2 , " , X n ) , the estimator T1 is defined to be a

better estimator than T2 if and only if

Rt1 (θ ) ≤ Rt2 for all θ in Θ and


Rt1 (θ ) < Rt2 for at least one θ in Θ

An estimator T = t ( X 1 , X 2 , " , X n ) is defined to be admissible if and only if there is no better estimator.

Example: Using the squared error loss function l ( t , θ ) = ( t − θ ) , estimators for the location parameters of a Normal
2

distribution given a sample of size n are the sample mean t1 ( x ) = x , sample median t2 ( x ) = m , the weighted

mean t3 ( x ) = ∑ wi xi ; ∑ wi = 1 . Their respective risk functions are


σ2 σ2 ⎧1 2⎫
R1 = , R2 = 1.57 ; R3 = σ 2 ⎨ + ∑ ( wi − w ) ⎬
n n ⎩n ⎭
Since R1 < R2 or , R1 < R3 . So x is an admissible estimator of the location parameter µ of normal distribution.

Inadmissibility of an Estimator
An estimator t is said to be inadmissible if there exists another estimator t ′ which dominates it such that

R (θ , t ′ ) ≤ R (θ , t ) for all θ in Θ and


R (θ , t ′ ) < R (θ , t ′ ) for some θ in Θ

Finding Inadmissible Estimator


To find the inadmissibility of an estimator t , we may use the following lemma.

Let the range of estimator τ (θ ) be [ a, b ] and the loss function L (θ , t ) ≥ 0 and for any fixed θ , L (θ , t ) is

increasing and for any fixed θ , L (θ , t ) is increasing as t moves away from τ (θ ) in either direction. Then any

estimator taking on values outside the closed interval [ a, b ] with positive probability is inadmissible.

Properties of Admissible Estimator


The properties of admissible estimators are as follows.

a) If the loss function L is strictly convex, then every admissible estimator must be non-randomized.

b) If L is strictly convex and t is an admissible estimator of τ (θ ) and if t ′ is another estimator with the same

risk function i.e. R (θ , t ) = R (θ , t ′ ) for all θ then t = t ′ with probability 1 .

c) Any unique Bayes estimator is admissible.

Minimax Estimator
*
An estimator T is defined to be a minimax estimator if and only if

θ
( )
Sup R θ , t * ≤ Sup R (θ , t )
θ
for every estimator t

Bayes and Minimax Estimation ~ 17 of 20


Properties of Minimax Estimator
The properties of minimax estimator are given below.

a) One appealing feature of the minimax estimator is that it does not depend on the particular parameterization.

b) If g ( ⋅) be a prior distribution of θ such that ∫ R (θ , tg ) g (θ ) dθ = sup


θ
R (θ , t g ) then

i. t g is minimax

ii. if t g is the unique Bayes solution with respect to g ( ⋅) , it is unique minimax procedure.

c) If a Bayes estimator t g has constant risk then it is minimax.

d) If t ′ dominates by a minimax estimator t , then t ′ is also minimax.

e) If an estimator has constant risk and is admissible, it is minimax.

f) The best equivalent estimator may be frequently minimax.

Example: Suppose that θ = {θ1 , θ 2 } , where θ1 corresponds to oil and θ 2 to no oil. Let A = {a1 , a2 , a3 } where ai

corresponds to the choice i , i = 1, 2, 3 . Suppose that the following table gives the losses for the decision problem.

Drill a1 Sell a2 Partial a3


Oil θ1 0 10 5
No oil θ 2 12 1 6

If there is oil and we drill, the loss is zero while if there is no oil and we drill, the loss is 12 and so on.

An esperiment is conducted to obtain the information about θ , resulting is the random variable X with possible
values coded as 0 and 1 given by

x
0 1
Oil θ1 0.3 0.7
No oil θ 2 0.6 0.4

When there is oil, 0 occurs with probability 0.3 and 1 occurs with probability 0.7

∴ P ( x = 0 | θ1 ) = 0.3 and P ( x = 1 | θ1 ) = 0.7

Now the possible decision rules δ i ( x ) are.

i
1 2 3 4 5 6 7 8 9
x
0 a1 a1 a1 a2 a2 a2 a3 a3 a3
1 a1 a2 a3 a1 a2 a3 a1 a2 a3

Here, δ1 = Take action a1 regurdless of the value of X


⎧Take action a1 if X = 0
δ2 = ⎨
⎩Take action a2 if X = 1
and so on.

Bayes and Minimax Estimation ~ 18 of 20


Then the risk of δ at θ is

R (θ , δ ) = E ⎡⎣l (θ , δ ( x ) ) ⎤⎦
= l (θ , a1 ) P (δ ( x ) = a1 ) + l (θ , a2 ) P (δ ( x ) = a2 ) + l (θ , a3 ) P (δ ( x ) = a3 )

Now, R (θ1 , δ 2 ) = 0 × 0.3 + 10 × 0.7 = 7


R (θ 2 , δ 2 ) = 12 × 0.6 + 1× 0.4 = 7.06

Thus we get,

1 2 3 4 5 6 7 8 9

R (θ1 , δ i ) 0 7 3.5 3 10 6.5 1.5 8.5 5

R (θ 2 , δ i ) 12 7.06 9.6 5.4 1 3 8.4 4.0 6

Max ⎡⎣ R (θ1 , δ i ) , R (θ 2 , δ i ) ⎤⎦ 12 7.06 9.6 5.4 10 6.5 8.4 8.5 6

min ⎡⎣ Max ⎡⎣ R (θ1 , δ i ) , R (θ 2 , δ i ) ⎤⎦ ⎤⎦ 5.4

⎧a1 if x=0
Thus minimax solution is δ 4 ( x ) = ⎨
⎩a2 if x =1

Again, R (θ 2 , δ 4 ) = 5.4 < R (θ 2 , δ 2 ) = 7.6 , so δ 2 is inadmissible.

Suppose that in our oil dreling example an expart thinks the chance of finding oil is 0.2, then we treat the parameter
as a random variable θ with possible values θ1 , θ 2 and the frequency function is

π (θ1 ) = 0.2, π (θ 2 ) = 0.8

So that Baye’s risk is

R (δ ) = E ⎡⎣ R (θ , δ ) ⎤⎦ = 0.2 R (θ1 , δ ) + 0.8 R (θ 2 , δ )


∴ R (δ1 ) = 0.2 × 0 + 0.8 × 12 = 9.6
R (δ 2 ) = 0.2 × 7 + 0.8 × 7.6 = 7.46
R (δ 3 ) = 0.2 × 3.5 + 0.8 × 9.6 = 8.38
and so on.

So we compute the following table

i 1 2 3 4 5 6 7 8 9

R (δ i ) 9.6 7.48 8.38 4.92 2.8 3.7 7.02 4.9 5.8

In the Bayesian framework δ is preferable to δ ′ if and only if it has smaller Bayes risk. If there is a rule δ * which

( )
attains the minimum Bayes risk i.e. such that R δ * = min R (δ ) = 2.8 then it is called a Bayes rule. From this
δ

example we say that rule δ 5 = 2.8 is the unique Bayes rule for our prior distribution.

Bayes and Minimax Estimation ~ 19 of 20


⎧1 1 ⎫
Example: Let X ~ b (1, p ) , p ∈ Θ = ⎨ , ⎬ and A {a1 , a2 } . Let the loss function be defined as follows.
⎩4 2⎭

a1 a2
1
p1 = 1 4
4
1
p2 = 3 2
2

The set of decision rules includes four functions: δ1 , δ 2 , δ 3 , δ 4 , defined by

δ1 ( 0 ) = δ1 (1) = a1
δ 2 ( 0 ) = a1 δ 2 (1) = a2
δ 3 ( 0 ) = a2 δ 3 (1) = a1
δ 4 ( 0 ) = δ 4 (1) = a2

The risk function takes the following values:

i R ( p1 , δ i ) R ( p2 , δ i ) Max ⎡⎣ R ( p, δ i ) ⎤⎦ min Max ⎡⎣ R ( p, δ i ) ⎤⎦


p1 , p2 i p1 , p2

1 1 3 3
7 5 5
2
4 2 2 5
13 5 13 2
3
4 2 4
4 4 2 4

Thus the minimax solution is

⎧a1 if x=0
δ2 ( x) = ⎨
⎩a2 if x =1

Bayes and Minimax Estimation ~ 20 of 20


The General Statistical Simulation Procedure (SSP) and Bootstrap Methods

Introduction to SSP and Bootstrap Method


If we know the distribution function F ( ⋅) of a random variable X and wish to evaluate some function of it, say

θ ( F ) , we can proceed in two ways: (1) evaluate θ ( F ) exactly; and (2) simulate to estimate θ ( F ) .

For example: Suppose that X is a random variable which is N ( 0, 1) so that F ( x ) = Φ ( x ) , and that we wish to


know the 8 moment of X , so that θ ( F ) = ∫ x Φ ( x ) dx . Then one way we may proceed is to (1) evaluate the
8
th

−∞

integral exactly.

If the integral involved in the preceding method is such that no simple way to evaluate it exists, a second way we

may proceed is (2) simulate to estimate θ ( F ) . Generating random variables X 1 , X 2 , " , X n that are N ( 0, 1) and

independent, we can estimate θ ( F ) using the principles of estimation. Thus, we might estimate θ ( F ) by

(X 8
1 + X 28 + " + X n8 ) , which is a consistent estimator.
n

The preceding discussion assumed that we knew the distribution function F ( ⋅) . However, often we have not a

known distribution function F ( ⋅) for which we wish to know θ ( F ) , but rather a random sample X 1 , X 2 , " , X n

drawn from distribution function F ( ⋅) with F ( ⋅) unknown, and wish to estimate θ ( F ) .

Now suppose we are taking approach (2) and wish to specify the variance of our estimator. With F ( ⋅) known, we

can proceed as follows to solve this problem.

Step 1 : Generate X 1 , X 2 , " , X n and estimate θ ( F ) ; call the estimate θˆ1 .

Step 2 : Generate X n +1 , X n + 2 , " , X 2 n and estimate θ ( F ) from these new random variables which are to be

independent of all random variables previously generated; call the estimate θˆ2 .

Step 3 : Generate X 2 n +1 , X 2 n + 2 , " , X 3n and estimate θ ( F ) from these new random variables which are to be

independent of all random variables previously generated; call the estimate θˆ3 .

# # # # # #

Step N : Generate X ( N −1) n +1 , X ( N −1) n + 2 , " , X Nn and estimate θ ( F ) from these new random variables which

are to be independent of all random variables previously generated; call the estimate θˆN .

Then θˆ1 , θˆ2 , θˆ3 , " , θˆN are N independent and identically distributed random variables each estimating θ ( F ) .

Their variance may be estimated by

∑ (θˆi − θ )
N

i =1 θˆ1 + θˆ2 + " + θˆN


σˆ 2 = ; θ =
N −1 N

The General Statistical Simulation Procedure (SSP) and Bootstrap Methods ~ 1 of 3


is estimated by σˆ
2
And the variance of the estimator θ . We call this procedure the Statistical Simulation
N
Procedure SSP (θ , F , n, N ) .

Now, if F ( ⋅) is unknown, the SSP (θ , F , n, N ) cannot be used. However, then one will have a random sample

X 1 , X 2 , " , X n taken from the unknown distribution function F ( ⋅) . Using the random sample, the distribution

function may be estimated, say by some estimator F̂ , and then Step 1, Step 2, " , Step N followed. We call this

(
the General Statistical Simulation Procedure SSP θ , Fˆ , X 1 , " , X n , N . )

( )
There are many ways to choose the estimate F̂ of F in the SSP θ , Fˆ , X 1 , " , X n , N . One of the very simplest

is to take F̂ to be the empiric distribution function based on the sample X 1 , X 2 , " , X n . If that is done, then the

procedure is called the bootstrap procedure.

Bootstrap Sampling
Bootstrap sampling is a method of selecting a sample of size n with replacement from a set of n data points for a

data set X 1 , X 2 , " , X n . This is equivalent to record the value of each data point into a ping pong ball and placing

them into a box. Select a ping pong ball at random, record its value and replace the ball. We have to repeat this n
times. Doing this n times maintains the original sample size of n . With the Bootstrap method the basic sample is
treated as the population. Thus the bootstrap estimation procedure consists of following steps.

Step 1 : Using the original data set calculate some statistic of interest to estimate the characteristics of
population of interest. Call this B0 .

Step 2 : Take a Bootstrap sample of size n from the original data set which produces a new data set

X 1* , X 2* , " , X n* . Calculate some statistic of interest to estimate the characteristic of your population of

interest and call this B1 .

Step 3 : Then we have to repeat N times the step-2 and we will produce B1 , B2 , " , BN .

Step 4 : Sort B1 , B2 , " , BN from smallest to largest.

Step 5 : We can estimate the bias of our original estimator by B0 − B .

Uses of Bootstrapping Method:


• This method is used for computation of sampling distribution of any statistic.
• This is very good for confidence interval and bias estimation but not for point estimation.
• This method is able to estimate measures of variability.
• It is able to calculate power.
• It can be employed in nonparametric and in parametric inference.
• When the data size is very small we use bootstrap sampling for increasing the data size.

Example: Estimating the Standard Error of X .


Let θ = E ( X ) and σ 2 = Var ( X ) . Then from a random sample X 1 , X 2 , " , X n with the same distribution function

as X , we find X and it has mean θ and V X = σ ( )


2
.
n

The General Statistical Simulation Procedure (SSP) and Bootstrap Methods ~ 2 of 3


The bootstrap method of estimating V X ( ) (
is the SSP θ , Fˆ , X1 , " , X n , N ) with F̂ taken to be the empiric

distribution function and proceeds as follows:

Step 1 : Take a sample of size n (with replacement) form { X1 , X 2 , " , X n } say { X11 , X12 , " , X1n } and
calculate its sample mean X1 .
Step 2 : Repeat step 1 independently N − 1 additional times, finding X1 , X 2 , " , X N . The bootstrap estimate of
V ( X ) is
N
∑ ( Xi − X⋅ ) X1 + X 2 + " + X N
i =1
σˆ 2 = ; X⋅ =
N −1 N

Example: Estimating Bias

Suppose that, based on a random sample X 1 , X 2 , " , X n some quantity θ of interest is estimated by θˆ . The

( )
estimator θˆ has some bias b = E θˆ − θ . To estimate the bias, consider use of SSP θ , Fˆ , X1 , " , X n , N ( ) with F̂
taken to be the empiric distribution function (so we have a bootstrap estimate). Based on N bootstrap samples of

θˆ + θˆ2 + " + θˆN


size n each, one finds the estimators θˆ1 , θˆ2 , " , θˆN with θ = 1 and estimates the bias of θˆ by
N

b = θˆ − θ .
Example:
Let X 1 , X 2 , " , X n be a random sample of size n from a Poisson distribution with unknown mean λ . If the

( )
parameter of interest is θ = P ( X ≤ 1) = e − λ (1 + λ ) , the MLE is e − X 1 + X , which is biased. To reduce the bias, let

us investigate the bootstrap method.

Let X ij ( i = 1, " , N ; j = 1, " , n ) be the N bootstrap samples, that is, samples taken at random with

replacement from { X1 , X 2 , " , X n } and for i = 1, " , N

θˆi = e X i (1 + X i ) − (
Number of X i1 , X i 2 , " , X in that are ≤ 1)
n
Then the bootstrap estimate of the bias of θ is simply b̂ = θ

( )
Then, one might use e − X 1 + X − bˆ to estimate θ .

Remark:
Note that an approximate 100 (1 − α ) % confidence interval for θ can be constructed using bootstrap methods, as

follows. If θ is the bootstrap estimate of θ and σˆ 2 its sample variance based on θˆ1 , θˆ2 , " , θˆN for N large we

take the interval

{θˆ − Φ (1 − α 2 )σˆ ,
−1
(
θˆ + Φ −1 1 − α 2 σˆ) }
Note that we use the original estimate of θ (not the bootstrap estimate), and the bootstrap procedure has been used

to provide us with an estimate of variability for θˆ .

Note that the exact same details apply to the more general statistical simulation procedure

( )
SSP θ , Fˆ , X1 , " , X n , N , in which the only difference is what estimate of F is being sampled from.

The General Statistical Simulation Procedure (SSP) and Bootstrap Methods ~ 3 of 3


Estimation & Confidence Interval

Simultaneous Estimation of Several Parameters


Suppose we have vector of parameters θ = (θ1 , θ 2 , , θ k ) ; k − parameters . Our problem is to estimate the

parameters simultaneously. Let fθ ( x1 , x2 , , xn ) = L ( x;θ ) = joint distribution or LE

We have the following regularity conditions:

i) Θ is a non-degenerate open interval in R k .


ii) For all most all x ' s ( all θ ∈ Ω )
δ fθ ( x1 , x2 , , xn ) δ L ( x;θ )
( i = 1, 2, , k) = exists
δθi δθi
δ δ
L ( x ; θ ) dx = ∫ L ( x ; θ )dx
δθi ∫A
iii) for i = 1, 2, ,k
A
δθ i

where, A = domain of positive probability density

δ δ
iv)
δθi ∫ t j L ( x, θ )dx = ∫ t j δθ j L ( x; θ ) dx where, t j is the esitmator of θ j
A

⎡ δ ln L ( x, θ ) δ ln L ( x θ ) ⎤
( )
v) The elements of the matrix ∆θ = δ ij (θ ) , where δ ij (θ ) = Eθ ⎢
δθi δθ j
⎥ , exist and are such
⎣⎢ ⎦⎥
that ∆θ is positive definite.

Theorem
In any regular estimation case, the variances and covariance’s δ ij (θ ) of unbiasedness estimator

Ti ( i = 1, 2, ..., k ) for θi ( i = 1, 2, ..., k ) respectively satisfy the inequality,


u′ ∑θ u ≥ u′∆θ−1u
where, ∑θ = (σ ij (θ ) ) and u is an arbitrary vector of real parameters.

Proof
δ δ
Let the same symbol λi (θ ) be used to denote log fθ ( X1 , , X n ) as well as log fθ ( x1 , , xn ) . In this
δθi δθi
situation, condition (iii) becomes

∫ λi (θ ) fθ ( x1 , , xn ) dx = 0, (i )
A

and condition (iv) leads to,

⎧θ if i − j
∫ ti λ j (θ ) fθ ( x1 , , xn ) dx = ⎨ i
⎩0 otherwise
( ii )
A

since Ti is unbiased for θi .

Estimation & Confidence Interval ~ 1 of 17


k k
Let u1 , , uk be real numbers. Since Ti is unbiased for θi , ∑ uiTi is unbiased for ∑ uiθi ; i.e.,
i =1 i =1
k k

∫∑
i =1
ui ti fθ ( x1 , , xn ) dx = ∑ uiθi
i =1
( for all θ ∈ Θ )
A

This gives, on being differentiated with respect to θ j ,

∫∑
i =1
ui ti λ j (θ ) fθ ( x1 , , xn ) dx = u j because of ( ii )
A

k
Because of ( i ) again, ∫ ∑ ui [ti − θi ] λ j (θ ) fθ ( x1 , , xn ) dx = u j
A i =1

Taking another set of real numbers, c1 , c2 , , ck , we have


k k k k k k

∫ ∑∑ ui [ti − θi ] c j λ j (θ ) fθ ( x1 , , xn ) dx = ∑ c j u j , i.e., ∑∑ ui c j covθ (Ti , λ j (θ ) ) = ∑ c j u j


A i =1 j =1 j =1 i =1 j =1 j =1

k k
Nothing that the left hand side is the covariance between ∑ uiTi and ∑ ci λi (θ ) , we have, since
i =1 i =1

2
⎡ ⎛ k k ⎞⎤ ⎛ k ⎞ ⎛ k ⎞
⎢ covθ ∑ ∑
⎜⎜ uiTi , ci λi (θ ) ⎟⎟ ⎥ ≤ Varθ ∑
⎜⎜ uiTi ⎟⎟ Varθ ∑
⎜⎜ ci λi (θ ) ⎟⎟
⎣⎢ ⎝ i =1 i =1 ⎠ ⎦⎥ ⎝ i =1 ⎠ ⎝ i =1 ⎠
2
⎛ k ⎞
⎛ k ⎞
⎜⎜ ci ui ⎟⎟ ∑
⎝ i =1 ⎠
⇒ ∑
varθ ⎜⎜ uiTi ⎟⎟ ≥
⎛ k ⎞
⎝ i =1 ⎠ var
θ ⎜⎜ ci λi (θ ) ⎟⎟∑
⎝ i =1 ⎠

Let us now maximize the right-hand side with respect to the c ' s . Noting that the right hand side remains unchanged
if the c ' s are a multiplied by a common number and that the maximizing c ' s must be such that the correlation

between ∑ uiTi and ∑ ci λi (θ ) is a maximum (i.e., unity)


i i

k k
∑ ui [ti − θi ] = ∑ ci λi (θ )
i =1 i =1
k k k
⇒ uj ≡ ∑ ui covθ (Ti , λ j (θ ) ) = ∑ ci covθ ( λi (θ ) , λ j (θ ) ) ≡ ∑ ciδ ij (θ )
i =1 i =1 i =1

Hence the maximizing c ' s are such that (in matrix notation)

∆θ c = u
⇒ c = ∆θ−1u ( because of (V ) ) ,
⎛ k ⎞
and c′u = u ′∆θ−1u , ∑
varθ ⎜⎜ ci λi (θ ) ⎟⎟ = c′∆θ c = u ′∆θ−1u
⎝ i =1 ⎠
Hence
⎛ k ⎞

varθ ⎜⎜ uiTi ⎟⎟ ≥ u ′∆θ−1u
⎝ i =1 ⎠
u ′Σθ u ≥ u ′∆θ−1u

Estimation & Confidence Interval ~ 2 of 17


Problem: Consider the case of a random sample from a normal population whose mean (θ1 ) and variance (θ 2 ) are both

unknown.

1 ⎛ ( x −θ1 ) ⎞
2
− ⎜ ⎟
1 2 ⎜ θ2 ⎟
Here, f ( x) = e ⎝ ⎠ ; −∞ < x < ∞
2πθ 2
Now the likelihood function is as follows,
n
⎛ 1 ⎞ − 2θ ∑ ( xi −θ1 )
1 2

L ( x, θ1 ,θ 2 ) = ⎜ ⎟ e 2
⎜ 2πθ ⎟
⎝ 2 ⎠


n 1
⇒ ln L = − ln 2πθ 2 − ( xi − θ1 )2 (i )
2 2θ 2
Now we differentiate it with respect to θ1 and get,

δ ln L ( x )
∑ ( xi − θ1 )( −1)
2
=−
δθ1 2θ 2

∑ ( xi − θ1 )
1
=
θ2
⎡ δ ln L ( x ) ⎤
2

⎥ = 2 E ⎣⎡ ∑ ( xi − θ1 ) ⎦⎤ = 2 ∑ ⎡⎣ E ( xi − θ1 ) ⎤⎦
1 2 1 2
and E⎢
⎣ δθ 1 ⎦ θ2 θ2
1 n
= 2 nθ 2 =
θ2 θ2
Again we differentiate eq ( i ) with respect to θ 2 and we get,

δ ln L ( x ) ⎡ 2⎤
∑ ( xi − θ1 ) ∑ ( xi − θ1 )
n 1 1 2 1 1
=− + = ⎢−n + ⎥
δθ 2 2 θ 2 2θ 22 2θ 2 ⎣ θ 2 ⎦
⎡ δ ln L ( x ) ⎤
2
n
E⎢ ⎥ = 2
⎣ δθ 2 ⎦ 2θ 2
⎡ δ ln L ( x ) δ ln L ( x ) ⎤
E⎢ , ⎥=0
⎣ δθ 2 δθ 2 ⎦
θ2 2θ 22
Hence the lower bounds, the variance of unbiased estimator of θ1 and θ 2 are δ
21
(θ ) = and δ 22 (θ ) =
n n
respectively.

∑( Xi − X )
2
θ
The traditional unbiased estimators for θ1 and θ 2 are X and S = . Since Varθ ⎡⎣ X ⎤⎦ = 2 while
2
n −1 n
2θ 2
Varθ ⎡⎣ s 2 ⎤⎦ = 2 , the lower bound in the first case is attained but not that in the second.
n −1

Vector of Parameters
Let us assume that a random sample X 1 , X 2 , , X n of size n form the density f ( x ; θ1, θ 2 , , θ k ) is available,

where the parameter θ = (θ1 , θ 2 , , θ k ) and parameter space Θ are k − dimensional. We want to simultaneously

estimate τ1 (θ ) , τ 2 (θ ) , , τ r (θ ) , where τ j (θ ) , j = 1, , r is some function of θ = (θ1 , , θ k ) . Often k = r ,

but this need not be the case. An important special case is the estimation of θ = (θ1 , θ 2 , , θ k ) itself; then r = k ,

and τ1 (θ ) = θ1 , , τ k (θ ) = θ k . Another important special case is the estimation of τ (θ ) ; then r = 1 . A point

(
estimator of τ1 (θ ) , , τ r (θ ) ) is a vector of statistics, say (T1 , , Tr ) , where T j = t j ( X1 , , X n ) and T j is an

estimator of τ j (θ ) .

Estimation & Confidence Interval ~ 3 of 17


Unbiased
An estimator (T1 , , Tr ) , where T j = t j ( X1 , , X n ) ; j = 1, , r , is defined to be an unbiased estimator of

(τ1 (θ ) , , τ r (θ ) ) if and only if εθ ⎡⎣T j ⎤⎦ = τ j (θ ) for j = 1, , r and for all θ ∈ Θ .

For single estimator, we consider the variance of estimator as a member of its closeness to real valued function of
population parameter. Here, we seek generalization of the notion of variance to r dimensions. Several such
generalization have been proposed; we consider here only four of them
i) Vector of variances.
ii) Linear combination of variances.
iii) Ellipsoid of concentration.
iv) Wilks’ generalized variance.

1. Vector of Variances
Let the vector ( varθ [T1 ] , , var [Tr ]) be a measure of the closeness of the estimator (T1 , , Tr ) to
(τ1 (θ ) , , τ r (θ ) ) . Its main advantage is that it is very easy and simple. And the disadvantage of such a definition
is that our measure is vector-valued and consequently sometimes difficult to work.

2. Linear Combination of Variances


One way of over come the disadvantages faced in method (1) is used to linear combinations of variances, that is,

(T1 , , Tr ) to (τ1 (θ ) , , τ r (θ ) ) with ∑ j =1 a j varθ ⎡⎣T j ⎤⎦


r
measure the closeness of the estimator for suitably

chosen a j ≥ 0 .

Both of these (1) and ( 2 ) generalization of variance embody only the variances of the T j , j = 1, , r . But T j (θ )

are likely to be correlated. So, one should incorporate the covariance of T j ' s for measuring the closeness.

3. Ellipsoid of Concentration
Let (T1 , , Tr ) be an unbiased estimator of (τ1 (θ ) , , τ r (θ ) ) . Let σ ij (θ ) be the ij − th element of the inverse

of the covariance matrix of (T1 , , Tr ) , where the ij − th element of the covariance matrix is

σ ij (θ ) = covθ ⎡⎣Ti , T j ⎤⎦ . The ellipsoid of concentration of (T1 , , Tr ) is defined as the interior and boundary of the

ellipsoid
r r
∑∑ σ ij (θ ) ⎡⎣ti − τ i (θ )⎤⎦ ⎡⎣t j − τ j (θ )⎤⎦ = r + 2
i =1 j =1

The ellipsoid of concentration measures how concentrated the distribution of (T1 , , Tr ) is about

(τ1 (θ ) , , τ r (θ ) ) . The distribution an estimator (T1 , , Tr ) whose ellipsoid of concentration is contained within

the ellipsoid of concentration of another estimator (T1′, , Tr′ ) is more highly concentrated about

(τ1 (θ ) , , τ r (θ ) ) than is the distribution of (T1′, , Tr′ ) .

4. Wilks’ Generalized Variance


Let (T1 , , Tr ) be an unbiased estimator of (τ1 (θ ) , , τ r (θ ) ) . Wilk’s generalized variance of (T1 , , Tr ) is

defined to be determinant of the covariance matrix of (T1 , , Tr ) .


Estimation & Confidence Interval ~ 4 of 17
Risk Function

Rd (θ ) = Expected loss = E ⎡⎣ w (θ − d ( x ) ) ⎤⎦
= Smaller should be desired
= Smaller the risk better the estimator

Minimax Estimator
If a random variable X as a density function f (θ ; x ) and d ( x ) is some estimate of θ then the risk function is

Rd (θ ) = E ⎡⎣ w (θ , d ( x ) ) ⎤⎦

A minimax estimator d ( x ) is any estimator which minimize the supremum Sup Rd (θ ) .


θ

Properties of Minimax Estimator


i) If T * = t * ( X 1 , X 2 , , X n ) is a Bayes estimator having constant risk. i.e., Rt x (θ ) = constant then

T * = minimax estiamtor .

If tn′ dominants a minimax estimator tn then tn′ is also minimax.


*
ii)

iii) If an estimator has constant risk and is admissible it is minimax.

Properties of Admissible Estimator


i) If the loss function L is strictly convex than every admissible estimator must be normalized.
ii) If L is strictly convex and T be an admissible estimator and if t ′ is another estimator with the same risk that if

T ′ is an another estimator with the same risk i.e. R (θ , t ) = R (θ , t ′ ) then t = t ′ with probability 1 .

iii) Any unique Bayes estimator is admissible (here uniqueness mean that any two Bayes estimator on a set N

with Rθ ( N ) = θ ∀ Θ ).

Problem: If x1 , , xn are n independent Gaussian normal random variable with distribution function N ( µ , θ ) and the

loss function is the squared error. Find the minimax estimator of mean θ .

Solution
Consider a sequence of prior distribution with mean 0 and variance σ 2 . If θ is a prior distribution of

p (θ | x1 , , xn )
P (θ | x1 , , xn ) = for n = 1, x = x1
p ( x1 , , xn )

p (θ | x ) =
( )
N ( x ; θ , 1) N θ ; 0, σ 2

∫ N ( x ; θ , 1) N (θ ; 0, σ ) dθ
2

−∞

1 ⎡ 1 1+σ 2 ⎛ σ2 ⎞
2⎤

= exp ⎢ − ⎜ θ − x ⎟⎟ ⎥
2πσ 2 ⎢ 2 σ 2 ⎝⎜ 1+ σ 2 ⎠ ⎥
⎣ ⎦
(1 + σ ) 2

E (θ | x ) = d ( x )
xσ 2
=
1+ σ 2
Estimation & Confidence Interval ~ 5 of 17
σ2 σ2
V (θ | x ) = then Sup ( V (θ | x ) ) = Sup
1+σ 2 σ 1+ σ 2
⎛ ⎞
⎜ 1 ⎟
= Sup ⎜ ⎟
σ ⎜ 1+ 1 ⎟
⎜ ⎟
⎝ σ2 ⎠
σ 2x
and lim δ ( x ) = lim
σ →∞ δ →∞ 1 + σ 2

⎡ ⎤
= lim ⎢
1 ⎥
x =1
δ →∞ ⎢ 1 + 1⎥
⎣⎢ σ 2
⎦⎥
Hence x is a minimax estimator.

Problem: Find the minimax estimator of θ in sampling from the Bernoulli distribution using a squared error loss function.

Solution
A Bayes estimator is given by
n−∑ x
∫0 θθ ∑ (1 − θ ) (1 B ( a, b ) )θ a −1 (1 − θ )b−1 dθ ∫0 θ ∑ x + a (1 − θ )n−∑ x +b−1dθ
1 x 1
i i i i

= 1
n−∑ x
∫0 θ ∑ (1 − θ ) (1 B ( a, b ) )θ a −1 (1 − θ )b−1 dθ ∫0 θ ∑ x + a −1 (1 − θ )n−∑ x +b−1 dθ
1 x i i i i

B ( ∑ xi + a + 1, n − ∑ xi + b )
=
B ( ∑ xi + a, n − ∑ xi + b )

=
∑ xi + a + 1 n − ∑ xi + b × ∑ xi + a + n − ∑ xi + b
∑ xi + a + 1 + n − ∑ xi + b ∑ xi + a n − ∑ xi + b
=
∑ xi + a n + a + b
a + b + n +1 1

=
∑ xi + a
n+a+b
So, the Bayes estimator with respect to a beta prior distribution with parameters a and b is given by

t * ( x1 , x2 , , xn ) =
∑ xi + a = ∑ xi +
a
(i )
n+a+b n+a+b n+a+b
⎡ ⎤
∑ xi + B
1 a
⇒ t *AB ( x1 , x2 , , xn ) = A ⎢ A = n + a + b and B = n + a + b ⎥
⎣ ⎦

Risk of the estimator ,

(θ ) = E ⎡⎣( A∑ xi + B ) − θ ⎤⎦
2
ℜt *
AB

{ ( ∑ x − nθ ) + B − θ + nAθ }⎤⎦⎥
2
= E ⎡⎢ A
⎣ i

= A2 E ⎡⎢ ( ∑ xi − nθ ) {∑ E ( xi ) − nθ }
2⎤
⎥⎦ + ( B − θ + nAθ ) + 2 ( B − θ + nAθ ) A
2

2 ⎡
(∑ )
xi − nθ + ( B − θ + nAθ ) ⎤⎥
2 2
= A E⎢
⎣ ⎦
2 2 ⎡ 2⎤
= A n E ⎢ n ( xn − θ ) ⎥ + ( B − θ + nAθ )
2
⎣ ⎦
θ (1 − θ )
+ ( B − θ + nAθ )
2
= A2 n 2
n
= A2 nθ (1 − θ ) + ( B + θ ( nA − 1) )
2

Estimation & Confidence Interval ~ 6 of 17


= A2 nθ − A2 nθ 2 + B 2 + θ 2 ( nA − 1) + 2 ( nA − 1) Bθ
2

= θ 2 ⎢⎡( nA − 1) − nA2 ⎤⎥ + θ ⎡ nA2 + 2 ( nA − 1) B ⎤ + B 2


2
⎣ ⎦ ⎣ ⎦
will be constant if ( nA − 1)2 − nA2 = 0 & nA2 + ( 2nA − 1) B = 0

And ( )
A2 n 2 − n − 2nA + 1 = 0

⇒ A=
(
2n ± 4n 2 − 4 ⋅ 1 n 2 − n )= 2n ± 2 n
2(n − n) ⎛ 2
( )
2 2⎞
2⎜ n − n ⎟
⎝ ⎠
n± n
=
( n + n )( n − n )
1
=
n± n
1
=
n ( )
n ±1

Again, nA2 + 2 ( nA − 1) − B = 0
2
⎛ 1 ⎞
⎜ ⎟
nA2 n +1 ⎠
= ⎝
1
∴ B= for A =
2 (1 − nA ) ⎛
2 ⎜⎜ 1 −
n ⎞

n ( n +1 )
⎝ n + 1 ⎟⎠
1
=
2 n +1 ( )
1
Now, A=
n+a+b
⎡ ⎤
1 1 ⎢∵ A = 1 ⎥
⇒ =
n ( n +1 ) n+a+b ⎢
⎢⎣ n ( )
n + 1 ⎥⎥

⇒ n+a+b = n ( n +1 )
⇒ a+b = n+ n −n
⇒ a+b = n

a
Again B=
n+a+b
1 a
=
2 ( n +1) n+a+b

⇒ n + a + b = 2a ( n +1 )
⇒ n + n = 2a ) ( n +1

n+ n n ( n + 1) n
∴ a= = =
2 ( n + 1) 2 ( n + 1) 2

So our estimator is
∑ xi + a = ∑ xi + a
n+a+b n+ n

So, this is the Bayes estimator with constant risk. Hence it is the Minimax estimator.
Estimation & Confidence Interval ~ 7 of 17
Bayesian Confidence Interval

Example: Assuming each item coming off a production line either is or is not defective. So, we can call each item a
Bernoulli trial. Assume again the trials are independent with P ⎣⎡defective⎦⎤ = θ for each trial.

If we select n times from the production line and then,

⎧1 if the item i is defective


Let , X i = ⎨
⎩0 if the item i is not defective
The X1, X 2 , , X n is a random sample of a random variable x with parameter θ . We know that the conjugate prior

of θ is a beta density with parameters a and b .

For example, for our production line suppose our prior information suggest
E ⎡⎣θ ⎤⎦ = 0.01 Var ⎡⎣θ ⎤⎦ = 0.0001

The larger we take Var (θ ) , the less sure we are of our prior of our information. Thus we determine a & b .

a
= 0.01
a+b
⇒ a = 0.0101b

ab
= 0.0001
( a + b ) ( a + b + 1)
2

⇒ {
ab = 0.0001 ( 0.0101b + b ) ( 0.0101b + b + 1)
2
}
Now, if we observe ∑ X i = ∑ xi from the sample we observe that posterior distribution for θ is again a beta

distribution with parameters a + ∑ xi & b + n − ∑ xi

a+b+n
θ (1 − θ )
a+ ∑ xi b+n− ∑ xi
Thus the Bayes estimator of θ is mean of these posterior distribution i.e.,

∑ xi ⎤⎦ = a + ∑ xi + ∑
a+ xi
θ * = E ⎡⎣θ |
b + x − ∑ xi

=
a+ ∑ xi
a+b+n

= Bayes estimator of θ =
( ∑ xi + 0.98)
n + 97.02 + 0.98

Bayesian Interval
Given a random sample of a random variable the confidence interval can be evaluated and in a sense we are
100 (1 − α ) % sure that observe confidence interval covers the true unknown parameter value. Very similar

manipulation can be accomplished with the Bayesian approach. Suppose we are given a random sample of a
random variable x whose distribution depends on unknown parameter θ . The parameter θ has a prior density

fθ (θ ) . Once the sample values x1, x2 , , xn are known, we can compute the posterior distribution fθ | x (θ | x ) which

summarizes all the current information about θ then if c1 < c2 are two constants

p ⎣⎡c1 ≤ θ ≤ c2 | x ⎦⎤ = 1 − α

We are 100 (1 − α ) % sure that ( c1, c2 ) includes θ given the sample values. We will call such an interval ( c1, c2 ) a

100 (1 − α ) % Bayesian interval for θ .

Estimation & Confidence Interval ~ 8 of 17


Approximation confidence interval in large samples:
We know that under certain regularity conditions MLE ' s are asymptotically normal with mean θ and asymptotic
variance
1 1
σ n2 (θ ) = =−
⎡ δ ln f ( x,θ ) ⎤
2 ⎡ δ 2 ln f ( x,θ ) ⎤
nEθ ⎢ ⎥ n Eθ ⎢ ⎥
⎣⎢ δθ ⎦⎥ ⎢⎣ δθ 2 ⎥⎦

tn − θ
When such asymptotically normal estimation exists then may be taken as a pivotal quantity and a
σ n (θ )

100 (1 − α ) % C.I of θ may be taken as approximately

⎡ ⎤
⎢⎣Tn + zα 2σ n (θ ) , Tn + z1−α σ n (θ ) ⎥
2 ⎦

( tn − θ )
The above method provides a large sample confidence interval so long as can be inverted.
σ n (θ )

Example: Let X1, ( )


, X n be a random sample drawn from N 0, σ 2 . Here, θ = σ 2 . Find central C.I for σ 2 with an

approximate confidence coefficient 1 − α .

Solution:
2
1⎛ x ⎞
− ⎜ ⎟
The probability density function, ( )
f x ; 0, σ 2 =
1
2πσ 2
e 2⎝ σ ⎠

⎞ − 1 ∑ 2i 1 ∑ xi
n x2 2
⎛ 1 ⎛ 1 ⎞ −2 θ
n
The likelihood function is given by, L=⎜ ⎟ e 2 σ =⎜ ⎟ e
⎜ 2πσ 2 ⎟ ⎝ 2πθ ⎠
⎝ ⎠

Now taking ln in both sides

n n
ln L = − ln 2π − ln θ −
1 ∑ xi2
2 2 2 θ
Now,
δ ln L
=0
δθ

⇒ −
n1 1
+
xi2
=0

2 θ 2 θ2

⇒ θˆ =
∑ xi2
n
Again,

δ 2 ln L n
= 2 −2
xi ∑ 2

δθ 2
2θ 2θ 3
⎡ δ 2 ln L ⎤ n nθ n
E⎢ ⎥ = 2 −2 3 =−
⎣⎢ δθ ⎦⎥ 2θ 2θ 2θ 2
2

⎡ δ 2 ln L ⎤ n
∴ −E⎢ ⎥= 2
⎢⎣ δθ ⎥⎦ 2θ
2

We have,
1 2θ 2
σ n2 = =
⎡ δ ln L ⎤
2 n
−E ⎢ 2 ⎥
⎣⎢ δθ ⎦⎥

Estimation & Confidence Interval ~ 9 of 17


∴ 100 (1 − α ) % Confidence interval for σ 2 is given by,

⎢ ∑ xi2 + zα 2σ 4
,
∑ xi2 + z 2σ 4 ⎤⎥
⎢ n n n 1−α n ⎥
2 2
⎣ ⎦



⎢ ∑ xi2 + zα 2
θ,
∑ xi2 + z 2 ⎥
θ

⎡∵ θ = σ 2 ⎤
⎢ n n n 1−α n ⎥ ⎣ ⎦
2 2
⎣ ⎦



⎢ ∑ xi2 + zα 2 ∑ xi2 , ∑ xi2 + z 2 ∑ xi2 ⎤⎥ ⎡
⎢∵ θ = ∑ xi2 ⎤⎥
⎢ n n n n 1−α n n ⎥ ⎢ n ⎥
2 2
⎣ ⎦ ⎣ ⎦



⎢ ∑ xi2 ⎧⎪⎨1 + zα 2 ⎫⎪
⎬,
∑ xi2 ⎧⎪⎨1 + z 2 ⎫⎪⎤⎥

⎢ 1−α n ⎪⎭⎥

n ⎩⎪ 2 n ⎪⎭ n ⎪⎩ 2

⎡ ⎤



xi2 ∑ xi2 ⎥

∴ n , n
⎢⎡ ⎥
⎢ ⎢1 + z 2⎤ ⎡ 2 ⎤⎥
⎢ ⎣⎢ 1−α ⎥ ⎢1 + zα ⎥⎥
2 n ⎥⎦ ⎣⎢ 2 n ⎦⎥
⎣ ⎦
This C.I is not invariant under transformation of parameters. Thus if we take square roots of C.I then that will not get
the C.I of σ .

Now we will consider construction of Large Sample Confidence Intervals which are invariant under transformation of
parameter.

Suppose that p.d . f f ( x;θ ) is such that

⎡ δ ln f ( x;θ ) ⎤ ⎡ δ ln f ( x;θ ) ⎤ ⎡ δ 2 ln f ( x;θ ) ⎤


E⎢ ⎥=0 & V⎢ ⎥ = − E ⎢ ⎥ = k 2 ( say ) < ∞
⎣⎢ δθ ⎦⎥ ⎣⎢ δθ ⎦⎥ ⎣⎢ δθ 2
⎦⎥
n
Let X 1 , , X n be a random sample of size n drawn from f ( x;θ ) and L = ∏ f ( xi ; θ ) . Clearly then, each of the
i =1

∂ ln ( X i ; θ )
random variables ( i = 1, , n ) has mean zero and variance K 2 . Therefore, by the central limit
∂θ
theorem, their sample mean
δ ln L
1 δ ln L ⎛ k2 ⎞ δθ
~ N ⎜ 0, ⎟⎟ i.e., ~ N ( 0,1) (i )
n δθ ⎜ n
⎝ ⎠ ⎪⎧ ⎛ δ 2 ln L ⎞ ⎪⎫
⎨− E ⎜⎜ 2 ⎟⎟⎬
⎩⎪ ⎝ δθ ⎠ ⎭⎪
Using this property one can get a large sample C.I for θ . Note that the maximum likelihood estimate of θ has not
been used here.

Let, Φ (θ ) be the strictly increasing function of θ . Now,

δ ln L δ ln L δ Φ
=
δθ δ Φ δθ
⎡ δ 2 ln L ⎤ ⎡ δ 2 ln L ⎤ ⎛ δ Φ ⎞ 2 ⎛ δ ln L ⎞ δ 2 Φ
E⎢ ⎥ = E ⎢ 2 ⎥⎜ ⎟ +⎜ ⎟ 2
⎣⎢ δθ ⎦⎥ ⎣⎢ δ Φ ⎦⎥ ⎝ δθ ⎠ ⎝ δ Φ ⎠ δθ
2

⎡ δ 2 ln L ⎤ ⎛ δ Φ ⎞ 2 ⎡ δ 2 ln L ⎤ ⎡ δ ln L ⎤
Hence, E⎢ 2 ⎥ ⎜
= ⎟ E⎢ 2 ⎥ ⎢ Since, = 0⎥
⎢⎣ δθ ⎥⎦ ⎝ δθ ⎠ ⎢⎣ δ Φ ⎥⎦ ⎣ δΦ ⎦
Therefore, if (θ1 , θ 2 ) is a C.I for θ then {Φ (θ1 ) , Φ (θ 2 )} is C.I for Φ (θ ) .

Estimation & Confidence Interval ~ 10 of 17


Example: Let X1 , (
, X n be a random sample from a N 0, σ 2 population. Here )
n ∑ xi ⎛ ∂ 2 ln L ⎞
2
∂ ln L 2n
=− + and E ⎜⎜ 2 ⎟⎟ = −σ2
∂σ σ σ3 ⎝ ∂ σ ⎠

∑ X i2 −n
Hence σ2 ~ N ( 0, 1)
2n
A central 100 (1 − α ) % confidence interval for σ is, therefore,

⎧ 1 1
2⎫
⎪⎪⎛⎜ ∑ ⎞ 2
X i2 ⎟

⎜ ∑
X i2 ⎟
⎞ ⎪⎪
⎨⎜ ⎟
,
⎜n−z ⎬ (1)
⎪⎜ n + zα 2n ⎟ ⎜ α 2n ⎟⎟ ⎪
⎝ ⎠ ⎝ ⎠
⎩⎪ ⎭⎪
2 2

If the variance σ 2 is treated as parameter, then the method yields 100 (1 − α ) % confidence interval for σ 2 as


⎪⎪
∑ X i2 ∑ X i2 ⎫
n , n ⎪⎪
⎨ ⎬
⎪1 + zα 2 1 − zα 2 ⎪
⎩⎪ 2 n 2 n ⎭⎪

It may be noted that the large sample confidence intervals based on maximum likelihood estimators will be shorter
on an average that the large sample confidence intervals based on any other estimator.

Confidence Belt
Let T be a statistic whose distribution depends on θ , preferably a sufficient statistic for θ . For each θ , let us

determine the values t1 (θ ) and t2 (θ ) such that

Pθ ⎡⎣T < t1 (θ ) ⎤⎦ = α1 and Pθ ⎡⎣T > t2 (θ ) ⎤⎦ = α 2

Where, α1 + α 2 = α . Supposing Θ is a

non-degenerate real interval, by varying

θ we shall get two curves from t1 (θ )

and t2 (θ ) . The first curve, C1 , has the

equation t = t1 (θ ) and the second, C ,

the equation t = t2 (θ ) . Let the two

curves be as in the following figure, so


that any line drawn perpendicular to the
t − axis intersects both the curves.

Let us denote the ordinate of the point of


intersection of this line with C1 by

θ 2 ( t ) and that of the point of intersection of the line with C2 by θ1 ( t ) , so that θ1 ( t ) < θ 2 ( t ) .

Consider now the two random variables θ1 ( t ) and θ 2 ( t ) , which are so defined that for

T = t , θ1 (T ) = θ1 ( t ) and θ 2 (T ) = θ 2 ( t ) . From the way θ1 (T ) and θ 2 ( t ) have been obtained, it is obvious that

θ1 (T ) ≤ θ ≤ θ 2 (T ) iff t1 (θ ) ≤ T ≤ t2 (θ ) .

Estimation & Confidence Interval ~ 11 of 17


As such,

Pθ ⎡⎣θ1 (T ) ≤ θ ≤ θ 2 (T ) ⎤⎦ = Pθ ⎡⎣t1 (θ ) ≤ T ≤ t2 (θ ) ⎤⎦ = 1 − α ∀ θ ∈Θ

Hence given a set of observations X , if t denotes the corresponding value of T , then θ1 ( t ) and θ 2 ( t ) are a pair

of confidence limits to θ with confidence coefficient 1 − α .

The region in the (T , θ ) - plane which is bounded by the two curves C1 and C2 is called a confidence belt for θ

corresponding to the confidence coefficient 1 − α .

Example: Suppose ( X i , Yi ) , i = 1, 2, , 20 are a random sample drawn from bivariate normal with

( )
BIV µ x , µ y , σ x2 , σ y2 , ρ where the all parameters are not known. We want to set confidence limits for ρ .

Solution

We know, sample correlation coefficient, r=


∑ ( X i − X )(Yi − Y )
∑ ( X i − X ) (Yi − Y )
2 2

The distribution of r depends on ρ only. Let α = 0.05 . From the tables of the correlation coefficient by F.N David

et al., we may obtain, for each ρ , the values r1 ( ρ ) and r2 ( ρ ) of r such that

Pθ ⎡⎣ r < r1 ( ρ ) ⎤⎦ = Pθ ⎡⎣ r > r2 ( ρ ) ⎤⎦ = 0.025.

These are shown in the following table for the values of ρ from −0.9 to 0.9 at intervals of 0.1 .

Table: Values of r1 ( ρ ) and r2 ( ρ ) for n = 20

ρ r1 ( ρ ) r2 ( ρ ) ρ r1 ( ρ ) r2 ( ρ )
−0.9 −0.97065 −0.77222 0 −0.44486 0.44486
−0.8 −0.92223 −0.56661 0.1 −0.35862 0.52565
−0.7 −0.92289 −0.38984 0.2 −0.26394 0.59586
−0.6 −0.83500 −0.22886 0.3 −0.15880 0.71366
−0.5 −0.78095 −0.08607 0.4 −0.04226 0.72585
−0.4 −0.72585 0.04226 0.5 0.08607 0.78095
−0.3 −0.71366 0.15830 0.6 0.22886 0.83500
−0.2 −0.59586 0.26394 0.7 0.38984 0.92289
−0.1 −0.52565 0.35862 0.8 0.56661 0.92223
0.9 0.77222 0.97065

Now, given the observed value of r for a particular random sample of size 20 from the bivariate normal distribution,
we can obtain the confidence limits to ρ with confidence coefficient 1 − 2 × 0.025 = 0.95 . Suppose, e.g., the

observed value or r is 0.55 . Treating 0.55 as a value of r2 ( ρ ) , we find, by inverse interpolation, the

corresponding value of ρ to be 0.135 . Similarly, treating 0.55 as a value of r1 ( ρ ) , the corresponding value of ρ

is found to be 0.791 . Hence for this value of r , the 95% confidence limits to ρ are 0.135 and 0.791.

Estimation & Confidence Interval ~ 12 of 17


Shortest Confidence Intervals
Suppose we have two statistics T1 & T2 of

P ⎡⎣T < t1 (θ ) ⎤⎦ = α1 and P ⎡⎣T > t2 (θ ) ⎤⎦ = α 2

in such a way that α1 = α 2 = α . However, it is clear that α1 & α 2 may be chosen in infinitely many ways, each
2
satisfying the conditions α i ≥ 0 and α1 + α 2 = α . Let us consider a particular function

n ( X −θ )
ψ (T , θ ) =
σ
If α not fixed then we can get many confidence interval. But α is fixed then we have many confidence interval.

So we need some criterion which may make a choice among these infinite set of confidence. And obvious method of
selecting one out of the possible confidence interval is based on the width of the interval.

Let us suppose that T1 & T2 are two values such that

P ⎣⎡T1 ≤ τ (θ ) ≤ T2 ⎦⎤ = 1 − α (i )
Then the confidence interval given by T1 & T2 will be said to be better than that of the interval given by T1′ & T2′

which satisfy if

T2 − T1 ≤ T2′ − T1′ ∀ θ ∈Θ ( ii )

If equation ( ii ) holds for every other pair of statistics T1′ & T2′ satisfying Pθ ⎣⎡T1 ≤ γ (θ ) ≤ T2 ⎦⎤ = 1 − α for all θ ∈ Θ

then the confidence interval given by T1 & T2 will be called uniformly shortest confidence interval for τ (θ ) based

on the statistic T .

( )
Example: Consider X ~ N θ , σ 2 where σ 2 is known. Find the shortest confidence interval for θ .

Solution

⎡ n ( X −θ ) ⎤
We have, P ⎢τ1−α1 ≤ ≤ τα2 ⎥ = 1 − α
⎢⎣ σ ⎥⎦
⎡ σ σ ⎤
⇒ P ⎢ X − τα2 ≤ θ ≤ τ1−α1 ⎥ = 1−α
⎣ n n⎦

The length of the corresponding confidence interval,


σ
L= ⎡τ α + τ1−α ⎤
n⎣ 2 1 ⎦

So, we have to minimize L i.e., minimize τ α 2 + τ 1−α1 subject to the condition α1 ≥ 0 and α1 + α 2 = α .

Due to symmetry of the distribution of n ( X − θ ) σ about zero, the difference will be minimum when

τ1−α1 = −τ α 2 i.e. α1 = α 2 = α 2

Hence the interval is in fact the shortest confidence interval based on the distribution of X .
Estimation & Confidence Interval ~ 13 of 17
In some situation the length of he confidence interval may involve some function of sample observations, e.g., when,
under the normal set-up the confidence interval for µ is obtained from the t − distribution for the statistic

n ( X − µ ) S or when, under the same set-up, the confidence interval for σ 2 is obtained from the

∑( Xi − X )
2
χ 2 − distribution for σ 2 . Here in order to make choice among all possible confidence intervals with
same confidence coefficient, we may make use of the average or expected length of the confidence interval.
For the statistics T1 and T2 expected length is,

Eθ (T2 − T1 )
The interval for which these expected length is minimum may be called the interval with shortest expected length or
shortest average length.

Example: Let X 1 , X 2 , , X n is a random sample draw from N µ , σ 2 ( ) here both µ and σ 2 are unknown. We have

to estimate shortest confidence interval for µ .

Solution
⎡ n (X − µ) ⎤
We have, P ⎢t(1−α1 ), n −1 ≤ ≤ tα 2 , n −1 ⎥ = 1 − α
⎢⎣ S ⎥⎦
⎡ S S ⎤
⇒ P ⎢ X − tα 2 ,n −1 ≤ µ ≤ X − t1−α1 ,n −1 ⎥ = 1−α
⎣ n n⎦
The expected length of the confidence interval is,
Eθ ( S )
( tα ,n−1 − t1−α ,n−1 )
2 1
n
= kσ ⎡⎣tα 2 ,n −1 − t1−α1 ,n −1 ⎤⎦

where k is constant that depends on n alone. So we have to minimize ⎡⎣tα 2 , n −1 − t1−α1 , n −1 ⎤⎦ subject to the condition

α1 ≥ 0, α 2 ≥ 0 and α1 + α 2 = α .

Due to symmetric of the t − distribution around zero, the difference tα 2 , n −1 − t1−α1 , n −1 will be minimum if

t1−α1 ,n −1 = −tα 2 , n −1 , i.e. when α1 = α 2 = α .


2

( )
Example: Let X ~ N µ , σ 2 where µ is known. Find the confidence interval for σ 2 .

Solution
Pθ ⎡ ∑ ( xi − µ ) θ < χ12−α1 ,n ⎤ = α1 ⎫
2
The inequalities
⎣ ⎦ ⎪
⎬ here θ = σ 2

∑ ( xi − µ ) ⎤
θ > χα 2 ,n = α 2 ⎪
2 2
and Pθ
⎣ ⎦ ⎭

∑ i
⎡ ( X − µ )2
∑( Xi − µ ) ⎤
2
lead to the result Pθ ⎢ ≤θ ≤ ⎥ = 1−α
⎢ χα22 ,n χ12−α1 ,n ⎥
⎣ ⎦
The corresponding confidence interval has the length,

⎡ 1 1 ⎤
∑( Xi − µ )
2
⎢ 2 − 2 ⎥
⎢⎣ χ1−α1 ,n χα 2 ,n ⎥⎦

⎡ 1 1 ⎤
which has the expected value nθ ⎢ 2 − 2 ⎥
⎣⎢ χ1−α1 , n χα 2 , n ⎦⎥
Estimation & Confidence Interval ~ 14 of 17
⎡ 1 1 ⎤
The minimization of this expected length amounts to minimization of ⎢ − ⎥ subject to the condition,
⎢⎣ χ1−α1 ,n χα 2 ,n ⎥⎦
2 2

χ 22

∫ f ( χ ) d ( χ ) = 1−α
2 2

χ12

where, χ12 = χ12−α1 , n , χ 22 = χα22 , n and f is the p.d.f of the χ 2 − distribution with n degrees of freedom.

Using Lagrange’s method of undetermined multipliers, which involves the partial differentiation of

⎡ χ 22 ⎤
∫ ( ) ( )
1 1 ⎢ f χ 2 d χ 2 − (1 − α ) ⎥
− + λ
χ12 χ 22 ⎢ 2 ⎥
⎣ χ1 ⎦

with respect to χ1 and χ 2 , we get the minimizing equation as


2 2

1
χ14
( )
+ λ f χ12 = 0 and
1
χ 24
( )
+ λ f χ 22 = 0

( )
⇒ 1 + χ14 λ f χ12 = 0 (1) ⇒ ( )
1 + χ 24 λ f χ 22 = 0 ( 2)

4
( )
Now from equation (1) and ( 2 ) we can write, χ1 f χ1 = χ 2 f χ 2
2 4
( ) is satisfied, besides the equation,
2

χ 22

∫ f (χ ) d ( χ ) = 1−α
2 2

χ12

The actual determination of the values χ12 and χ 22 will, of course, by pretty difficult. In practice, one takes χ12 and

α
χ 22 such that α1 = α 2 = . But this may make the average length too big.
2

For example, if n = 10, α = 0.05, α1 = α 2 = 0.025 then χ12 = 3.247 and χ 22 = 20.483 . So, average length of the

interval is,

⎡ 1 1 ⎤
10 θ ⎢ − ⎥ = 3.0318 θ
⎣ 3.247 20.483 ⎦

On the other hand, if we take α1 = 0.05 , α 2 = 0 then χ12 = 3.940 and χ 22 = ∞ then the average length of the
interval is ,

⎡ 1 ⎤
10 θ ⎢ − 0 ⎥ = 2.58 θ
⎣ 3.940 ⎦

⎡ v ⎤ n
Thus this second procedure, where the confidence interval will be of the form ⎢ 0, ⎥ , where v =
χ12 ⎦⎥
∑ ( xi − µ )2 ,
⎣⎢ i =1

would seem to be preferable to this procedure. Thus, this interval is, in fact, not only shorter on the average, but
shorter in every case.

Case of Discrete Random Variable


The case of discrete random variables requires to be separately dealt with, for if we want to apply one of the previous

procedures, we immediately face a difficulty. In this case we cannot hope to get for each α ( 0 < α < 1) a confidence

interval that will have confidence coefficient exactly equal to 1 − α .

Estimation & Confidence Interval ~ 15 of 17


One way of avoiding this problem is to require only that the confidence coefficient be at least 1 − α . Then the

statistics T1 and T2 will provide confidence limits to a parametric function γ (θ ) if

Pθ ⎡⎣T ≤ γ (θ ) ≤ T2 ⎤⎦ ≥ 1 − α for all θ ∈ Θ

The actual determination of the confidence intervals may be carried out by drawing confidence belts.

Example: Let X 1 , X 2 , , X10 be a random sample from a (point binomial) distribution with p.m. f .
1− x
⎪⎧θ x (1 − θ ) if x = 0, 1
fθ ( x ) = ⎨ where 0 ≤ θ ≤ 1.
⎪⎩0 oterwise

For obtaining confidence limits to θ with a confidence coefficient at least equal to 0.90 , we may first determine, for

a suitable set of values of θ , the values t1 (θ ) and t2 (θ ) of the sufficient statistic T = ∑ Xi such that
i

Pθ ⎡⎣T < t1 (θ ) ⎤⎦ ≤ 0.05 and Pθ ⎡⎣T > t2 (θ ) ⎤⎦ ≤ 0.05

the inequalities for Pθ being made as near to equalities as possible.

For values of θ from 0.1 to 0.9 (taken at intervals of 0.1 ), these numbers t1 (θ ) and t2 (θ ) are as shown in the

table bellow:

θ t1 (θ ) t2 (θ ) If we draw t1 (θ ) and t2 (θ ) for different values of θ we

0.1 0 3 can get a confidence belt of θ .


0.2 0 4 Pθ ⎡⎣t1 (θ ) ≤ T ≤ t2 (θ ) ⎤⎦ ≥ 0.90
0.3 1 5
the confidence coefficient belt can be improved and the
0.4 2 7
0.5 2 8 confidence coefficient made closer to 0.9 if n is make
0.6 3 8 large and if at the same time we tabulate t1 (θ ) and
0.7 5 9
0.8 6 10 t2 (θ ) at finer intervals of θ .
0.9 7 10

Theory of Confidence Set


In this context, we are interested in a set of the parameter space Θ , determined in the light of the observations X ,
that may be supposed to cover the true value(s) of the parameter(s) and that is why a concept of confidence set
rather than confidence intervals.

Let S be a set of parameter space Θ , then we shall write ' S c θ ' to mean that this set covers or includes θ , so

that S c θ ⇔ θ ∈ S .

Definition
A family of sets S ( X ) , for varying x ∈ ℑ , of the parameter space Θ is said to be a family of confidence sets at the

level 1 − α (or with the confidence coefficient 1 − α ) if

Pθ ⎣⎡ S ( X1 , X 2 , , X n ) c θ ⎦⎤ = 1 − α for all θ ε Θ

Estimation & Confidence Interval ~ 16 of 17


Definition
A family of sets S0 ( Χ ) , for varying x ∈ ℑ , of the parameter space Θ is said to constitute a family of uniformly more

accurate (or most selective or smallest) confidence sets if

Pθ ⎡⎣ S0 ( X1 , X 2 , , X n ) c θ ⎤⎦ = 1 − α for all θ ∈ Θ (i )
and Pθ ⎡⎣ Sθ ( X 1 , X 2 , , X n ) c θ ⎤⎦ ≤ Pθ ⎡⎣ S ( X1 , X 2 , , X n ) c Θ ⎤⎦ for all θ , θ ′∈ Θ (θ ≠ θ ′ ) ( ii )

whatever, the other family of sets satisfies ( i ) and ( ii ) . The implication of equation ( ii ) is that it has a smaller

probability of including a wrong value or set of values of the parameter θ then any other family of sets at the same

level. In this sense S0 ( x ) is the smallest confidence set of level α corresponding to the set of observation x . In

most cases a family of UMA sets cannot be obtained. Hence we introduce the concept of unbiasedness.

Definition
A family of sets S ( Χ ) for different values of x ∈ ℑ of the Θ is said to constitute a family of unbiasedness

confidence sets of level 1 − α if

Pθ ⎡⎣ S ( X1 , X 2 , , X n ) c θ ⎤⎦ ≤ 1 − α for all θ , θ ′ ∈ Θ, θ ≠ θ ′

Hence S ( Χ ) is a family of unbiased sets iff the probability for S ( X 1 , X 2 , , X n ) to cover θ when some

alternative value θ ′ is true does not exceed the same probability for the case when θ itself is true. Surely this is a
desirable feature of a family of confidence sets.

Uniformly Most Accurate Unbiased Set (UMAU)


A family of sets S0 ( Χ ) , for varying x ∈ ℑ of the parameter space Θ is said to constitute a family of uniformly most

accurate unbiased (UMAU ) confidence sets of level 1 − α if

Pθ ⎣⎡ S0 ( X 1 , X 2 , , X n ) c θ ⎦⎤ = 1 − α for all θ ∈ Θ,
Pθ ′ ⎡⎣ S0 ( X 1 , X 2 , , X n ) c θ ⎤⎦ ≤ 1 − α for all θ , θ ′∈Θ (θ ≠ θ ′ )
and Pθ ′ ⎡⎣ S0 ( X1 , X 2 , , X n ) c θ ⎤⎦ ≤ Pθ ′ ⎡⎣ S ( X 1 , X 2 , , X n ) c θ ⎤⎦ for all θ , θ ′∈Θ (θ ≠ θ ′ )

Estimation & Confidence Interval ~ 17 of 17


Hypothesis-I

Most Powerful Test


The critical region w is the most powerful critical region of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 if

P ( x ∈ w H 0 ) = L0 dx = α
∫ ... ... ... (1)
w

and P ( x ∈ w H1 ) ≥ P ( x ∈ w1 H1 ) ... ... ... ( 2)


for every critical region w1 satisfying (1) . The test based on the most powerful critical region is called most powerful

test of level α .

Uniformly Most Powerful (UMP) Test


The region w is called uniformly most powerful (UMP) critical region of size α for testing H 0 : θ = θ 0 against

H1 : θ ≠ θ 0 i.e. against H1 : θ = θ1 ≠ θ 0 if

P ( x ∈ w H 0 ) = L0 dx = α
∫ ... ... ... (1)
w

and P ( x ∈ w H1 ) ≥ P ( x ∈ w1 H1 ) ... ... ... ( 2)


for all θ ≠ θ 0 whatever the region w1 satisfying (1) . The test based on the uniformly most powerful critical region is

called uniformly most powerful test of level α .

Unbiased Test and Unbiased Critical Region


Let us consider the testing of H 0 : θ = θ 0 against H1 : θ = θ1 . The critical region w and consequently the test based

on it is said to be unbiased if the power of the test exceeds the size of the critical region i.e.
Power of the test ≥ Size of the C.R
⇒ 1− β ≥ α
⇒ Pθ1 ( w ) ≥ Pθ0 ( w )
⇒ P [ x : x ∈ w | H1 ] ≥ P [ x : x ∈ w | H 0 ]
In other words, the critical region w is said to be unbiased if

Pθ1 ( w ) ≥ Pθ0 ( w ) ; ∀ θ ( ≠ θ 0 ) ∈ Ω .

Uniformly Most Powerful Unbiased (UMPU) Test


Let φ be an unbiased test (or w a critical region) of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 ; θ1 ∈ Ω ,

i.e.

i) E {φ ( x ) | θ 0 } = P ( x ∈ w | θ0 ) = α

ii) E {φ ( x ) | θ1} ≥ E {φ ( x ) | θ 0 } ; ∀ θ1 ∈ Ω

Suppose that for every other test φ satisfying the conditions (1) and ( 2 ) we have
*

{
E {φ ( x | θ1 )} ≥ E φ * ( x | θ1 )} ; ∀ θ1 ∈ Ω

then φ is a uniformly most powerful unbiased (UMPU) test of size α .

Hypothesis-I ~ 1 of 11
UMPU Type A1 Test

Let φ be an unbiased test (or w a critical region) of sizw α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 ;

θ1 ∈ Ω , i.e.

i) E {φ ( x ) θ0 } = P ( x ∈ w θ0 ) = α

ii) E {φ ( x ) θ1} ≥ E {φ ( x ) θ 0 } ; ∀ θ1 ∈ Ω

δ
iii) E {φ ( x ) θ1} =0
δθ1 θ1 =θ0

Then φ is called UMPU type A1 test. For a UMPU test it is not required that power curve should have a regular

minimum at θ 0 but this is often the name UMPU test is used to imply type A1 test.

Show that 1 − β ≥ α .

Let w be a BCR of size α for testing a simple H 0 against a simple H1 . Then by definition we have,

P ( x ∈ w H 0 ) = ∫ L ( x H 0 ) dx = α
w

By Neyman-Pearson lemma, we have,

L ( x H0 )
≤K if x ∈ w ... ... ... (i )
L ( x H1 )
L ( x H0 )
and ≥K if x ∈ ( S − w ) ... .... ... ( ii )
L ( x H1 )

From ( i ) we have, K .L ( x H1 ) ≥ L ( x H 0 )

⇒ K ∫ L ( x H1 ) dx ≥∫ L ( x H 0 ) dx
w w
⇒ K (1 − β ) ≥ α " " " ( iii )
Again from ( ii ) we have, K .L ( x H1 ) ≤ L ( x H 0 )

⇒ K ∫ L ( x H1 ) dx ≤ ∫ L ( x H 0 ) dx
S −w S −w
⇒ K β ≤ (1 − α ) " " " ( iv )
From ( iii ) and ( iv ) we have, K (1 − α )(1 − β ) ≥ K αβ
⇒ 1− β ≥ α ( Proved )

Example: Let x1 , x2 , ..., xn be a random sample darwn from N ( µ ,1) . For testing H 0 : µ = µ0 against

H1 : µ ≠ µ1 ≠ µ0 , show that for α1 = α 2 UMPU test exists.

Solution
Since x1 , x2 , ..., xn are drawn from N ( µ ,1) , we have,

n
⎛ 1 ⎞ − 2 ∑ ( xi − µ0 )
1 2

L ( x H0 ) = ⎜ ⎟ e
⎝ 2π ⎠
n
⎛ 1 ⎞ − 2 ∑ ( xi − µ1 )
1 2

and L ( x H1 ) = ⎜ ⎟ e
⎝ 2π ⎠

Hypothesis-I ~ 2 of 11
According to Neyman-Pearson lemma, we have the BCR is given by

L ( x H0 )
≤K
L ( x H1 )


⎡ n
{ 2 ⎤
exp ⎢ − ( x − µ0 ) − ( x − µ1 ) ⎥ ≤ K
⎣ 2
2


}

⎡ n 2
{ ⎤
exp ⎢ − x − 2 x µ0 + µ02 − x 2 + 2 x µ1 − µ12 ⎥ ≤ K
⎣ 2 ⎦
}

⎡ n
⎣ 2
{ ⎤
exp ⎢ − 2 x ( µ1 − µ0 ) + µ02 − µ12 ⎥ ≤ K

( )}

n
{
− 2 x ( µ1 − µ0 ) + µ02 − µ12 ≤ ln K
2
( )}
⇒ ( 2
2 x ( µ1 − µ0 ) + µ02 − µ12 ≥ − ln K
n
)
⇒ ( 2
2 x ( µ1 − µ0 ) ≥ µ12 − µ02 − ln K
n
)
⇒ x (µ − µ ) ≥
(µ 2
1 − µ02 ) − 1 ln K " " " (i )
1 0
2 n
IF µ1 > µ0 then

µ1 + µ0 1
x≥ + ln K
2 n ( µ0 − µ1 )
⇒ x ≥ λ1 ( say ) " " " ( ii )

We know that

P ( x ≥ λ1 H 0 ) = α

⇒ ∫ f ( x ) dx = α under H 0 : µ = µ0
λ1

n −
n
( x − µ0 ) 2 ⎛ σ2 ⎞

2π ∫ e 2 dx =α since x ~ N ⎜ µ ,

⎝ n
⎟⎟

λ1
∞ 2
n −z 1

2π ∫
1− µ
e 2
n
dz = α
0
1
n
∞ 2
1 −z

2π ∫ e 2 dz =α

We have,
λ1 − µ0
zα =
1
n
1
⇒ λ1 − µ0 = zα
n
1
⇒ λ1 = µ0 + zα
n
Hence from equation we have, the BCR is
1
x ≥ µ0 + zα
n

Hypothesis-I ~ 3 of 11
Again, if µ1 < µ0 , then frim the equation ( i ) we have, the BCR is:

µ12 − µ02 1
x ( µ0 − µ1 ) ≥ − ln k
2 n
µ0 + µ1 1
⇒ x≥− − ln k
2 n ( µ0 − µ1 )
µ0 + µ1 1
⇒ x≤ + ln k
2 n ( µ0 − µ1 )
⇒ x ≤ λ2 ( say ) " " " ( iii )
Again, we know that,

P ( x < λ2 H 0 ) = α
λ2
⇒ ∫ f ( x ) dx = α under H 0 : µ = µ0
−∞
λ2 n
n − ( x − µ0 )2

2π ∫ e 2 dx =α
−∞
λ2 − µ0
1
z2
x − µ0
n
n − 1

2π ∫ e 2 dz = α
1
=z ⇒ dx =
n
dz
−∞
n
∞ z2
1 −

2π ∫
λ2 − µ0
e 2 dz = 1 − α

1
n
∞ z2
1 −

2π ∫ e 2 dz = 1 − α
z1−α

λ2 − µ0
∴ z1−α =
1
n
1
⇒ λ2 = µ0 + z1−α
n
By symmetry of normal distribution, we have,
z1−α = − zα
1
∴ λ2 = µ0 − zα
n

From the equation ( iii ) we have the BCR is

1
x ≤ µ0 − zα
n

So that we have to w , the critical region as

1 1
w : x ≤ xα 2 , x ≥ xα1 where, xα1 = µ0 + zα1 and xα 2 = µ0 − zα 2
n n
where z is N ( 0,1) and α1 % to the right and α 2 % to the left side value.

∴ P ⎡⎣ z ≤ zα1 ⎤⎦ = α1 and

P ⎡⎣ z ≥ zα 2 ⎤⎦ = α 2 where, α1 + α 2 = α

Hypothesis-I ~ 4 of 11
For µ1 > µ0 , the power function is

∞ ∞

∫ f ( x ) dx = ∫ f ( z ) dz = 1− F (m)
xα 2 xα 2 − µ1
=m
1
n

= F ( −m )
⎛ ⎞
⎜ µ1 − xα 2 ⎟
= F⎜ ⎟
⎜ 1 n ⎟
⎝ ⎠
⎡ 1 ⎤
= F ⎡ n ( µ1 − µ0 ) + zα 2 ⎤ ⎢∴ xα 2 = µ0 − zα 2 ⎥
⎣ ⎦
⎣ n⎦

For µ1 < µ0 , the power function is


xα1 − µ2
1
xα1 n

∫ f ( x ) dx = ∫ f ( z ) dz
−∞ −∞

= F ⎡ − n ( µ1 − µ0 ) + zα1 ⎤
⎣ ⎦

So power,

P = F ⎡ n ( µ1 − µ0 ) + zα 2 ⎤ + F ⎡ − n ( µ1 − µ0 ) + zα1 ⎤
⎣ ⎦ ⎣ ⎦
⇒ P = F ⎡ n ∆ + zα 2 ⎤ + F ⎡ − n ∆ + zα1 ⎤
⎣ ⎦ ⎣ ⎦ [ ∆ = µ1 − µ0 ]
n ∆+ zα 2

z2 − ( n ∆− zα1 ) −
z2
1 1
⇒ P= ∫ 2π
e 2 dz + ∫ 2π
e 2 dz
−∞ −∞

( ) ( )
2 2
n ∆+ zα 2 n ∆− zα1
1 − 1 1 − ⎛ 1 ⎞
∴ P= e 2 + e 2
⎜− ⎟=0
2π n 2π ⎝ n⎠
⎧ ( n ∆+ zα 2 )
2
( n ∆− zα1 )
2

n ⎪ − − ⎪
⇒ ⎨e 2 −e 2
⎬=0
2π ⎪ ⎪
⎩ ⎭
( )
2
n ∆− zα1
1
( )
2
− n ∆+ zα 2 −
⇒ e 2 =e 2

⇒ n ∆ + zα 2 = n ∆ − zα1
⇒ − zα 2 = z α1
∴ α1 = α 2

Thus we see the power curve is minimum at µ1 = µ0 if and only if α1 = α 2 . Otherwise the minimum occurs at some

µ1 ≠ µ0 implying that the probability of rejecting H 0 is actually smaller when H 0 is false them when it is true,
Evidently two curves (b ) and (c) representing one sided UMP tests are biased. Power curve (a) represents a

most powerful test among all unbiased tests, but not a most powerful among all tests.

Locally Uniformly Most Powerful Unbiased (LUMPU) Test


An unbiased test which is most powerful in the neighborhood of θ 0 is called locally uniformly most powerful unbiased

test. This test is also called uniformly most powerful unbiased test of type A . The critical region associated with this
test is called unbiased critical region of type A .
Hypothesis-I ~ 5 of 11
The region w is said to be a type A critical region of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 , if

i) P ( x ∈ w | H0 ) = α
ii ) P ( x ∈ w | H1 ) ≥ α

(
iii ) P ( x ∈ w | H1 ) ≥ P x ∈ w* | H1 )
δ
iv) ⎡ P ( x ∈ w | H1 ) ⎤⎦ =0
δθ1 ⎣ θ1 =θ0

δ2 δ2 ⎡
v) ⎡
δθ12 ⎣
P ( x ∈ w | H 1 ) ⎤ * ⎤
⎦θ1 =θ0 δθ 2 ⎣ P x ∈ w | H1 ⎦θ =θ

1 0
( )
1

where wt is any other region satisfying conditions (1) to ( iv ) .

We must choose a critical region for which the power is largest in the neighborhood of H 0 : θ = θ 0 . This condition is

made by ( v ) , conditions ( i ) , ( ii ) and ( iii ) controls the first type of error and unbiasedness and condition ( iv )
makes the region locally unbiased. This test is recommended only when H 0 and H1 are close to each other. Also

condition (v) states that the rate of increase of the curve related to w is very large than that of w* in the

neighborhood of θ 0 .

Construction of Type A regions


Let us consider the problem of constructing a UMP unbiased region for H 0 : θ = θ 0 against H1 : θ ≠ θ 0 when no

UMP region exists. This statement states the following theorem:

Theorem
If w be an MP region for testing H 0 : θ = θ 0 against H1 : θ = θ1 , then it is necessarily unbiased. Similarly, if w be

UMP for testing H 0 : θ = θ 0 against H1 : θ ∈ Ω it is necessarily unbiased.

Proof
If w be an MP region of size α for testing H 0 against H1 then for a non-negative constant k ,

∫ L0 ( x ) dx = ∫ L0 ( x ) dx = α
w
{ x| L1 ( x ) > kLo( x ) }
where L0 ( x ) be the likelihood function under H 0 , and

∫ L1 ( x ) dx = ∫ L1 ( x ) dx = α
w
{x|L ( x )> kL ( ) }
1 o x

So that,

∫ L1 ( x ) dx = ∫ L1 ( x ) dx > k ∫ L0 ( x ) dx = kα ... ... ... (i )


w
{ x L1 ( x ) > kLo( x ) } { x L1 ( x ) > kLo( x ) }
while, 1 − ∫ L1 ( x ) dx = ∫ L1 ( x ) dx ≤ k ∫ L0 ( x ) dx = k (1 − α ) ... ... ... ( ii )
w
{x L ( x )≤kL ( )}
1 o x {x L ( x )≤kL ( )}
1 o x

If k ≥ 1 , then from ( i ) we have,

∫ L1 ( x ) dx > α
w

Hypothesis-I ~ 6 of 11
If k < 1 , then from ( ii ) we have,

1 − ∫ L1 ( x ) dx < 1 − α
w

which implies,

∫ L1 ( x ) dx > α i.e. 1− β > α


w

Hence w is unbiased.

In case w is a UMP region of size α , then too the above approach will hold good if for θ1 we read θ such that

θ ∈ Ω . So we have,
Pθ ( w ) > α for all θ ∈ Ω
So, here also w is unbiased.

( )
Example: Consider the case of random sample from N θ , σ 2 , where θ is unknown ( −∞ < θ < ∞ ) and σ 2 is known.

Find the type A region for testing H 0 : θ = θ 0 against H1 : θ ≠ θ1 .

Solution


∑ ( xi − θ ) ⎤
n 2
⎛ 1 ⎞
L ( x) = ⎜ ⎟ exp ⎢ − ⎥
⎝ σ 2π ⎠ ⎢ 2σ 2 ⎥
⎣ ⎦
∑ ( xi − θ )
2
n
ln L ( x ) = − ln 2πσ 2 −
2
( )
2σ 2
δ ln L ( x ) 2
Hence = 2 ∑ ( xi − θ )( −1)
δθ 0 σ
n ( x − θ0 )
⇒ φ=
σ2
δ 2 ln L ( x ) n
∴ =φ' =−
δθ 02 σ2
n
= a + bφ ( say ) where a = − , b=0
σ2

As such the type-A region for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 is w , given by

w = { x φ < c1 ∪ φ > c2 }
= { x x < d1 ∪ x > d 2 } ( say )
where c1 and c2 or d1 and d 2 are constants such that

∫ Lθ ( x ) dx = α
0
and ∫ φ Lθ ( x ) dx = 0
0
w w

Now, these conditions are equivalent to the conditions


d1 ∞ d1 ∞

∫ gθ0 ( x ) dx + ∫ gθ0 ( x ) dx = α and ∫ φ gθ ( x ) dx + ∫ φ gθ ( x ) dx = 0


0 0
−∞ d2 −∞ d2

where gθ be the marginal p d.f. of x or equivalent to

Hypothesis-I ~ 7 of 11
d2

∫ gθ ( x ) dx = 1 − α
0
" " " ( *)
d1
d2
⎡ ∞ ⎤
and ∫ φ gθ0 ( x ) dx = 0 ⎢since
⎢⎣
∫ φ gθ ( x ) dx = 0⎥⎥
0
" " " (**)
d1 −∞ ⎦

Now from (*) we can write,

n ( d 2 −θ0 )
σ y2
1 −

2π ∫ e 2 dy = 1 − α
n ( d1 −θ0 )
σ

and from (**) we have,

n ( d 2 −θ 0 )

n ( x − θ0 )
σ y2
− ⎡ ny⎤
∫ e 2 dy = 0 ⎢since φ =
⎣⎢ σ 2
= ⎥
σ ⎥⎦
n ( d1 −θ0 )
σ
n ( d 2 −θ0 )
⎡ − y2 ⎤ σ
⇒ ⎢ −e 2 ⎥ =0 " " " (***)
⎢ ⎥ n ( d1 −θ 0 )
⎣ ⎦
σ
2 2
1 ⎧⎪ n ( d 2 −θ0 ) ⎫⎪ 1 ⎧⎪ n ( d1 −θ0 ) ⎫⎪
− ⎨ ⎬ − ⎨ ⎬
2 ⎩⎪ σ ⎭⎪ 2 ⎩⎪ σ ⎭⎪
⇒ −e +e =0
2 2
1 ⎧⎪ n ( d 2 −θ0 ) ⎫⎪ 1 ⎧⎪ n ( d1 −θ0 ) ⎫⎪
− ⎨ ⎬ − ⎨ ⎬
2 ⎩⎪ σ ⎭⎪ 2 ⎩⎪ σ ⎭⎪
⇒ −e = −e
2 2
1 ⎧⎪ n ( d 2 − θ0 ) ⎫⎪ 1 ⎧⎪ n ( d1 − θ 0 ) ⎫⎪
⇒ − ⎨ ⎬ =− ⎨ ⎬
2 ⎪⎩ σ ⎭⎪ 2 ⎩⎪ σ ⎭⎪
Solving (***) we have,

n ( d1 − θ0 ) n ( d2 − θ0 )
=−
σ σ
n ( d 2 −θ0 )
σ y2
1 −
and since
2π ∫ e 2 dy = 1 − α
n ( d 2 −θ 0 )

σ

n ( d2 − θ0 )
We have, = τα
σ 2
σ
⇒ d 2 = θ0 + τ α .
2 n

n ( d1 − θ0 )
Hence also, − = τα
σ 2
σ
⇒ d1 = θ0 − τ α .
2 n
As such, the type-A region of size α is

⎧ σ σ ⎫ ⎧⎪ n x − θ0 ⎫⎪
w = ⎨ x | x < θ0 − τ α . ∪ x > θ0 + τ α . ⎬ = ⎨x | > τα ⎬
⎩ 2 n 2 n⎭ ⎩⎪ σ 2⎪

Hypothesis-I ~ 8 of 11
Similar Region (Testing Composite Hypothesis)
Let X be a random variable distributed as f ( x ; θ1 , " , θ k ) . A hypothesis of the form

H 0 : θ1 = θ10 , " , θ r = θ r 0 (r < k ) " " " (1)


Here (k − r ) unspecified parameters and it is a composite hypothesis with (k − r ) d . f . We want to determine a
critical region ω of size α such that

∫ L ( x | H 0 ) dx = α " " " ( 2)


ω

and ∫ L ( x | H1 ) dx is maximum " " " ( 3)


ω

where H1 is some simple hypothesis about the parameters.

Since the parameters θ r +1 , " , θ k are unspecified by H 0 , α given in ( 2) is in general a function of these

parameters and hence con not be uniquely determined. If α does not depend on the unspecified parameters the

region ω for which equation ( 2 ) is true is called a region similar to the sample space with respect to the parameters

θ r +1 , " , θ k or we can say that the region is similar region.

A test based on a similar region of size α is called a similar size α test.

Concept of Similar Region


In case of a composite hypothesis the selection of a suitable tests involves three important stages
i) finding all similar region
ii) finding these similar region S which are of size of α
iii) finding a similar region of size α that is best from the point of view of power, then we will get UMP critical region.

Construction of Similar Region


When statistic sufficient for each of the unspecified parameters exists or when a jointly sufficient statistic exists for
the unknown parameters, then regions similar to the sample space can be constructed.

Let ω be any critical region of size α . Now we define a indicator function or variable Iω of the critical region ω

⎧1 ; if observation lies in the C.R. i.e. x ∈ ω


Iω = ⎨
⎩0 ; if observation lies outside the C.R. i.e. x ∈ ω

The set of all points X for which Iω = 1 is the region of rejection.

∫ L ( x | H ) = ∫ Iω L ( x | H1 ) dx = E ( Iω | H )
ω S
= Expected value of Iω when it is true
⎧α ; if H = H 0
and E ( Iω | H ) = ⎨
⎩1 − β ; if H = H1

If the parameter θ admits a sufficient estimator the likelihood function factorizes into

L ( x | θ ) = g ( t , θ ) h ( x, t )

where g ( t , θ ) is the frequency function of the sufficient statistic t , and h ( x, t ) is the functions of sample values

only for a given t .


Hypothesis-I ~ 9 of 11
Now, E ( Iω ) = ∫ Iω L ( x | θ ) dx = ∫ Iω g ( t , θ ) h ( x, t ) dx
S S

= E ⎡⎣ E ( Iω | t ) ⎤⎦ " " " ( 4)


The equation ( 4 ) is very important for us, since t is sufficient for θ . E ( Iω | t ) does not depend on θ and this has

the same expectation as Iω i.e.

E ( Iω | t ) = E ( Iω )

If t is sufficient for θ , both H 0 and H1 are true, equation ( 4 ) implies that there is a region based on t similar to

the sample with size and power exactly equal to the original critical region ω .

Neyman Structure
A test with critical region ω is said to be of Neyman structure with respect to t if E ( Iω | t ) is the same almost

everywhere for θ i.e. a test satisfying E ( Iω | t ) = α is said to have Neyman structure with respect to t .

Example: Let x1 , " , xn be a random sample drawn from N µ , σ 2 ( ) where both µ and σ 2 are unknown. Test

H 0 : µ = µ0 against H1 : µ = µ1 .

Solution
The hypothesis H 0 has one d . f . , the parameter σ 2 being unspecified. We have

n
⎛ 1 ⎞ ⎡ 1 2⎤
L ( x | H0 ) = ⎜ ⎟ exp ⎢ − 2
⎝ 2πσ ⎠ ⎣ 2σ
∑ ( xi − µ0 ) ⎥

n
∑ ( xi − µ0 ) is sufficient for σ 2 and also this is complete sufficient statistic. Consider a
2
Under H 0 the statistic V =
i =1

simple H 0 and H1 as

H 0 : µ = µ0 , σ 2 = σ 02
H1 : µ = µ1 , σ 2 = σ12
According to Neyman-Pearson lemma, we have,

L ( x | H0 ) ⎡ 1 2⎤
∑ ( xi − µ0 ) ∑ ( xi − µ1 )
2 1
= exp ⎢ − 2 + ⎥ ≤ Constant
L ( x | H1 ) ⎣⎢ 2σ 0 2σ12 ⎦⎥
With this we can find out the MP critical region of size α for testing simple H 0 against simple H1 is

( ) (
L x | µ1 , σ12 > k ( v ) L x | µ0 , σ 02 )
where k ( v ) is such that the conditional size of ω0 given V = v is α which implies that

( µ1 − µ0 )( x − µ0 ) > k1 ( v ) " " " (1)


where k1 ( v ) is related to k ( v ) .

Case I:
If µ1 > µ0 , there condition (1) is equivalent to

( x − µ0 ) > k2 ( v )
n ( x − µ0 )
⇒ > k3 ( v ) ( say )
v
Hypothesis-I ~ 10 of 11
So as such, we can write,

⎪⎧ n ( x − µ0 ) ⎪⎫
ω0 = ⎨ x | > k3 ( v ) ⎬
⎪⎩ v ⎪⎭

where k3 ( v ) is to be determined such that Pµ0 [ω0 | v ] = α

n ( X − µ0 ) n ( X − µ0 )
Here, and v are independent. So that the conditional distribution given V = v is the
v v

n ( x − µ0 )
same as the distribution of .
V

⎡ n ( X − µ0 ) ⎤
So k3 ( v ) will be independent of V . Hence we can write P⎢ > k3 ⎥ = α
⎢⎣ V ⎥⎦

We know that,

n ( X − µ0 ) n ( X − µ0 )
=
n ( X − µ0 ) + ∑ ( xi − X )
v 2

n ( X − µ0 ) t n ( X − µ0 )
= = ~ tn −1 where, t=
⎛ t2 ⎞ t2 + n −1 S
∑ ( xi − X )
2
⎜⎜ 1 + ⎟⎟
⎝ n −1 ⎠

n ( X − µ0 )
Since < k3 iff t > k4 ( say ) , we may also write,
v
⎧⎪ n ( x − µ0 ) ⎫⎪
ω0 = ⎨ x | > k3 ( v ) ⎬ = { x | t > k4 } ⎡Where Pθ [t > k4 ] = α ⎤
⎣ ⎦
⎪⎩ ⎪⎭
0
v

where k4 is the upper α point of A distribution of tn −1 and we can write finally

⎧⎪ n ( x − µ0 ) ⎫⎪
ω0 = ⎨ x | > tα , n −1 ⎬
⎩⎪ S ⎭⎪

Since this is independent of σ 0 and σ 1 , it is the MP similar region of size α for testing H 0 against H1 .
2 2

Case II:
If µ1 < µ0 , in this case we have as before

( µ1 − µ0 )( x − µ0 ) > k1 ( v )
⇒ ( x − µ0 ) < k2′ ( v )
So preceding as before, MP similar region of size α for testing H 0 against H1 is

⎧⎪ n ( x − µ0 ) ⎫⎪
ω0′ = ⎨ x | < tα , n −1 ( v ) ⎬
⎩⎪ S ⎭⎪
Since ω0 is independent of µ1 i.e. it is the same for all µ1 > µ0 in fact it is the UMP similar region of size α for

testing H 0 against the more composite H1 : µ > µ1 .

Similarly, ω0′ is the UMP similar region of size α for testing H 0 : µ = µ1 against H1 : µ < µ1 .

Hypothesis-I ~ 11 of 11
Likelihood Ratio Test

Introduction
Neyman and Pearson (1928) developed a simpler method of testing hypothesis called the method of Likelihood
Ratio. Just like a method of maximum likelihood which yields an estimate of a parameter, the method of maximum
likelihood ratio test yields a statistic rather more easily.

Definition
Let θ ∈ Ω be a vector of parameters and let X = ( x1 , " , xn ) be a random vector with p.d . f . fθ , θ ∈ Ω .

Consider the problem of testing the null hypothesis H 0 : X ~ fθ , θ ∈ Ω0 against the alternative hypothesis

H1 : X ~ fθ , θ ∈ Ω1 = Ω − Ω0 . The likelihood ratio test for testing H 0 against H1 is defined as the ratio

sup fθ ( x1 , " , xn )
λ = λ ( X ) = λ ( x1 , " , xn ) =
θ ∈Ω0
=
( )
L Ω
ˆ
0

sup fθ ( x1 , " , xn ) L (Ω
ˆ)
θ ∈Ω

And the test is of the form: reject H 0 iff λ ( X ) < C , where C is some constant, determined from the size α (the

(
level of significance, 0 < α < 1 i.e., sup Pθ x : λ { x} < C = α ).)
θ ∈Ω0

Remarks
The numerator of the likelihood ratio λ is the best explanation of X that the H 0 can provide and denominator is

the best possible explanation of X . H 0 is rejected if there is a much better explanation of X then the best one

provided by H 0 . It is clear that 0 ≤ λ ≤ 1 .

Properties of LRT
LRT has some desirable properties, specially large sample properties. LRT is generally UMP if an UMP test exists.
We state below, the two asymptotic properties of LRT.

i) Under certain conditions, −2 ln λ has an asymptotic chi-square distribution.


ii) Under certain assumptions, LRT is consistent.

Properties of LRT Statistic ( λ )

i) The likelihood ratio λ is a function of x only and hence λ is a statistic which does not depend on θ .
ii) Since λ is the ratio of conditional maximum of likelihood function to its unconditional maximum, thus
0 ≤ λ ≤ 1.
λ0
iii) The critical region is 0 < λ < λ0 when ∫0 h ( λ ) d λ = α , the level of significance.
iv) λ is always a function of sufficient statistic.
v) If the null hypothesis H 0 is composite, the distribution of λ may not be always unique.

vi) Under certain condition −2 ln λ follows χ(1) .


2

Likelihood Ratio Test ~ 1 of 9


LRT for Testing the Equality of Means of Two Normal Populations

Let us consider, two independent random variables, X 1 and X 2 follows normal distribution with N µ1 , σ 1 ( 2
) and

( )
N µ2 , σ 22 respectively. We want to test the hypothesis

H 0 : µ1 = µ2 = µ ( say ) ; 0 < σ12 < ∞ , 0 < σ 22 < ∞


against , H1 : µ1 ≠ µ 2 ; σ12 > 0 , σ 22 > 0

Case I: Population variance are unequal.

Ω= {( µ , µ , σ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2}


1 2
2
1
2
2 i
2
i

and Ω0 = {( µ , σ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2}


2
1
2
2
2
i

Let x1i ( i = 1, " , m ) and x2 j ( j = 1, " , n ) be two independent random samples of sizes m and n from the

(
population N µ1 , σ 1
2
) and N ( µ , σ ) respectively. Then the likelihood function is given by-
2
2
2

m n
⎛ 1 ⎞ 2 ⎡ 1 m
2⎤⎛ ⎞ 2 ⎡ 1 n
2⎤
∑ ( x1i − µ1 ) ∑ ( x2 j − µ2 )
1
L=⎜
⎜ 2πσ 2 ⎟⎟
exp ⎢ − 2 ⎥ ⎜⎜ 2 ⎟

exp ⎢ − 2 ⎥ " " " (1)
⎝ 1 ⎠ ⎣⎢ 2σ 1 i =1 ⎦⎥ ⎝ 2πσ 2 ⎠ ⎢⎣ 2σ 2 j =1 ⎥⎦

The maximum likelihood estimates for µ1 , µ 2 , σ 12 , σ 22 are given by


m n
∑ x1i = x1 ∑ x2 j = x2
1 1
µˆ1 = and µˆ 2 =
m i =1 n j =1
m n
∑ ( x1i − x1 ) ∑ ( x2 j − x2 )
1 1
( say ) ( say )
2
σ12 = σ 22 =
2
= s12 and = s22
m i =1 n j =1

m n
⎛ ⎞ ⎛ 1 ⎞ ( m+ n)
( )
2 2

Now ˆ =⎜ 1 ⎟
L Ω ⎜⎜ ⎟⎟ e 2
⎜ 2π s 2 ⎟
⎝ 2π s2
2
⎝ 1 ⎠ ⎠
Under H 0 , the likelihood function is given by
m n
⎛ 1 ⎞ 2 ⎡ 1 m
2⎤⎛ ⎞ 2 ⎡ 1 n
2⎤
∑ ( x1i − x1 ) ∑ ( x2 j − x2 )
1
L ( Ω0 ) = ⎜ ⎟⎟ exp ⎢ − 2 ⎥ ⎜⎜ 2 ⎟
exp ⎢ − 2 ⎥
⎜ 2πσ 2 ⎟
⎝ 1 ⎠ ⎣⎢ 2σ1 i =1 ⎦⎥ ⎝ 2πσ 2 ⎠ ⎢⎣ 2σ 2 j =1 ⎥⎦

To obtain the maximum value of L ( Ω0 ) for variation in µ , σ 1 , σ 2 , it will be seen that estimate of µ is obtained as
2 2

the rot of a cubic equation.

m 2 ( x1 − µ ) n 2 ( x2 − µ )
m
+ n
∑ ( x1i − µˆ ) ∑ ( x2 j − µˆ )
2 2

i =1 j =1

And thus is complicated function of the sample observations. It is impossible to obtain the critical region 0 < λ < λ0 ,

for given α since the distribution of the population variances is ordinarily unknown. As an approximate test, −2 ln λ

can be distributed as χ 2 variate with 1 d . f .

Case II: Population variances are equal, i.e., σ 12 = σ 22 = σ 2 ( say )


Ω= {( µ , µ , σ ) ; − ∞ < µ < ∞ , σ > 0 ;
1 2
2
i
2
i = 1, 2 }
and Ω0 = {( µ , σ ) ; − ∞ < µ < ∞ , σ > 0}
2 2

Likelihood Ratio Test ~ 2 of 9


The likelihood estimate is then given by
µˆ1 = x1 and µˆ 2 = x2
1 ⎡m n
2⎤
σˆ 2 = ∑
⎢ ( x1i − µˆ1 ) + ∑ ( x2 j − µˆ 2 )
2
and ⎥
m + n ⎣⎢ i =1 j =1 ⎦⎥
1 ⎡m n
2⎤
∑ ∑ ( x2 j − x2 )
1 ⎡ 2
⎢ ( x1i − x1 ) + ms1 + ns22 ⎤⎦
2
= ⎥ =
m + n ⎢⎣ i =1 j =1 ⎥⎦ m+n ⎣

( m+ n)
⎧ ⎫ 2
( m+n)
m+n
Now ( )
ˆ = ⎨⎪
L Ω


⎡ ms12 + ns22 ⎤ ⎪
e

2 ⎡ Substituting the values of µˆ1 , µˆ 2 , σˆ 2 in (1) ⎤
⎣ ⎦
⎪⎩ 2π ⎣ ⎦⎭
Under H 0 the likelihood function is
m+n
⎛ 1 ⎞ ⎡ 1 ⎧⎪ m n
⎪⎤
2⎫
∑ ∑ ( x2 j − µ )
2
L ( Ω0 ) = ⎜ exp ⎢ − 2 ⎨ ( x1i − µ ) + ⎬⎥
2

⎝ 2πσ 2 ⎠ ⎢⎣ 2σ ⎩⎪ i =1 j =1 ⎭⎪⎥⎦
m+n ⎪⎧ ⎫
2⎪
m n
∑ ∑ ( x2 j − µ )
1
ln L ( Ω0 ) = C − ln σ 2 − 2 ⎨ ( x1i − µ ) +
2
⇒ ⎬
2 2σ ⎩⎪ i =1 j =1 ⎭⎪
where C is a constant independent of µ and σ 2 . The likelihood equation for estimating µ is
∂ ln L
=0
∂µ
1 ⎧⎪ m n ⎫⎪
⇒ ∑
⎨ ( x1i − µ ) +
σ 2 ⎩⎪ i =1
∑ ( x2 j − µ )⎬ = 0
j =1 ⎭⎪
⇒ ( mx1 + nx2 ) − ( m + n ) µ = 0
( mx1 + nx2 )
⇒ µ=
(m + n)
∂ ln L
Also, =0
∂σ 2
m+n 1 ⎧⎪ m n
2⎫

⇒ −
2σ 2
+ 4 ⎨∑ 1i
2σ ⎩⎪ i =1
( x − µ )2 + ∑ x2 j − µ ( ) ⎬=0
j =1 ⎭⎪
1 ⎪⎧ m n ⎫
2⎪
⇒ σˆ 2 = ⎨∑ ( x1i − µˆ ) + ∑ x2 j − µˆ
( m + n ) ⎪⎩ i =1
2
( ) ⎬
j =1 ⎪⎭
But
m m
∑ ( x1i − µˆ ) = ∑ ( x1i − x1 + x1 − µˆ )
2 2

i =1 i =1
m
= ∑ ( x1i − x1 ) + m ( x1 − µˆ )
2 2

i =1

mn 2 ( x1 − x2 )
2 2
⎛ mx + nx2 ⎞
= ms12 + m ⎜ x1 − 1 ⎟ = ms12 +
⎝ m+n ⎠ ( m + n )2
Similarly
nm 2 ( x2 − x1 )
n 2

∑( )
2
x2 j − µˆ = ns22 +
j =1 ( m + n )2
1 ⎧⎪ 2 mn ( x1 − x2 ) nm 2 ( x2 − x1 ) ⎫⎪
2 2 2
∴ σˆ 2 = ⎨ ms + + ns 2
+ ⎬
( m + n ) ⎩⎪ 1 ( m + n )2
2
( m + n )2 ⎭⎪
1 ⎧⎪ 2 mn ( x1 − x2 ) ⎫⎪
2
= ⎨ ms + ns 2
+ ⎬
( m + n ) ⎩⎪ 1 2
( m + n ) ⎭⎪
Likelihood Ratio Test ~ 3 of 9
( m+n)
⎧ ⎫ 2
⎪⎪ m+n ⎪⎪ ( m+ n)
∴ ( )
L Ω
ˆ
0 =⎨
⎡ 2 mn 2⎤⎪
⎬ e

2

⎪ 2π ms + ns 2
+ ( x − x )
⎪⎩ ⎢⎣ 1 2
m+n
1 2 ⎥⎦ ⎪

∴ λ=
( )
L Ω
ˆ
0

L (Ω
ˆ)
( m+ n)
( m+ n)
⎧ ⎫ 2 −
⎪ ⎪ ⎧ mn ( x1 − x2 )
2 ⎫ 2
ms12 + ns22 ⎪ ⎪
=⎨ ⎬ = ⎨1 + ⎬
⎪ ms12 + ns22 + mn ( x1 − x2 )2 ⎪ ⎩⎪ (
( m + n ) ms12 + ns22 ) ⎭⎪
⎩ m+n ⎭
We know that, under H 0 : µ1 = µ2 , the test statistic

x1 − x2
t=
1 1
where
1
m+n−2
ms12 + ns22 ( )
S +
m n
follows student t distribution with m + n − 2 d . f .

Thus for testing the null hypothesis

H 0 : µ1 = µ2 = µ ; σ 12 = σ 22 = σ 2 > 0
against , H1 : µ1 ≠ µ2 ; σ12 = σ 22 = σ 2 > 0

We have the two-tailed t -test defined as follows-

If t =
x1 − x2
S 1 +1
> tm + n − 2 α ( 2 ) reject H 0 , otherwise H 0 may be accepted.
m n

Likelihood Ratio Test for Testing the Equality of Variances of Two Population

Consider two normal populations N µ1 , σ 1 ( 2


) ( )
and N µ2 , σ 2 , where the means µ1 and µ2 and variances σ 12
2

and σ 2 are unspecified. We want to test the hypothesis


2

H 0 : σ12 = σ 22 = σ 2 (Unspecified ) , with µ1 and µ2 (Unspecified )


against H1 : σ12 ≠ σ 22 ; µ1 and µ2 (Unspecified )

If x1i ( i = 1, " , m ) and x2 j ( j = 1, " , n ) be independent random samples of size m and n form N µ1 , σ 1 ( 2
)
(
and N µ 2 , σ 2
2
) respectively then
m n
⎛ 1 ⎞ 2 ⎡ 1 m
2⎤⎛ 1 ⎞ 2 ⎡ 1 n
2⎤
L=⎜
⎜ 2πσ 2 ⎟⎟
exp ⎢ − 2 ∑ ( x1i − µ1 ) ⎥⎥ ⎜⎜ 2πσ 2 ⎟⎟ exp ⎢ − 2 ∑ ( x2 j − µ2 ) ⎥ " " " (1)
⎝ 1 ⎠ ⎣⎢ 2σ 1 i =1 ⎦⎝ 2 ⎠ ⎣⎢ 2σ 2 j =1 ⎦⎥

In this case,

Ω= {( µ , µ , σ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2}


1 2
2
1
2
2 i
2
i

and Ω0 = {( µ , µ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2}


1 2
2
i
2

Likelihood Ratio Test ~ 4 of 9


The maximum likelihood estimates for µ1 , µ2 , σ 12 and σ 22 is given by
m n
∑ ∑ x2 j = x2
1 1
µˆ1 = x1i = x1 and µˆ 2 =
m i =1 n j =1
m n
∑ ( x1i − x1 ) ∑ ( x2 j − x2 )
1 1
( say ) ( say )
2
σ12 = σ 22 =
2
= s12 and = s22
m i =1 n j =1

Substituting these values in (1) we get,

m n
⎛ ⎞ ⎛ 1 ⎞ ( m+ n)
( )
2 2

ˆ =⎜ 1 ⎟
L Ω ⎜⎜ ⎟⎟ e 2
⎜ 2π s 2 ⎟
⎝ 2π s2
2
⎝ 1 ⎠ ⎠
Under H 0 , the likelihood function is given by

m+ n
⎛ 1 ⎞ ⎡ 1 ⎧⎪ m n
⎪⎤
2⎫
∑ ∑ ( x2 j − µ2 )
2
L ( Ω0 ) = ⎜ exp ⎢ − 2 ⎨ ( x1i − µ1 ) + ⎬⎥ ( 2)
2
⎟ " " "
⎝ 2πσ 2 ⎠ ⎢⎣ 2σ ⎩⎪ i =1 j =1 ⎭⎪⎥⎦

The MLE’s for µ1 , µ 2 and σ


2
are now given by

m n
∑ ∑ x2 j = x2
1 1
µˆ1 = x1i = x1 and µˆ 2 =
m i =1 n j =1

1 ⎡m n
2⎤
σˆ 2 = ∑
⎢ ( x1i − µˆ1 ) + ∑ ( x2 j − µˆ 2 )
2
and ⎥
m + n ⎢⎣ i =1 j =1 ⎥⎦

1 ⎡m n
2⎤
∑ ∑ ( x2 j − x2 )
1 ⎡ 2
⎢ ( x1i − x1 ) + ms1 + ns22 ⎤⎦
2
= ⎥ =
m + n ⎣⎢ i =1 j =1 ⎦⎥ m+n ⎣

Substituting these values in ( 2 ) we get,

( m+ n)
⎧ ⎫ 2
( m+n)
m+n
( )
L Ω
ˆ
0

=⎨

⎬ e

2 ⎡ Substituting the values of µˆ1 , µˆ 2 , σˆ 2 in (1) ⎤
⎣ ⎦
⎪⎩ 2π ⎡⎣ ms1 + ns2 ⎤⎦ ⎪⎭
2 2

⎧ m n ⎫

∴ λ=
L Ω
ˆ( )
0
= (m + n)
m+n ( ) ( )
⎪⎪ s12 2 s22 2 ⎪⎪
⎨ m+n ⎬
L (Ω
ˆ)
2

⎪⎩ (ms1
2
+ )
ns 2
2
2 ⎪
⎪⎭
⎧ m n ⎫

( ) ( )
m+n
(m + n) ⎪⎪ ms1 ns22 2 ⎪⎪
2 2

( 3)
2
= m n ⎨ m+n ⎬
" " "
m 2n 2 ⎪
⎩⎪( )
ms1 + ns2 2 ⎪
2 2 ⎪

We know, that under H 0 the statistic

∑ ( x1i − x1 )
2

( m − 1) s12
F= =
∑ ( x2 j − x2 )
2
s22
( n − 1)
follows F -distribution with ( m − 1) , ( n − 1) d . f . and also implies

m ( n − 1) s12
F=
n ( m − 1) s22
( m − 1) ms 2
⇒ F = 12
( n − 1) ns2
Likelihood Ratio Test ~ 5 of 9
Substituting in ( 3) and simplifying, we get

⎧ m ⎫
m+ n ⎪ ⎛ m −1 F ⎞ 2 ⎪
(m + n) ⎜ ⎟
2 ⎪ ⎝ n −1 ⎠ ⎪
λ= m n ⎨ m+n ⎬
m 2n 2 ⎪⎛ m −1 ⎞ 2 ⎪
⎪ ⎜1 + F⎟ ⎪
⎪⎩ ⎝ n −1 ⎠ ⎪⎭
Thus λ is a monotonic function of F and hence the test can be carried on with F as test statistics. The critical

region 0 < λ < λ0 can be given by pair of intervals F ≤ F1 and F ≥ F2 , where F1 and F2 are determined so that

under H 0

P ( F ≥ F2 ) = α and P ( F ≥ F1 ) = 1 − α
2 2

Since, under H 0 , F follows F -distribution with m − 1 , n − 1 d . f . we have

F2 = Fm −1, n −1 α ( 2) and (
F1 = Fm −1, n −1 1 − α
2 )
where, Fm, n (α ) is upper α point of F -distribution with ( m, n ) d . f .

Consequently for testing H 0 : σ 1 = σ 2 against alternative hypothesis H1 : σ 1 ≠ σ 2 , we have a two-tailed F -test,


2 2 2 2

the critical region being given by

F > Fm −1, n −1 α ( 2) and (


F < Fm −1, n −1 1 − α
2 )

Example: Let x1 , " , xn be a random sample from f ( x ; θ ) = θ e−θ x I ( 0, ∞ ) ( x ) where Θ = {θ , θ > 0} . Test H 0 : θ ≤ θ0

against H1 : θ > θ 0 .

Solution

sup L ⎡⎣(θ ; x1 , " , xn ) ⎤⎦ = sup ⎡θ n e ∑ i ⎤


−θ x

θ ∈Θ θ >0

⎣ ⎥⎦

n ⎡ ⎤
⎛ n ⎞ −n ⎢ 1 ⎥
=⎜ ⎟ e ⎢ Since, θ = = n
ˆ
⎜ ∑
xi ⎟⎠ ∑ xi ∑ xi ⎥
⎝ ⎢ ⎥
⎣ n ⎦

sup L ⎡⎣(θ ; x1 , " , xn ) ⎤⎦ = sup ⎡θ n e ∑ i ⎤


−θ x
and
θ ∈Θ0 0<θ ≤θ 0

⎣ ⎦⎥
⎧⎛ ⎞
n
⎪⎜ n ⎟ e − n if
n
≤ θ0
⎪⎜
= ⎨⎝ ∑ xi ⎟⎠ ∑ xi
⎪ n −θ0 ∑ xi n
⎪θ 0 e if > θ0
⎪⎩ ∑ xi
Hence
⎧1 n ≤ θ0
∑ xi
if


⎪ n −θ0 ∑ xi
λ = ⎨ θ0 e n > θ0
∑ xi
if
⎪⎛ n
⎞ −n
⎪⎜ n ⎟

e
⎪⎩ ⎝ xi ⎠

Likelihood Ratio Test ~ 6 of 9


If 0 < λ0 < 1 , then a generalized likelihood ratio test is given by-

Reject H 0 if λ ≤ λ0


n
⎛ θ 0 xi ⎞
Or, Reject H 0 if n
∑ xi
≤ θ 0 and ⎜
⎜ n ⎟
⎟ exp ⎣⎡ −θ 0 ∑ xi + n ⎦⎤ ≤ λ0
⎝ ⎠

Reject H 0 if θ 0 x < 1 and (θ 0 x ) exp ⎡⎣ − n (θ 0 x − 1) ⎤⎦ ≤ λ0


n
Or,

Let, y = θ0 x and can say that y n exp ⎣⎡ −n ( y − 1) ⎦⎤ has a maximum for y = 1 . Hence, y < 1 and

y n exp ⎡⎣ −n ( y − 1) ⎤⎦ ≤ λ0 iff y ≤ k , where k is a constant satisfying 0 < k < 1 .

A generalized likelihood ratio test reduces to the


following.
Reject H 0 iff θ 0 x < k , where 0 < k < 1 .

1
That is, reject H 0 if x is less than some function of .
θ0
If that generalized likelihood ratio test having size α is Figure:
desired, k is obtained as the solution to the equation-

α = Pθ0 (θ0 X < k ) = Pθ0 (θ 0 ∑ X i < nk ) = ∫


nk 1
u n −1e−u du
0 n

( ) (
Note that Pθ θ 0 X < k ≤ Pθ0 θ 0 X < k ) for θ ≤ θ 0 .

Uses of LRT
λ can be used for determination of the rejection as λ is positive monotonic function. It is used

1) Test for the mean of a normal population


H 0 : µ = µ0 ⎫
H1 : µ = µ1 ⎭
⎬ (
x1 , " , xn ~ N µ , σ 2 ) For σ 2 known and σ 2 is unknown.

2) Test for the equality of means of two normal populations


H 0 : µ1 = µ2
H1 : µ1 ≠ µ2 For σ 2 is equal and σ 2 is unequal.

3) Test for the equality of means of several normal populations


H 0 : µ1 = µ2 = " = µk
H1 : µi ' s are not equal
4) Test for the variance of normal population

H 0 : σ 2 = σ 02 ⎫⎪
⎬ σ 2 is specified
H1 : σ 2 ≠ σ 02 ⎪⎭
5) Test for the equality of variances of two normal populations

H 0 : σ12 = σ 22
H1 : σ12 ≠ σ 22

Likelihood Ratio Test ~ 7 of 9


Situation of Using LRT
Neyman-Pearson lemma based on the ratio of two probability distribution function provides a best test for testing
simple hypothesis. Generally best test depends on the form of probability distribution and alternative hypothesis. In
this situation, a general test construction is recommended known as LRT for simple or composite hypothesis.

Consistent Test and LRT


A test of a hypothesis H 0 against a class of alternatives H1 is said to be consistent if, when any number of H1

holds, the probability of rejecting H 0 tends to 1 as sample size tends to infinity. If c* is the CR and X the sample

point, we may write,

lt P ⎡⎣ X ∈ c* | H1 ⎤⎦ = 1 , n be the sample size.


n →∞

The LRT is a consistent test. We have that under a very generally satisfied condition, the MLE θˆ of a parameter
vector θ is consistent. If we are dealing with a situation in which all the MLE’s are consistent, we see from the
definition of the LRT statistic that, as sample size increases,

λ→
(
L x | θ r0 , θ s ) " " " (1)
L ( x | θr , θs )

where, θ r , θ s are the true values of the parameters and θ r0 is the hypothetical values of θ r being tested. Thus,

when H 0 holds

λ → 1, in probability
and the critical region
λ ≤ cα
will therefore have its boundary cα approaching 1 . When H 0 does not hold, the limiting values of λ is (1) will

some constant k satisfying

0 ≤ k <1
and thus we have

P [ λ ≤ cα ] → 1
Therefore, LRT is consistent.

Show that under certain regularity condition −2 ln λ ~ χ 2 with 1 d . f . where λ is LR.

Let x1 , ." , xn be a sample from N µ , σ ( 2


) . Let the hypothesis be
H 0 : µ = µ0
H1 : µ ≠ µ0 where, σ 2 is known
n
⎛ 1 ⎞ exp ⎡ − 1 2⎤

⎝ σ 2π ⎠
⎟ ⎢⎣ 2σ 2 ∑ ( xi − µ0 ) ⎥⎦
Then, λ= n
[ where, the MLE of µˆ = x ]
⎛ 1 ⎞ exp ⎡ − 1 2⎤

⎝ σ 2π ⎠
⎟ ⎢⎣ 2σ 2 ∑ ( xi − x ) ⎥⎦
⎡ 1
{
= exp ⎢ − 2 ∑ ( xi − µ0 ) − ∑ ( xi − x ) ⎥
⎣ 2σ
2 2 ⎤


}
⎡ 1
{
= exp ⎢ − 2 nx 2 − 2nµ0 x + nµ02 ⎥
⎣ 2σ


}
⎡ n 2⎤
= exp ⎢ − 2 { x − µ0 } ⎥
⎣ 2σ ⎦
Likelihood Ratio Test ~ 8 of 9
n
⇒ ln λ = − { x − µ 0 }2
2σ 2
n { x − µ 0 }2
⇒ − 2 ln λ = { x − µ 0 }2 =
σ 2
σ2
n

{ x − µ0 }2
If n is large then ~ χ (21)
σ2
n

Thus, −2 ln λ ~ χ (21) where λ is LR.

BCR or LRT is a function of sufficient statistic


Let we have,
H 0 : θ = θ0
H1 : θ = θ1

By LRT and Naymen-Pearson lemma, BCR is given by


L (θ 0 )
≤k " " " (1) where, k is a positive number
L (θ1 )

Let, t be sufficient statistic for θ , then by factorization criteria, we get

L (θ 0 ) = h ( t | θ 0 ) k ( x ) and L (θ1 ) = h ( t | θ1 ) k ( x )

where k ( x ) is the function of x and independent of θ , thus from (1) we get,

L (θ 0 ) h ( t | θ0 ) k ( x )
= ≤k
L (θ1 ) h ( t | θ1 ) k ( x )
h ( t | θ0 )
⇒ ≤k
h ( t | θ1 )

Hence, LRT or BCR is a function of sufficient statistic.

Likelihood Ratio Test ~ 9 of 9


Monotone Likelihood Ratio (MLR)

Monotone Likelihood Ratio (MLR)


A joint p.d . f . L ( x | θ ) is said to have a monotone likelihood ratio (MLR) in the statistic T = t ( x ) if for any two

L ( x | θ2 )
values of the parameter θ1 < θ 2 , the ratio depends on X . Thoroughly the function t ( x ) and this ratio is
L ( x | θ1 )

a non-decreasing function of t ( x ) .

Example
Let X ~ b ( m, θ ) then we have

⎪⎧ ⎛ m ⎞ ⎪⎫
m
L ( x | θ ) = ⎨∏ ⎜ ⎟ ⎬θ ∑ i (1 − θ ) ∑ i
x mn − x

⎩⎪ i =1 ⎝ i ⎠ ⎭⎪
x

If θ 2 > θ1 , then

θ 2∑ i (1 − θ 2 ) ∑ i
mn − x
L ( x | θ2 )
x
=
L ( x | θ1 ) θ ∑ i (1 − θ ) ∑ i
x mn − x
1 1

∑ xi mn − ∑ xi
⎛θ ⎞ ⎪⎧ (1 − θ 2 ) ⎪⎫
=⎜ 2 ⎟ ⎨ ⎬
⎝ θ1 ⎠ ⎩⎪ (1 − θ1 ) ⎭⎪
⎛ θ (1 − θ1 ) ⎞∑ i ⎧⎪ (1 − θ 2 ) ⎫⎪
x mn

= ⎜⎜ 2 ⎟⎟ ⎨ ⎬
⎝ θ1 (1 − θ 2 ) ⎠ ⎩⎪ (1 − θ1 ) ⎭⎪

is a non-decreasing function of ∑ xi , where ∑ xi = T = t ( x ) . Hence L ( x | θ ) has MLR in ∑ xi .

Uses
Distribution having MLR proving UMP test for testing simple H 0 against one sided H1 .

Example
1 −x
Let X ~ exp (θ ) , then f ( x) = e θ ; θ >0 , x>0
θ


∑ xi
1
We have, L(x |θ ) = e θ
θn
⎛ ∑ xi ⎞
exp ⎜ −
1 ⎟
L ( x | θ2 ) ⎜ θ2
θ 2n ⎟
= ⎝ ⎠
L ( x | θ1 )
1 exp ⎜ − ∑ i
⎛ x ⎞

θ1n ⎜ θ1 ⎟
⎝ ⎠
n
⎛θ ⎞ ⎛ ⎧θ − θ ⎫ ⎞
= ⎜ 1 ⎟ exp ⎜⎜ −∑ xi ⎨ 2 1 ⎬ ⎟⎟
⎝ θ2 ⎠ ⎝ ⎩ θ1θ 2 ⎭ ⎠
L ( x | θ2 )
For θ 2 > θ1 ,
L ( x | θ1 )
is a non-decreasing function of ∑ xi . So that L ( x | θ ) has MLR in ∑ xi .
Monotone Likelihood Ration (MLR) ~ 1 of 10
Example
Let X ~ N (θ , 1) , then we have,

1 ⎛ 1 2⎞
f ( x) = exp ⎜ − { x − θ } ⎟
2π ⎝ 2 ⎠
We have,
n
⎛ 1 ⎞ ⎛ 1 ⎞
L(x |θ ) = ⎜ ∑ { xi − θ }
2
⎟ exp ⎜ − ⎟
⎝ 2π ⎠ ⎝ 2 ⎠

n
⎛ 1 ⎞ ⎛ 1 ⎞
∑ { xi − θ2 }
2
L ( x | θ2 ) ⎜ ⎟ exp ⎜ − 2 ⎟
⎝ 2π ⎠ ⎝ ⎠
∴ =
L ( x | θ1 ) ⎛ 1 ⎞
n
⎛ 1 ⎞
∑ { xi − θ1}
2
⎜ ⎟ exp ⎜ − 2 ⎟
⎝ 2π ⎠ ⎝ ⎠
⎛ 1 ⎞
∑ ∑ ∑ xi2 − ∑ xiθ1 + 2 θ12 ⎟⎠
n 1 n
= exp ⎜ − xi2 + xiθ 2 − θ 22 +
⎝ 2 2 2



n
= exp ⎜ xi (θ 2 − θ1 ) − θ 22 − θ12 ⎟
2


( )
which is a non-decreasing function of ∑ xi . So L ( x | θ ) has MLR in ∑ xi .

Example
Let x1 , " , xn ~ U ( 0, θ ) , θ > 0 , then we have

1
f ( x) =
θ

The joint p.d . f . of x1 , " , xn is

1
L(x |θ ) = ; 0 ≤ max xi ≤ θ
θn

Let θ 2 > θ1 and the ratio


n
⎛1⎞
⎜ ⎟ I ( max xi ≤θ2 )
L ( x | θ2 ) θ
= ⎝ 1 ⎠n
L ( x | θ1 ) ⎛ 1 ⎞
⎜ ⎟ I ( max xi ≤θ1 )
⎝ θ2 ⎠
n
⎛ θ ⎞ I ( max xi ≤θ2 )
=⎜ 2 ⎟
⎝ θ1 ⎠ I ( max xi ≤θ1 )

I ( max xi ≤θ 2 ) ⎪⎧1 ; max xi ∈ [ 0, θ1 ]


Let R ( x ) = =⎨
I ( max xi ≤θ1 ) ⎪⎩∞ ; max xi ∈ [θ1 , θ 2 ]

L ( x | θ2 )
Define R ( x ) = ∞ if max xi > θ 2 . It follows that is a non-decreasing function of max xi and the L ( x | θ )
L ( x | θ1 ) 1≤ i ≤ n

has an MLR in max xi .


1≤ i ≤ n

Monotone Likelihood Ration (MLR) ~ 2 of 10


Theorem: The one-parameter exponential family
L ( x | θ ) = exp {Q (θ ) T ( x ) + S ( x ) + D (θ )}

where Q (θ ) is non-decreasing, has an MLR in T ( x ) .

Proof
For θ 2 > θ1 , Q (θ 2 ) > Q (θ1 ) and thus

L ( x | θ2 )
L ( x | θ1 )
{
= exp T ( x ) ⎡⎣Q (θ 2 ) − Q (θ1 ) ⎤⎦ + ⎡⎣ D (θ 2 ) − D (θ1 ) ⎤⎦ }
which is non-decreasing function in T ( x ) . Hence the exponential family has in MLR .

Example
L ( x | θ2 ) 1 + ( x − θ1 )
2
Let X ~ c (1, θ ) then we have, = →1 as x → ±∞
L ( x | θ1 ) 1 + ( x − θ2 )
2

So that c (1, θ ) does not have an MLR.

Theorem: If a joint p.d . f . L ( x | θ ) has MLR in the statistic T = t ( x ) then there exists a UMP test for testing H 0 : θ = θ 0

against H1 : θ > θ 0 .

Proof
We know that, for testing a simple H 0 : θ = θ 0 against a simple H1 : θ = θ1 ( > θ0 ) there exists a BCR ω0 such that

L ( x | H1 )
≥ a constant " " " (1)
L ( x | H0 )

Since the ratio of the likelihood function is non-decreasing function of t ( x ) . For θ1 > θ 0 , the BCR determined by (1)

is also given by

t ( x ) ≥ k1 if inside ω0 " " " ( 2)


Let the size and power function of this test be α and P (θ ) respectively.

∴ P (θ 0 ) = α

The BCR for testing H 0 : θ = θ 0 against one H1 : θ = θ1 ( > θ0 ) is given by

L ( x | θ2 )
≥ a constant which is inside the C.R.
L ( x | θ0 )
⇒ t ( x ) ≥ k2 " " " ( 3)
If we take k1 = k2 in ( 3) , the CR obtained is identical with ω0 defined in ( 2 ) and is still most powerful for testing

θ = θ 0 against θ = θ 2 ( > θ 0 ) with size of the region α ′ = P (θ ′ ) . As the test is most powerful P (θ 2 ) > P (θ 0 ) .

Thus the power function P (θ ) is strictly increasing for P (θ ) < 1 .

Therefore, for testing θ = θ 0 , the critical region defined by equation ( 2 ) can be used with size less than or equal to

α . The power of the test for nay alternative θ1 > θ0 is maximum and this is so for all alternatives greater than θ 0 .

Hence, the critical region given by ( 2 ) is a UMP for testing θ = θ 0 against θ > θ 0 .

Monotone Likelihood Ration (MLR) ~ 3 of 10


Example
Let X ~ N ( µ , 1) then we have

n
⎛ 1 ⎞ ⎛ 1 2⎞
L ( x, µ ) = ⎜ ⎟ exp ⎜ − ∑ { xi − µ} ⎟
⎝ 2π ⎠ ⎝ 2 ⎠

L ( x | µ1 )
For µ1 > µ0 ,
L ( x | µ0 )
⎛ 1
(⎞
= exp ⎜ nx ( µ1 − µ0 ) + µ12 − µ02 ⎟
⎝ 2 ⎠
)

This is an increasing function of x . So there exist a UMP test for testing H 0 : µ = µ0 against H1 : µ ≥ µ0 .

Theorem: Suppose that x1 , " , xn has joint p.d . f . of the form

f ( x ; θ ) = C (θ ) h ( x ) exp ⎡⎣ q (θ ) l ( x ) ⎤⎦

where q (θ ) is an increasing function of θ then there exists a UMP test.

Example
Consider a random sample of size n from Poisson population with parameter µ , then we have

e − nµ µ ∑ i
x
L(x | µ) = ; xi = 0, 1, "
∏ xi !
= e − nµ (∏ xi !)
−1
exp ⎡ln µ ∑
⎣⎢ ( xi
)⎤⎦⎥
where
q ( µ ) = ln µ t ( x ) = ∑ xi

(∏ xi !)
−1
C ( µ ) = e − nµ h ( x) =

So there will exists a UMP test of size α for testing H 0 : µ = µ0 against H1 : µ > µ0 .

So P ⎡⎣T = ∑ xi ≥ k | H 0 ⎤⎦ = α

Randomized Test
A test γ of a hypothesis H is defined to be a randomized test if γ is defined by the function

ψ γ ( x1 , " , xn ) = P ⎣⎡ H is rejected | ( x1 , " , xn ) is observed ⎦⎤

The function ψ γ ( ⋅, " , ⋅) is called the critical function of the test γ .

For example, let x1 , " , xn be a random sample from f ( x ; θ ) = ϕθ , 25 ( x ) . Consider H : θ ≤ 17 . If γ be a test of

H is toss a coin and reject H iff head appears, then γ is a randomized test.

Non-Randomized Test
Let a test γ of a statistical hypothesis H be defined as follows:

Reject H if and only if ( x1 , " , xn ) ∈ cr , where cr is a subset of the sample space χ ; then γ is called a non-

randomized test and cr is called the critical region of the test γ .

Monotone Likelihood Ration (MLR) ~ 4 of 10


For example, let x1 , " , xn be a random sample from f ( x ; θ ) = φθ , 25 ( x ) . χ is the Euclidean n space. Consider

5
H : θ < 17 and the test γ : Reject H if and anly if x > 17 + , then γ is non-randomized and
n
⎧ 5 ⎫
cr = ⎨( x1 , " , x1 ) : x > 17 + ⎬.
⎩ n⎭

Theorem: Let x1 , " , xn be a random sample of size n from a p.d . f . f ( x ; θ ) which depends continuously is a single

parameter θ belongs to a parametric space Ω i.e. θ ∈ Ω . Let the likelihood function L ( x | θ ) have MLR in

T ( x ) = t ( x1 , " , xn ) . Then for testing H 0 : θ = θ 0 against H1 : θ > θ 0 there exists a UMP test φ ( x1 , " , xn ) of size
α given by
⎧1 ; if T ( x1 , " , xn ) > k

φ ( x1 , " , xn ) = ⎨γ ; if T ( x1 , " , xn ) = k

⎩0 ; if T ( x1 , " , xn ) < k

where γ and k are non-negative constants satisfying Eθ0 ⎡⎣φ ( x1 , " , xn ) ⎤⎦ = α .

Proof
Since L ( x | θ ) has MLR in T ( x1 , " , xn ) for any θ1 > θ 0 and a constant k ,

⎧> k
L ( x | θ1 ) ⎪
⎨= k " " " (1)
L ( x | θ0 ) ⎪
⎩< k
⎧> c

is equivalent to T ( x1 , " , xn ) ⎨ = c for some constant c .
⎪< c

Hence by Neyman-Pearson lemma there exists a test

⎧1 ; if T ( x1 , " , xn ) > k

φ ( x1 , " , xn ) = ⎨γ ; if T ( x1 , " , xn ) = k

⎩0 ; if T ( x1 , " , xn ) < k

which is most powerful of size α for testing H 0 : θ = θ 0 against any simple alternative provided θ1 > θ 0 .

⎧1

Furthermore, for any pair (θ ′, θ ′′ ) with θ ′ ≤ θ ′′ the test φ ( x1 , " , xn ) = ⎨γ is most powerful for testing a simple
⎪0

H 0 : θ = θ ′ against a simple H 0 : θ = θ ′′ for size α . So, if we find the power for this test then it will be more

powerful. Therefore, there exist a UMP test for testing θ = θ 0 against θ = θ1 > θ 0 .

Example

Let x1 , " , xn be a random sample of size n drawn form N 0, σ ( 2


) . Find the UMP test of size α for H 0 : σ = σ 0
2 2

against H1 : σ > σ 02 , where σ 02 is specified.


2

Solution
n x2
⎡ 1 ⎤ − 12 ∑ i2
We have, L(x |σ ) = ⎢ ⎥ e σ

⎣⎢ 2πσ ⎥⎦
2

Monotone Likelihood Ration (MLR) ~ 5 of 10


n
⎛ 1 ⎞ 2 ⎛ 1 xi2 ⎞
L( )x | σ 22
⎜⎜ 2
σ
⎟⎟ exp ⎜⎜ − ∑ ⎟
σ 22 ⎟⎠
Now, =⎝ 2 ⎠ ⎝ 2
L(x |σ ) 2 n
⎛ 1 ⎞ 2 ⎛ 1 x2 ⎞
∑ σi2 ⎟⎟
1
⎜⎜ 2 ⎟⎟ exp ⎜⎜ −
⎝ σ1 ⎠ ⎝ 2 1 ⎠
n
⎛σ 2 ⎞ 2 ⎛1 ⎡ 1 1 ⎤⎞
= ⎜ 12
⎜σ ⎟⎟
exp ⎜
⎜2 ∑ xi2 ⎢⎢σ 2 − σ 2 ⎥⎥ ⎟⎟
⎝ 2 ⎠ ⎝ ⎣ 1 2 ⎦⎠

which is a non-decreasing function of ∑ xi2 for σ 2 > σ 1 . So there exist a MLR. So that
2 2

⎧1

; if ∑ xi2 > c

φ ( x1 , " , xn ) = ⎨γ ; if ∑ xi2 = c

⎪⎩0 ; if ∑ xi2 < c
where γ and c are constant.

Now, we have to find out the value of c , we have,

∑ xi2 ≥ c | H 0 ⎤⎦ = α
P ⎡⎣
⎡ ∑ xi2 c ⎤
⇒ P⎢ ≥ | H0 ⎥ = α
⎢⎣ σ 0 σ 02 ⎥⎦
2

⎡ c ⎤
⇒ P ⎢χ 2 ≥ 2 | H0 ⎥ = α
⎣⎢ σ0 ⎦⎥

∑ xi2 ≥ c , then
c
Thus may be real from the table and c determined. If H 0 is rejected at the significance level
σ 02
α , otherwise H 0 is accepted.

If we take n = 10, α = 0.05, σ 0 = 2 then from χ


2 2
table we have,

c
= 18.307
σ 02
⇒ c = 18.307 × 2 = 36.614

If ∑ xi2 ≥ 36.614 then we reject H 0 : σ 02 = 2 , otherwise we accept H 0 .

Theorem: Let f ( x | θ ) be a continuous density function of a random variable x . If the likelihood function L ( x ; θ ) of n

independent observation is differentiable with respect to θ under the sign of integration, the derivative L′ ( x ;θ ) of

L ( x ; θ ) with regard to θ is everywhere continuous in θ and does not vanish identically the sub-space and for

testing a sample H 0 : θ = θ 0 defining the family of alternatives, there does not exist a UMP test for both negative

and positive values of (θ − θ 0 ) .

Proof
Let H1 : θ = θ1 be a simple alternatives. The likelihood functions under H 0 and H1 are L ( x | H 0 ) and L ( x | H1 )

respectively. Then expanding L ( x | H1 ) about θ = θ 0 by Taylor series

L ( x | θ1 ) = L ( x | θ 0 ) + (θ1 − θ 0 ) L′ ( x | θ ′ ) " " " (1)


where θ ′ be some value of θ in the interval (θ1 , θ 0 ) .

Monotone Likelihood Ration (MLR) ~ 6 of 10


Let us assume that there exists a UMP test for testing H 0 : θ = θ 0 against H1 : θ ≠ θ 0 . According to Neyman-

Pearson Lemma, the BCR for testing H 0 : θ = θ 0 against H1 : θ = θ1 is

L ( x | θ1 )
≥ k (θ1 ) within the C. R. " " " ( 2)
L ( x | θ0 )

Here k depends on α and sample size. But here fix k and we assume k will depend on θ1 only. Now from (1)

we have,

L ( x | θ1 ) L′ ( x | θ1′ )
= 1 + (θ1 − θ 0 )
L ( x | θ0 ) L ( x | θ0 )
L ′ ( x | θ1′ )
⇒ 1 + (θ1 − θ0 ) ≥ k (θ1 ) ⎡⎣by ( 2 ) ⎤⎦ " " " ( 3)
L ( x | θ0 )

When θ1 = θ 0 , we can write k (θ 0 ) = 1 . Therefore, we can expand k (θ1 ) about θ 0 using again by Taylor’s series

k (θ1 ) = 1 + (θ1 − θ0 ) k ′ (θ ′′ ) " " " ( 4) where θ 0 < θ ′′ < θ1

Using ( 3) and ( 4 ) we have,

L′ ( x | θ1′ )
1 + (θ1 − θ 0 ) ≥ 1 + (θ1 − θ 0 ) k ′ (θ ′′ )
L ( x | θ0 )
⎡ L′ ( x | θ1′ ) ⎤
⇒ (θ1 − θ0 ) ⎢ − k ′ (θ ′′ ) ⎥ ≥ 0 " " " ( 5)
⎣⎢ L ( x | θ 0 ) ⎦⎥
If x denotes the point on the boundary of the BCR defined by equation ( 2 ) then,

L ( x | θ1 )
= k (θ1 )
L ( x | θ0 )

L′ ( x | θ1 )
So that , = k ′ (θ1 ) [by differentiating w. r. to θ1 ]
L ( x | θ0 )
L′ ( x | θ ′′ )
Similarly = k ′ (θ ′′ ) [by differentiating w. r. to θ1 ]
L ( x | θ0 )

Substituting the value of k ′ (θ ′′ ) in ( 5 ) we have,

⎡ L′ ( x | θ ′ ) L′ ( x | θ ′′ ) ⎤
⇒ (θ1 − θ0 ) ⎢ − ⎥≥0 " " " (6)
⎣⎢ L ( x | θ0 ) L ( x | θ 0 ) ⎦⎥

For the C. R. to be UMP must hold good for all θ . Therefore, ( 6 ) must be true identically for all values of θ1 , x

and x within the BCR.

Since (θ1 − θ 0 ) can assume both positive and negative values and for all positive and negative values and for all

⎡ L′ ( x | θ ′ ) L ′ ( x | θ ′′ ) ⎤
these values ( 6 ) must hold good, the expression ⎢ − ⎥ must vanish within the BCR.
⎣⎢ L ( x | θ 0 ) L ( x | θ 0 ) ⎦⎥

The outside of the BCR in equation ( 2 ) is defined by

L ( x | H1 )
< k (θ1 ) " " " (7)
L ( x | H0 )

Monotone Likelihood Ration (MLR) ~ 7 of 10


With the help of the same arguments which lead from ( 2) to (6) we see from (7) and (6) with inequality sign

reversed is true for both positive and negative values of (θ1 − θ0 ) outside the BCR and hence the expression

⎡ L′ ( x | θ ′ ) L′ ( x | θ ′′ ) ⎤
⎢ − ⎥ is zero outside the BCR also.
⎣⎢ L ( x | θ 0 ) L ( x | θ 0 ) ⎦⎥

L′ ( x | θ ′ ) L′ ( x | θ ′′ )
Thus, − = 0 throughout the sample space that is
L ( x | θ0 ) L ( x | θ0 )

L′ ( x | θ ′ ) L′ ( x | θ ′′ )
=
L ( x | θ0 ) L ( x | θ0 )

Therefore, since L′ ( x | θ 0 ) is continuous in θ we get,

L′ ( x | θ 0 ) ∂ ln L ( x ;θ ) ⎤
= ⎥
L ( x | θ0 ) ∂θ ⎦θ =θ0

is a constant and this is the essential condition for the existence of a UMP test for the two sided alternatives. We
have,

∫ L ( x | θ ) dx = 1
S

Since differentiation under the sign of integration is valid this leads to

∂ ln L ( x ;θ ) ⎤
∫ ⎥ L ( x | θ 0 ) dx = 0
∂θ ⎦θ =θ0

This result is similar with the earlier result. i.e.

L′ ( x | θ 0 )
=0
L ( x | θ0 )
⇒ L ( x | θ0 ) = 0

Identically in S (sample space) and this is a contradiction. Hence the theorem.

Example
−( x −θ )
Let us consider f ( x | θ ) = e for testing H 0 : θ = θ 0 against two sided alternative.

L(x |θ ) = e ∑ i
− ( x −θ )
Here
⇒ ln L ( x | θ ) = − ∑ ( xi − θ )
∂ ln L ( x | θ )
⇒ =n
∂θ

Here n is constant, so there exist a UMP test.

Monotone Likelihood Ration (MLR) ~ 8 of 10


Since the lower point of the range of integration depends on the parameter µ , the smallest observation x1 in the

sample is sufficient for µ . Therefore, the probability that x1 < µ1 is zero. Thus

L′ ( x | H 0 ) ⎧⎪∞ ; x1 < µ1
= ⎨ n( µ − µ )
L ( x | H0 ) ⎪⎩e 0 1 ; otherwise

n( µ0 − µ1 )
That is e ≤k " " " (1)

Determine the BCR where k is so chosen as to make its size equal to α . The left hand side of (1) is a constant

and is therefore independent of the observations.

Hence (1) will be satisfied by every C. R. of size α with x1 ≥ µ1 . Thus every such C. R. is of equal power and is

therefore a BCR.

If we permit µ1 to take values greater or less than µ0

⎧∞ ; µ0 ≤ x1 < µ1
⎪ n( µ0 − µ1 )
L ( x | H0 ) ⎪e <1 ; x1 ≥ µ1 > µ0
=⎨
L ( x | H1 ) ⎪e
n ( µ0 − µ1 )
>1 ; x1 ≥ µ0 > µ1

⎩ 0 ; µ1 ≤ x1 < µ0

The BCR is therefore given by

( x1 − µ0 ) < 0 , ( x1 − µ0 ) > c1

When H 0 hold probability that ( x1 − µ0 ) < 0 is zero, the value of c1 is so chosen as to satisfy the condition

P {( x1 − µ0 ) > c1 | H 0 } = α

This C. R. is BCR for all alternatives µ1 ≠ µ0 and is therefore UMP with respect to these alternatives.

Example

Examine whether a UMP test exists for testing H 0 : θ = θ 0 , σ = σ 0 in N θ , σ 2 . ( )

Solution
If H1 : θ = θ1 , σ = σ 1 is any simple alternative then

L ( x | H1 ) ⎡
1 ⎧⎪ ∑ ( xi − θ1 ) − ∑ ( xi − θ0 ) ⎫⎪⎤
n 2 2
⎛σ ⎞
= ⎜ 0 ⎟ exp ⎢ − ⎨ ⎬⎥ ≥ k
L ( x | H0 ) ⎝ σ1 ⎠ ⎢ 2⎪ σ 12 σ 02 ⎥
⎣ ⎩ ⎭⎪⎦

Monotone Likelihood Ration (MLR) ~ 9 of 10


This may be written as

⎞ ( x − θ1 ) ( x − θ0 ) 2 ⎡ 1 ⎛ σ1 ⎞ ⎤
n
⎛ 1
2 2
1
S2 ⎜ 2 − 2 ⎟⎟ + − ≤ ln ⎢ ⎜ ⎟ ⎥
⎜σ n ⎢ k ⎝ σ0 ⎠ ⎥
⎝ 1 σ0 ⎠ σ12 σ 02 ⎣ ⎦
⎛ 1 1 ⎞ ⎛ 1 1 ⎞ ⎛θ θ ⎞
⇒ S 2 ⎜ 2 − 2 ⎟ + x 2 ⎜ 2 − 2 ⎟ + 2 x ⎜ 02 − 12 ⎟ ≤ constant
⎜σ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 1 σ0 ⎠ ⎝ σ1 σ 0 ⎠ ⎝ σ 0 σ1 ⎠

( )
⎛ 1 1 ⎞ 2 ⎡ θ0σ12 − θ1σ 02 ⎤
⎜⎜ 2 − 2 ⎟⎟ S + { x − δ } ≤ constant ⎢δ =
2
⇒ ⎥
⎝ σ1 σ 0 ⎠ ⎣⎢ σ 02σ12 ⎦⎥
⇒ (σ 2
0 − σ12 )∑(x −δ ) i
2
≤ constant

This means that if σ 0 > σ 1 , the BCR is bounded by a hyper sphere centered at (δ , " , δ ) where δ itself is

dependent on H1 . When σ 1 > σ 0 , the BCR lies outside this sphere. In both cases the BCR changes with the

alternative.

Therefore, there does not exist ant UMP test for any set of alternatives.

Example
Examine for what values of λ there exists a UMP test for H 0 : µ = µ0 , λ = λ0 in the distribution

1
1 − ( x−µ )
f ( x ; µ, λ ) = e λ ; µ≤x≤∞
λ

Solution

∑ xi then
1
If H1 : µ = µ1 , λ = λ1 is any simple hypothesis and if x =
n

L ( x | H1 )
n
⎛λ ⎞ ⎡ ⎧ x − µ0 x − µ1 ⎫⎤
= ⎜ 0 ⎟ exp ⎢ n ⎨ − ⎬⎥
L ( x | H0 ) ⎝ λ1 ⎠ ⎣⎢ ⎩ λ0 λ1 ⎭⎦⎥
n
⎛λ ⎞ ⎡ ⎧1 1⎫ ⎛ µ µ ⎞⎤
= ⎜ 0 ⎟ exp ⎢ nx ⎨ − ⎬ + n ⎜ 1 − 0 ⎟ ⎥
⎝ λ1 ⎠ ⎢⎣ ⎩ λ0 λ1 ⎭ ⎝ λ1 λ0 ⎠ ⎥⎦

The BCR for the above H 0 and H1 is defined by the relation

⎧1 1⎫
x ⎨ − ⎬ ≥ Constant
⎩ λ0 λ1 ⎭

Thus, UMP tests exist separately for λ1 > λ0 and λ1 < λ0 irrespective of the value of µ1 .

Monotone Likelihood Ration (MLR) ~ 10 of 10

Вам также может понравиться