Statistical Inference

STATISTICAL INFERENCE
CLASS NOTE
SANGKAR ROY
JAHANGIRNAGAR UNIVERSITY,BANGLADESH.
EMAIL:sankar1604@gmail.com
Estimation ~ 1 of 22
Estimation
Sufficient Statistic: Let X1 , X 2 , , Xn be a random sample from the density f (⋅ ; θ ) , where θ may
be a vector. A statistic T = t ( X1 , X 2 , , Xn ) is defined to be a sufficient statistic if and only if the
conditional distribution of X1 , X 2 , , Xn given T=t does not depend on θ for any value t of T .
Note: This definition of a sufficient statistic is not very workable. First, it does not tell us which
statistic is likely to be sufficient and second, it requires us to derive a conditional distribution which
may not be easy, especially for continuous random variables. For this reason we may use
Factorization Criterion that may aid us in finding sufficient statistics.
Another Definition: Let X1 , X 2 , , Xn be a random sample from the density f (⋅ ; θ ) . A statistic
T = t ( X1 , X 2 , , Xn ) is defined to be a sufficient statistic if and only if the conditional distribution of

S given T does not depend on θ for any statistic S = s ( X 1 , X 2 , , Xn ) .
Note: This definition is particularly useful in showing that a particular statistic is not sufficient. For instance,
to prove that a statistic T ′ = t ′ ( X 1 , X 2 , , X n ) is not sufficient, one needs only to find another statistic
T = t ( X1 , X 2 , , X n ) for which the conditional distribution of T given T ′ depends on θ .
Jointly Sufficient Statistics: Let X1 , X 2 , , X n be a random sample form the density f ( ⋅ ; θ ) . The
statistics T1 , , Tr are defined to be jointly sufficient if and only if the conditional distribution of
X1 , X 2 , , X n given T1 = t1 , , Tr = tr does not depend on θ .
Concept of Sufficient Statistic: A sufficient statistic is a particular kind of statistic. It is a statistic that
condenses Ω is such a way that no “information about θ ” is lost. The only information about the parameter
θ in the density f ( ⋅ ; θ ) from which we sampled is contained in the sample X 1 , X 2 , , X n ; so, when we say
that a statistic loses no information, we mean that it contains all the information about θ that is contained in
the sample. We emphasize that the type of information of which we are speaking is that information about θ
contained in the sample given that we know the form of the density; that is, we know the function f ( ⋅ ; ⋅) in
f ( ⋅ ; θ ) , and the parameter θ is the only unknown. We are not speaking of information in the sample that
might be useful in checking the validity of out assumption that the density does indeed have form f ( ⋅ ; ⋅) .
Example: Let X1 , X 2 , X 3 be a sample of size 3 from the Bernoulli distribution. Consider the two statistics
S = s ( X 1 , X 2 , X 3 ) = X 1 + X 2 + X 3 and T = t ( X 1 , X 2 , X 3 ) = X1 X 2 + X 3 . We have to show that s ( ⋅, ⋅, ⋅) is
sufficient and t ( ⋅, ⋅, ⋅) is not.
Values of Values of f ( x1 , x2 , x3 | S ) f ( x1 , x2 , x3 | T )
S T
1− p
( 0, 0, 0 ) 0 0 1
1+ p
1− p
( 0, 0, 1) 1 1 1
3 1+ 2 p
p
( 0, 1, 0 ) 1 0 1
3 1+ p
p
(1, 0, 0 ) 1 0 1
3 1+ p
p
( 0, 1, 1) 2 1 1
3 1+ 2 p
p
(1, 0, 1) 2 1 1
3 1+ 2 p
p
(1, 1, 0 ) 2 1 1
3 1+ 2 p
(1, 1, 1) 3 2 1 1
Now, we have f x1 , x2 , x3 |S =1 ( 0, 1, 0 |1) = P [ X1 = 0; X 2 = 1; X 3 = 0 | S = 1]

P [ X 1 = 0; X 2 = 1; X 3 = 0; S = 1]
=
P [ S = 1]
(1 − p ) p (1 − p ) 1
= =
⎛ 3⎞ 3
⎜ ⎟ p (1 − p )
2
1
⎝ ⎠
P [ X1 = 0; X 2 = 1; X 3 = 0; T = 0]
and f x1 , x2 , x3 |T = 0 ( 0, 1, 0 | 0 ) =
P [T = 0]
(1 − p )2 p
=
(1 − p )3 + 2 p (1 − p )2
p p
= =
1− p + 2 p 1+ p
The conditional distribution of the sample given the values of S is independent of p ; so S is a sufficient
statistic. However, the conditional distribution of the sample given the values of T depends on p ; so T is
not sufficient.
Factorization Theorem (Single Sufficient Statistic): Let X1 , X 2 , , X n be a random sample of size n
from the density f ( ⋅ ; θ ) , where the parameter θ may be a vector. A statistic T = t ( X1 , X 2 , , X n ) is
n
sufficient if and only if the joint density of X 1 , X 2 , , X n , which is ∏ f ( xi ;θ ) , factors as
i =1
f x1 , , xn { x1 , , xn ;θ } = g {t ( x1 , x2 , , xn ) ;θ } ⋅ h{ x1 , x2 , , xn }
= g {t ;θ } ⋅ h{ x1 , x2 , , xn }
where the function h{ x1 , x2 , , xn } is nonnegative and does not involve the parameter θ and the function
g {t ( x1 , x2 , , xn ) ;θ } is nonnegative and depends on x1 , , xn only through the function t ( ⋅, , ⋅) .
Factorization Theorem (Jointly Sufficient Statistics): Let X1 , X 2 , , X n be a random sample of size
n from the density f (⋅ ; θ ) , where the parameter θ may be a vector. A statistics
T1 = t1 ( X 1 , X 2 , , Xn ), , Tr = tr ( X 1 , X 2 , , X n ) is jointly sufficient if and only if the joint density of
n
X1 , X 2 , , X n , which is ∏ f ( xi ;θ ) , can be factored as
i =1
f x1 , , xn { x1 , , xn ;θ } = g {t1 ( x1 , x2 , , xn ) , , tr ( x1 , x2 , , xn ) ;θ } ⋅ h{ x1 , x2 , , xn }
= g {t1 , , tr ;θ } ⋅ h{ x1 , x2 , , xn }
where the function h{ x1 , x2 , , xn } is nonnegative and does not involve the parameter θ and the function
g {t1 , , tr ;θ } is nonnegative and depends on x1 , , xn only through the function t1 ( ⋅, , ⋅) , , tr ( ⋅, , ⋅) .
N.B.: To get more about this topic, see Mood, Graybill, Boes; Introduction to the Theory of Statistics, P-300-
311.
Efficient Estimator: If x1 , x 2 , , x n be a sample drawn from the population with density f (x ;θ ) and t be
a unbiased consistent estimator of θ . If the variance of t is less than the variance of all other estimators, then
t is said to be the most efficient estimator of θ , simply called efficient estimator of θ . The efficiency of an
estimator can be written as
Var ( Most efficient estimator )
c=
Var ( Given estimator )
Regular Distribution: The joint p.d . f . of X ’s is said to be regular with respect to its first θ derivative,
where X ~ f (x ;θ ) θ ∈ Ω i.e.,
∞
∫ f (x ;θ )dx = 1
−∞
∂f (x ;θ )
∞
⇒ ∫ ∂θ
dx = 0
−∞
∂f (x ;θ )
∞
⇒ ∫ ∂θ ⋅ f (x ;θ ) f (x ;θ )dx = 0
−∞
∂ ln f (x ;θ )
∞
⇒ ∫ f (x ;θ )dx = 0
−∞
∂θ
This is called regular distribution.
Regular Estimator and Regularity Condition: Let X1 , , X n be a random variables having the joint
p.d . f . fθ ( x1 , , xn ) , θ ∈ Θ . if the statistic t ( X 1 , , X n ) is such that fθ {t ( X 1 , , X n )} = ψ (θ ) ∀ θ , and if
the following regularity conditions hold then the statistic t ( X1 , , X n ) is known as the regular estimator of
θ ∈Θ .
i) θ lies in a non-degenerate open interval Θ in the real line; Θ may be infinite;
∂fθ ( x )
ii) exists ∀ θ ∈ Θ ;
∂θ
iii) ∫ fθ ( x ) dx can be differentiated with respect to θ under the integral sign;
iv) ∫ t ( x ) fθ ( x ) dx can be differentiated under the integral sign;
2
⎡ ∂ ln fθ ( x ) ⎤
v) Eθ ⎢ ⎥ Exists and is positive ∀ θ ∈ Θ .
⎣ ∂θ ⎦
Best Regular Unbiased Estimator (BRUE): In any regular estimation case, the efficiency of an
unbiased regular estimator tn ( X1 , , X n ) is
1
⎧⎪ ⎛ ∂ ln f ( X | θ ) ⎞ ⎫⎪
2
⎨ θ⎜
nE ⎟ ⎬
⎪⎩ ⎝ ∂θ ⎠ ⎪⎭
eθ ( tn ) =
Varθ ( tn )
If eθ ( tn ) ≡ 1 , then tn is called efficient and a Best Regular Unbiased Estimator (BRUE).
Note: In any regular estimation case, 0 ≤ eθ ( tn ) ≤ 1 . We have eθ ( tn ) ≡ 1 iff Varθ ( tn ) achieves the lower
bound for all θ . In any regular estimation case, the asymptotic efficiency of an unbiased regular estimator
tn ( X 1 , , X n ) is lim eθ ( tn ) .
n →∞
N.B.: From the Chapter Consistency and Efficient Estimator of Third Year Note, we have to read Example
of efficient and sufficient estimator, Fisher’s Information, Raw-Cramer Inequality and others.
Generalized Rao-Cramer Inequality: See in the chapter of Asymptotically Most Efficient Estimator of
Third Year Note. (Ref. Kendal, Stuart; the Advanced Theory of Statistics, P-12)
Bhattacharyya Inequality: See in the chapter of Asymptotically Most Efficient Estimator of Third Year
Note. (Ref. Kendal, Stuart; The Advanced Theory of Statistics, P-12-15)
Chapman, Robbins and Kiefer Inequality: This inequality gives a lower bound for the variance of an
estimate but does not require regularity conditions like Rao-Cramer Inequality.
Statement: Suppose that X = ( x1 , x2 , , xn ) be random variables with joint density or frequency function
f ( x ;θ ) , where θ is a one dimensional parameter belongs to parametric space Ω . Let T be an unbiased
( )
estimate of τ (θ ) with Eθ T 2 < ∞ for all θ ∈ Ω . If θ ≠ ψ , assume that fθ and fψ are different and assume
that there exists a ψ ∈ Ω such that θ ≠ ψ and S (θ ) = { fθ ( x ) > 0} ⊃ S (ψ ) = { fψ ( x ) > 0} then

2
⎡⎣τ (ψ ) − τ (θ ) ⎤⎦
Varθ (T ( X ) ) ≥ sup ∀ θ ∈Ω
{ψ :S (ψ )⊂ S (θ ), ψ ≠θ } ⎪⎧ fψ ( X ) ⎪⎫
Varθ ⎨ ⎬
⎩⎪ fθ ( X ) ⎭⎪
Proof: Since T is unbiased for τ (θ ) , Eψ (T ( X ) ) = τ (ψ ) ∀ ψ ∈ Ω . Hence, for ψ ≠ θ
⎡ fψ ( x ) − fθ ( x ) ⎤
∫ T ( x ) ⎢⎣⎢ fθ ( x ) ⎥⎦⎥ fθ ( x ) dx = τ (ψ ) − τ (θ )
S (θ )
Which gives
⎧⎪ ⎡ fψ ( X ) ⎤ ⎫⎪
Covθ ⎨T ( X ) , ⎢ − 1⎥ ⎬ = τ (ψ ) − τ (θ )
⎩⎪ ⎢⎣ fθ ( X ) ⎥⎦ ⎭⎪
⎧⎪ fψ ( X ) ⎫⎪ ⎧⎪ fψ ( X ) ⎫⎪
⇒ E {T ( X ) − τ (θ )} ⎨ − 1⎬ = τ (ψ ) − τ (θ ) Since E ⎨ − 1⎬ = 0
⎩⎪ fθ ( X ) ⎭⎪ ⎩⎪ fθ ( X ) ⎭⎪
Since ρ2 ≤1
⎧⎪ ⎡ fψ ( X ) ⎤ ⎫⎪ ⎧⎪ fψ ( X ) ⎫⎪
⇒ Covθ2 ⎨T ( X ) , ⎢ − 1⎥ ⎬ ≤ V {T ( X )} ⋅ V ⎨ − 1⎬
⎩⎪ ⎣⎢ θ ( ) ⎦⎥ ⎭⎪
f X ⎩⎪ fθ ( X ) ⎭⎪
⎧⎪ ⎡ fψ ( X ) ⎤ ⎫⎪ ⎧⎪ fψ ( X ) ⎫⎪
⇒ Covθ2 ⎨T ( X ) , ⎢ − 1⎥ ⎬ ≤ V {T ( X )} ⋅ V ⎨ ⎬
⎪⎩ ⎢⎣ fθ ( X ) ⎥⎦ ⎪⎭ ⎪⎩ fθ ( X ) ⎪⎭
⎧⎪ fψ ( X ) ⎫⎪
⎡⎣τ (ψ ) − τ (θ ) ⎤⎦ ≤ V {T ( X )} ⋅ V ⎨
2
⇒ ⎬
⎪⎩ fθ ( X ) ⎪⎭
2
⎡τ (ψ ) − τ (θ ) ⎤⎦
⇒ V {T ( X )} ≤ ⎣ ( Proved )
⎪⎧ fψ ( X ) ⎪⎫
V⎨ ⎬
⎩⎪ fθ ( X ) ⎭⎪
Example: Let X be U [ 0, θ ] . Then
⎧1
⎪ if 0 ≤ x ≤ θ
fθ ( x ) = ⎨θ
⎪⎩0 otherwise
2
⎡ ∂ ln fθ ( X ) ⎤ 1 θ2
Thus we get Eθ ⎢ ⎥ = 2 , so that the lower bound of the Rao-Cramer inequality is . Hence we
⎣ ∂θ ⎦ θ n
can say that the regularity condition do not hold.

Let ψ (θ ) = θ . If ψ < θ , then S (ψ ) ⊂ S (θ ) . Also,
2
⎡ fψ ( X ) ⎤
2
ψ ⎛θ ⎞ 1
Eθ ⎢ ⎥ =
⎣⎢ fθ ( X ) ⎦⎥
∫0 ⎜ ⎟ dx
⎝ψ ⎠ θ
θ θ
= [ x ]ψ =
ψ2 0 ψ
Thus
2
⎡τ (ψ ) − τ (θ ) ⎦⎤
Varθ {T ( X )} ≥ sup ⎣
{ψ : ψ <θ } ⎧⎪ fψ ( X ) ⎫⎪
V⎨ ⎬
⎩⎪ fθ ( X ) ⎭⎪
≥ sup
[ψ − θ ]2
{ψ : ψ <θ } θ − 1
ψ
≥ sup {ψ (θ −ψ )}
{ψ : ψ <θ }
Now, let us take
ψ (θ −ψ ) θ +1
K (ψ ) = >1 iff ψ <
(ψ − 1)(θ −ψ + 1) 2
θ +1 θ +1
Therefore, K (ψ ) increases as long as ψ < and decreases if ψ > . K (ψ ) attains maximum value if
2 2
θ +1
ψ = .
2
⎧θ + 1 ⎛ θ + 1 ⎞⎫
∴ Varθ {T ( X )} ≥ sup {ψ (θ −ψ )} =⎨ ⎜θ − ⎟⎬
{ψ : ψ <θ } ⎩ 2 ⎝ 2 ⎠⎭
θ2
So, Varθ {T ( X )} ≥
4
This is the lower bound for any unbiased estimate T ( X ) of θ .
Now, X is a complete sufficient statistic and 2 X is unbiased for θ so that T ( X ) = 2 X is the UMVUE. Also
θ2 θ2
Varθ {2 X } = 4Var ( X ) = >
3 4
2
θ
Thus the lower bound of the Chapman, Robbins and Kiefer (CRK) inequality is not achieved by any
4
unbiased estimate of θ .
Example: Let X have p.m. f .
⎧1
⎪ ; if k = 1, 2, ,N
PN { X = k} = ⎨ N
⎪⎩0 ; Otherwise
Let Ω = { N : N ≥ M , M > 1 given} . Takingψ ( N ) = N . The p.m. f . does not hold regularity conditions, so CRK
inequality is applicable. Now for N ≠ N ′ ∈ Ω

S ( N ) = {1, 2, , N } ⊃ S ( N ′ ) = {1, 2, , N ′} ; if N ′ < N
Also, PN and PN ′ are different for N ≠ N ′ . Thus
( N − N ′ )2
VarN (T ) ≥ sup
N ′< N ⎧P ⎫
VarN ⎨ N ′ ⎬
⎩ PN ⎭
⎧N
PN ′ PN ′ ( x ) ⎪ ; x = 1, 2, , N ′, N ′ < N
Now, ( x) = = ⎨N′
PN PN ( x ) ⎪
⎩0 ; Otherwise
2
⎪⎧ P ′ ( x ) ⎪⎫ 1 ⎪⎧ P ′ ( x ) ⎪⎫
N′ N′ 2
⎛N ⎞ 1 ⎛N ⎞
∑ ∑ ⎜⎝ N ′ ⎟⎠
N
EN ⎨ N ⎬= ⎜ ′⎟ =1 and EN ⎨ N ⎬ = =
⎩⎪ PN ( x ) ⎭⎪ N 1 ⎝N ⎠ ⎩⎪ PN ( x ) ⎭⎪ N 1 N′
⎪⎧ P ′ ( x ) ⎪⎫ N
∴ VarN ⎨ N ⎬= −1 > 0 for N > N ′
⎩⎪ PN ( x ) ⎭⎪ N ′
It follows that
( N − N ′ )2
VarN (T ) ≥ sup
N ′< N N
−1
N′
≥ sup ⎡⎣ N ′ ( N − N ′ ) ⎤⎦
N ′< N
Now, let us take

k (N − k) N +1
>1 iff k<
( k − 1)( N − k + 1) 2
N +1 N +1
Therefore, N ′ ( N − N ′ ) increases as long as N ′ < and decreases if N ′ > . The maximum is achieved
2 2
N +1
at N ′ = .
2
⎡ N +1⎛ N + 1 ⎞⎤
∴ VarN (T ) ≥ ⎢ ⎜N − ⎟
⎣ 2 ⎝ 2 ⎠ ⎥⎦
N +1
≥ M (N − M ) if M >
2
N.B.: Reference Rohatgi V K; An Introduction to Probability Theory and Mathematical Statistics, p-365.
And Rohatgi, Saleh; an Introduction to Probability and Statistics, p-397.
Uniformly Minimum Variance Unbiased Estimator (UMVUE): Let X1 , X 2 , , X n be a random
sample from f ( x, θ ) . An estimator T * = t * ( X1 , , X n ) of τ (θ ) is defined to be a uniformly minimum
variance unbiased estimator of τ (θ ) if and only if
a) T * is unbiased for τ (θ )
( )
b) For any other estimator T of τ (θ ) will be V T * ≤ V (T ) for all θ ∈ Ω .
Concept of Raw-Blackwell Theorem: A very powerful method for finding minimum variance estimator
irrespective whether MVB is attained or not is provided by a theorem known as Rao-Blackwell Theorem.
This theorem says that if we look for an MVE of τ (θ ) , we need only inspect estimators which are function of
sufficient statistic. This theorem says that any unbiased estimator should be a function of sufficient statistic. If
not we can construct an estimator with smaller variance by taking the conditional expectation given a
sufficient statistic. However, this raises the question of which sufficient statistic to use to compute the
conditional expectation. For example, suppose that S is an unbiased estimator of τ (θ ) , with finite variance
and let T1 & T2 are both sufficient statistic for θ , with T2 = h (T1 ) for some function of h . Let us define
S1* = E ( S | T1 ) and S2* = E ( S | T2 )
By the Rao-Blackwell theorem the variance of S1* & S 2* can not exceed V ( S ) . However, it is not obvious
which estimator will have the smaller variance.

Statement of Rao-Blackwell Theorem: Suppose that S has a joint distribution depending on some
unknown parameter θ and that T = T ( x ) is a sufficient statistic for θ . Let S = S ( x ) be a statistic such that
E ( S ) = τ (θ ) and if S * = E ( S | T ) then
a) S * is a unbiased estimator of τ (θ )
( ) ( )
b) V S * ≤ V ( S ) . Moreover, V S * < V ( S ) unless P S = S * = 1 . ( )
N.B: For more on see the Chapter of Asymptotically Most Efficient Estimator of Third Year Note.
Minimal Sufficient Statistics: Suppose X1 , X 2 , (

, Xn ~ N θ, σ 2 ) where both θ and σ 2 are unknown,
then joint sufficient statistic are x and s 2 . We have another set of sufficient statistic
X1 , X 2 , , X n ⇒ Y1 < Y2 < < Yn . Now these sufficient statistic condense the data.
A set of sufficient statistic is minimal if no other set of sufficient statistic condenses the data more. A set of
jointly sufficient statistic is define to be minimal sufficient iff it is a function of every other set of sufficient
statistic. That is among a number of sufficient statistic we should choose one t0 (say) which condenses the
data more than any other sufficient statistics. Then t0 is minimal sufficient statistic.
A statistic T ( x1 , x2 , , xn ) is a minimal sufficient statistic if T ( ⋅) is a sufficient statistic and a function of
every other sufficient statistic that is

T ( x1 , x2 , , xn ) = ψ {t ( x1 , x2 , , xn )} ∀ sufficient statistic t
Sufficient statistic always exists but minimal sufficient statistic may not always exists.
Way of Finding Minimal Sufficient Statistic: Let f ( x ; θ ) be the p.d . f . of X and suppose that there
f (x ; θ )
exists a function T ( x ) such that for any two points x and y the ratio is independent of θ iff
f (y ;θ)
t ( x ) = t ( y ) then T ( x ) is a minimal sufficient statistic for θ .
L(x ; θ )
If likelihood ratio is independent of θ when t ( x ) = t ( y ) for some sufficient statistic t ( ⋅) and
L( y ; θ )
L(x ; θ ) g (t ; θ ) h ( x ) h ( x)
= = then T ( x ) is a minimal sufficient statistic for θ .
L( y ; θ ) g (t ; θ ) h ( y ) h( y)
Example: Suppose x1 , x2 , , xn are independent random variables each with N θ , σ 2 , where θ is ( )

unknown and σ 2 is known. Find a minimal sufficient statistic for θ .
Solution: Here we have,
1
⎛ 1 ⎞ − 2σ 2 ∑ ( xi −θ )
n 2
L(x |θ ) = ⎜ ⎟ e
⎝ σ 2π ⎠
⎛ 1 ⎞ − 2σ 2 ∑ xi − 2σ 2 ( −2θ ∑ xi + nθ )
n 1 2 1 2
=⎜ ⎟ e e
⎝ σ 2π ⎠
By Neyman Factorization theorem we can say that x is sufficient for θ . Similarly
⎛ 1 ⎞ − 2σ 2 ∑ yi − 2σ 2 ( −2θ ∑ yi + nθ )
n 1 2 1 2
L( y |θ ) = ⎜ ⎟ e e
⎝ σ 2π ⎠
θ
L(x |θ ) −
1
2σ 2
( ∑ x −∑ y )
2
i
2
i −
σ2
( ∑ xi − ∑ yi )
=e ⋅e
L( y |θ )
Which will be independent of θ iff
∑ xi = ∑ yi
⇒ t ( xi ) = t ( yi )
Hence t ( x1 , x2 , , xn ) = ∑ xi is a minimal sufficient statistic for θ .
Example: Suppose we have n = 2 independent observation from the Cauchy distribution with p.d . f .
1 1
f X ( x) = ⋅ ; −∞ < x < ∞
π 1 + ( x − θ )2
Show that no nontrivial sufficient statistic exists.

Solution: Since n = 2 , we consider two points x = ( x1 , x2 ) and y = ( y1 , y2 ) . Hence we get,
1 1 1
L(x |θ ) = ⋅
1 + ( x1 − θ ) 1 + ( x2 − θ )
2 2 2
π
1 1 1
Similarly, L( y |θ ) = ⋅
1 + ( y1 − θ ) 1 + ( y2 − θ )
2 2 2
π
1
Thus we have,
L(x |θ )
=
{1 + ( x − θ ) }{1 + ( x
1
2
2 −θ )
2
}
L( y |θ ) 1
{1 + ( y − θ ) }{1 + ( y − θ ) }
1
2
2
2
=
{1 + ( y − θ ) }{1 + ( y − θ ) }
1
2
2
2
{1 + ( x − θ ) }{1 + ( x − θ ) }
1
2
2
2
L(x |θ )
Since is depends on θ , so hence we cannot get the minimal sufficient statistic.
L( y |θ )
Example: Let X be a single observation from the point probability function

⎧θ
⎪2 if x = −3
⎪
⎪θ if x=0
⎪3
⎪
⎪1 − 2θ 1
f (x |θ ) = ⎨ if x = 6, 13, 52 where 0 < θ <
⎪ 3 2
⎪ 2 θ
⎪θ + 6 if x = 60
⎪
⎪θ − θ 2 if x = 68
⎪⎩
Find a minimal Sufficient statistic for θ .
Solution: We know at least one sufficient statistic always exists, namely the identity statistic
t ( x1 , x2 , , xn ) = x1 , x2 , , xn . Hence X itself is a sufficient statistic for θ . For finding minimal Sufficient
L(x |θ )
statistic, is independent of θ . So there will exists a minimal sufficient statistic.
L( y |θ )
Thus we can partition the sample space into the sets {−3, 0} , {6, 13, 52} , {60} , {68} and a minimal sufficient
statistic is
⎧c1 if x = −3 or 0
⎪c x = 6 or 13 or 52
⎪ 2 if
t ( x) = ⎨
⎪c3 if x = 60
⎪⎩c4 if x = 68
where c1 , c2 , c3 and c4 are distinct constants.
The probability distribution of t ( x ) is
⎧ 5θ
⎪6 if w = c1
⎪
⎪1 − 2θ if w = c2
P {t ( x = w )} = ⎨
⎪θ 2 + θ if w = c3
⎪ 6
⎪ 2
⎩θ − θ if w = c4
Example: Let X be a single observation from the probability function
⎧θ 2 if x = −1, 3
⎪
⎪1 θ 2
⎪2 − 2 if x=0
⎪
P(x |θ ) = ⎨ θ 2 θ
⎪− + if x = 2, 4
⎪ 2 2
⎪1 θ 2
⎪ − −θ if x =1
⎩2 2
where θ is an unknown number between zero and 2 − 1 . Find a minimal sufficient statistic for θ .
Solution: Here we partition the sample space into the sets {−1, 3} , {0} , {2, 4} , {1} . Hence the minimal
sufficient statistic is
⎧c1 if x = −1 or 3
⎪c x=0
⎪ 2 if
t ( x) = ⎨
⎪c3 if x = 2 or 4
⎪⎩c4 if x =1
Where c1 , c2 , c3 , c4 are distinct constants.
Best Asymptotically Normal Estimator (BAN Estimator): A sequence of estimator T1′, T2′, , Tn′ of
τ (θ ) is defined to be best asymptotically normal ( BAN ) if and only if the fallowing four conditions are
satisfied:
a) The Distribution of { }
n ⎡⎣Tn′ − τ (θ ) ⎤⎦ approaches N 0, σ ′2 (θ ) as n → ∞
b) For every ε > 0 , lim Pθ ⎣⎡ Tn′ − τ (θ ) > ε ⎦⎤ = 0 for each θ ∈ Ω

n →∞
c) Let {Tn } be any other sequence of simple consistent estimators for which the distribution of
n ⎡⎣Tn′ − τ (θ ) ⎤⎦ approaches N ⎡⎣0, σ 2 (θ ) ⎤⎦
d) σ 2 (θ ) is not less than σ ′2 (θ ) for all θ in any open interval.
BAN is sometimes replaced by consistent asymptotically normal efficient (CANE).
Example: Let x1 , x2 , ( )
, xn be a random sample from N µ , θ 2 . Then Tn′ =
∑ xi = xn is a BAN estimator of
n
µ . Since the limiting distribution of ( )

n [ xn − µ ] is N 0, σ 2 and no other estimator can have smaller limiting
variance is any interval of µ values.
Best Consistent Unbiased Asymptotically Normal (BCUAN): A CAN estimator TN of g (θ ) is said

to be the Best Consistent Unbiased Asymptotically Normal Estimator if it is unbiased and the variance of the
limiting distribution of n ⎡⎣Tn − g (θ ) ⎤⎦ has least possible value.
Completeness: The family of density or probability functions f ( x | θ ) , θ ∈ Ω , is called complete if, for
every function u ( x ) , the identity Eθ {u ( X )} = 0 implies Pθ {u ( X ) = 0} = 1 for all θ ∈ Ω . This is sometimes
express by saying that there are no unbiased estimators of zero. In particular it means that two different
function of T can not have the same expected value. For exmple
E {T ( X )} = θ and E { K ( X )} = θ
∴ E {T ( X ) − K ( X )} = 0
⇒ T (X )− K (X ) = 0
That is any unbiased estimator is unique. In this sense, we are primarily interested in knowing that the family
of density function of a sufficient statistic is complete, since in that case an unbiased function of the sufficient
statistic will be unique and it must be a uniformly minimum variance unbiased estimator by the Rao-
Blackwell theorem.
x2
1 −
Example: Suppose f ( x ) = e 2 ; − ∞ < x < ∞ . Check it whether it is complete or not.
2π
Solution: Let us consider the function ϕ ( x ) = x . Now,
∞ x2
1 −
E {ϕ ( x )} = E ( x ) = ∫ x⋅e 2 dx
2π −∞
0 x2 ∞ x2
1 − 1 −
=
2π ∫ −x ⋅ e 2 dx +
2π ∫ x⋅e 2 dx
−∞ 0
∞ x2 ∞ x2
1 − 1 −
=−
2π ∫ x⋅e 2 dx +
2π ∫ x⋅e 2 dx
0 0
∴ E {ϕ ( x )} = 0
But ϕ ( x ) is non-zero. Hence f ( x ) is not complete.
x2
−
1
Example: Let f ( x ) = e 2β 2
; − ∞ < x < ∞ ; β > 0 . Check it whether it is complete or not.
β 2π
Solution: Let us consider the function ϕ ( x ) = x . Now,
∞ x2
−
1
E {ϕ ( x )} = E ( x ) = ∫ x⋅e
2β 2
dx
β 2π −∞
0 x2 ∞ x2
− −
1 1
∫ −x ⋅ e ∫ x⋅e
2β 2 2β 2
= dx + dx
β 2π −∞
β 2π 0
∞ x2 ∞ x2
− −
1 1
∫ x⋅e ∫ x⋅e
2β 2 2β 2
=− dx + dx
β 2π 0
β 2π 0
∴ E {ϕ ( x )} = 0
But ϕ ( x ) is non-zero. Hence f ( x ) is not complete.
Example: Let x1 , x2 , , xn be independent random variable each with P ( λ ) ; λ > 0 . Find a minimal
sufficient statistic for λ and check if it is complete.

Solution: The joint distribution of x1 , x2 , , xn is
e− nλ λ ∑ ⎛ n 1 ⎞
( )
xi
L(x ; θ ) = = ⎜ ∏ ⎟ e − nλ λ ∑ i
x
n ⎜ x ! ⎟
∏ xi ! ⎝ i =1 i ⎠
i =1
By the Neyman Factorization theorem, we can say that ∑ xi is a sufficient statistic. Now,
e − nλ λ ∑
xi
n
L(x | λ)
∏ xi !
i =1
=
L( y | λ) e − nλ
λ∑ i
y
n
∏ yi !
i =1
which is independent of λ if ∑ xi = ∑ yi . So that ∑ xi is a minimal sufficient statistic. The distribution of
∑ xi is also Poisson and has a p.m. f .
e− nλ ( nλ )
x
; x = 0, 1, 2,
x!
Hence showing that the family of probability functions of the minimal sufficient statistic ∑ xi is complete is
equivalent to showing that the Poisson family is complete.

Let u ( ⋅) be any function, then
Eλ ⎡⎣u ( x ) ⎤⎦ = 0
∞
e−λ λ k
⇒ ∑ u (k ) ⋅ k!
=0
k =0
∞
λk
⇒ ∑ u (k ) ⋅ k!
=0 ⎡ Since e − λ > 0 ⎤
⎣ ⎦
k =0
Since e −λ
≠ 0, k ! ≠ 0 , so that u ( k ) = 0 . Hence the Poisson family of distribution is complete and ∑ xi is a
complete minimal sufficient statistic for λ .

Example: Let x1 , x2 , , xn be independent random variable each with U ( 0, θ ) ; θ > 0 . Find a sufficient
statistic for θ and show that it is complete.

Solution: Since x1 , x2 , , xn be independent random variables each with U ( 0, θ ) ; θ > 0 then
1
f ( x) =
θ
1
∴ L ( x) =
θn
Hence, x( n ) is the sufficient statistic for θ . We known that
f n:n ( x ) = n { F ( x )}
n −1
⋅ f ( x)
So, the p.d . f . of x( n ) is
nx n −1 ⎡ x
1 x⎤
⎢∵ F ( x) = ∫ θ dx = θ ⎥⎥
θ n
⎢⎣ 0 ⎦
Let u ( ⋅) be any function. Then
Eθ ⎡⎣u ( xn ) ⎤⎦ = 0
θ
nx n −1
⇒ ∫ u ( xn ) ⋅ θn
dx = 0
0
θ
∫ u ( xn ) x
n −1
⇒ dx = 0
0
⇒ u (θ )θ n −1 = 0 [ Differentiating both sides w. r. to θ ]

∴ u (θ ) = 0 ∀θ ⎡ Since θ n −1 ≠ 0 ⎤
⎣ ⎦
Therefore x( n ) is a complete sufficient statistic for θ .
Ancillary Statistic: A statistic U ( X1 , X 2 , , X n ) is called first order ancillary if Eθ {U ( X1 , X 2 , , X n )} is
a constant independent of θ . U ( X1 , X 2 , , X n ) is called an ancillary statistic for θ if the distribution
function of U ( X1 , X 2 , , X n ) does not depend on θ .
Thus, unlike a sufficient statistic, an ancillary statistic does not contain any information about the parameter
θ . In such cases, intuition suggests that (since the sufficient statistic T ( X1 , X 2 , , X n ) contains all the
information about θ ) the ancillary statistic should be independent of T ( X 1 , X 2 , , Xn ) .
Example: Let X1 , X 2 , , Xn be a random sample from N ( µ , 1) . Then the statistic

n
U ( X ) = ( n − 1) S 2 = ∑ ( X i − X )
2
is ancillary since ( n − 1) S 2 ~ χ(2n −1) , which is free of µ . Some other ancillary
i =1
n
statistics are X1 − X , X ( n ) − X (1) and ∑ Xi − X .
i =1
Example: Let X1 , X 2 , ( )
, X n be a random sample from N 0, σ 2 . Then the statistic U ( X ) = X follows
( )
N 0, n −1σ 2 and is not ancillary with respect to the parameter σ 2 .
Example: Let X (1) , X ( 2) , , X ( n ) be the order statistics of a random sample from the p.d . f . f ( x − θ ) , where
(
θ ∈ ℜ , then the statistic U ( X ) = X ( 2 ) − X (1) , , X ( n ) − X (1) ) is ancillary for θ .
Example: Let X1 , X 2 , , X n be a iid random variable with distribution
1
f ( x ; µ ,θ ) = ; µ −θ ≤ x ≤ µ +θ
2θ
Then the statistic R = X ( n ) − X (1) is not ancillary statistic because the distribution of R is
n ( n − 1) x n − 2 ⎛ x ⎞
fR ( r ) = ⎜1 − ; 0 ≤ x ≤ 2θ
( 2θ ) n −1
⎝ 2θ ⎟⎠
which is dependent of θ .
Basu’s Theorem: Let T ( X1 , X 2 , , X n ) be a complete sufficient statistic and U ( X 1 , X 2 , , X n ) an
ancillary statistic. Then T and U are independent random variables.

Proof: Fixing u as an arbitrary value of U . Let g ( t ) = P {U = u | T = t} , then
Eθ { g (T )} = ∑ P {U = u | T = t} P {T = t}
t
∑ P {U = u, T = t}
= t
P {T = t}
P {T = t}
= ∑ P {U = u , T = t} = P {U = u}
t
So, Eθ ⎡⎣ g (T ) − P {U = u}⎤⎦ = 0 . By completeness of T , g ( t ) − P {U = u} = 0 for all t , that is
P {U = u | T = t} = P {U = u} for all t
Hence U and T are independent random variables.
Example: Let X1 & X 2 be independent random variables each N ( µ , σ 2 ) with σ 2 known and µ unknown.
( )
Let T ( X1 , X 2 ) = X 1 + X 2 , U ( X 1 , X 2 ) = X1 − X 2 . Then U ( X1 , X 2 ) is N 0, 2σ 2 and its distribution does not
depend on µ . Hence it is an ancillary statistic for µ . Since T ( X 1 , X 2 ) is a complete sufficient statistic, it
follows from the Basu’s theorem the X 1 − X 2 and X 1 + X 2 are independent random variables.
Example: Let X1 , X 2 , , X n be a random sample of size n from the uniform distribution on [ 0, θ ] , and let
Y1 < Y2 < < Yn denote the corresponding order statistics. Show that Y1 and Yn are independent random
Yn
variables.
Solution: Since Yn is a complete sufficient statistic for θ , it suffices (by the Basu’s Theorem) to show that
the distribution of Y1 Y does not depend on θ (i.e. that Y1 Y is an ancillary statistic), which follows since for
n n
0 < t ≤1
fY1 ( t ) = P ⎛⎜ Y1 Y ≤ t ⎞⎟ = P (Y1 ≤ tYn )

Yn ⎝ n ⎠
θ
= P (Y1 ≤ ty ) fYn ( y ) dy
∫
0
θ
⎧⎪ ⎛ ty ⎞ n ⎫⎪ ny n −1
∫
= ⎨1 − ⎜ 1 − ⎟ ⎬ n dy
⎪ ⎝ θ ⎠ ⎭⎪ θ
0⎩
θ θ n −1
ny n −1
n
⎛ y⎞ ⎛ y⎞ 1
= ∫ θ n ∫
dy − ⎜1 − t ⎟ n ⎜ ⎟
⎝ θ ⎠ ⎝θ ⎠ θ
dy
0 0
1
⎛ y⎞
= 1 − n (1 − tx ) x n −1dx
∫
n
⎜ Taking x = ⎟
0 ⎝ θ⎠
Which is independent of θ . So Y1 Y and Yn are independent random variables.

n
Theorem: Suppose that X = ( X1 , X 2 , , X n ) have joint density or joint frequency function that is a k -
parameters exponential family

⎡ k ⎤
∑
f ( x ; θ ) = exp ⎢ Ci (θ ) Ti ( x ) − d (θ ) + S ( x ) ⎥
⎣ i =1 ⎦
then the statistic {T1 ( x ) , T2 ( x ) , , Tk ( x )} is complete as well as sufficient for θ .
Example: Show that if X1 , X 2 , , Xn are independent random variables each
( )
N µ , σ 2 ; − ∞ < µ < ∞ , σ 2 > 0 are both unknown, then the joint density of X 1 , X 2 , , X n is a member of
two parameter exponential family.

Solution: The joint density of X1 , X 2 , , X n is
1
⎛ 1 ⎞ − 2σ 2 ∑ ( xi − µ )
n 2
(
L x ; µ, σ 2 = ⎜ )⎟ e
⎝ σ 2π ⎠
⎛ 1 ⎞ − 2σ 2 {∑ xi − 2 µ ∑ xi + nµ }
n 1 2 2
=⎜ ⎟ e
⎝ σ 2π ⎠
⎡
⎣
1 n 1
{
⎤
= exp ⎢ n ln − ln ( 2π ) − 2 ∑ xi2 − 2 µ ∑ xi + nµ 2 ⎥
σ 2 2σ ⎦
}
⎡ µ nµ ⎤2
( 1
)
= exp ⎢ − n ln σ 2π − 2 ∑ xi2 + 2 ∑ xi − 2 ⎥
2σ σ 2σ ⎦
⎣
⎡µ ⎞ ⎧⎪ nµ
2 ⎫⎤
= exp ⎢ 2
⎢⎣ σ
⎛ 1
∑ xi + ∑ xi2 ⎜⎝ − 2σ 2 ⎟ − ⎨
⎠ ⎪⎩ 2σ 2
+ n ln σ 2π ( )⎬⎪⎪⎥⎥ (1)
⎭⎦
So we can say, there is sufficient statistic for µ and σ 2 . The joint density of X 1 , X 2 , , X n is said to be a
member of the exponential family or a member of the Koopman-Darmois class or is said to have Koopman-
Darmois form if
⎡ k ⎤
L ( x ; θ ) = exp ⎢ ∑ Ci (θ ) Ti ( x ) − d (θ ) + S ( x ) ⎥ ( 2)
⎣ i =1 ⎦
Now by (1) and ( 2 ) we have,
T1 ( x ) = ∑ xi T2 ( x ) = ∑ xi2
µ
(
C1 µ , σ 2 = ) σ2
(
C2 µ , σ 2 = −) 1
2σ 2
⎪⎧ nµ ⎪⎫
2
(
d µ, σ 2 ) (
= ⎨ 2 + n ln σ 2π ⎬ ) S ( x) = 0
⎩⎪ 2σ ⎭⎪
So, the joint density of X 1 , X 2 , , X n is a member of two parameter exponential family.
Lehmann-Scheffe Theorem: This theorem gives a simple criterion for existence of a uniformly minimum
variance unbiased estimator when a complete and sufficient statistic exists.
Statement: Let X1 , , X n be a random sample from a density f ( ⋅ , θ ) . If S = s ( X1 , , X n ) is a complete
sufficient statistic and if T * = t * ( S ) , a function of S , is an unbiased estimator of τ (θ ) , then T * is an
UMVUE of τ (θ ) .
Proof: Let T ′ be any unbiased estimator of τ (θ ) which is a function of S ; that is, T ′ = t ′ ( S ) . Then
Eθ ⎣⎡T * − T ′⎦⎤ = 0 for all θ ∈ Ω and T * − T ′ is a function of S ; so by completeness of S , Pθ ⎣⎡t * ( S ) = t ′ ( S ) ⎦⎤ ≡ 1
for all θ ∈ Ω . Hence there is only one unbiased estimator of τ (θ ) . T * must be equal to E [T | S ] since
E [T | S ] is an unbiased estimator of τ (θ ) depending on S . By Rao-Blackwell theorem, Vθ ⎡⎣T * ⎤⎦ ≤ Vθ [T ] for
all θ ∈ Ω ; so T * is an UMVUE.
Explanation: This theorem states that if a complete sufficient statistic S exists and if there is an unbiased
estimator for τ (θ ) , then there is an UMVUE for τ (θ ) . This theorem also simplifies search for unbiased
estimator if a complete and sufficient statistic T exist and there exist no function h such that
E ⎣⎡ h ( s ) ⎦⎤ = τ (θ ) , then no unbiased estimator of τ (θ ) exist. The Rao-Blackwell theorem and Lehmann-Scheffe
theorem suggest two approaches to finding UMVUE when a complete and sufficient statistic exists.
Note:
a) Find a function h such that E ⎡⎣ h ( s ) ⎤⎦ = τ (θ ) then h ( s ) is the unique UMVUE of τ (θ ) . The
function h can be determined by solving the equation E ⎡⎣ h ( s ) ⎤⎦ = τ (θ ) .
b) Given an unbiased estimator T of τ (θ ) defined an estimator by the Rao-Blackwell theorem
E ⎡⎣T * | S ⎤⎦ = T then this T is the unique UMVUE of τ (θ ) .
Example: Let X1 , , X n be iid Bernoulli random variable with parameter θ . By factorization theorem
T = X1 + X 2 + + X n is sufficient for θ . And it is one parameter exponential family of distribution so it is
complete. We want to find the UMVUE of θ 2 .

Let n = 2 . If a UMVUE exists, it is a function of the form h ( S ) , where the function h satisfies
2
⎛2⎞
θ 2 = ∑ h ( k ) ⎜ ⎟ θ k (1 − θ )
2− k
k =0
k ⎝ ⎠
θ = h ( 0 )(1 − θ ) + h (1) θ (1 − θ ) + h ( 2 )θ 2 (1)
2 2
⇒
For the equation (1) , L.H .S = R.H .S iff h ( 0 ) = h (1) = 0 and h ( 2 ) = 1 .
S ( S − 1)
Thus h ( S ) = is UMVUE of θ 2 if h = 2 . But for n > 2 , T * = I ( X1 + X 2 = 2 ) .
2
Note: UMVUE of τ (θ ) can be found if a complete and sufficient statistic exists. However in many cases we
can not find complete and sufficient statistic. So in that case we can not apply the Lehmenn-Scheffe theorem
to find UMVUE estimators. So, crammer-Rao lower bound derives, a lower bound for the variance of a
unbiased estimator of τ (θ ) . If the variance of some unbiased estimator achieves this lower bound, then the
estimator will be UMVUE.
Goodness of the Estimator:

Modal Unbiased and Median Unbiased Estimators:
Modal Unbiased Estimate: Let X i ( i = 1, , n ) be iid random variables with common p.d . f . f ( x ; θ )
and let t ( X1 , , X n ) be a statistic such that the mode of the density function of t is θ . Then t ( X1 , , X n ) is
said to be a model unbiased estimate of θ .
Median Unbiased Estimate: Let X i ( i = 1, , n ) be iid random variables with common p.d . f . f ( x ; θ )
and let t ( X1 , , X n ) be a statistic such that the median of the density function of t is θ . Then t ( X1 , , Xn )
is said to be a median unbiased estimate of θ .

Example: Suppose X i ( i = 1, , n ) be the random variable with common p.d . f .
x
1 −
f ( x, θ ) = e θ ; x > 0, θ > 0
θ
Find median unbiased estimate.
Solution:
x x
1 −
We get , F ( x ) = ∫ e θ dx
0
θ
x
⎡ −x ⎤ −
x
= − ⎢e θ ⎥ = 1− e θ
⎢⎣ ⎥⎦
0
Let , Y1 = min X i
1≤i ≤ 2 n +1
n −1
∴ fY1 ( y1 ) = n ⎡⎣1 − F ( x ) ⎤⎦ f ( x)
2 n +1−1
⎡ − ⎤
x
1 −
x
= ( 2n + 1) ⎢1 − 1 + e θ ⎥ e θ
⎢⎣ ⎥⎦ θ
( 2n + 1) −
x
θ
( 2 n −1)
= e ; 0≤ x≤∞
θ
m
( 2n + 1) −
x
( 2 n −1) 1
We can write ∫ θ
e θ dx =
2
0
m
( 2n + 1) −θ ⎡ − θ ( 2 n −1) ⎤
⎢e
x
⎥ =
1
⇒ ⋅
θ ( 2n + 1) ⎢⎣ ⎥⎦
0
2
m
− ( 2 n −1) 1
⇒ −e θ +1 =
2
θ
⇒ m= ln 2
( 2n + 1)
⎡ θ ⎤
∴ E⎢ ln 2 ⎥ = θ
⎣⎢ ( 2n + 1) ⎦⎥
( 2n + 1) ( 2n + 1)
Thus, y1 is the estimate of θ . So, y1 is the unbiased estimate of median?
ln 2 ln 2
⎛ σ2 ⎞
Example: Suppose X ~ N (θ , σ 2 ) and X ~ N ⎜⎜ θ , ⎟ . Then the model X is unbiased estimate of θ .
⎝ n ⎟⎠
2
1 ⎛ x −θ ⎞
1 − ⎜ ⎟
Solution: We have f ( x) = e 2⎝ σ ⎠ ; −∞ < x < ∞
σ 2π
2
⎛ 1 ⎞ 1 ⎛ x −θ ⎞
ln L (θ ) = f ( x ) = n ln ⎜ ⎟ − ∑⎜ ⎟
⎝ σ 2π ⎠ 2 ⎝ σ ⎠
∂ ln L (θ ) 1 ⎛ x −θ ⎞
⇒ = 0 − ⋅ 2∑ ⎜ ⎟ ( −1) = 0
∂θ 2 ⎝ σ ⎠
⇒ θ=X
It can be shown that the second derivative is negative. So, X is the modal value of θ . Again, X is also an
unbiased estimate of θ . Thus we can say that the X is modal unbiased estimate of θ .
Theorem: If m is a median of a discrete density p ( x ) and g ( r ) = E ( X − r ) = ∑ X − r P ( x ) then g ( r ) is
x
minimized for r = m provided that the sum exists for at least one r ∈ R (The real line).
Proof: Let r0 ∈ R and r0 ∈ [ m1 , m2 ] . Now, if it can be shown that E ( X − r0 ) = E ( X − m ) > 0 , it can be
concluded that g ( r ) is minimum for r = m . Expanding we get,
∑ ( r0 − m ) P ( x ) + ∑ ( r0 + m − 2m ) P ( x ) + ∑ ( m − r0 ) P ( x )
x≤m m < x < r0 x > r0
1
⇒ ∑ ( r0 − m ) P ( x ) ≥ 2 ( r0 − m )
x≤m
From the definition of the median we get,
∑ ( m − r0 ) P ( x ) = ∑ ( m − r0 ) P ( x ) − ∑ ( m − r0 ) P ( x )
x > r0 x>m m < x < r0
⇒ E ( X − r0 ) − E ( X − m ) ≤ 0
Which will be minimized when r0 = m .
Example: Let X i ( i = 1, , n ) be independent random samples from a continuous uniform distribution with
1
p.d . f . f ( x, θ ) = ; 0 < x ≤ θ . We have to show that y is a model unbiased estimate of θ .
θ
Consider Y = max X i . The density function of Y is
i
ny n −1
φ(y ;θ) = ; 0 < y ≤θ
θn
The mode of this distribution is θ . Hence, y is a modal unbiased estimate of θ .
Show that a modal unbiased estimate may not always unique.

Let X i ( i = 1, , n ) be independent random variables each having exponential density
x
1 −
f ( x, θ ) = e θ ; x > 0, θ > 0
θ
For this distribution, the statistic
X (n)
Y1 = where X ( n ) = max X i
ln n i
n
∑ Xi
Sn
Y2 = where S=
( n − 1) i =1
are both modal unbiased for θ .

x x
1 −
We get , F ( x) = ∫θ e θ dx
0
x
⎡ −x ⎤ −
x
= − ⎢e θ ⎥ = 1− e θ
⎢⎣ ⎥⎦
0
The p.d . f . of X ( n ) is
n −1
n⎛ ⎞
( )
x( n ) x
− − ( n)
g x( n ) , θ = ⎜1 − e θ ⎟ e θ ; x( n ) > 0 , θ > 0
⎜
θ⎝ ⎟
⎠
Now,
X (n)
Y1 = ⇒ dX ( n ) = ( ln n ) dY1
ln n
Hence the p.d . f . of Y1 is
n −1
n ln n ⎛ − 1 ln n ⎞
y y1
− ln n
g ( y1 , θ ) = ⎜1 − e θ ⎟ e θ ; y1 > 0 , θ > 0
θ ⎜ ⎟
⎝ ⎠
n −1
⎛ 1 y1 ⎞ y1
n ln n ⎜ ⎧⎪ − θ ⎫⎪ ⎟ ⎧⎪ − 1 ⎫⎪ ⎡ ⎤
1
−
= 1 − ⎨n ⎬ ⎨n θ ⎬ ⎢ Putting n θ = a ⎥
θ ⎜⎜ ⎩⎪ ⎟
⎭⎪ ⎟⎠ ⎩⎪ ⎭⎪ ⎢⎣ ⎥⎦
⎝
n ln n
( )
n −1 y
= 1 − a y1 a1 (i )
θ
∂g ( y1 , θ )
⇒
∂y1
=
n ln n
θ
(1 − a ) y1 n − 2
( n − 1) ( −a y 1
ln a a y1 +) n ln n
θ
(1 − a )y1 n −1
a y1 ln a = 0
⇒ (1 − a ) ( n − 1) a
y1 y1
=1
⎡ −
1 ⎤
⇒ na y1 = 1 ⎢∵ n θ = a⎥
⎢⎣ ⎥⎦
⇒ y1 = θ
It can be shown that second derivative of ( i ) at y1 = θ is negative and hence Y1 is modal unbiased estimate of
θ.
Now, the p.d . f . of Sn is
snn −1 −
sn
h ( sn , θ ) = e θ ; sn > 0 , θ > 0
θ n
n
Now,
Sn
Y2 = ⇒ dSn = ( n − 1) dY2
n −1
Hence the p.d . f . of Y2 is

n −1 ( n −1) y2
⎡( n − 1) y2 ⎤⎦ −
h ( y2 , θ ) = ⎣ e θ
( n − 1)
θn n
( n −1) y2
( n − 1)n [ y2 ]n −1 −
= e θ ; y2 > 0 , θ > 0
θn n
( n −1) y2 ( n −1) y2
∂h ( y2 , θ ) ( n − 1)n − ( n − 1)n [ y2 ]n−1 − ⎪⎧ ( n − 1) ⎪⎫
⇒ = ( n − 1) y2n − 2 e θ − e θ
⎨ ⎬=0
∂y2 θ n θ n ⎩⎪ θ ⎭⎪
n n
1
⇒ y2−1 − =0
θ
⇒ y2 = θ
Again the second derivative of h ( y2 ,θ ) at y2 = θ is negative. Hence Y2 is a model unbiased estimate of θ .
Thus a modal unbiased estimate is not always unique.
Theorem: Let X be a random variable with density function f ( x ; θ ) . Y = g ( x ) And let φ ( y ) be the
density function of Y such that
∂φ ( y )
(i ) =0 at y=0
∂y
∂ 2φ 2 ( y )
( ii ) <0 at y<0
∂y 2
( iii ) y is 1 − 1 transformation from x to y and from y to x
Then the solution of the following differential equation is a modal unbiased estimate of θ .
∂f ( x, θ ) ⎛ ∂x ⎞
2
∂2 x
(a) f ( x, θ ) + ⎜ ⎟ =0 at y = θ
∂y 2 ∂x ⎝ ∂y ⎠
∂ 2 f ( x, θ ) ⎛ ∂x ⎞ ∂f ( x,θ ) ∂x ∂ 2 x
3
∂3 x
(b) ⎜ ⎟ + 3 ⋅ ⋅ + f ( x , θ ) <0 at y = θ
∂x 2 ⎝ ∂y ⎠ ∂x ∂y ∂y 2 ∂y 3
Proof:
∂ ∂
φ ( y) = F ( y) = Fx ⎡ g −1 ( y ) ⎤⎦ ⎡ Since y = g ( x ) ⇒ x = g −1 ( y ) ⎤⎦
∂y ∂y ⎣ ⎣
∂ ∂x
= Fx ( x )
∂y ∂y
∂x
∴ φ ( y) = f ( x ; θ )
∂y
∂φ ( y ) ∂f ( x ; θ ) ⎛ ∂x ⎞
2
∂2 x
⇒ = ⎜ ⎟ + f ( x, θ ) 2 = 0 at y = θ by our hypothesis
∂y ∂x ⎝ ∂y ⎠ ∂y
∂ 2φ ( y ) ∂ 2 f ( x,θ ) ⎛ ∂x ⎞ ∂f ( x,θ ) ∂x ∂ 2 x
3
∂3 x
⇒ = ⎜ ⎟ + 3 ⋅ ⋅ + f ( x , θ ) <0
∂y 2 ∂x 2 ⎝ ∂y ⎠ ∂x ∂y ∂y 2 ∂y 3
at y = θ according to our hypothesis
Hence the solution of the differential equation is a modal unbiased estimate of θ .
Example: Suppose X ~ N (θ , σ 2 ) . Show that X is a modal unbiased estimate of θ .
Solution: We have,
2
1 ⎛ x −θ ⎞
1 − ⎜ ⎟
f ( x) = e 2⎝ σ ⎠ ; −∞ < x < ∞
σ 2π
2
1 ⎛ x −θ ⎞
∂f ( x, θ ) 1 − ⎜ ⎟ 2( x −θ )
∴ = e 2⎝ σ ⎠
∂x σ 2π 2σ 2
Now,
2 2
1 ⎛ x −θ ⎞ 1 ⎛ x −θ ⎞
2( x −θ )
2
1 − ⎜ ⎟ ∂2 x 1 − ⎜ ⎟ ⎛ ∂x ⎞
e 2⎝ σ ⎠ − e 2⎝ σ ⎠ ⎜ ⎟ =0
σ 2π ∂y 2 2σ 2 σ 2π ⎝ ∂y ⎠
2
1 ⎛ x −θ ⎞
1 − ⎜ ⎟ ⎡ ∂ 2 x ( x − θ ) ⎛ ∂x ⎞2 ⎤
2⎝ σ ⎠
⇒ e ⎢ 2− ⎜ ⎟ ⎥=0
σ 2π ⎢⎣ ∂y σ 2 ⎝ ∂y ⎠ ⎥⎦
∂2 x ( x − θ ) ⎛ ∂x ⎞2 We have, Y = g ( x)
∴ − ⎜ ⎟ =0 at y = θ
∂y 2 σ2 ⎝ ∂y ⎠ ⇒ Y=x
( x −θ ) ∂Y
⇒ 0− (1) 2
=0 ⇒ =1
σ2 ∂x
⇒ x =θ at y = θ ∂ 2Y
⇒ =0
∂x 2
So, x = y is the solution of this equation?

Hence X is a modal unbiased estimate of θ .
Jackknife Estimator and Correction for Bias
Jackknife Estimator: The jackknife estimator was introduced by Quenouille in 1949 and named by Tukey
in 1958. The jackknife technique’s purpose is to decrease the bias of an estimator and provide an approximate
confidence interval for the parameter of interest.
If a parameter has a UMVUE associated with it, then clearly there is no chance of improving such as
estimator’s bias. However, MLE’s are often biased and hence improvement may be possible in the sense of an
estimator with lower bias. Jackknifing is an important technique for accomplishing such bias reduction.
Let X1 , X 2 , " , X n be a random sample of size n from a population with real valued parameter θ . Let θˆ be
n
an estimator of θ . Divide the random sample into N groups of equal size m = observations each ( N is
N
one of the factors of n ). Delete one group at a time and estimate θ based on the remaining ( N − 1) m
observations, using the same estimation procedure previously used with a sample of size n . Denote the
estimator of θ obtained with the i th group deleting by θî ( i = 1, 2, " , N ) called a jackknife statistic.
For i = 1, 2, " , N let us consider a new statistic
J i = Nθˆ − ( N − 1)θî
and consider
()
N
1
J θˆ =
N
∑ ⎡⎣ Nθˆ − ( N − 1)θî ⎤⎦
i =1
N
1
= Nθˆ − ( N − 1)θî Where, θî =
N
∑θî
i =1
()
J θˆ is called Jackknife estimator of θ .
()
Generally, we take m = 1 , then the commonly used Jackknife estimate is J θˆ = nθˆ + ( n − 1)θî , since
n
m =1= ⇒n=N.
N
Note:
()
J θˆ = Nθˆ + θˆ − θˆ − Nθî + θî
= θˆ + θˆ ( N − 1) − θî ( N − 1)
(
= θˆ + ( N − 1) θˆ − θî )
()
Which shows that the estimator J θˆ is an adjustment of θˆ with the amount of adjustment depending on the
difference between θˆ and θî .
Correction for Bias: If we have a biased estimator then we have to add or make simple adjustment to have
unbiased estimator. But sometimes the expected value is a rather complicated function of the parameter then it
is very difficult to add a simple factor to make a biased estimator into an unbiased one.
Let tn denote the biased estimator of θ based on n observation.
Jackknife Estimator and Correction for Bias ~ 1 of 13

∞
⎡ where, ar is function of θ ⎤
()
E ( tn ) = E θˆ = θ + ∑ nrr
a
" " " ( A) ⎢but cons tan t w. r. to n
⎣
⎥
⎦
r =1
∞
∑ nrr
a
⇒ E ( tn ) − θ =
r =1
Let tn −1 be the estimated value of θ based on ( n − 1) observation. Again tn −1, i ( i = 1, " , n ) denote the
estimated value of θ based on ( n − 1) observation where i th observation is omitted. Then we will have
tn −1, 1 , tn −1, 2 , " , tn −1, n . tn −1 is the average of these n estimated values with ( n − 1) observations each.
Let us define another new statistic as
tn′ = ntn − ( n − 1) tn −1
⎣⎢
()
⎡ J θˆ = nθˆ − ( n − 1)θˆ ; here, t = θˆ and t = θˆ ⎤
i n n −1 i⎥
⎦
= tn + ( n − 1)( tn − tn −1 )
∞ ⎛ ∞ ⎧⎪ 1 n ⎫⎪ ⎞
∑ ∑ ∑
ar ar
⇒ E ( tn′ ) = θ + + ( n − 1 ) ⎜ θ + − E ⎨ tn −1, i ⎬ ⎟
r ⎜ r
⎪⎩ n i =1 ⎪⎭ ⎟⎠
r =1 n ⎝ r =1 n
∞ ⎛ ∞ ∞
ar ⎞
∑ ∑ ∑
ar ar
=θ + + ( n − 1) ⎜θ +
⎜
− θ − r ⎟
⎟
r =1 n
r
⎝ r =1 n r
r =1 ( n − 1 ) ⎠
∞ ∞ ∞
∑ nr ∑ nr ∑
ar ar ar
=θ + + ( n − 1) − ( n − 1)
r =1 r =1 r =1 ( n − 1)r
∞ ∞ ∞ ∞
∑ nr ∑ nr ∑ nr ∑
ar ar ar ar
=θ + +n − − r −1
r =1 r =1 r =1 r =1 ( n − 1)
∞ ∞
∑ nrr−1 − a1 − ∑
a ar
= θ + a1 +
r =2 r =2 ( n − 1)r −1
∞ ∞
∑ nrr−1 − ∑
a ar
=θ +
r =2 r =2 ( n − 1)r −1
⎛1 1 ⎞ ⎛ 1 1 ⎞ ⎛ 1 1 ⎞
= θ + a2 ⎜ − ⎟ + a3 ⎜⎜ 2 − ⎟ + a4 ⎜ −
⎟
⎟ +"
⎜ n {n − 1}3 ⎟
⎝ n n −1 ⎠ ⎝ n { 1}2
n − ⎠ ⎝
3
⎠
a2 ⎛ 1 ⎞
⇒ E ( tn′ ) = θ − − Ο⎜ 3 ⎟
n2 ⎝n ⎠
1 1
That is, tn′ is only biased of order but tn has the bias of order i.e. tn′ reduces the bias. Similarly we can
n2 n
take another statistic
n 2 tn′ − ( n − 1) tn −1
2
tn′′ =
n 2 − ( n − 1)
2
a2 ⎛ 1 ⎞
⇒ E ( tn′′ ) = θ − − Ο⎜ 4 ⎟
n 3
⎝n ⎠
1
That is, bias of order . So, every step amount of bias is very small. So in this method we can remove bias
n3
completely or to any required degree.
N.B.: Explain Jackknife Method and discuss how it reduce the bias.
Example: Let, x1 , x2 , ..........., xn be a random sample of size n with the probability density function
2
1 ⎛ x−µ ⎞
− ⎜
( ) 1 ⎟
f x ; µ, σ 2 = e 2⎝ σ ⎠ ; −∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0
σ 2π
Find the jackknife estimator of µ .
2
1 ⎛ x−µ ⎞
− ⎜
( )=σ 1 ⎟
Solution: Here, we have that f x ; µ, σ 2
e 2⎝ σ ⎠ ; −∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0
2π
Thus the likelihood function is:

2
n
⎛ xi − µ ⎞
⎛ 1 ⎞ −2∑
n 1
⎜
i =1 ⎝ σ ⎟⎠
⇒ L=⎜ ⎟ e
⎝ σ 2π ⎠
n
∑ ( xi − µ )
n n 1
log L = − log ( 2π ) − log σ 2 −
2
⇒
2 2 2σ 2 i =1
∂ log L n
∑ 2 ( xi − µ )( −1) = 0
1
⇒ =− 2
∂µ 2σ i =1
n
⇒ ∑ ( xi − µ ) = 0
i =1
1 n
⇒ µˆ = ∑ xi
n i =1
So, the maximum likelihood estimate of the parameter µ is:

n
∑ xi
1
µˆ = x = " " " (+)
n i =1
n
∑ E ( xi ) = µ
1
⇒ E ( µˆ ) = " " " (+ +)
n i =1
Thus, the jackknife estimator is not to be found here.

But, we can find the jackknife estimator by taking
n
∑
1
θî = xj
n − 1 j ≠ i =1
1 ⎛ n ⎞
∑ x j − xi ⎟⎟
1
⇒ θî = ⎜ = ( nx − xi )
n − 1 ⎜⎝ j =1 ⎠ n −1
n n
⎡ 1 ⎤
∑θî ∑ ⎢⎣ n − 1 ( nx − xi )⎥⎦
1 1
So, θî = =
n i =1 n i =1
=
1
n ( n − 1)
(
n 2 x − nx )
⇒ θˆ = x " " " (+ + +)
So, the jackknife estimator is given by
() J θˆ = nθˆ − ( n − 1)θî
⇒ J (θˆ ) = nx − ( n − 1) x ⎣⎡ from ( + ) and ( + + + ) ⎦⎤

⇒ J (θˆ ) = x " " " ( A)
∴ E ⎡ J (θˆ ) ⎤ = E ( x ) = µ
⎣ ⎦
So, we can say that J (θ ) = x is an unbiased and uniformly minimum variance unbiased estimator
ˆ of µ .
Example: Let x1 , x2 , ..........., xn be a random sample of size n with the probability density function
2
1 ⎛ x−µ ⎞
− ⎜
( ) 1 ⎟
f x ; µ, σ 2 = e 2⎝ σ ⎠ ; −∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0
σ 2π
Find the jackknife estimator of σ 2 , where µ and σ 2 both are unknown.

2
1 ⎛ x−µ ⎞
− ⎜
( )=σ 1 ⎟
Solution: Here, we have that f x ; µ, σ 2
e 2⎝ σ ⎠ ; −∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0
2π

2
n
⎛ xi − µ ⎞
⎛ 1 ⎞ −2∑
n 1
⎜
i =1 ⎝ σ ⎟⎠
⇒ L=⎜ ⎟ e
⎝ σ 2π ⎠
n
∑ ( xi − µ )
n n 1
log L = − log ( 2π ) − log σ 2 −
2
⇒
2 2 2σ 2 i =1
∂ log L n
∑ 2 ( xi − µ )( −1) = 0
1
⇒ =− 2
∂µ 2σ i =1
n
⇒ ∑ ( xi − µ ) = 0
i =1
n
∑ xi
1
⇒ µˆ = " " " (1)
n i =1
Again, we have that

∂ log L n
∑ ( xi − µ )
n 1 2
=− + =0
∂σ 2
2σ 2
2σ 4
i =1
n
∑ ( xi − µ )
1 2 n
⇒ =
2σ 4 i =1 2σ 2
n
∑ ( xi − µ )
1 2
⇒ =n
σ2 i =1
n
∑ ( xi − µ )
1
⇒ S 2 = σˆ 2 =
2
" " " ( 2)
n i =1
The likelihood equations for simultaneous estimation of µ and σ 2 are

∂ log L ∂ log L
=0 and =0
∂µ ∂σ 2
From equation (1) and (2 ) we have,
1 n
µˆ = x σˆ 2 = ∑
n i =1
( xi − µˆ )2
1 n
∑ ( xi − x ) (+)
2
⇒ θˆ = S 2 = σˆ 2 = " " "
n i =1
1 n
∑ ⎡⎣( xi − µ ) − ( x − µ ) ⎤⎦
2
=
n i =1
1⎡ n 2⎤
⎢ ∑ ( xi − µ ) − n ( x − µ ) ⎥
2
=
n ⎢⎣ i =1 ⎥⎦
1 n
= ∑
n i =1
( xi − µ )2 − ( x − µ )2

() ( ) 1 n
⇒ E θˆ = E σˆ 2 = ∑ E ( xi − µ ) − E ( x − µ )
n i =1
2 2
1 n
∑ E ( xi − E ( xi ) ) − E ( x − E ( x ) )
2 2
=
n i =1
⇒ () ( ) 1 n
E θˆ = E σˆ 2 = ∑ Var ( xi ) − Var ( x )
n i =1
() ( ) σ 2
⇒ E θˆ = E σˆ 2 = σ 2 − " " " (+ +)
n
n
So, we can say that θˆ = σˆ 2 = 1 ∑ ( xi − x )2 is a biased estimator of σ 2 . But, we can remove the bias by the
n i =1
application of a simple adjustment such as follows:

σ
() ( )
2
⇒ E θˆ = E σˆ 2 = σ 2 −
n
⎛ n ˆ⎞ ⎛ n 2⎞
⇒ E⎜ θ ⎟ = E⎜ σˆ ⎟ = σ 2
⎝ n − 1 ⎠ ⎝ n − 1 ⎠
n ˆ n
So, we can say that θ= σˆ 2 is an unbiased estimator of θ = σ 2 . Thus, the jackknife estimator is not to
n −1 n −1
be found here.
2
⎛ n
n ⎞
∑ ⎜ x 2j
xj ∑ ⎟
j ≠ i =1 ⎜ j ≠ i =1 ⎟
θi = σˆ i =
ˆ 2
−
n −1 ⎜ n −1 ⎟
⎜ ⎟
⎜ ⎟
⎝ ⎠
2
n ⎛ n ⎞
∑ xi2 − xi2 ⎜ ∑ xi − xi ⎟
⇒ θî = σˆ i2 = i =1
−⎜ i =1 ⎟
n −1 ⎜ n −1 ⎟
⎜ ⎟
⎜ ⎟
⎝ ⎠
n
∑ xi2 − xi2 ⎛ nx − xi ⎞
2
i =1
= −⎜ ⎟
n −1 ⎝ n −1 ⎠
⎡ n 2 ⎤
⎢ ∑ xi − xi
2
2⎥
⎛ nx − xi ⎞ ⎥
n n
θî = ∑ θî = ∑ ⎢ i =1
1 1
So, −⎜ ⎟
n i =1 n i =1 ⎢ n − 1 ⎝ n −1 ⎠ ⎥
⎢ ⎥
⎢⎣ ⎥⎦
⎡ n 2 n 2⎤ n
⎢ n∑ xi − ∑ xi ⎥ − 2 ∑
1 1
= ( nx − xi )2
n ( n − 1) ⎣ i =1 i =1 ⎦ n ( n − 1) i =1
⎡ n 2 ⎤
∑ ( n2 x 2 − 2nxi x − xi2 )
n
⎢ ∑ xi ( n − 1) ⎥ −
1 1
=
n ( n − 1) ⎣ i =1 ⎦ n ( n − 1)
2
i =1
n
∑ xi2 ⎡ 3 2 2⎤
n n
⎢ n x − 2nx ∑ xi + ∑ xi ⎥
i =1 1
= −
n ( n − 1)
2
n ⎣ i =1 i =1 ⎦

n
∑ xi2 ⎡ 2 2 n ⎤
∑ xi2 ⎥
1
⎢n x ( n − 2) +
i =1
= −
n ( n − 1)
2
n ⎣ i =1 ⎦
n n
∑ xi2 n ( n − 2) x 2
∑ xi2
i =1 i =1
= − −
n ( n − 1) 2
n ( n − 1)
2
n
∑ xi2 n 2 − 2n + 1 − 1 n ( n − 2 ) x
2
i =1
= × −
n ( n − 1)2 ( n − 1)2
⎛ n ⎛ n ⎞
2 ⎞
⎜
n ( n − 2 ) ⎜ i =1
∑
xi2 ⎜ xi ∑ ⎟ ⎟
⎟
= − ⎜ i =1 ⎟
⎜ ⎟
( n − 1)2 ⎜ n ⎜⎜ n ⎟
⎟ ⎟
⎜⎜ ⎜ ⎟ ⎟⎟
⎝ ⎝ ⎠ ⎠
n ( n − 2) ˆ n ( n − 2) 2
⇒ θî = θ= σˆ " " " (+ + + +)
( n − 1)2 ( n − 1)2
()
J θˆ = nθˆ − ( n − 1)θî
n ( n − 2)
⇒ J (θˆ ) = nσˆ 2
− ( n − 1) σˆ 2 ⎡⎣ from ( + ) and ( + + + + ) ⎤⎦
( n − 1)2
⎡ ( n − 2) ⎤
⇒ J (θˆ ) = nσˆ 2 ⎢1 − ⎥
⎣⎢ ( n − 1) ⎦⎥
⇒ ()
J θˆ =
n
n −1
σˆ 2 " " " ( A)
⇒
⎣ ()
E ⎡ J θˆ ⎤ =
n
⎦ n −1
E σˆ 2 ( )
⎛1 n ⎞
∑ ( xi − x )
n 2
== E⎜ ⎟⎟
n − 1 ⎜⎝ n i =1 ⎠
n ⎛ 2 σ2 ⎞
= ⎜σ − ⎟
n − 1 ⎜⎝ n ⎟⎠
n ⎛ n −1 2 ⎞
= ⎜ σ ⎟ =σ2
n −1 ⎝ n ⎠
()
n
∑ ( xi − x )
n ˆ n 1
So, we can say that J θˆ = θ= σˆ 2 =
2
is an unbiased and uniformly minimum variance
n −1 n −1 n − 1 i =1
unbiased estimator θ = σ 2 .
1− x
f ( x ; p ) = p x (1 − p ) ; x=0 ,1
Find the jackknife estimator of Var ( x ) = pq .

1− x
Solution: Here, we have that f ( x ; p ) = p x (1 − p ) ; x=0 ,1

n
∑ xi n
L ( x; p ) = p i =1
(1 − p )n −∑ x
i =1
i
n ⎛ n ⎞
⇒ ln L ( x; p ) = ∑ xi ln p + ⎜⎜ n − ∑ xi ⎟⎟ ln (1 − p )
i =1 ⎝ i =1 ⎠
⎛ n ⎞ n
∂ ln L ( x; p ) i =1
xi ⎜⎜ n − xi ⎟⎟ ∑ ∑
+⎝ ⎠ =0
i =1
⇒ =
∂p p 1− p
n n n
∑ xi − p∑ xi − np + p∑ xi
i =1 i =1 i =1
⇒ =0
p (1 − p )
n
∑ xi ⎡ n ⎤
∑ xi = y ~ B ( n, p )⎥⎥
i =1 y
⇒ pˆ = = ⎢ Let ,
n n ⎣⎢ i =1 ⎦
So, the maximum likelihood estimator of p is:
n
∑ xi y
i =1
pˆ = = .
n n
And, we know that if θˆ is the maximum likelihood estimator of θ and g (θ ) is a one-to-one function of θ ,
⎡⎣That is, g (θ1 ) = g (θ 2 ) ()

⇔ θ1 = θ 2 ⎤⎦ , then g θˆ is the maximum likelihood estimator of g (θ ) .
So, from the above, we can say that the maximum likelihood estimator of θ = pq is given by:
y⎛ y⎞
θˆ = pq
ˆˆ= ⎜1 − ⎟ " " " (+)
n⎝ n⎠
2
⇒ ()
E θˆ = E ( pq
⎛ y⎞
ˆ ˆ) = E ⎜ ⎟ − E ⎜ ⎟
⎝n⎠
⎛ y⎞
⎝n⎠
= E ( y ) − 2 ⎡Var ( y ) + ⎡⎣ E ( y ) ⎤⎦ ⎤
1 1 2
n n ⎢⎣ ⎥⎦
1
n
1
= np − 2 npq + n 2 p 2
n
( )
( S in ce, y ~ B ( n, p ) )
pq
= p− − p2
n
⇒ ()
E θˆ = E ( pq
ˆ ˆ ) = pq −
pq
n
" " " (+ +)
ˆ ˆ is a biased estimator of θ = pq . But, we can remove the bias by the application of

So, we can say that θˆ = pq
a simple adjustment such as follows:
⎛ n −1⎞
()
E θˆ = E ( pq
ˆ ˆ) = ⎜
⎝ n ⎠
⎟ pq
⎛ n ˆ⎞ ⎛ n ⎞
⇒ E⎜ θ ⎟ = E⎜ ˆ ˆ ⎟ = pq
pq " " " (+ + +)
⎝ n −1 ⎠ ⎝ n −1 ⎠
n ˆ n
So, we can say that θ= ˆ ˆ is an unbiased estimator of θ = pq . Thus, the jackknife estimator is not to
pq
n −1 n −1
be found here.

⎧ y −1 ⎛ y −1 ⎞
⎪ n − 1 ⎜1 − n − 1 ⎟
⎪ ⎝ ⎠
; if yi = 1 (That is, if the i th
trial is a success )
θî = ⎨
⎪ y ⎛
1−
y ⎞
⎪⎩ n − 1 ⎜⎝ n − 1 ⎟⎠
; if yi = 1 (That is, if the i th
trial is a failure )
Now, since there are x success and n − y failures to be removed, then we have that
n
1 ⎡ y −1 ⎛ y −1 ⎞ y ⎛ y ⎞⎤
∑θî = n ⎢⎣ y n − 1 ⎜⎝1 − n − 1 ⎟⎠ + ( n − y ) n − 1 ⎜⎝1 − n − 1 ⎟⎠⎥⎦
1
θî =
n i =1
1 ⎡ y ( y − 1)( n − y ) y ( n − y )( n − y − 1) ⎤
= ⎢ + ⎥
n⎢
⎣ ( n − 1)2 ( n − 1)2 ⎥⎦
y ( n − y )( n − 2 )
=
n ( n − 1)
2
y ( n − y ) n ( n − 2)
=
n n ( n − 1)2
y⎛ y ⎞ n ( n − 2)
= ⎜1 − ⎟
n ⎝ n ⎠ ( n − 1)2
n ( n − 2)
⇒ θî = ˆˆ
pq " " " (+ + + +)
( n − 1)2
()
n ( n − 2)
ˆ ˆ − ( n − 1)
= npq ˆˆ
pq ⎣⎡ from ( + ) and ( + + + + ) ⎦⎤
( n − 1)2
⎡ ( n − 2) ⎤
= npq
ˆ ˆ ⎢1 − ⎥
⎢⎣ ( n − 1) ⎥⎦
n
= ˆˆ
pq " " " ( A)
n −1
⇒
⎣ () n
⎦ n −1 ( )
E ⎡ J θˆ ⎤ = ˆˆ
E pq
n ⎛ pq ⎞
= ⎜ pq − ⎟ ⎡⎣ from ( + + ) ⎤⎦
n −1 ⎝ n ⎠
n n −1
= pq
n −1 n
⇒ ⎡ ⎤
E J θ = pq
⎣
ˆ
⎦ () ⎡⎣ from ( A ) ⎤⎦
So, we can say that J θˆ = () n

n −1
ˆ ˆ is an unbiased and uniformly minimum variance unbiased estimator
pq
θ = pq .
⎛n⎞ 1− x
f ( x ; n, p ) = ⎜ ⎟ p x (1 − p ) ; x = 0, 1, " , n
⎝ x⎠
Find the jackknife estimator of p 2 .
⎛n⎞ 1− x
Solution: Here, we have that f ( x ; n, p ) = ⎜ ⎟ p x (1 − p ) ; x = 0, 1, " , n
⎝ x⎠

So, for the binomial distribution, we know that
E ( x ) = np and Var ( x ) = npq
So, from the above we have that

⎛x⎞
E⎜ ⎟ = p
⎝n⎠
and ( )
Var ( x ) = E x 2 + ⎡⎣ E ( x ) ⎤⎦ = npq
2
⇒ ( )
E x 2 = npq + n 2 p 2 " " " (**)
⎛ x 2 ⎞ pq
⇒ E ⎜⎜ 2 ⎟⎟ = + p2
⎝ ⎠
n n
⎡⎛ x ⎞ 2⎤
p (1 − p ) ⎡ ⎛ x⎞ ⎤
2
⇒ ()
E θˆ = E ⎢⎜ ⎟
⎢⎣⎝ n ⎠
⎥= p +
⎥⎦
2
n
⎢ where, θˆ = ⎜ ⎟ ⎥
⎢⎣ ⎝ n ⎠ ⎥⎦
" " " (+)
2
So, we can say that θˆ = ⎛⎜ ⎞⎟ is a biased estimator of θ = p 2 . And, we cannot remove the bias by the
x
⎝n⎠
application of a simple adjustment. So, the jackknife estimator is needed to be found here.
And, we can find the jackknife estimator by taking
⎧⎛ x − 1 ⎞ 2
⎪⎜
⎪⎝ n − 1
⎟
⎠
; if xi = 1 (
That is, if the i th trial is a success )
θî = ⎨
2
⎪⎛ x ⎞
⎪⎜ n − 1 ⎟
⎩⎝ ⎠
; if xi = 1 (
That is, if the i th trial is a failure )
Now, since there are x success and n − x failures to be removed, then we have that
1 n ˆ 1 ⎡ ⎛ x −1⎞ ⎛ x ⎞ ⎤
2 2
θî = ∑
n i =1
θi = ⎢ x ⎜ ⎟ + (n − x)⎜
n ⎢⎣ ⎝ n − 1 ⎠
⎟ ⎥
⎝ n − 1 ⎠ ⎥⎦
1 ⎡ x ( x − 1)
2⎤
= ⎢
2
+
(n − x) x ⎥ =
1 ⎡ x3 − 2 x 2 + x + nx 2 − x3 ⎤
n ⎢ ( n − 1)2 ( n − 1)2 ⎥⎦ n ( n − 1) ⎣ ⎦
2
⎣
1
⇒ θî = ⎡ nx 2 − 2 x 2 + x ⎤ " " " (+ +)
n ( n − 1) ⎣ ⎦
2

()
2
⎛x⎞ 1
= n ⎜ ⎟ − ( n − 1) ⎡ nx 2 − 2 x 2 + x ⎤ ⎡⎣ from ( + ) and ( + + ) ⎤⎦
⎝ ⎠ n ( n − 1) ⎣ ⎦
n 2
( n − 1) x 2 − ⎡⎣ nx 2 − 2 x 2 + x ⎤⎦ x ( x − 1)
= = " " " ( A)
n ( n − 1) n ( n − 1)
E ( x ) − E ( x)
2
⇒ E ⎡ J (θˆ ) ⎤ =
⎣ ⎦ n ( n − 1)
npq + n 2 p 2 − np
= ⎡⎣ from (**) ⎤⎦
n ( n − 1)
np − np 2 + n 2 p 2 − np np 2 ( n − 1)
= =
n ( n − 1) n ( n − 1)
⎛ x −x ⎞
()
2
∴ E ⎡ J θˆ ⎤ = E ⎜ ⎟ = p2 ⎡⎣ from ( A)⎤⎦
⎣ ⎦ ⎜ n ( n − 1) ⎟
⎝ ⎠
x ( x − 1)
So, we can say that J θˆ = () is an unbiased and uniformly minimum variance unbiased estimator
n ( n − 1)
θ = pq .
f ( x ;θ ) = e ( )
− x −θ
; x >θ
Find the jackknife estimator of θ .
f ( x ;θ ) = e ( )
− x −θ
Solution: Here, we have that ; x >θ

⎡ n ⎤
∑
L ( x ;θ ) = exp ⎢ − ( xi − θ ) ⎥ " " " (1)
⎣⎢ i =1 ⎦⎥
Here, we have to choose θ so that L is maximum in equation (1) . Now, L is maximum if ( x − θ ) is
minimum. That is, L is maximum if θ is maximum.

Let, x(1) , x( 2 ) , " , x( n ) be the ordered sample of n independent observations from the given population so that
θ ≤ x(1) ≤ x( 2 ) ≤ ......... ≤ x( n ) ≤ ∞
Since, the maximum value of θ consistent with the sample is x(1) , the smallest observation, then we have that
θˆ = x(1) = The smallest observation
So, the maximum likelihood estimator of θ is
θˆ = x(1) = The smallest observation " " " (+)

Now, we know that the density function of the r th observation is given by:
( )
f x( r ) =
n!
( r − 1)!( n − r )! ⎣
r −1
⎡ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦
n−r
f ( x) " " " ( 2)
So, from equation ( 2 ) , we have that

( )
f x(1) =
n! 1−1
⎡ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦
(1 − 1)!( n − 1)! ⎣
n −1
f ( x)
n −1
= n ⎡⎣1 − F ( x ) ⎤⎦ f ( x) " " " ( 3)
Now, we have that
f ( x; θ ) = e ( )
− x −θ
; x >θ
x
F ( x ) = ∫ e ( ) dx
− x −θ
⇒
θ
x
= − ⎡e ( ) ⎤
− x −θ
⎣ ⎦0
= − ⎡ e ( ) − 1⎤
− x −θ
⎣ ⎦
F ( x) = 1− e ( )
− x −θ
⇒ " " " ( 4)
So, from equation ( 3) and ( 4 ) , we have that
( )
f x(1) = n ⎡⎣1 − F ( x ) ⎤⎦
n −1
f ( x)
n −1
= n ⎡1 − 1 + e ( ) ⎤
− x −θ
f ( x)
⎣ ⎦
n −1
= n ⎡e ( ) ⎤ e ( ) = ne ( )
− x −θ − x −θ − n x −θ
⎣ ⎦
∞
∴ ( )
E x(1) = n ∫ xe ( ) dx
− n x −θ
" " " (***)
θ
⎡ e − nx ∞ ∞ e− nx ⎤
⎢ − n θ θ∫ −n
nθ
= ne ⎢ x − dx ⎥
⎥
⎣ ⎦
⎡ θ e − nθ 1 e− nx ⎤ ∞
= ne nθ ⎢ 0 − + ⎥
⎢ −n n −n θ ⎥
⎣ ⎦
⎡ − nθ ⎤ ⎡ θ e − nθ e− nθ ⎤
θe
= ne nθ ⎢
1
− 2 0 − e− nθ ⎥ = nenθ ⎢ ( + 2 ⎥ )
⎣ n n ⎦ ⎣ n n ⎦
∴ ()
E θˆ = E x(1) = θ +
1
n
( ) " " " (+ +)
So, we can say that θˆ = x(1) is a biased estimator of θ . And, we cannot remove the bias by the application of a
simple adjustment. So, the jackknife estimator is needed to be found here.

And, we can find the jackknife estimator by taking
⎧⎪ x(1) ; if xi ≠ x(1)
θî = ⎨
x ; if xi = x(1)
⎩⎪ ( 2 )
Now, since there are x success and n − x failures to be removed, then we have that
1 n ˆ 1⎡
θî = ∑θi = n ⎣( n − 1) x(1) + x( 2) ⎤⎦
n i =1
" " " (+ + +)

()
1
= nx(1) − ( n − 1) ⎡( n − 1) x(1) + x( 2 ) ⎤ ⎣⎡ from ( + ) and ( + + + ) ⎦⎤
n⎣ ⎦
⎡ ( n − 1)2 ⎤ ( n − 1)
= ⎢n − ⎥ x(1) − x( 2 )
⎢⎣ n ⎥
⎦
n
⎛ 2n − 1 ⎞ ( n − 1)
=⎜ ⎟ x(1) − x( 2 )
⎝ n ⎠ n
⎛ n + n −1 ⎞ ( n − 1)
=⎜ ⎟ x(1) − x( 2 )
⎝ n ⎠ n
n ⎛ n −1 ⎞ ( n − 1)
= x(1) + ⎜ ⎟ x(1) − x( 2 )
n ⎝ n ⎠ n
⎛ n −1 ⎞
= x(1) + ⎜
⎝ n ⎠
(
⎟ x(1) − x( 2 ) ) " " " ( A)
∴
⎣ () ⎦ ( )
E ⎡ J θˆ ⎤ = E x(1) + ⎜
⎛ n −1 ⎞ ⎡
⎝ n ⎠⎣
( ) ( )
⎤
⎟ E x(1) − E x( 2 ) ⎦ " " " ( B)
Now, to find the above expected value, first of all we have to find the expected value of the second order
statistics as follows:
From equation ( 2 ) , we have that
( )
f x( r ) =
n! r −1
⎡ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦
( r − 1)!( n − r )! ⎣
n−r
f ( x)
∴ ( )
f x( 2 ) =
( )(
2 − 1
n!
! n − 2 ) !
2 −1
⎣⎡ F ( x ) ⎦⎤ ⎣⎡1 − F ( x ) ⎦⎤
n−2
f ( x)
n−2
= n ( n − 1) ⎡⎣ F ( x ) ⎤⎦ ⎡⎣1 − F ( x ) ⎤⎦ f ( x) " " " (5)
Now, from equation ( 5 ) and ( 4 ) , we have that
( )
n−2
f x( 2 ) = n ( n − 1) ⎡1 − e ( ) ⎤ ⎡1 − 1 + e ( ) ⎤ e ( )
− x −θ − x −θ − x −θ
⎣ ⎦⎣ ⎦
n −1
= n ( n − 1) ⎡1 − e ( ) ⎤ ⎡ e ( ) ⎤
− x −θ − x −θ
⎣ ⎦⎣ ⎦
n −1 n
= n ( n − 1) ⎡e ( ) ⎤ − n ( n − 1) ⎡ e ( ) ⎤
− x −θ − x −θ
⎣ ⎦ ⎣ ⎦
( )( ) ( )⎤
= n ( n − 1) ⎡e ⎤ − ( n − 1) n ⎡ e
− n −1 x −θ − n x −θ
⎣ ⎦ ⎣ ⎦
∞ ∞
⇒ ( )
E x( 2 ) = n ( n − 1) x ⎡e ( )( ) ⎤ dx − ( n − 1) n x ⎡e ( ) ⎤
∫
⎣
− n −1 x −θ
⎦ ⎣
− n x −θ
⎦ ∫
θ θ
∞
= n ( n − 1) x ⎡e ( )( ) ⎤ − ( n − 1) E x(1)
⎣
− n −1 x −θ
∫ ⎦ ( ) ⎡⎣ from (***) ⎤⎦
θ
∞
⎛ 1⎞
= n ( n − 1) e(
n −1)θ −( n −1) x ⎤
∫ x ⎣⎡e ⎦
− ( n − 1) ⎜ θ + ⎟
⎝ n⎠
θ
⎡ ∞ ⎤
−( n −1) x
e ( )
∞ − n −1 x
⎛ 1⎞
= n ( n − 1) e( ) ⎢ x
e
dx ⎥ − ( n − 1) ⎜ θ + ⎟
n −1 θ
−∫
⎢ − ( n − 1) − ( n − 1) ⎥ ⎝ n⎠
⎢⎣ θ θ ⎥⎦

⎡θ e−( n −1)θ e−( n −1)θ ⎤ ⎛ 1⎞
= n ( n − 1) e(
n −1)θ
⎢ + ⎥ − ( n − 1) ⎜ θ + ⎟
⎢⎣ ( n − 1 ) ( n − 1) ⎥⎦
2
⎝ n⎠
⎡ 1 ⎤ ⎛ 1⎞
= n ⎢θ + ⎥ − ( n − 1) ⎜ θ + ⎟
⎢⎣ ( n − 1) ⎥⎦ ⎝ n⎠
n n −1
= nθ + − nθ + θ −
n −1 n
⎛ n n −1 ⎞
=θ +⎜ − ⎟
⎝ n −1 n ⎠
⎡ n 2 − ( n − 1)2 ⎤
=θ + ⎢ ⎥
⎢⎣ n ( n − 1) ⎥⎦
2n − 1
=θ + " " " (C )
n ( n − 1)
Now, from equation ( + + ) , ( B ) and ( C ) , we have that
() ( ) ⎛ n −1 ⎞ ⎡
E ⎡ J θˆ ⎤ = E x(1) + ⎜
⎣ ⎦ ⎝ n ⎠⎣
( ) ( )
⎟ E x(1) − E x( 2 ) ⎦
⎤
⎛ 1 ⎞ ⎛ n − 1 ⎞ ⎡⎛ 1⎞ ⎛ 2n − 1 ⎞ ⎤
= ⎜θ + ⎟ + ⎜ ⎟ ⎢⎜ θ + ⎟ − ⎜⎜ θ + ⎟⎥
⎝ n ⎠ ⎝ n ⎠ ⎣⎢⎝ n⎠ ⎝ n ( n − 1) ⎟⎠ ⎦⎥
1 n − 1 ⎛ n − 1 − 2n + 1 ⎞
=θ + + ⎜ ⎟
n n ⎜⎝ n ( n − 1) ⎟⎠
1 n − 1 ⎛ −n ⎞
=θ + + ⎜ ⎟
n n ⎜⎝ n ( n − 1) ⎟⎠
=θ
∴
⎣ () ⎦
⎡
⎣
⎛ n −1 ⎞
E ⎡ J θˆ ⎤ = E ⎢ x(1) + ⎜
⎝ n ⎠
( ⎤
⎟ x(1) − x( 2 ) ⎥ = θ
⎦
)
()
So, we can say that J θˆ = x(1) + ⎜
⎛ n −1 ⎞
( )
⎟ x(1) − x( 2 ) is an unbiased estimator of θ .
⎝ n ⎠

Pitman Estimator for Location Parameter
Location invariant
An estimator T = t ( X1 , X 2 , " , X n ) is defined to be location invariant if and only if
t ( x1 + c, x2 + c, " , xn + c ) = t ( x1 , x2 , " , xn ) + c for all values x1 , x2 , " , xn and all c .

n
∑ Xi
i =1
Example: Show that X n = is location invariant.
n
Solution:
n
∑ xi
Let, t ( x1 , x2 ,......, xn ) = xn = i =1
n
Then we have that
n
∑ ( xi + c )
t ( x1 + c, x2 + c, " , xn + c ) = i =1
n
n
∑ xi
i =1
= +c
n
⇒ t ( x1 + c, x2 + c, " , xn + c ) = t ( x1 , x2 , " , xn ) + c
So, we can say that X n is location invariant.
Y1 + Yn
Example: Show that is location invariant where Y1 is the smallest order statistics and Yn is the largest order
2
statistics.
Solution:
y1 + yn min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn )

Let, t ( x1 , x2 , " , xn ) = =
2 2
Then we have that,
min ( x1 + c, x2 + c, " , xn + c ) + max ( x1 + c, x2 + c, " , xn + c )

t ( x1 + c, x2 + c, " , xn + c ) =
2
min ( x1 , x2 , " , xn ) + c + max ( x1 , x2 , " , xn ) + c
=
2
min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn )
= +c
2
⇒ t ( x1 + c, x2 + c, " , xn + c ) = t ( x1 , x2 , " , xn ) + c
Y1 + Yn
So, we can say that is location invariant.
2
Pitman Estimator for Location Parameter ~ 1 of 8

1 n
∑( )
2
Example: Show that s 2 = Xi − Xn is not location invariant.
n − 1 i =1
Solution:
1 n
t ( x1 , x2 , " , xn ) = s 2 = ∑ ( xi − xn )
2
Let,
n − 1 i =1
Then we have that
2
n ⎛ n ⎞
∑ ∑
1 ⎜ xi + c − 1
t ( x1 + c, x2 + c, ", xn + c ) = ( xi + c ) ⎟⎟
n −1 ⎜ n
i =1 ⎝ i =1 ⎠
n
∑( x − x )
1 2
= i n
n −1
i =1
⇒ t ( x1 + c, x2 + c, ", xn + c ) = t ( x1, x2 , ", xn )
1 n
(
∑ Xi − Xn )
2
So, we can say that s =
2
is not location invariant.
n − 1 i =1
Example: Show that Yn − Y1 is not location invariant.
Solution:
Let, t ( x1 , x2 , " , xn ) = Yn − Y1 = max ( x1 , x2 , " , xn ) − min ( x1 , x2 , " , xn )
Then we have that
t ( x1 + c, x2 + c, ", xn + c ) = max ( x1 + c, x2 + c, ", xn + c ) − min ( x1 + c, x2 + c, ", xn + c )
= max ( x1, x2 , ", xn ) + c − min ( x1, x2 , ", xn ) − c
= max ( x1, x2 , ", xn ) − min ( x1, x2 , ", xn )
⇒ t ( x1 + c, x2 + c, ", xn + c ) = t ( x1, x2 ,......, xn )
So, we can say that Yn − Y1 is not location invariant.
Location parameter
Let { f (⋅ ; θ ) ; θ ∈ Ω} be a family of densities indexed by a parameter θ . The parameter θ is defined to be a
location parameter if and only if the density f ( x ; θ ) can be written as function of ( x −θ ) . That is
f ( x ; θ ) = h ( x − θ ) for some function h ( ⋅) . Equivalently θ is a location parameter for the density f X ( x ; θ ) of a
random variable X if and only if the distribution of ( X −θ ) does not depend on θ .
We note that if θ is a location parameter for the family of densities { f (⋅ ; θ ) ; θ ∈ Ω} , then the function h (⋅) of the
definition is a density function given by h ( ⋅) = f ( ⋅ ; 0 ) .
Example: If f ( x ; θ ) = φθ , 1 ( x ) , then show that θ is a location parameter.
Solution:
1
1 − ( x −θ )2
Here, we have that f ( x ; θ ) = φθ , 1 ( x ) = e 2 = φ0, 1 ( x − θ ) = h ( x − θ )
2π
Or, we can say that if X is distributed normally with mean θ and variance 1, then ( X −θ ) has a standard normal
distribution. Hence, the distribution of ( X −θ ) is independent of θ .
So, we can say that θ is a location parameter.

Example: If f ( x ; θ ) = I ⎛ 1 1⎞ ( x ) , then show that θ is a location parameter.
⎜θ − 2 , θ + 2 ⎟
⎝ ⎠
Solution:
Here, we have that
f ( x ; θ ) = I⎛ 1 1⎞ ( x)
⎜θ − 2 , θ + 2 ⎟
⎝ ⎠
1
= =1
1 1
θ + −θ −
2 2
= I⎛ 1 1⎞ ( x −θ )
⎜ − 2, 2 ⎟
⎝ ⎠
= h ( x −θ )
Hence, the distribution of ( X −θ ) is independent of θ . So, we can say that θ is a location parameter.
1 1
Example: If f ( x ; θ ) = , then show that θ is a location parameter.
π ⎡1 + ( x − θ )2 ⎤
⎢⎣ ⎥⎦
Solution:
1 1
Here, we have that f (x ; θ ) = = h( x −θ )
π ⎡1 + ( x − θ )2 ⎤
⎢⎣ ⎥⎦
Hence, the distribution of ( X −θ ) is independent of θ . So, we can say that θ is a location parameter.
Example: If f ( x ; θ ) = φθ , 9 ( x ) , then show that θ is a location parameter.
Solution:
Here, we have that
1
1 − ( x −θ )2
f ( x ; θ ) = φθ , 9 ( x ) = e 2×9
3 2π
= φ0, 9 ( x − θ ) = h ( x −θ )
Or, we can say that if X is distributed normally with mean θ and variance 9, then ( X −θ ) has a normal distribution
with mean 0 and variance 9. Hence, the distribution of ( X −θ ) is independent of θ .
So, we can say that θ is a location parameter.
Pitman estimator for location

Let, X 1 , X 2 , " , X n denote a random sample from the density f ( ⋅ ; θ ) , where θ is a location parameter. Then,
the estimator
n
∫θ ∏ f ( X i ;θ )dθ
t ( X1 , X 2 , " , X n ) = i =1
n
∫∏
i =1
f ( X i ;θ )dθ
is the estimator of θ which has uniformly smallest mean-squared error within the class of location-invariant
estimators.
The estimator given in the above equation is defined to be the pitman estimator location.
Example: Let, X 1 , X 2 , " , X n be a random sample from a normal distribution with mean θ and the variance unity,
where θ is a location parameter. Find the pitman estimator of θ .
Solution:
We know that the pitman estimator for θ is given by
⎡ 1 2⎤
n
n
⎛ 1 ⎞ n
∫ θ ∏ f ( X i ;θ )dθ ∫θ ⎜ ⎟ exp ⎢ −
⎝ 2π ⎠
∑( Xi −θ ) ⎥ dθ
t ( X1 , X 2 , " , X n ) = i =1
= ⎣⎢ 2 i =1 ⎦⎥
⎡ 1 2⎤
n n
⎛ 1 ⎞ n
∫ ∏ f ( X i ;θ )dθ ⎜ ∫
⎟ exp ⎢ − ∑( Xi −θ ) ⎥ dθ
i =1 ⎝ 2π ⎠ ⎢⎣ 2 i =1 ⎥⎦
⎡ 1⎛ n n ⎞⎤
∫ ∑ ∑
θ exp ⎢ − ⎜⎜ X i2 − 2θ X i + nθ 2 ⎟⎟ ⎥ dθ
= ⎣⎢ 2 ⎝ i =1 i =1 ⎠ ⎦⎥
⎡ 1⎛ n 2 n ⎞⎤
∫ ∑ ∑
exp ⎢ − ⎜ X i − 2θ X i + nθ 2 ⎟ ⎥ dθ
⎜ ⎟⎥
⎣⎢ 2 ⎝ i =1 i =1 ⎠⎦
∫
⎡ 1
(
θ exp ⎢ − −2nX nθ + nθ 2 ⎥ dθ
⎣ 2
⎤
⎦
)
=
∫
⎡ 1
(
exp ⎢ − −2nX nθ + nθ 2 ⎥ dθ
⎣ 2
⎤
⎦
)
⎡ ⎛ ⎞ ⎤
2
⎢ ⎜ ⎟ ⎥
1 θ−X ⎥
θ exp ⎢ − ⎜
∫ ⎟ dθ
⎢ 2⎜ 1 ⎟ ⎥
⎡ n 2
∫ θ exp ⎢−
⎣ 2
( ⎤
θ − 2 X nθ + X 2 ⎥ dθ
⎦
) ⎢
⎢
⎣
⎜ ⎟ ⎥
⎝ n ⎠ ⎦⎥
= =
⎡ n
( ⎤
) ⎡ ⎞ ⎤
2
∫ exp ⎢ − θ 2 − 2 X nθ + X 2 ⎥ dθ
⎣ 2 ⎦
⎢
⎛
1 ⎜θ − X ⎟ ⎥
⎥
⎢
∫
exp − ⎜
⎢ 2⎜ 1 ⎟ ⎥
⎟ dθ
⎢ ⎜ n ⎟ ⎥
⎢⎣ ⎝ ⎠ ⎥⎦
⎡ ⎛ ⎞ ⎤
2
⎢ ⎜ ⎟ ⎥
1 ⎢ 1 θ−X ⎥
∫
θ
1
exp − ⎜
⎢ 2⎜ 1 ⎟ ⎥
⎟ dθ
2π ⎢ ⎜ ⎟ ⎥
n ⎢⎣ ⎝ n ⎠ ⎥⎦
=
⎡ ⎛ ⎞ ⎤
2
⎢ ⎥
1 ⎜θ − X ⎟ ⎥
exp ⎢ − ⎜
1
∫1 ⎢ 2⎜ 1 ⎟ ⎥
⎟ dθ
2π ⎢ ⎜ n ⎟ ⎥
n ⎣⎢ ⎝ ⎠ ⎦⎥
⇒ t ( X 1 , X 2 , " , X n ) = E (θ ) = X n
So, we can say that X n is a pitman estimator.
⎛ 1 1⎞
Example: Let, X 1 , X 2 , " , X n be a random sample from a uniform distribution over the interval ⎜ θ − , θ + ⎟ , where
⎝ 2 2⎠
θ is a location parameter. Find the pitman estimator for θ .
Solution:
n
∫ θ ∏ I⎛ ( X i )dθ
n
∫θ ∏ f ( X i ;θ )dθ 1 1⎞
θ− ,θ+ ⎟
i =1 ⎜⎝ 2
t ( X1 , X 2 , " , X n ) = i =1 2⎠
n
= n
∫∏
i =1
f ( X i ;θ )dθ ∫∏ I ⎛ 1 1 ⎞ ( X i )dθ
i =1 ⎜
θ− ,θ+ ⎟
⎝ 2 2⎠

⎡ 1 1⎤
n
⎢ θ− ≤ Y1 ≤ Y2 ≤ ........ ≤ Yn ≤ θ + ⎥
∫θ ∏ I⎛ 1
i =1 ⎜
1 ⎞ (θ )dθ
X − ,X + ⎟ ⎢
2 2
⎥
= ⎝
i
2
i
2⎠ ⎢That is 1
θ − ≤ Y1 ⇒ θ ≤ Y1 +
1 ⎥
n ⎢ 2 2 ⎥
∫∏ I⎛ 1
i =1 ⎜
1 ⎞ (θ )dθ
X − ,X + ⎟ i i
⎢
⎢ 1
Yn ≤ θ + ⇒ θ ≥ Yn −
1
⎥
⎥
⎝ 2 2⎠
⎢⎣ 2 2 ⎥⎦
1
Y1 + 1
2 Y1 +
⎡θ 2 ⎤
∫ 1 θ dθ
2
⎢ ⎥
Yn − ⎣⎢ 2 ⎦⎥ Yn −
1
= 2
1
= 1
2
Y1 + Y1 +
2
[θ ]Y −21
∫ 1 dθ n
2
Yn −
2
2 2
1⎛ 1⎞ ⎛ 1⎞
Y1 + ⎟ − ⎜ Yn − ⎟
2 ⎜⎝ 2⎠ ⎝ 2⎠
=
⎛ 1⎞ ⎛ 1⎞
⎜ Y1 + 2 ⎟ − ⎜ Yn − 2 ⎟
⎝ ⎠ ⎝ ⎠
1
⇒ t ( X 1 , X 2 , " , X n ) = (Y1 + Yn )
2
Theorem
A pitman estimator for location is a function of sufficient statistics.
Proof:
We know that if S1 = s1 ( X 1 , X 2 , " , X n ) , " , S k = sk ( X 1 , X 2 , " , X n ) is a set of sufficient statistics, then by the
factorization criterion
n
∏ f ( xi ;θ ) = g ( s1 , s2 , ", sk ; θ ) h ( x1 , x2 , ", xn )
i =1
So, the pitman estimator can be written as

n
∫θ ∏ f ( X i ; θ ) dθ
t ( X1, X 2 , " , X n ) = i =1
n
∫∏ f ( X
i =1
i ; θ ) dθ
∫θ g ( S , S , ", S ; θ ) h ( X , X , ", X ) dθ
=
1 2 k 1 2 n
∫ g ( S , S , ", S ; θ ) h ( X , X , ", X ) dθ
1 2 k 1 2 n
)= ∫
θ g ( S , S , ", S ; θ ) dθ
1 2 k
⇒ t ( X1, X 2 , " , X n
∫ g ( S , S , ", S ; θ ) dθ
1 2 k
The above is the function of the sufficient statistics. So, we can say that a pitman estimator is a function of the
sufficient statistics.
Example: Let, X 1 , X 2 , " , X n be a random sample from a normal distribution with mean θ and the variance 9, where θ
15
is a location parameter. Find the pitman estimator of θ when ∑ xi = 225 .
i =1
Solution:
n
∫ θ ∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
∫∏
i =1
f ( X i ; θ ) dθ
⎛
1 ⎞
n
⎡ 1 n ⎛ X − θ ⎞2 ⎤
∫θ ⎜ π
⎟ exp ⎢− ⎜ 3 ⎟ ⎥ dθ
i
∑
⎝ 3 2 ⎠ ⎢⎣ 2 i =1 ⎝ ⎠ ⎦⎥
=
⎛ 1 ⎞
n
⎡ 1 ⎛ Xi −θ ⎞ ⎤
n 2
⎜ ∫
⎝ 3 2π ⎠
⎟ exp ⎢ − ⎜
⎢⎣ 2 i =1 ⎝ 3 ⎠ ⎥⎦
⎟ ∑ ⎥ dθ
⎡ 1 ⎛ n 2 n
2⎞
⎤
∫
θ exp ⎢ − ∑
⎜⎜ X i − 2θ X i + nθ ⎟⎟ ⎥ dθ ∑
= ⎣⎢ 2 × 9 ⎝ i =1 i =1 ⎠ ⎦⎥
⎡ 1 ⎛ n 2 n ⎞⎤
∫
exp ⎢ − ∑
⎜⎜ X i − 2θ X i + nθ ⎟⎟ ⎥ dθ
2
∑
⎣⎢ 2 × 9 ⎝ i =1 i =1 ⎠ ⎦⎥
∫
⎡ n
θ exp ⎢ −
⎣ 2×9
(
−2 X nθ + θ 2 ⎥ dθ
⎤
⎦
)
=
∫
⎡ n
exp ⎢ −
⎣ 2×9
(
−2 X nθ + θ 2 ⎥ dθ
⎤
⎦
)
⎡ ⎛ ⎞ ⎤
2
⎢ ⎜ ⎟ ⎥
1 θ−X ⎥
exp ⎢ − ⎜
1
∫
θ
3 ⎢ 2⎜ 3 ⎟ ⎥
⎟ dθ
⎡ n
∫ θ exp ⎢ −
⎣ 2×9
( ⎤
θ 2 − 2 X nθ + X 2 ⎥ dθ
⎦
) n
2π ⎢
⎢
⎣
⎜ ⎟ ⎥
⎝ n ⎠ ⎦⎥
= =
⎡ n
( 2 ⎤
) ⎡ ⎞ ⎤
2
∫ exp ⎢ − θ − 2 X nθ + X ⎥ dθ ⎛
2
⎢ ⎥
⎣ 2×9 ⎦ 1 ⎜θ − X ⎟ ⎥
exp ⎢ − ⎜
1
∫ 3 ⎢ 2⎜ 3 ⎟ ⎥
⎟ dθ
2π ⎢ ⎜ n ⎟ ⎥
n ⎢⎣ ⎝ ⎠ ⎥⎦
⇒ t ( X1 , X 2 , " , X n ) = E (θ )
n
∑ Xi
1
= Xn =
n i =1
15
So, we can say that X n is a pitman estimator. Now, when ∑ xi = 225 , then the pitman estimator for θ is given by
i =1
15
∑ xi =
1 225
xn = = 15
15 i =1 15
Exercise: Let, X 1 , X 2 , " , X n be a random sample from the density
f ( x ; θ ) = e ( ) I(θ ,∞ ) ( x )
− x −θ
for − ∞ < θ < ∞
Then, find the pitman estimator for the location parameter θ .
Solution:
n n
∫θ ∏ f ( X i ; θ ) dθ ∫θ ∏ exp ⎡⎣ − ( X i − θ ) ⎤⎦I (θ ,∞ ) ( x ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
= i =1
n
∫∏
i =1
f ( X i ; θ ) dθ ∫∏
i =1
exp ⎡⎣ − ( X i − θ ) ⎤⎦I (θ ,∞ ) ( x ) dθ

⎡ ⎤ ⎡ ⎤
Y1 n Y1 n
∫ θ exp ⎢−∑ ( X i − θ )⎥ dθ
⎢⎣ i =1 ⎥⎦
∫ θ exp ⎢⎣⎢−∑
i =1
X i + nθ ⎥ dθ
⎦⎥
−∞ −∞
= =
Y1
⎡ n ⎤ Y1
⎡ n ⎤
∫ exp ⎢⎣⎢−∑ i =1
( X i − θ ) ⎥ dθ
⎦⎥
∫ exp ⎢⎣⎢−∑
i =1
X i + nθ ⎥ dθ
⎦⎥
−∞ −∞
Y1
enθ e nθ
Y1 Y1
∫ θ e dθ ∫
nθ
θ − dθ
n −∞
n
−∞ −∞
= Y1
= Y1
enθ
∫ enθ dθ
n
−∞ −∞
nY1
Y1 e 1 nY1
− e
∴ t ( X1 , X 2 , " , X n ) =
n n2 = Y1 −
1
nY1
e n
n

1− x
f ( x ; θ ) = θ x (1 − θ ) for x = 0, 1 and 0 <θ <1
Solution:
n n
∫ θ ∏ f ( X i ; θ ) dθ ∫ θ ∏θ (1 − θ )
1− x
x
dθ
t ( X1 , X 2 , " , X n ) = i =1
n
= i =1
n
∫∏ ∫∏
1− x
f ( X i ; θ ) dθ θ x (1 − θ ) dθ
i =1 i =1
n
1 ∑ Xi n
∫ θθ i =1
(1 − θ )n −∑ Xi =1
i
dθ
= 0
n
1 ∑ Xi n
∫θ i =1
(1 − θ )n −∑ X i =1
i
dθ
0
n
1 ∑ X i +1 n
∫ θ i=1 (1 − θ )n −∑ Xi =1
i
dθ
= 0
n
1 ∑ Xi n
∫θ i =1
(1 − θ )n −∑ X
i =1
i
dθ
0
n
1 ∑ X i + 2−1 n
n − ∑ X i +1−1 ⎛ n n ⎞
∫θ i =1
(1 − θ ) i =1 dθ β ⎜⎜ ∑ X i + 2, n − ∑ X i + 1⎟⎟
= 0
= ⎝ i =1 i =1 ⎠
⎛ n ⎞
n
n
∑ X i +1−1
∑ ∑
1 n
β ⎜⎜ X i + 1, n − X i + 1⎟⎟
∫θ i =1
(1 − θ )n −∑ X +1−1 dθ
i =1
i
⎝ i =1 i =1 ⎠
0
n n
∑ Xi + 2 n − ∑ Xi +1 n+2
i =1 i =1
= ×
n+3 n n
∑ Xi +1 n − ∑ Xi +1
i =1 i =1
n
∑ Xi +1
∴ t ( X1 , X 2 , " , X n ) = i =1
n+2

f ( x ; θ ) = θ e −θ for x>0 and θ >0

Solution:
n
∫ θ ∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
∫∏
i =1
f ( X i ; θ ) dθ
∞ n
∫θ ∏
−θ
θe dθ
i =1
= 0
∞ n
∫∏
−θ
θe dθ
0 i =1
∞
∫ θθ
n − nθ
e dθ
= 0
∞
∫θ
n − nθ
e dθ
0
∞
∫θ
n +1 − nθ
e dθ
= 0
∞
∫θ
n − nθ
e dθ
0
∞
∫e
− nθ
θ n + 2 −1dθ
= 0
∞
∫e
− nθ
θ n +1−1dθ
0
n+2
n+2
= n
n +1
n n +1
n +1
=
n
1
∴ t ( X1 , X 2 , " , X n ) = 1 +
n

Pitman Estimator for Scale Parameter
Scale invariant
An estimator T = t ( X 1 , X 2 , " , X n ) is defined to be scale invariant if and only if
t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )
for all values x1 , x2 , " , xn and all c > 0 .
n
∑ Xi
i =1
Example: Show that X n = is scale invariant.
n
Solution:
n
∑ xi
Let, t ( x1 , x2 , " , xn ) = xn = i =1
n
Then we have that

n
∑ cxi
t ( cx1 , cx2 , " , cxn ) = i =1
n
n
∑ xi
i =1
=c
n
⇒ t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )
So, we can say that X n is scale invariant.
Y1 + Yn
Example: Show that is scale invariant where Y1 is the smallest order statistics and Yn is the largest order
2
statistics.
Solution:
y1 + yn min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn )

Let, t ( x1 , x2 , " , xn ) = =
2 2
Then we have that,
min ( cx1 , cx2 , " , cxn ) + max ( cx1 , cx2 , " , cxn )
t ( cx1 , cx2 , " , cxn ) =
2
min ( x1 , x2 , " , xn ) + max ( x1 , x2 , " , xn )
=c
2
⇒ t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )
Y1 + Yn
So, we can say that is scale invariant.
2
Pitman Estimator for Scale Parameter ~ 1 of 5
1 n
(
∑ Xi − X n )
2
Example: Show that s2 = is scale invariant.
n − 1 i =1
Solution:
1 n
t ( x1 , x2 , " , xn ) = s 2 = ∑ ( xi − xn )
2
Let,
n − 1 i =1
2
1 n ⎛ n ⎞
∑ ∑
1
Then we have that t ( cx1 , cx2 , " , cxn ) = ⎜⎜ cxi − cxi ⎟
n − 1 i =1 ⎝ n ⎟
i =1 ⎠
1 n
=c ∑
n − 1 i =1
( xi − xn )2
⇒ t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )
1 n
(
∑ Xi − X n )
2
So, we can say that s2 = is scale invariant.
n − 1 i =1
Example: Show that Yn − Y1 is not scale invariant.

Solution:
Let, t ( x1 , x2 , " , xn ) = Yn − Y1 = max ( x1 , x2 , " , xn ) − min ( x1 , x2 , " , xn )
Then we have that
t ( cx1 , cx2 , " , cxn ) = max ( cx1 , cx2 , " , cxn ) − min ( cx1 , cx2 , " , cxn )
= c ⎡⎣ max ( x1 , x2 , " , xn ) − min ( x1 , x2 , " , xn ) ⎤⎦
⇒ t ( cx1 , cx2 , " , cxn ) = ct ( x1 , x2 , " , xn )
So, we can say that Yn − Y1 is scale invariant.
Location Parameter
Let { f (⋅ ; θ ) ; θ > 0} be a family of densities indexed by a real parameter θ . The parameter θ is defined to be a
X 1 ⎛x⎞
scale parameter if and only if the density f ( x ; θ ) can be written as a function of . That is, f ( x ; θ ) =
θ ⎜⎝ θ ⎟⎠
h
θ
for some function h ( ⋅) . Equivalently θ is a scale parameter for the density f X ( x ; θ ) of a random variable X if
X
and only if the distribution of does not depend on θ .
θ
We note that if θ is a scale parameter for the family of densities { f (⋅ ; θ ) ; θ > 0} , then the function h (⋅) of the
definition is a density function given by h ( x ) = f ( x ; 1) .
Example: If f ( x ; θ ) = φ0, σ 2 ( x ) , then show that θ is a scale parameter.
Solution:
2
1⎛ x ⎞
1 − ⎜ ⎟
Here, we have that f ( x ; θ ) = φ0, σ 2 ( x ) = e 2⎝ σ ⎠
σ 2π
⎛x⎞ ⎛x⎞
= φ0, 1 ⎜ ⎟ = h⎜ ⎟
⎝σ ⎠ ⎝σ ⎠
X
Or, we can say that if X is distributed normally with mean 0 and variance σ 2 , then has a standard normal
θ
X
distribution. Hence, the distribution of is independent of θ .
θ
So, we can say that θ is a scale parameter.

1
Example: If f ( x ; θ ) = I ( x ) , then show that θ is a scale parameter.
θ (θ , 2θ )
Solution:
1
Here, we have that f (x ; θ ) = I ( x)
θ (θ , 2θ )
1 ⎛x⎞ ⎛x⎞
= I (1, 2 ) ⎜ ⎟ = h⎜ ⎟
θ ⎝θ ⎠ ⎝θ ⎠
X
Hence, the distribution of is independent of θ . So, we can say that θ is a scale parameter.
θ
x
1 −
Example: If f ( x ; θ ) = e θ I ( 0, ∞ ) ( x ) , then show that θ is a scale parameter.
θ
Solution:
x
1 − ⎛x⎞
Here, we have that f (x ; θ ) = e θ I ( 0, ∞ ) ( x ) = h ⎜ ⎟
θ ⎝θ ⎠
X
θ
1
Example: If f ( x ; θ ) = I ( x ) , then show that θ is a scale parameter.
θ ( 0, θ )
Solution:
Here, we have that
1
f (x ; θ ) = I ( x)
θ ( 0, θ )
1 ⎛x⎞ ⎛x⎞
= I ( 0, 1) ⎜ ⎟ = h⎜ ⎟
θ ⎝θ ⎠ ⎝θ ⎠
X
θ
Pitman estimator for Scale

Let, X 1 , X 2 , " , X n denote a random sample from the density f ( ⋅ ; θ ) , where θ > 0 is a scale parameter.
Assume that f ( x ; θ ) = 0 for x ≤ 0 . That is, the random variable X i assume only the positive values. Then, the
estimator
n
⎛ 1 ⎞
∫ ⎜ 2⎟
⎝θ ⎠
∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
⎛ 1 ⎞
∫ ⎜⎝ θ 3 ⎟⎠ ∏
i =1
f ( X i ; θ ) dθ
is the estimator of θ which has uniformly smallest risk within the class of scale-invariant estimators for the loss
( t − θ )2
function l ( t ; θ ) = .
θ2
The estimator given in the above equation is defined to be the pitman estimator for scale.

Example: Let, X 1 , X 2 , " , X n be a random sample from a density function.
1
f (x ; θ ) = I ( x)
θ ( 0, θ )
Find the pitman estimator of θ for the scale parameter.
Solution:
n
⎛ 1 ⎞
∫ ⎜⎝ θ 2 ⎟⎠ ∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
⎛ 1 ⎞
∫ ⎜⎝ θ 3 ⎟⎠ ∏
i =1
f ( X i ; θ ) dθ
∞ n
⎛ 1 ⎞
n 1 ⎛1⎞
∫ ⎜ 2⎟ ∏θ
1
I ( 0, θ ) ( X i ) dθ ∫ θ 2 ⎜⎝ θ ⎟⎠ dθ
⎝θ ⎠ i =1 Yn
= n
= ∞
⎛ 1 ⎞
n
1 ⎛1⎞
∏θ
1
∫ I ( 0, θ ) ( X i ) dθ
⎜ 3⎟
⎝θ ⎠ i =1
∫ ⎜ ⎟ dθ
θ3 ⎝θ ⎠
Yn
∞
∞ ⎡ θ −( n + 2 ) +1 ⎤
−( n + 2 )
∫θ dθ ⎢ ⎥
⎢⎣ − ( n + 2 ) + 1 ⎥⎦Y
Yn
= ∞
= n
∞
− ( n + 3) ⎡ θ −( n + 3) +1 ⎤
∫θ dθ ⎢ ⎥
Yn ⎢⎣ − ( n + 3) + 1 ⎥⎦Y
n
1
− ( n + 1)
0 − Yn− n −1( )
=
1
− ( n + 2)
(
0 − Yn− n − 2 )
n+2
∴ t ( X1 , X 2 , " , X n ) = × Yn
n +1
n
Note: We know that Yn is a complete sufficient statistic and E (Yn ) = θ . So, by the Lehmann-Scheffe theorem
n +1
n +1
Yn is the UMVUE of θ .
n
Example: Let, X 1 , X 2 , " , X n be a random sample from a density function.

x
1 −
f (x ; θ ) = e θ I ( 0, ∞ ) ( x )
θ
Find the pitman estimator of θ for the scale parameter.
Solution:
We know that the pitman estimator of θ for scale parameter is given by
n
⎛ 1 ⎞
∫ ⎜⎝ θ 2 ⎟⎠ ∏ f ( X i ; θ ) dθ
t ( X1 , X 2 , " , X n ) = i =1
n
.
⎛ 1 ⎞
∫ ⎜ 3⎟
⎝θ ⎠
∏ f ( X i ; θ ) dθ
i =1
n x
⎛ 1 ⎞ −
∫ ⎜⎝ θ 2 ⎟⎠ ∏
1
e θ I ( 0, ∞ ) ( x ) dθ
i =1
θ
= x
n
⎛ 1 ⎞ −
∏θ
1
∫ ⎜ 3⎟ e θ I ( 0, ∞ ) ( x ) dθ
⎝θ ⎠ i =1

⎛ n ⎞ ⎡ ⎤
∞ ⎜ ∑ X i ⎟⎟ ⎢ Let , ⎥
θ ( ) exp ⎜⎜ −
− n+2
∫ i =1
⎟ dθ
θ
⎢ n
⎢ n
⎥
⎥
0 ⎜
⎜
⎝
⎟
⎟
⎠
∑
⎢ Xi ∑ Xi ⎥
= ⎢ i =1 =Z ⇒ θ = i =1 ⎥
⎛ n ⎞ ⎢ θ Z ⎥
∞ ⎜ ∑ Xi ⎟ ⎢ n ⎥
∫θ ( ) exp ⎜⎜ −
− n +3 i =1
θ ⎟
⎟ dθ ⎢
⎢ ∑Xi ⎥
⎥
0 ⎜ ⎟ ⎢ ⇒ dθ = − i =1 2 ⎥
⎜ ⎟ ⎢⎣ Z ⎥⎦
⎝ ⎠
−( n + 2 )
⎛ n ⎞ ⎛ n ⎞
∞⎜
⎜
∑ Xi ⎟
⎟
⎜ Xi ∑ ⎟
⎜∫ i =1
e− Z ⎜⎜ − i =1 2 ⎟ dz
0⎜
Z ⎟ Z ⎟
⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟
= ⎝ ⎠ ⎝ ⎠
− ( n + 3)
⎛ n ⎞ ⎛ n ⎞
∞⎜
⎜
∑ Xi ⎟
⎟
⎜
⎜ − i =1
∑
Xi ⎟
⎟ dz
⎜∫ i =1
Z ⎟
e− Z ⎜ Z2 ⎟
0⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠
∞
⎛ n ⎞ 0 ∫
Z n e− Z dz
⎜ ∑
= ⎜ Xi ⎟ ∞
⎟
⎝ i =1 ⎠
∫
Z n +1e− Z dz
0
∞
⎞∫
−Z
e Z n +1−1dz
⎛ n
⎜ ∑
= ⎜ X i ⎟ ∞0
⎟
⎝ i =1 ⎠
∫e
−Z
Z n + 2 −1dz
0
⎛n ⎞ n +1
∑
= ⎜ Xi ⎟
⎜ ⎟
⎝ i =1 ⎠ n + 2
⎛ n ⎞
∑
⎜⎜ X i ⎟⎟
t ( X1 , X 2 , " , X n ) = ⎝ ⎠
i =1
∴
n +1
n n
∑ Xi ∑ Xi
i =1 i =1
Note: It can be shown that UMVUE of θ is . Again note that is a scale invariant estimator and hence
n n
n
∑ Xi ( t − θ )2
i =1
is a scale-invariant estimator having uniformly smallest risk for the loss function l ( t ; θ ) = , the risk
n +1 θ2
n n
∑ Xi ∑ Xi 1
i =1 i =1
of is uniformly smaller than the risk of . Also, since here risk equals times the MSE , the MSE
n +1 n θ2
n n
∑ Xi ∑ Xi
i =1 i =1
of is uniformly smaller than the MSE of .
n +1 n

Bayes and Minimax Estimation
Decision Function
A decision function δ ( x ) is a statistic that takes values in D , that is, δ is a Borel measurable function that maps
R n into D .
Elements of Decision Function

The elements of decision are
• Choices available or alternatives or options.
• States of nature.
• Payoffs
Prior Distribution
Let f (θ ) be the probability distribution of the parameter θ which is also summarizes the objective information
about θ prior to obtaining sample observation. We will choose f (θ ) with sampler variance, so that f (θ ) is the
prior distribution of θ .
Posterior Distribution
Consider a random variable X and the distribution of X is denoted by f ( x | θ ) . This distribution depends on θ ,
where θ is unknown parameter.
Let x1 , x2 , " , xn be a random sample, then the joint distribution can be written as
f ( x1, x2 , " , xn | θ ) = f ( x1 | θ )" f ( xn | θ )
The posterior distribution of θ as the conditional distribution of θ given the sample values or sample measures. So,
f ( x1 , x2 , " , xn , θ ) Where,
f (θ x1 , x2 , " , xn ) =
f ( x1 , x2 , " , xn ) f ( x1 , x2 , " , xn , θ ) = joint distribution
f ( x1 , x2 , " , xn | θ ) f (θ ) of sample & θ
=
f ( x1 , x2 , " , xn ) = f (θ ) f ( x1 , x2 , " , xn | θ )
(
Thus f θ x1 , x2 , " , xn ) is known as the posterior distribution of θ .
Example: A time failure of a transistor is known to be exponentially distributed with parameter θ having the density
function:
f ( x θ ) = θ e−θ x ; x>0
Assume that the prior distribution of Θ is given by
g Θ (θ ) = ke − kθ ; θ >0
That is, Θ is also exponentially distributed over the interval ( 0, ∞ ) . Find the posterior distribution of Θ .
Bayes and Minimax Estimation ~ 1 of 20

Solution:
We know that the posterior distribution of Θ is given by
f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =
f X1 , X 2 , ", X n ( x1 , x2 , " , xn )
n
∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )
i =1
= n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦
n ⎛ n ⎞
−θ ∑ xi −θ ⎜ ∑ xi + k ⎟
− kθ ⎜ ⎟
θ e n i =1
ke θ n
e ⎝ i =1 ⎠
= =
n
⎛ n ⎞
∞ −θ ∑ xi ∞ −θ ⎜ ∑ xi + k ⎟
⎜ ⎟
∫θ
− kθ
dθ
∫θ
n i =1 n ⎝ i =1 ⎠ dθ
e ke e
0 0
⎛ n ⎞
−θ ⎜ ∑ xi + k ⎟
⎜ ⎟
θ e
n ⎝ i =1 ⎠
=
⎛ n ⎞
∞ − ⎜ ∑ xi + k ⎟θ
⎜ ⎟
∫e ⎝ i =1 ⎠ θ n +1−1dθ
0
n +1
⎛ n ⎞ ⎛ n ⎞
θ e
n
−θ ⎜ ∑ xi + k ⎟
⎜
⎝ i =1
⎟
⎠
∑
⎜⎜ xi + k ⎟⎟ ⎛ n ⎞
− ⎜ ∑ xi + k ⎟θ
⎜ ⎟
=⎝ ⎠
i =1
= e ⎝ i =1 ⎠ θ n +1−1
n +1 n +1
n +1
⎛ n ⎞
∑
⎜⎜ xi + k ⎟⎟
⎝ i =1 ⎠
⎛ n ⎞
⇒ f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = Gamma ⎜ n + 1, xi + k ⎟ ∑ ; θ ≥0
⎜ ⎟
⎝ i =1 ⎠
Example: Let X 1 , X 2 , " , X n denote a random sample from normal distribution with the density
⎡ 1 2⎤
f (x θ ) =
1
exp ⎢ − ( x − θ ) ⎥ ; −∞ ≤ x ≤ ∞
2π ⎣ 2 ⎦
1 ⎡ 1 ⎤
g Θ (θ ) = exp ⎢ − θ 2 ⎥ ; −∞ ≤θ ≤ ∞
2π ⎣ 2 ⎦
That is, Θ is standard normal. Find the posterior distribution of Θ .
Solution:
f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =
f X1 , X 2 , ", X n ( x1 , x2 , " , xn )
n
∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )
i =1
= n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦
⎡ 1 n 2⎤
n
⎛ 1 ⎞ ⎡ 1 ⎤
⎟ exp ⎢ − ∑ ( xi − θ ) ⎥
1
⎜ exp ⎢ − θ 2 ⎥
⎝ 2π ⎠ ⎢⎣ 2 i =1 ⎦⎥ 2π ⎣ 2 ⎦
=
∞
⎡ ⎤
n
⎛ 1 ⎞ n
⎡ 1 ⎤
∫ ⎜⎝ 2π ⎟⎠ exp ⎢⎢⎣− 2 ∑
1 1
( xi − θ )2 ⎥ exp ⎢ − θ 2 ⎥ dθ
−∞ i =1 ⎥⎦ 2π ⎣ 2 ⎦

⎡ 1⎛ n n ⎞⎤
∑ ∑
exp ⎢ − ⎜ xi2 − 2θ xi + nθ 2 + θ 2 ⎟ ⎥
⎜ ⎟⎥
= ⎣⎢ 2 ⎝ i =1 i =1 ⎠⎦
∞ ⎡ 1⎛ n n ⎞⎤
∫ exp ⎢⎢− 2 ⎜⎜⎝ ∑ xi2 − 2θ ∑ xi + nθ 2 + θ 2 ⎟ ⎥ dθ
⎟⎥
⎠
−∞ ⎣ i =1 i =1 ⎦
⎡ 1
⎣ 2
( ⎤
exp ⎢ − ( n + 1)θ 2 − 2θ nx ⎥
⎦
)
= ∞
∫ exp ⎢⎣− 2 ( ( n + 1)θ )

⎡ 1 ⎤
2
− 2θ nx ⎥ dθ
−∞ ⎦
⎡ n +1 ⎛ 2 n ⎞⎤
exp ⎢ − ⎜ θ − 2θ n + 1 x ⎟ ⎥
⎣ 2 ⎝ ⎠ ⎦
= ∞
⎡ n +1 ⎛ 2 n ⎞⎤
∫exp ⎢ −
⎣ 2 ⎝
⎜ θ − 2θ n + 1 x ⎟ ⎥ dθ
⎠⎦
−∞
⎡ n +1 ⎛ n ⎛ n ⎞ ⎞⎤
2
exp ⎢ − ⎜ θ 2 − 2θ x +⎜ x ⎟ ⎟⎥
⎢⎣ 2 ⎜⎝ n +1 ⎝ n + 1 ⎠ ⎟⎠ ⎥⎦
=
∞ ⎡ n +1 ⎛ n ⎛ n ⎞ ⎞⎤
2
∫exp ⎢ −
⎢⎣ 2 ⎜⎝
⎜ θ 2 − 2θ
n +1
x +⎜ x ⎟ ⎟ ⎥ dθ
⎝ n + 1 ⎠ ⎟⎠ ⎥⎦
−∞
⎡ ⎛ n ⎞ ⎤
2
⎢ θ− x⎟ ⎥
1⎜ n +1 ⎟ ⎥
exp ⎢ − ⎜
⎢ 2⎜ 1 ⎟ ⎥
⎢ ⎜ ⎟ ⎥
⎢⎣ ⎝ n + 1 ⎠ ⎥⎦
=
⎡ ⎛ n ⎞ ⎤
2
⎢ θ − ⎟ ⎥
1⎜
∞ x
exp ⎢ − ⎜ n + 1 ⎟ ⎥ dθ
∫ ⎢ 2⎜ 1 ⎟ ⎥
−∞ ⎢ ⎜ ⎥
⎣⎢ ⎝ n + 1 ⎟⎠ ⎦⎥
⎡ ⎛ n ⎞ ⎤
2
⎢ θ − ⎟ ⎥
1⎜
x
1
exp ⎢ − ⎜ n +1 ⎟ ⎥
1 ⎢ 2⎜ 1 ⎟ ⎥
2π ⎢ ⎜ ⎥
n +1 ⎣⎢ ⎝ n + 1 ⎟⎠ ⎦⎥
=
⎡ ⎛ n ⎞ ⎤
2
∞ ⎢ ⎜ θ− x⎟ ⎥
1
exp ⎢ − ⎜
1 n + 1 ⎟ ⎥ dθ
∫ 1 ⎢ 2⎜ 1 ⎟ ⎥
−∞ 2π ⎢ ⎜ ⎥
n +1 ⎣⎢ ⎝ n + 1 ⎟⎠ ⎦⎥
⎛ 1 ⎞
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = N ⎜ θ ;
n
⇒ −∞ ≤θ ≤ ∞
n + 1 n + 1 ⎟⎠
x, ;
⎝
Example:
Let X 1 , X 2 , " , X n denote a random sample from Poisson distribution with the density
e −θ θ x
f (x θ ) = ; x = 0, 1, " , ∞
x!
⎛1 ⎞
⎜ ⎟ − 1θ
β
g Θ (θ ) = ⎝ ⎠ e β θ α −1 ; θ >0
α
That is, Θ is standard normal. Find the posterior distribution of Θ .

Solution:
f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =
f X1 , X 2 , ", X n ( x1 , x2 , " , xn )
n
∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )
i =1
= n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫∏
i =1
⎣ ⎦
(× 1β ) e
n
− nθ
∑ xi 1
− θ
e θ i=1 β
θ α −1
n
α
∏ ( x !)
i =1
=
( 1β ) e
n
∞ − nθ
∑ xi 1
− θ
e θ i =1
∫ θ α −1dθ
β
×
n
α
0
∏ ( x !)
i =1
⎛ 1⎞ ⎛ n ⎞ ⎛ 1⎞ ⎛ n ⎞
− ⎜ n + ⎟θ ⎜⎜ ∑ xi +α −1⎟⎟ −⎜ n + ⎟θ ⎜⎜ ∑ xi +α −1⎟⎟
β ⎠ ⎝ i =1 β ⎠ ⎝ i=1
e ⎝ θ ⎠ e ⎝ θ ⎠
= =
⎛ n ⎞
∞ −⎛⎜ n + 1 ⎞⎟θ ⎜ ∑ xi +α −1⎟
n
∫e ⎝ β ⎠ ⎝⎜ i=1
θ
⎟
⎠ dθ ∑ xi + α
i =1
0 n
( n + 1β ) ∑ xi +α
i =1
=
( n + 1β ) ∑ xi +α
i =1
e
⎛
⎝
1⎞ ⎛ n
β ⎠ ⎝ i=1
θ
⎞
−⎜ n + ⎟θ ⎜⎜ ∑ xi +α −1⎟⎟
⎠ ; θ >0
n
∑ xi + α
i =1
⇒
⎛ n
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = Gamma ⎜ xi + α , n + 1 ⎟
⎜
⎝ i =1
⎞
β ⎟
⎠
∑ ( )
Posterior Bayes estimator
Let X 1 , X 2 , " , X n be a random sample from a density f ( x | θ ) , where θ is the value of a random variable Θ
with known density g Θ ( ⋅) . The posterior Bayes estimator of τ (θ ) with respect to the prior density g Θ ( ⋅) is defined
to be
E ⎡⎣τ (θ ) | X1 , X 2 , " , X n ⎤⎦
Here, it is given that
E ⎡⎣τ (θ ) | X 1 = x1 , " , X n = xn ⎤⎦ = ∫ τ (θ ) f Θ| X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ

n
∫ τ (θ ) ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) dθ
i =1
= n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦
One might note the similarity between the posterior Bayes estimator of τ (θ ) = θ and the Pitman estimator of a
location parameter.

Example:
Let X 1 , X 2 , " , X n denote a random sample from Bernoulli density
f ( x θ ) = θ x (1 − θ )
1− x
I ( 0, 1) ( x ) for 0 ≤θ ≤1
g Θ (θ ) = I( 0, 1) (θ )
That is, Θ is uniformly distributed over the interval ( 0,1) . Find the posterior distribution of Θ and find the Bayes
estimator of θ and θ (1 − θ ) .
Solution:
f X1 , X 2 , ", X n ,Θ ( x1 , x2 , " , xn , θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) =
f X1 , X 2 , ", X n ( x1 , x2 , " , xn )
n n
∏ ⎡ f ( xi θ ) ⎤ gΘ (θ ) ∑ xi n
i =1
⎣ ⎦ θ i=1 (1 − θ )n−∑
i =1
xi
I ( 0, 1) (θ )
= n
= n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ∑ xi
∫∏
n
n − ∑ xi
i =1
⎣ ⎦ ∫θ i =1
(1 − θ ) i =1 I ( 0, 1) (θ ) dθ
n
∑ xi +1−1 n
θ i=1 (1 − θ )n −∑
i =1
xi +1−1
=
⎛ n n ⎞
β ⎜⎜ ∑ xi + 1, n − ∑ xi + 1⎟⎟
⎝ i =1 i =1 ⎠
⎛ n n ⎞
⇒ f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) = Beta 1st ⎜⎜θ ; ∑ xi + 1, n − ∑ xi + 1⎟⎟ ; 0 ≤θ ≤1
⎝ i =1 i =1 ⎠
Again, we have that the posterior Bayes estimator of θ with respect to the prior distribution g Θ (θ ) = I ( 0, 1) (θ ) is
given by
E ⎡⎣τ (θ ) X1 = x1 , " , X n = xn ⎤⎦ = τ (θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ

∫
= ∫ θ f Θ X = x , ", X = x (θ1 1 n n
x1 , x2 , " , xn ) dθ
n n
∫ θ ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )dθ ∑ xi n
i =1 ∫ θθ i=1 (1 − θ )n −∑ x i =1
i
I ( 0, 1) (θ ) dθ
= n
= n
⎡ f ( xi θ ) ⎤ gΘ (θ )dθ ∑ xi
∫∏
n
n − ∑ xi
i =1
⎣ ⎦ ∫ θ i =1
(1 − θ ) i =1 I ( 0, 1) (θ ) dθ
n
1 ∑ xi +1 n
∫ θ i=1 (1 − θ )n −∑ x i =1
i
dθ
= 0
n
1 ∑ xi n
n − ∑ xi
∫θ i =1
(1 − θ ) i =1 dθ
0
⎛ n n ⎞ ⎛ n ⎞
β ⎜⎜ ∑ xi + 2, n − ∑ xi + 1⎟⎟ ∑
⎜⎜ xi + 1⎟⎟ n + 2
= ⎝ ⎠ =⎝ ⎠
i =1 i =1 i =1
⎛ n n ⎞ n+3
∑
β ⎜⎜ xi + 1, n − xi + 1⎟⎟ ∑
⎝ i =1 i =1 ⎠
n
∑ xi + 1
∴ E ⎡⎣τ (θ ) X1 = x1 , " , X n = xn ⎤⎦ = i =1
n+2

n
∑ xi + 1
i =1
Hence, the posterior Bayes estimator of θ with respect to the uniform prior distribution is given by .
n+2
n n
∑ xi ∑ xi
i =1 i =1
Contrast this to the maximum likelihood estimator of θ , which is . We know that is unbiased and
n n
UMVUE , whereas the posterior Bayes estimator is not unbiased.
Again, we have that the posterior Bayes estimator of θ (1 − θ ) with respect to the prior distribution
g Θ (θ ) = I ( 0,1) (θ ) is given by
E ⎡⎣τ (θ ) X1 = x1 , " , X n = xn ⎤⎦ = ∫ τ (θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ
= ∫ θ (1 − θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ
n
⎡ f ( xi θ ) ⎤ gΘ (θ )dθ
∫ θ (1 − θ ) ∏
i =1
⎣ ⎦ ∫ θ (1 − θ ) f X , X , ", X ,Θ ( x1 , x2 , ", xn , θ ) dθ
= =
1 2 n
∫ f X , X , ", X ,Θ ( x1 , x2 , ", xn , θ ) dθ
n
∫ ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )dθ
i =1
1 2 n
n
∑ xi n
n − ∑ xi
=
∫ θ (1 − θ )θ i =1
(1 − θ ) i =1 I ( 0, 1) (θ ) dθ
n
∑ xi n
n − ∑ xi
∫ θ i =1
(1 − θ ) i =1 I ( 0, 1) (θ ) dθ
n
1 ∑ xi +1 n
⎛ n n ⎞
∫ θ i=1 (1 − θ )n −∑
i =1
xi +1
dθ β ⎜⎜ ∑ xi + 2, n − ∑ xi + 2 ⎟⎟
= 0
= ⎝ i =1 i =1 ⎠
⎛ n ⎞
n n
1 ∑ xi n
β ⎜⎜ ∑ xi + 1, n − ∑ xi + 1⎟⎟
∫θ i =1
(1 − θ )n −∑ xi =1
i
dθ
⎝ i =1 i =1 ⎠
0
⎛ n ⎞ n
⎜⎜ ∑ xi + 1⎟⎟ n − ∑ xi + 1 n + 2
=⎝ ⎠
i =1 i =1
n+4
⎛ n ⎞ n
⎜⎜ ∑ xi + 1⎟⎟ n − ∑ xi + 1
E ⎡⎣τ (θ ) X 1 = x1 , " , X n = xn ⎤⎦ = ⎝ ⎠
i =1 i =1
⇒
( n + 3)( n + 2 )
Hence, the posterior Bayes estimator of θ (1 − θ ) with respect to the uniform prior distribution is given by
⎛ n ⎞ n
⎜⎜ ∑ xi + 1⎟⎟ n − ∑ xi + 1
⎝ i =1 ⎠ i =1
. We noted in the above example that the posterior Bayes estimator that we obtained was
( n + 3)( n + 2 )
not unbiased.
The following remark states that in general a posterior Bayes estimator is not unbiased.
Remark: Let TG* = tG* ( X1 , X 2 , " , X n ) denote the posterior Bayes estimator of τ (θ ) with respect to a prior distribution
G ( ⋅) . If both TG* and τ ( Θ ) have finite variance, then either
var ⎡TG* θ ⎤ = 0
⎣ ⎦

Or, TG* is not an unbiased estimator of τ (θ ) . That is, either TG* estimates τ (θ ) correctly with probability 1 or TG* is
not an unbiased estimator.
Proof:
Let us suppose that TG is an unbiased estimator of τ (θ ) . That is
*
( )
E TG* θ = τ (θ )
By the definition, we have that
TG* = tG* ( X 1 , X 2 , " , X n ) = E ⎡⎣τ (θ ) X 1 , X 2 , " , X n ⎤⎦
Now, we have that
( ) ⎣ ⎣⎢ ⎥⎦ ⎦ ( )
Var TG* = E ⎡⎢Var ⎡ TG* Θ ⎤ ⎤⎥ + Var ⎡⎢ E ⎡ TG* Θ ⎤ ⎤⎥
⎣ ⎣⎢ ⎥⎦ ⎦ ( )
⎣ ⎢⎣ ( )
= E ⎢⎡Var ⎡ TG* Θ ⎤ ⎥⎤ + Var ⎡⎣τ (θ ) ⎤⎦
⎥⎦ ⎦
" " " (1)
And
Var ⎡⎣τ ( Θ ) ⎤⎦ = E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤ + Var ⎡ E ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤

⎣ ⎦ ⎣ ⎦
= E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤ + Var ⎡TG* ⎤
⎣ ⎦ ⎣ ⎦
= Var ⎡⎣τ ( Θ ) ⎤⎦ − E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤ " " " ( 2)
⎣ ⎦
Now, from equation (1) and ( 2 ) we have that
⎣
( )
E ⎢⎡Var ⎡ TG* Θ ⎤ ⎥⎤ + Var ⎡⎣τ (θ ) ⎤⎦ = Var ⎡⎣τ ( Θ ) ⎤⎦ − E ⎡Var ⎡⎣τ ( Θ ) X1 , X 2 , " , X n ⎤⎦ ⎤
⎣⎢ ⎦⎥ ⎦ ⎣ ⎦
⇒
⎣ ⎣⎢
( )
E ⎡⎢Var ⎡ TG* Θ ⎤ ⎤⎥ + E ⎡Var ⎣⎡τ ( Θ ) X1 , X 2 , " , X n ⎦⎤ ⎤ = 0
⎦⎥ ⎦ ⎣ ⎦
⎡
Now, since both E Var ⎡ TG Θ ⎤
⎢⎣ ⎣⎢
( )
* ⎤ and E ⎡Var ⎡τ Θ X , X , " , X ⎤ ⎤ are non-negative and their sum is zero,
⎦⎥ ⎥⎦ ⎣ ⎣ ( ) 1 2 n ⎦⎦
then both are zero.
⎡
⎢⎣ ⎢⎣ ( ) ⎤
*
⎥⎦ ⎥⎦ ⎢⎣ ( )
In particular, E Var ⎡ TG Θ ⎤ = 0 and since Var ⎡ TG Θ ⎤ is non-negative and has zero expectation, then
*
⎥⎦
⎣⎢
( )
Var ⎡ TG* θ ⎤ = 0 .
⎥⎦
Loss Function
Consider estimating g (θ ) , let t = t ( x1 , x2 , " , xn ) denote an estimate of g (θ ) . The loss function, denoted by
l ( t ; θ ) is defined to be a real valued function satisfying
1) l ( t ; θ ) ≥ 0 for all possible estimates t and all θ in Θ .
2) l ( t ; θ ) = 0 for t = g (θ ) .
l ( t ; θ ) equals the loss incurred if one estimates g (θ ) to be t when θ is the true parameter value.
The word ‘loss’ is used in place of ‘error’ and loss function is used as the measure of the ‘error’.

Several Possible Loss Function
l1 ( t ; θ ) = ⎡⎣t − g (θ ) ⎤⎦ . It is called the squared error loss function.

2
1)
2) l2 ( t ; θ ) = t − g (θ ) . It is called the absolute error loss function.
⎧A
⎪ if t − g (θ ) > ε
3) l3 ( t ; θ ) = ⎨ .
⎪⎩0 if t − g (θ ) ≤ ε , where A > 0
l4 ( t ;θ ) = ρ (θ ) t − g (θ ) for ρ (θ ) ≥ 0 and r > 0 .

r
4)
Note that both l1 and l2 increases as the error T − g (θ ) increases in magnitude. l3 says that we loss nothing if
the estimate t is within ε units of g (θ ) and otherwise we loss the amount A . l4 is a general loss function that
includes both l1 and l2 as special cases.
Risk Function
For a given loss function l ( ⋅ ; ⋅) , the risk function, denoted by Rt (θ ) , of an estimator T = t ( X 1 , X 2 , " , X n ) is
defined to be
Rt (θ ) = Eθ ⎡⎣l (T ;θ ) ⎤⎦
The risk function is the average loss. The expectation in the above equation can be taken in two ways. For example,
if the density f ( x ;θ ) from which we sampled is a probability density function, then
R (θ , t ) = Eθ ⎡⎣l (T ;θ ) ⎤⎦ = Eθ ⎡⎣l ( t ( X 1 , X 2 , " , X n ) ;θ ) ⎤⎦

n
= " l ( t ( X 1 , X 2 , " , X n ) ;θ )
∫ ∫ ∏ f ( xi ;θ ) dxi
i =1
Or, we can consider the random variable T and the density of T is fT ( t ) then
R (θ , t ) = Eθ ⎡⎣l (T ; θ ) ⎤⎦ = l (T ;θ ) fT ( t ) dt
∫
Where, fT ( t ) is the density of the estimator T . In either case, the expectation averages out the values of
x1 , x2 , " , xn . Since θˆ is consider to be a random so that risk itself a random variable.
Possible Risk Functions
l1 ( t ;θ ) = ⎡⎣t − g (θ ) ⎤⎦
2
1) Corresponding to the loss function the risk function is given by
Rt (θ ) = Eθ ⎡⎣l1 (T ;θ ) ⎤⎦ = Eθ ⎡⎣t − g (θ ) ⎤⎦
2
2) Corresponding to the loss function l2 ( t ; θ ) = t − g (θ ) the risk function is given by
Rt (θ ) = Eθ ⎡⎣l2 (T ; θ ) ⎤⎦ = Eθ t − g (θ ) . It is called the mean absolute error.
3) Corresponding to the loss function

⎧A
⎪ if t − g (θ ) > ε
l3 ( t ; θ ) = ⎨
⎪⎩0 if t − g (θ ) ≤ ε , where A > 0
the risk function is given by Rt (θ ) = Eθ ⎣⎡l3 (T ;θ ) ⎦⎤ = APθ ⎡ t − g (θ ) > ε ⎤ .

⎣ ⎦
Corresponding to the loss function l4 ( t ; θ ) = ρ (θ ) t − g (θ ) for ρ (θ ) ≥ 0 and r > 0 the risk function is
r
4)
given by Rt (θ ) = Eθ ⎡⎣l4 (T ;θ ) ⎤⎦ = ρ (θ ) Eθ ⎡ t − g (θ ) z ⎤ .
r
⎣ ⎦
When a loss function is said to be Convex and Strictly Convex?
A real valued function L ( t ; θ ) defined over an open interval I = ( a, b ) with −∞ < t < t * < b and any 0 < γ < 1
( )
L ⎡⎣γ t + (1 − γ ) t * ⎤⎦ ≤ γ L ( t ) + (1 − γ ) L t * " " " (1)
The function is said to be strictly convex if strict inequality holds in (1) , for all indicated values of t , t and γ .
*
Convexity is a vary strong condition which implies, for example, that L is continuous in ( a, b ) and has a left and
right derivative at every point of ( a, b ) .
Determination of Convexity
Determination of whether or not a loss function is conves is often easy with the help of the following two criteria.
a) If L is defined and differentiable on ( a, b ) , then a necessary and sufficient condition for L to be convex is that
( )
L′ ( t ) ≤ L′ t * for all a < t < t * < b " " " (1)
The function is strictly convex iff (1) is strict for all t < t * .
b) If L is twice differentiable then the necessary and sufficient condition

(1) is equivalent to
L′′ ( t ) ≥ 0 for all a < t < b
with strict inequality sufficient for strict convexity.
Bayes Estimator With Respect to Loss Function

The Bayes estimator of the parameter θ to be the function d of the sample observation x1 , x2 , " , xn that
minimizes the expected risk, were expected risk is defined as
B ( d ) = E ⎡⎣ R ( d , θ ) ⎤⎦ ∫
= R ( d , θ ) f (θ ) dθ
= ⎡⎢ " l {d ( x1 , " , xn ) ; θ } f ( x1 , " , xn | θ ) dx1 " dxn ⎤⎥ f (θ ) dθ

∫∫ ∫ " " " ( *)
⎣ ⎦
Now, interchanging the order of integration we can write (*) as
B ( d ) = " ⎡⎢ l {d ( x1 , " , xn ) ; θ } f ( x1 , " , xn | θ ) f (θ ) dθ ⎤⎥ dx1 " dxn

∫ ∫∫ " " " (**)
⎣ ⎦
The function B ( d ) will be minimized if we can find the minimized function d i.e. minimizes the quantity within the
third braket of the equation (**) for every set of x values. That is, the Bayes estimator of θ is a function of d of
x1 , x2 , " , xn that minimizes
∫ l {d ( x , ", x ) ; θ } f ( x , ", x | θ ) f (θ ) dθ
1 n 1 n
Since f ( x1 , " , xn , θ ) = f ( x1 , " , xn | θ ) f (θ )
= ∫ l {d ( x ) ; θ } f ( x , " , x , θ ) dθ f ( x1 , " , xn | θ ) f (θ )
f (θ | x1 , " , xn ) =
1 n
⇒
f ( x1 , " , xn )
= f ( x , " , x ) ∫ l {d ( x ) ; θ } f (θ | x , " , x ) dθ
1 n 1 n
Thus the Bayes estimator of θ is the value θˆ that minimizes
∫ l {d ( x ) ; θ } f (θ | x , ", x ) dθ = Y
1 n ( say )

If the loss function is the squared error i.e. ⎣⎡ d ( x ) − θ ⎦⎤ then
2
∫ ∫
Y = ⎡⎣ d ( x ) − θ ⎤⎦ f (θ | x1 , " , xn ) dθ = ⎡⎣ d ( x ) ⎤⎦ f (θ | x1 , " , xn ) dθ
2 2
∫ ∫
− 2 θ d ( x ) f (θ | x1 , " , xn ) dθ + θ 2 f (θ | x1 , " , xn ) dθ
Thus minimizing Y with respect to d ( x ) is
∂Y
=0
∂ ⎡⎣ d ( x ) ⎤⎦
∫ ∫
⇒ 2 d ( x ) f (θ | x1 , " , xn ) dθ − 2 θ f (θ | x1 , " , xn ) dθ = 0
⇒ d ( x) =
∫ θ f (θ | x , ", x ) dθ
1 n
= Expected Posterior
∫ f (θ | x , ", x ) dθ
1 n
Hence, d ( x ) is the Bayes estimate for θ if the loss function is in squared error.
Advantages of Bayesian Approach

Bayesian approach has the following advantages over classical approach.
a) We make inferences about the unknown parameters given the data whereas in the classical approach we look at
the long run behavior e.g. in 95% of experiments p will lie between p ′ and p ′′ .
b) The posterior distribution tells the whole story and if a point estimate or confidence interval be desired they can
immediately be obtained from posterior distribution.
c) Bayesian approach provides solutions for problems which do not have solutions from the classical point of view.
Note:
a) A decision Rule δ is said to be uniformly better than a decision rule δ ′ if R (δ , θ ) ≤ R (δ ′, θ ) ∀ θ ∈ Θ with
strict inequality holding for some θ .
b) A decision rule δ * is said to be uniformly best in a class of decision rules D if δ * is uniformly better than any
other decision rule δ ∈ D .
c) A decision rule is said to be admissible in a class of D if there exists no other decision rule in D which is
uniformly better that that δ .
( )
Example: Let X 1 , X 2 , " , X n be independent N µ , σ 2 variables where µ is unknown but σ 2 is known. Let the prior
(
distribution of µ be N θ , σ
2
) . Find the Bayes estimate of µ .
Solution:
The joint conditional distribution of the sample given µ is
n
⎛ 1 ⎞ ⎡ 1 2⎤
∑ ( xi − µ )
2
f ( x1 , " , xn | µ ) = ⎜ ⎟ exp ⎢ − 2 ⎥
⎝ 2πσ 2 ⎠ ⎣ 2σ ⎦
n
⎛ 1 ⎞ ⎡ n 2⎤
∑ ( xi − x )
2 1
exp ⎢ − 2 ( x − µ ) − 2
2
=⎜ ⎟ ⎥
⎝ 2πσ 2 ⎠ ⎣ 2σ 2σ ⎦
⎡ 1 2⎤
∴ f ( x1 , " , xn | µ ) ∝ exp ⎢ − 2 ( x − µ ) ⎥
⎣ 2σ ⎦

The posterior distribution of µ given x is
f ( µ | θ ) f ( x1 , " , xn | µ )
g ( µ | x1 , " , xn ) = ∝ f ( µ | θ ) f ( x1 , " , xn | µ )
f ( x1 , " , xn )
⎡ n 1 2⎤
∝ exp ⎢ − 2 ( x − µ ) − 2 ( µ − θ ) ⎥
2
⎣⎢ 2σ 2σ 0 ⎦⎥
⎡ 1 ⎛ nσ 2 + σ 2 ⎞ ⎛ nxσ 02 + θσ 02 ⎞
2⎤
∝ exp ⎢ − ⎜ 02 2 ⎟ ⎜ µ − ⎟⎟ ⎥
⎢ 2 ⎜⎝ σ 0 σ ⎟⎜
⎠⎝ nσ 02 + σ 2 ⎠ ⎥
⎣ ⎦
⎡ nxσ 02 + θσ 02 σ 02σ 2 ⎤
∴ f (µ | x) ~ N ⎢ , ⎥
⎣⎢ nσ 0 + σ nσ 02 + σ 2 ⎦⎥
2 2
nxσ 02 + θσ 02
If the loss function is squared error, the Bayes estimator of µ is
nσ 02 + σ 2
Theorem
Let X 1 , X 2 , " , X n be a random sample from the density f ( x | θ ) and let g Θ (θ ) be the density of Θ . Further, let
l ( t ; θ ) be the loss function for estimating τ (θ ) . The Bayes estimator of τ (θ ) is that estimator t * ( ⋅ ; ", ⋅) which
minimizes
∫ l (t ( x , x , ", x ) ; θ ) f
Θ
1 2 n Θ X1 = x1 , ", X n = xn (θ x1 , x2 , " , xn ) dθ
as a function of t ( ⋅ ; ", ⋅) .
Proof
For a general loss function l ( t ; θ ) , we seek that estimator, say t * ( ⋅ ; ", ⋅) , which minimizes the expression
∫ Rt (θ ) gΘ (θ ) dθ = ∫ Eθ ⎡⎣l ( t ; θ )⎤⎦ gΘ (θ ) dθ
Θ Θ
= ∫ Eθ ⎡⎣l ( t ( x1 , x2 , " , xn ) ; θ ) ⎤⎦ g Θ (θ ) dθ
Θ
⎡ n ⎤
= ∫ ⎢ ∫ l ( t ( x1 , " , xn ) ; θ ) f X1 , X 2 , ", X n Θ=θ ( x1 , x2 , " , xn θ ) ∏ dxi ⎥ g Θ (θ ) dθ
Θ⎣⎢R i =1 ⎦⎥
⎡ f X , ", X n Θ=θ ( x1 , " , xn θ ) g Θ (θ ) dθ ⎤ n
= ∫ ⎢ ∫ l ( t ( x1 , " , xn ) ; θ ) 1 ⎥ f X , ", X ( x1 , " , xn ) ∏ dxi
⎢
R ⎣Θ
f X1 , ", X n ( x1 , " , xn ) ⎥ 1 n
i =1
⎦
⎡ ⎤ n
= ∫ ⎢ ∫ l ( t ( x1 ," , xn ) ; θ ) f Θ X1 = x1 ,...., X n = xn (θ x1 , " , xn ) dθ ⎥ f X1 , ", X n ( x1 , " , xn ) ∏ dxi
⎢
R ⎣Θ
⎥
⎦ i =1
Since, the integral is non-negative, the double integral can be minimized if the expression within the braces, which is
sometimes called the posterior risk, is minimized for each x1 , x2 , " , xn .
So, in general, the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that
estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of
Θ given the observations x1 , x2 , " , xn .
That is, the Bayes estimator of τ (θ ) is that estimator which minimizes
∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ
1 1 n n
x1 , " , xn ) dθ
Θ
Hence, the theorem is proved.
Theorem
l ( t ; θ ) be the squared-error loss function for estimating τ (θ ) . That is,
l ( t ; θ ) = ⎡⎣t ( x1 , " , xn ) − τ (θ ) ⎤⎦
2
Then the Bayes estimator of τ (θ ) is given by
n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫τ (θ ) ∏ ⎣ ⎦
E ⎡⎣τ (θ ) | X1 = x1 , " , X n = xn ⎤⎦ = i =1
n
∫ ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) dθ
i =1
Proof
We, know that the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that
∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ
1 1 n n
x1 , " , xn ) dθ
Θ
Here, the loss function is squared error loss function. So, we have that the Bayes estimator of τ (θ ) is that estimator
which minimizes
∫ ⎡⎣t ( x1 , ", xn ) − τ (θ )⎤⎦ f Θ| X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ

2
∫ ⎡⎣τ (θ ) t ( x1 , ", xn )⎤⎦ f Θ| X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ

2
=
Θ
But the expression in the above is the conditional expectation of
⎡⎣τ (θ ) − t ( x1 , " , xn ) ⎤⎦
2
with respect to the posterior distribution of Θ given X1 = x1 , " , X n = xn , which is minimized as a function of
t ( x1 , " , xn ) for t * ( x1 , " , xn ) equal to the conditional expectation of τ ( Θ ) with respect to the posterior distribution
of Θ given X 1 = x1 , " , X n = xn .
⎡ Recall that E ( Z − a )2 is minimized as a function of a for a* = E ( Z ) ⎤

⎣⎢ ⎦⎥
Hence, the Bayes estimator of τ (θ ) with respect to the squared-error loss function is given by
n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫ τ (θ ) ∏ ⎣ ⎦
E ⎡⎣τ (θ ) X 1 = x1 , " , X n = xn ⎤⎦ = i =1
n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦
Hence, the theorem is proved.

Theorem
l ( t ; θ ) be the absolute-error loss function for estimating τ (θ ) . That is,
l ( t ; θ ) = t ( x1 , " , xn ) − τ (θ )
Then the Bayes estimator of τ (θ ) is given by the median of the posterior distribution of Θ given
X1 = x1 , " , X n = xn .
Proof
We know that the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that
∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ
1 1 n n
x1 , " , xn ) dθ
Θ
Here, the loss function is absolute-error loss function. So, we have that the Bayes estimator of τ (θ ) is that estimator
which minimizes
∫ t ( x1 , " , xn ) − τ (θ ) f Θ| X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ

Θ
= ∫ τ (θ ) − t ( x1 , ", xn ) fΘ| X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ

Θ
But the expression in the above is the conditional expectation of
t ( x1 , " , xn ) − τ (θ )
with respect to the posterior distribution of Θ given X 1 = x1 , " , X n = xn , which is minimized as a function of
t ( x1 , " , xn ) for t * ( x1 , " , xn ) equal to the conditional median with respect to the posterior distribution of Θ given
X1 = x1 , " , X n = xn .
⎡ Recall that E Z − a is minimized as a function of a for a* = median of Z ⎤

⎣ ⎦
Hence, the Bayes estimator of τ (θ ) with respect to the squared-error loss function is given by the median of the
posterior distribution of Θ given X1 = x1 , " , X n = xn . (Proved)
1 ⎡ 1 2⎤
f (x |θ ) = exp ⎢ − ( x − θ ) ⎥ ; −∞ ≤ x ≤ ∞
2π ⎣ 2 ⎦
1 ⎡ 1 2⎤
g Θ (θ ) = exp ⎢ − (θ − µ0 ) ⎥ ; −∞ ≤θ ≤ ∞
2π ⎣ 2 ⎦
That is, Θ is standard normal. Write µ0 = x0 when convenient. Find the Bayes estimator of τ (θ ) with respect to
the squared error loss function.

Solution
We know that the Bayes estimator of τ (θ ) with respect to the squared error loss function is given by
∫ τ (θ ) ∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ ) dθ
E ⎡⎣τ (θ ) X 1 = x1 , " , X n = xn ⎤⎦ = n
i =1
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦

n
∏ ⎡⎣ f ( xi θ )⎤⎦ gΘ (θ )
f Θ X1 = x1 , ", X n = xn (θ | x1 , " , xn ) = i =1
n
⎡ f ( xi θ ) ⎤ g Θ (θ ) dθ
∫∏
i =1
⎣ ⎦
n
⎛ 1 ⎞ ⎡ 1 n ⎤ ⎡ 1 2⎤
∑
1
⎜ ⎟ exp ⎢ − ( xi − θ )2 ⎥ exp ⎢ − (θ − µ0 ) ⎥
⎝ 2π ⎠ ⎢⎣ 2 i =1 ⎦⎥ 2π ⎣ 2 ⎦
=
∞ n
⎛ 1 ⎞ ⎡ 1 n ⎤ ⎡ 1 2⎤
∑
1
⎜∫ ⎟ exp ⎢ − ( xi − θ )2 ⎥ exp ⎢ − (θ − µ0 ) ⎥ dθ
−∞ ⎝
2π ⎠ ⎣⎢ 2 i =1 ⎦⎥ 2π ⎣ 2 ⎦
⎡ ⎛ 1 ⎞ ⎤
2
⎢
⎢
⎜ xi ⎟ ⎥
⎜ θ − i =0 ⎟ ⎥
∑
⎢ 1⎜ ⎥
⇒ f Θ X1 = x1 , ", X n = xn (θ | x1 , " , xn ) =
1
exp ⎢ − ⎜ n +1 ⎟ ⎥
⎟
1
2π ⎢ 2⎜ 1 ⎟ ⎥
n +1 ⎢ ⎜ n +1 ⎟ ⎥
⎢ ⎜⎜ ⎟⎟ ⎥
⎢ ⎝ ⎠ ⎥⎦
⎣
So, the Bayes estimator of θ with respect to the squared error loss function is given by
n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫τ (θ ) ∏ ⎣ ⎦
E ⎡⎣τ (θ ) | X1 = x1 , " , X n = xn ⎤⎦ = i =1
n
= τ (θ ) f Θ X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ
∫
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ
∫∏ i =1
⎣ ⎦
⎡ ⎛ 1 ⎞ ⎤
2
⎢
⎢
⎜ xi ⎟
⎜ θ − i =0 ⎟ ⎥
⎥ ∑
∞ ⎢ 1⎜ ⎥
1 n + 1 ⎟ ⎥ dθ
= ∫θ exp ⎢ − ⎜
⎢ 2⎜
⎟
⎟ ⎥
1 1
−∞ 2π
n +1 ⎢ ⎜ n +1 ⎟ ⎥
⎢ ⎜ ⎟ ⎥
⎢⎣ ⎝ ⎠ ⎥⎦
1
∑x 1
∑x
i
θ− i =0 i
Now, let n +1 = z ⇒ θ= i =0 +
1
z ⇒ dθ =
1
dz
1 n +1 n +1 n +1
n +1
Now, we have that,
⎛ 1 ⎞
⎜ xi
∞ ∑ ⎟
⎡ 1 ⎤ 1
E ⎣⎡τ (θ ) | X1 = x1 , " , X n = xn ⎦⎤ = ⎜⎜ i =0 + z ⎟⎟
1 1
n +1∫ n +1 1
exp ⎢ − z 2 ⎥
⎣ 2 ⎦ n +1
dz
−∞ ⎜ ⎟ 2π
⎜ ⎟ n +1
⎝ ⎠
1
∑ xi 1
∞
1 ⎡ 1 ⎤
∫z
i =0
= + exp ⎢ − z 2 ⎥ dz
n +1 n + 1 −∞ 2π ⎣ 2 ⎦

1
∑ xi 1 ⎡0 1 ⎡ 1 ⎤
∞
1 ⎡ 1 ⎤ ⎤
∫ ∫
i =0
= + ⎢ −z exp ⎢ − z 2 ⎥ dz + z exp ⎢ − z 2 ⎥ dz ⎥
n +1 n + 1 ⎣⎢ −∞ 2π ⎣ 2 ⎦ 0
2π ⎣ 2 ⎦ ⎦⎥
1
∑ xi 1 ⎡ ∞ 1 ⎡ 1 ⎤
∞
1 ⎡ 1 ⎤ ⎤
∫ ∫
i =0
= + ⎢− z exp ⎢ − z 2 ⎥ dz + z exp ⎢ − z 2 ⎥ dz ⎥
n +1 n + 1 ⎢⎣ 0 2π ⎣ 2 ⎦ 0
2π ⎣ 2 ⎦ ⎥⎦
1
∑ xi
∴ E ⎡⎣τ (θ ) | X 1 = x1 , " , X n = xn ⎤⎦ = i = 0
n +1
So, here we have that the Bayes estimator of θ with respect to the squared error loss is given by
1 1 1
∑ xi x0 + ∑ xi µ0 + ∑ xi
i =0 i =1 i =1
= =
n +1 n +1 n +1
Since, the posterior distribution of Θ is normal, its mean and median are the same. Hence,
1 1 1
∑ xi x0 + ∑ xi µ0 + ∑ xi
i =0 i =1 i =1
= =
n +1 n +1 n +1
is also the Bayes estimator with respect to the absolute-error loss function.
1
f (x |θ ) = I ( x)
θ ( 0, θ )
Assume that the prior distribution of Θ is given by g Θ (θ ) = I ( 0,1) (θ )
That is, Θ is standard uniform. Find the Bayes estimator of τ (θ ) with respect to the squared error loss function
( t − θ )2
l (t ; θ ) = .
θ2
Solution:
We know that the Bayes estimator of τ (θ ) with respect to any general loss function such as
l (t ; θ ) =
( t −θ )
2
θ2
is that estimator which minimizes
∫ l ( t ( x1 , ", xn ) ; θ ) fΘ X = x , ", X = x (θ 1 1 n n
x1 , " , xn ) dθ
Θ
n n n
⎡ f ( xi θ ) ⎤ gΘ (θ ) ⎛1⎞
∏ ⎣ ⎦ ⎜θ ⎟
⎝ ⎠
∏ I(0,θ ) ( xi ) I( 0,1) (θ )
f Θ X1 = x1 , ", X n = xn (θ | x1 , " , xn ) = i =1
n
= n
i =1
n
⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ⎛1⎞
∫∏
i =1
⎣ ⎦ ∫ ⎜θ ⎟
⎝ ⎠
∏ I(0,θ ) ( xi ) I(0,1) (θ )dθ
i =1
n n
⎛1⎞
⎜θ ⎟
⎝ ⎠
∏ I(0,θ ) ( xi )
i =1
= 1 n n
⎛1⎞
∫⎜θ ⎟ ∏ I(0,θ ) ( xi )dθ
0⎝ ⎠ i =1
n
⎛1⎞
⎜ θ ⎟ I ( yn ,1) (θ )
= ⎝ ⎠
1 n
⎛1⎞
∫ ⎜ θ ⎟ I ( yn ,1) (θ ) dθ
yn ⎝ ⎠

n n
⎛1⎞ ⎛1⎞
⎜ θ ⎟ I ( yn ,1) (θ ) ⎜ θ ⎟ I ( yn ,1) (θ )
=⎝ ⎠ 1
= ⎝ ⎠ 1
⎡ θ − n +1 ⎤ 1 ⎡θ −( n −1) ⎤
⎢ ⎥ − ( n − 1) ⎣⎢ ⎦⎥ yn
⎣⎢ −n + 1 ⎦⎥ yn
n n
⎛1⎞ ⎛1⎞
⎜ θ ⎟ I ( yn ,1) (θ ) ⎜ θ ⎟ I ( yn ,1) (θ )
= ⎝ ⎠ = ⎝ ⎠
1 ⎡1 − y −( n −1) ⎤ 1 ⎡ 1 ⎤
− ( n − 1) ⎣⎢
n
⎦⎥ ⎢ n −1 − 1⎥
( ) ⎣⎢ yn
n − 1 ⎦⎥
Now the Bayes estimator of τ (θ ) with respect to any general loss function such as
( t − θ )2
l (t ; θ ) =
θ2
is that estimator which minimizes
n
⎛1⎞
(t − θ ) 2
(t − θ ) 2 ⎜ θ ⎟ I ( yn ,1) (θ )
f Θ X1 = x1 , ", X n = xn (θ x1 , " , xn ) dθ = ⎝ ⎠
∫ θ 2 ∫ θ2 1 ⎡ 1 ⎤
dθ
Θ Θ
⎢ − 1⎥
( n − 1) ⎣⎢ ynn −1 ⎦⎥
n
⎛1⎞
( t ( yn ) − θ ) ⎜θ ⎟
1 2
⎝ ⎠
= ∫ θ2 1 ⎡ 1 ⎤
dθ
yn
⎢ − 1⎥
( n − 1) ⎢⎣ ynn −1 ⎥⎦
Or, that estimator which minimizes
( t ( yn ) − θ ) ( t ( yn ) − θ )
1 2 n 1 2
⎛1⎞
∫ θ2
⎜ θ ⎟ dθ =
⎝ ⎠
∫ θ n+ 2
dθ
yn yn
1 1 1
1 1 1
= ⎡⎣t ( yn ) ⎤⎦ ∫ θ n+2 dθ − 2t ( yn ) ∫ ∫ θ n dθ ( A)
2
n +1
dθ + " " "
yn yn
θ yn
Here, equation ( A) is a quadratic equation in t ( ⋅) . This quadratic equation assumes its minimum for
1
1
∫ θ n+1 dθ 1 ⎡
⎣
1 − yn− n ⎤
⎦
t *
( yn ) = yn
= − n
1 1 ⎡ −( n +1) ⎤
dθ − ( n + 1) ⎣⎢1 − yn
1
∫ θ n+2 ⎦⎥
yn
n + 1 ynn − 1 y n +1
= × n × n +n1
n yn yn − 1
n + 1 ynn − 1
∴ t * ( yn ) = × n +1 × yn
n yn − 1
So, the Bayes estimator of τ (θ ) with respect to the squared error loss function
( t − θ )2
l (t ; θ ) =
θ2
is given by
n + 1 ynn − 1
t * ( yn ) = × n +1 × yn .
n yn − 1

Admissible Estimator
For two estimators T1 = t1 ( X 1 , X 2 , " , X n ) and T2 = t2 ( X 1 , X 2 , " , X n ) , the estimator T1 is defined to be a
better estimator than T2 if and only if
Rt1 (θ ) ≤ Rt2 for all θ in Θ and

Rt1 (θ ) < Rt2 for at least one θ in Θ
An estimator T = t ( X 1 , X 2 , " , X n ) is defined to be admissible if and only if there is no better estimator.
Example: Using the squared error loss function l ( t , θ ) = ( t − θ ) , estimators for the location parameters of a Normal
2
distribution given a sample of size n are the sample mean t1 ( x ) = x , sample median t2 ( x ) = m , the weighted
mean t3 ( x ) = ∑ wi xi ; ∑ wi = 1 . Their respective risk functions are

σ2 σ2 ⎧1 2⎫
R1 = , R2 = 1.57 ; R3 = σ 2 ⎨ + ∑ ( wi − w ) ⎬
n n ⎩n ⎭
Since R1 < R2 or , R1 < R3 . So x is an admissible estimator of the location parameter µ of normal distribution.
Inadmissibility of an Estimator
An estimator t is said to be inadmissible if there exists another estimator t ′ which dominates it such that
R (θ , t ′ ) ≤ R (θ , t ) for all θ in Θ and

R (θ , t ′ ) < R (θ , t ′ ) for some θ in Θ
Finding Inadmissible Estimator

To find the inadmissibility of an estimator t , we may use the following lemma.
Let the range of estimator τ (θ ) be [ a, b ] and the loss function L (θ , t ) ≥ 0 and for any fixed θ , L (θ , t ) is
increasing and for any fixed θ , L (θ , t ) is increasing as t moves away from τ (θ ) in either direction. Then any
estimator taking on values outside the closed interval [ a, b ] with positive probability is inadmissible.
Properties of Admissible Estimator

The properties of admissible estimators are as follows.
a) If the loss function L is strictly convex, then every admissible estimator must be non-randomized.
b) If L is strictly convex and t is an admissible estimator of τ (θ ) and if t ′ is another estimator with the same
risk function i.e. R (θ , t ) = R (θ , t ′ ) for all θ then t = t ′ with probability 1 .
c) Any unique Bayes estimator is admissible.
Minimax Estimator
*
An estimator T is defined to be a minimax estimator if and only if
θ
( )
Sup R θ , t * ≤ Sup R (θ , t )
θ
for every estimator t

Properties of Minimax Estimator
The properties of minimax estimator are given below.
a) One appealing feature of the minimax estimator is that it does not depend on the particular parameterization.
b) If g ( ⋅) be a prior distribution of θ such that ∫ R (θ , tg ) g (θ ) dθ = sup

θ
R (θ , t g ) then
i. t g is minimax
ii. if t g is the unique Bayes solution with respect to g ( ⋅) , it is unique minimax procedure.
c) If a Bayes estimator t g has constant risk then it is minimax.
d) If t ′ dominates by a minimax estimator t , then t ′ is also minimax.
e) If an estimator has constant risk and is admissible, it is minimax.
f) The best equivalent estimator may be frequently minimax.
Example: Suppose that θ = {θ1 , θ 2 } , where θ1 corresponds to oil and θ 2 to no oil. Let A = {a1 , a2 , a3 } where ai
corresponds to the choice i , i = 1, 2, 3 . Suppose that the following table gives the losses for the decision problem.
Drill a1 Sell a2 Partial a3

Oil θ1 0 10 5
No oil θ 2 12 1 6
If there is oil and we drill, the loss is zero while if there is no oil and we drill, the loss is 12 and so on.
An esperiment is conducted to obtain the information about θ , resulting is the random variable X with possible
values coded as 0 and 1 given by
x
0 1
Oil θ1 0.3 0.7
No oil θ 2 0.6 0.4
When there is oil, 0 occurs with probability 0.3 and 1 occurs with probability 0.7
∴ P ( x = 0 | θ1 ) = 0.3 and P ( x = 1 | θ1 ) = 0.7
Now the possible decision rules δ i ( x ) are.
i
1 2 3 4 5 6 7 8 9
x
0 a1 a1 a1 a2 a2 a2 a3 a3 a3
1 a1 a2 a3 a1 a2 a3 a1 a2 a3
Here, δ1 = Take action a1 regurdless of the value of X

⎧Take action a1 if X = 0
δ2 = ⎨
⎩Take action a2 if X = 1
and so on.

Then the risk of δ at θ is
R (θ , δ ) = E ⎡⎣l (θ , δ ( x ) ) ⎤⎦
= l (θ , a1 ) P (δ ( x ) = a1 ) + l (θ , a2 ) P (δ ( x ) = a2 ) + l (θ , a3 ) P (δ ( x ) = a3 )
Now, R (θ1 , δ 2 ) = 0 × 0.3 + 10 × 0.7 = 7

R (θ 2 , δ 2 ) = 12 × 0.6 + 1× 0.4 = 7.06
Thus we get,
1 2 3 4 5 6 7 8 9
R (θ1 , δ i ) 0 7 3.5 3 10 6.5 1.5 8.5 5
R (θ 2 , δ i ) 12 7.06 9.6 5.4 1 3 8.4 4.0 6
Max ⎡⎣ R (θ1 , δ i ) , R (θ 2 , δ i ) ⎤⎦ 12 7.06 9.6 5.4 10 6.5 8.4 8.5 6
min ⎡⎣ Max ⎡⎣ R (θ1 , δ i ) , R (θ 2 , δ i ) ⎤⎦ ⎤⎦ 5.4
⎧a1 if x=0
Thus minimax solution is δ 4 ( x ) = ⎨
⎩a2 if x =1
Again, R (θ 2 , δ 4 ) = 5.4 < R (θ 2 , δ 2 ) = 7.6 , so δ 2 is inadmissible.
Suppose that in our oil dreling example an expart thinks the chance of finding oil is 0.2, then we treat the parameter
as a random variable θ with possible values θ1 , θ 2 and the frequency function is
π (θ1 ) = 0.2, π (θ 2 ) = 0.8
So that Baye’s risk is
R (δ ) = E ⎡⎣ R (θ , δ ) ⎤⎦ = 0.2 R (θ1 , δ ) + 0.8 R (θ 2 , δ )

∴ R (δ1 ) = 0.2 × 0 + 0.8 × 12 = 9.6
R (δ 2 ) = 0.2 × 7 + 0.8 × 7.6 = 7.46
R (δ 3 ) = 0.2 × 3.5 + 0.8 × 9.6 = 8.38
and so on.
So we compute the following table
i 1 2 3 4 5 6 7 8 9
R (δ i ) 9.6 7.48 8.38 4.92 2.8 3.7 7.02 4.9 5.8
In the Bayesian framework δ is preferable to δ ′ if and only if it has smaller Bayes risk. If there is a rule δ * which
( )
attains the minimum Bayes risk i.e. such that R δ * = min R (δ ) = 2.8 then it is called a Bayes rule. From this
δ
example we say that rule δ 5 = 2.8 is the unique Bayes rule for our prior distribution.

⎧1 1 ⎫
Example: Let X ~ b (1, p ) , p ∈ Θ = ⎨ , ⎬ and A {a1 , a2 } . Let the loss function be defined as follows.
⎩4 2⎭
a1 a2
1
p1 = 1 4
4
1
p2 = 3 2
2
The set of decision rules includes four functions: δ1 , δ 2 , δ 3 , δ 4 , defined by
δ1 ( 0 ) = δ1 (1) = a1
δ 2 ( 0 ) = a1 δ 2 (1) = a2
δ 3 ( 0 ) = a2 δ 3 (1) = a1
δ 4 ( 0 ) = δ 4 (1) = a2
The risk function takes the following values:
i R ( p1 , δ i ) R ( p2 , δ i ) Max ⎡⎣ R ( p, δ i ) ⎤⎦ min Max ⎡⎣ R ( p, δ i ) ⎤⎦

p1 , p2 i p1 , p2
1 1 3 3
7 5 5
2
4 2 2 5
13 5 13 2
3
4 2 4
4 4 2 4
Thus the minimax solution is
⎧a1 if x=0
δ2 ( x) = ⎨
⎩a2 if x =1

The General Statistical Simulation Procedure (SSP) and Bootstrap Methods
Introduction to SSP and Bootstrap Method

If we know the distribution function F ( ⋅) of a random variable X and wish to evaluate some function of it, say
θ ( F ) , we can proceed in two ways: (1) evaluate θ ( F ) exactly; and (2) simulate to estimate θ ( F ) .
For example: Suppose that X is a random variable which is N ( 0, 1) so that F ( x ) = Φ ( x ) , and that we wish to
∞
know the 8 moment of X , so that θ ( F ) = ∫ x Φ ( x ) dx . Then one way we may proceed is to (1) evaluate the
8
th
−∞
integral exactly.
If the integral involved in the preceding method is such that no simple way to evaluate it exists, a second way we
may proceed is (2) simulate to estimate θ ( F ) . Generating random variables X 1 , X 2 , " , X n that are N ( 0, 1) and
independent, we can estimate θ ( F ) using the principles of estimation. Thus, we might estimate θ ( F ) by
(X 8
1 + X 28 + " + X n8 ) , which is a consistent estimator.
n
The preceding discussion assumed that we knew the distribution function F ( ⋅) . However, often we have not a
known distribution function F ( ⋅) for which we wish to know θ ( F ) , but rather a random sample X 1 , X 2 , " , X n
drawn from distribution function F ( ⋅) with F ( ⋅) unknown, and wish to estimate θ ( F ) .
Now suppose we are taking approach (2) and wish to specify the variance of our estimator. With F ( ⋅) known, we
can proceed as follows to solve this problem.
Step 1 : Generate X 1 , X 2 , " , X n and estimate θ ( F ) ; call the estimate θˆ1 .
Step 2 : Generate X n +1 , X n + 2 , " , X 2 n and estimate θ ( F ) from these new random variables which are to be
independent of all random variables previously generated; call the estimate θˆ2 .
Step 3 : Generate X 2 n +1 , X 2 n + 2 , " , X 3n and estimate θ ( F ) from these new random variables which are to be
independent of all random variables previously generated; call the estimate θˆ3 .
# # # # # #
Step N : Generate X ( N −1) n +1 , X ( N −1) n + 2 , " , X Nn and estimate θ ( F ) from these new random variables which
are to be independent of all random variables previously generated; call the estimate θˆN .
Then θˆ1 , θˆ2 , θˆ3 , " , θˆN are N independent and identically distributed random variables each estimating θ ( F ) .
Their variance may be estimated by
∑ (θî − θ )
N
i =1 θˆ1 + θˆ2 + " + θˆN

σˆ 2 = ; θ =
N −1 N
The General Statistical Simulation Procedure (SSP) and Bootstrap Methods ~ 1 of 3

is estimated by σˆ
2
And the variance of the estimator θ . We call this procedure the Statistical Simulation
N
Procedure SSP (θ , F , n, N ) .
Now, if F ( ⋅) is unknown, the SSP (θ , F , n, N ) cannot be used. However, then one will have a random sample
X 1 , X 2 , " , X n taken from the unknown distribution function F ( ⋅) . Using the random sample, the distribution
function may be estimated, say by some estimator F̂ , and then Step 1, Step 2, " , Step N followed. We call this
(
the General Statistical Simulation Procedure SSP θ , Fˆ , X 1 , " , X n , N . )
( )
There are many ways to choose the estimate F̂ of F in the SSP θ , Fˆ , X 1 , " , X n , N . One of the very simplest
is to take F̂ to be the empiric distribution function based on the sample X 1 , X 2 , " , X n . If that is done, then the
procedure is called the bootstrap procedure.
Bootstrap Sampling
Bootstrap sampling is a method of selecting a sample of size n with replacement from a set of n data points for a
data set X 1 , X 2 , " , X n . This is equivalent to record the value of each data point into a ping pong ball and placing
them into a box. Select a ping pong ball at random, record its value and replace the ball. We have to repeat this n
times. Doing this n times maintains the original sample size of n . With the Bootstrap method the basic sample is
treated as the population. Thus the bootstrap estimation procedure consists of following steps.
Step 1 : Using the original data set calculate some statistic of interest to estimate the characteristics of
population of interest. Call this B0 .
Step 2 : Take a Bootstrap sample of size n from the original data set which produces a new data set
X 1* , X 2* , " , X n* . Calculate some statistic of interest to estimate the characteristic of your population of
interest and call this B1 .
Step 3 : Then we have to repeat N times the step-2 and we will produce B1 , B2 , " , BN .
Step 4 : Sort B1 , B2 , " , BN from smallest to largest.
Step 5 : We can estimate the bias of our original estimator by B0 − B .
Uses of Bootstrapping Method:

• This method is used for computation of sampling distribution of any statistic.
• This is very good for confidence interval and bias estimation but not for point estimation.
• This method is able to estimate measures of variability.
• It is able to calculate power.
• It can be employed in nonparametric and in parametric inference.
• When the data size is very small we use bootstrap sampling for increasing the data size.
Example: Estimating the Standard Error of X .

Let θ = E ( X ) and σ 2 = Var ( X ) . Then from a random sample X 1 , X 2 , " , X n with the same distribution function
as X , we find X and it has mean θ and V X = σ ( )

2
.
n

The bootstrap method of estimating V X ( ) (
is the SSP θ , Fˆ , X1 , " , X n , N ) with F̂ taken to be the empiric
distribution function and proceeds as follows:
Step 1 : Take a sample of size n (with replacement) form { X1 , X 2 , " , X n } say { X11 , X12 , " , X1n } and
calculate its sample mean X1 .
Step 2 : Repeat step 1 independently N − 1 additional times, finding X1 , X 2 , " , X N . The bootstrap estimate of
V ( X ) is
N
∑ ( Xi − X⋅ ) X1 + X 2 + " + X N
i =1
σˆ 2 = ; X⋅ =
N −1 N
Example: Estimating Bias
Suppose that, based on a random sample X 1 , X 2 , " , X n some quantity θ of interest is estimated by θˆ . The
( )
estimator θˆ has some bias b = E θˆ − θ . To estimate the bias, consider use of SSP θ , Fˆ , X1 , " , X n , N ( ) with F̂
taken to be the empiric distribution function (so we have a bootstrap estimate). Based on N bootstrap samples of
θˆ + θˆ2 + " + θˆN

size n each, one finds the estimators θˆ1 , θˆ2 , " , θˆN with θ = 1 and estimates the bias of θˆ by
N
b = θˆ − θ .
Example:
Let X 1 , X 2 , " , X n be a random sample of size n from a Poisson distribution with unknown mean λ . If the
( )
parameter of interest is θ = P ( X ≤ 1) = e − λ (1 + λ ) , the MLE is e − X 1 + X , which is biased. To reduce the bias, let
us investigate the bootstrap method.
Let X ij ( i = 1, " , N ; j = 1, " , n ) be the N bootstrap samples, that is, samples taken at random with
replacement from { X1 , X 2 , " , X n } and for i = 1, " , N
θî = e X i (1 + X i ) − (
Number of X i1 , X i 2 , " , X in that are ≤ 1)
n
Then the bootstrap estimate of the bias of θ is simply b̂ = θ
( )
Then, one might use e − X 1 + X − bˆ to estimate θ .
Remark:
Note that an approximate 100 (1 − α ) % confidence interval for θ can be constructed using bootstrap methods, as
follows. If θ is the bootstrap estimate of θ and σˆ 2 its sample variance based on θˆ1 , θˆ2 , " , θˆN for N large we
take the interval
{θˆ − Φ (1 − α 2 )σˆ ,
−1
(
θˆ + Φ −1 1 − α 2 σˆ) }
Note that we use the original estimate of θ (not the bootstrap estimate), and the bootstrap procedure has been used
to provide us with an estimate of variability for θˆ .
Note that the exact same details apply to the more general statistical simulation procedure
( )
SSP θ , Fˆ , X1 , " , X n , N , in which the only difference is what estimate of F is being sampled from.

Estimation & Confidence Interval
Simultaneous Estimation of Several Parameters

Suppose we have vector of parameters θ = (θ1 , θ 2 , , θ k ) ; k − parameters . Our problem is to estimate the
parameters simultaneously. Let fθ ( x1 , x2 , , xn ) = L ( x;θ ) = joint distribution or LE
We have the following regularity conditions:
i) Θ is a non-degenerate open interval in R k .

ii) For all most all x ' s ( all θ ∈ Ω )
δ fθ ( x1 , x2 , , xn ) δ L ( x;θ )
( i = 1, 2, , k) = exists
δθi δθi
δ δ
L ( x ; θ ) dx = ∫ L ( x ; θ )dx
δθi ∫A
iii) for i = 1, 2, ,k
A
δθ i
where, A = domain of positive probability density
δ δ
iv)
δθi ∫ t j L ( x, θ )dx = ∫ t j δθ j L ( x; θ ) dx where, t j is the esitmator of θ j
A
⎡ δ ln L ( x, θ ) δ ln L ( x θ ) ⎤
( )
v) The elements of the matrix ∆θ = δ ij (θ ) , where δ ij (θ ) = Eθ ⎢
δθi δθ j
⎥ , exist and are such
⎣⎢ ⎦⎥
that ∆θ is positive definite.
Theorem
In any regular estimation case, the variances and covariance’s δ ij (θ ) of unbiasedness estimator
Ti ( i = 1, 2, ..., k ) for θi ( i = 1, 2, ..., k ) respectively satisfy the inequality,

u′ ∑θ u ≥ u′∆θ−1u
where, ∑θ = (σ ij (θ ) ) and u is an arbitrary vector of real parameters.
Proof
δ δ
Let the same symbol λi (θ ) be used to denote log fθ ( X1 , , X n ) as well as log fθ ( x1 , , xn ) . In this
δθi δθi
situation, condition (iii) becomes
∫ λi (θ ) fθ ( x1 , , xn ) dx = 0, (i )
A
and condition (iv) leads to,
⎧θ if i − j
∫ ti λ j (θ ) fθ ( x1 , , xn ) dx = ⎨ i
⎩0 otherwise
( ii )
A
since Ti is unbiased for θi .
Estimation & Confidence Interval ~ 1 of 17

k k
Let u1 , , uk be real numbers. Since Ti is unbiased for θi , ∑ uiTi is unbiased for ∑ uiθi ; i.e.,
i =1 i =1
k k
∫∑
i =1
ui ti fθ ( x1 , , xn ) dx = ∑ uiθi
i =1
( for all θ ∈ Θ )
A
This gives, on being differentiated with respect to θ j ,
∫∑
i =1
ui ti λ j (θ ) fθ ( x1 , , xn ) dx = u j because of ( ii )
A
k
Because of ( i ) again, ∫ ∑ ui [ti − θi ] λ j (θ ) fθ ( x1 , , xn ) dx = u j
A i =1
Taking another set of real numbers, c1 , c2 , , ck , we have

k k k k k k
∫ ∑∑ ui [ti − θi ] c j λ j (θ ) fθ ( x1 , , xn ) dx = ∑ c j u j , i.e., ∑∑ ui c j covθ (Ti , λ j (θ ) ) = ∑ c j u j

A i =1 j =1 j =1 i =1 j =1 j =1
k k
Nothing that the left hand side is the covariance between ∑ uiTi and ∑ ci λi (θ ) , we have, since
i =1 i =1
2
⎡ ⎛ k k ⎞⎤ ⎛ k ⎞ ⎛ k ⎞
⎢ covθ ∑ ∑
⎜⎜ uiTi , ci λi (θ ) ⎟⎟ ⎥ ≤ Varθ ∑
⎜⎜ uiTi ⎟⎟ Varθ ∑
⎜⎜ ci λi (θ ) ⎟⎟
⎣⎢ ⎝ i =1 i =1 ⎠ ⎦⎥ ⎝ i =1 ⎠ ⎝ i =1 ⎠
2
⎛ k ⎞
⎛ k ⎞
⎜⎜ ci ui ⎟⎟ ∑
⎝ i =1 ⎠
⇒ ∑
varθ ⎜⎜ uiTi ⎟⎟ ≥
⎛ k ⎞
⎝ i =1 ⎠ var
θ ⎜⎜ ci λi (θ ) ⎟⎟∑
⎝ i =1 ⎠
Let us now maximize the right-hand side with respect to the c ' s . Noting that the right hand side remains unchanged
if the c ' s are a multiplied by a common number and that the maximizing c ' s must be such that the correlation
between ∑ uiTi and ∑ ci λi (θ ) is a maximum (i.e., unity)

i i
k k
∑ ui [ti − θi ] = ∑ ci λi (θ )
i =1 i =1
k k k
⇒ uj ≡ ∑ ui covθ (Ti , λ j (θ ) ) = ∑ ci covθ ( λi (θ ) , λ j (θ ) ) ≡ ∑ ciδ ij (θ )
i =1 i =1 i =1
Hence the maximizing c ' s are such that (in matrix notation)
∆θ c = u
⇒ c = ∆θ−1u ( because of (V ) ) ,
⎛ k ⎞
and c′u = u ′∆θ−1u , ∑
varθ ⎜⎜ ci λi (θ ) ⎟⎟ = c′∆θ c = u ′∆θ−1u
⎝ i =1 ⎠
Hence
⎛ k ⎞
∑
varθ ⎜⎜ uiTi ⎟⎟ ≥ u ′∆θ−1u
⎝ i =1 ⎠
u ′Σθ u ≥ u ′∆θ−1u

Problem: Consider the case of a random sample from a normal population whose mean (θ1 ) and variance (θ 2 ) are both
unknown.
1 ⎛ ( x −θ1 ) ⎞
2
− ⎜ ⎟
1 2 ⎜ θ2 ⎟
Here, f ( x) = e ⎝ ⎠ ; −∞ < x < ∞
2πθ 2
Now the likelihood function is as follows,
n
⎛ 1 ⎞ − 2θ ∑ ( xi −θ1 )
1 2
L ( x, θ1 ,θ 2 ) = ⎜ ⎟ e 2
⎜ 2πθ ⎟
⎝ 2 ⎠
∑
n 1
⇒ ln L = − ln 2πθ 2 − ( xi − θ1 )2 (i )
2 2θ 2
Now we differentiate it with respect to θ1 and get,
δ ln L ( x )
∑ ( xi − θ1 )( −1)
2
=−
δθ1 2θ 2
∑ ( xi − θ1 )
1
=
θ2
⎡ δ ln L ( x ) ⎤
2
⎥ = 2 E ⎣⎡ ∑ ( xi − θ1 ) ⎦⎤ = 2 ∑ ⎡⎣ E ( xi − θ1 ) ⎤⎦
1 2 1 2
and E⎢
⎣ δθ 1 ⎦ θ2 θ2
1 n
= 2 nθ 2 =
θ2 θ2
Again we differentiate eq ( i ) with respect to θ 2 and we get,
δ ln L ( x ) ⎡ 2⎤
∑ ( xi − θ1 ) ∑ ( xi − θ1 )
n 1 1 2 1 1
=− + = ⎢−n + ⎥
δθ 2 2 θ 2 2θ 22 2θ 2 ⎣ θ 2 ⎦
⎡ δ ln L ( x ) ⎤
2
n
E⎢ ⎥ = 2
⎣ δθ 2 ⎦ 2θ 2
⎡ δ ln L ( x ) δ ln L ( x ) ⎤
E⎢ , ⎥=0
⎣ δθ 2 δθ 2 ⎦
θ2 2θ 22
Hence the lower bounds, the variance of unbiased estimator of θ1 and θ 2 are δ
21
(θ ) = and δ 22 (θ ) =
n n
respectively.
∑( Xi − X )
2
θ
The traditional unbiased estimators for θ1 and θ 2 are X and S = . Since Varθ ⎡⎣ X ⎤⎦ = 2 while
2
n −1 n
2θ 2
Varθ ⎡⎣ s 2 ⎤⎦ = 2 , the lower bound in the first case is attained but not that in the second.
n −1
Vector of Parameters
Let us assume that a random sample X 1 , X 2 , , X n of size n form the density f ( x ; θ1, θ 2 , , θ k ) is available,
where the parameter θ = (θ1 , θ 2 , , θ k ) and parameter space Θ are k − dimensional. We want to simultaneously
estimate τ1 (θ ) , τ 2 (θ ) , , τ r (θ ) , where τ j (θ ) , j = 1, , r is some function of θ = (θ1 , , θ k ) . Often k = r ,
but this need not be the case. An important special case is the estimation of θ = (θ1 , θ 2 , , θ k ) itself; then r = k ,
and τ1 (θ ) = θ1 , , τ k (θ ) = θ k . Another important special case is the estimation of τ (θ ) ; then r = 1 . A point
(
estimator of τ1 (θ ) , , τ r (θ ) ) is a vector of statistics, say (T1 , , Tr ) , where T j = t j ( X1 , , X n ) and T j is an
estimator of τ j (θ ) .

Unbiased
An estimator (T1 , , Tr ) , where T j = t j ( X1 , , X n ) ; j = 1, , r , is defined to be an unbiased estimator of
(τ1 (θ ) , , τ r (θ ) ) if and only if εθ ⎡⎣T j ⎤⎦ = τ j (θ ) for j = 1, , r and for all θ ∈ Θ .
For single estimator, we consider the variance of estimator as a member of its closeness to real valued function of
population parameter. Here, we seek generalization of the notion of variance to r dimensions. Several such
generalization have been proposed; we consider here only four of them
i) Vector of variances.
ii) Linear combination of variances.
iii) Ellipsoid of concentration.
iv) Wilks’ generalized variance.
1. Vector of Variances
Let the vector ( varθ [T1 ] , , var [Tr ]) be a measure of the closeness of the estimator (T1 , , Tr ) to
(τ1 (θ ) , , τ r (θ ) ) . Its main advantage is that it is very easy and simple. And the disadvantage of such a definition
is that our measure is vector-valued and consequently sometimes difficult to work.
2. Linear Combination of Variances

One way of over come the disadvantages faced in method (1) is used to linear combinations of variances, that is,
(T1 , , Tr ) to (τ1 (θ ) , , τ r (θ ) ) with ∑ j =1 a j varθ ⎡⎣T j ⎤⎦

r
measure the closeness of the estimator for suitably
chosen a j ≥ 0 .
Both of these (1) and ( 2 ) generalization of variance embody only the variances of the T j , j = 1, , r . But T j (θ )
are likely to be correlated. So, one should incorporate the covariance of T j ' s for measuring the closeness.
3. Ellipsoid of Concentration
Let (T1 , , Tr ) be an unbiased estimator of (τ1 (θ ) , , τ r (θ ) ) . Let σ ij (θ ) be the ij − th element of the inverse
of the covariance matrix of (T1 , , Tr ) , where the ij − th element of the covariance matrix is
σ ij (θ ) = covθ ⎡⎣Ti , T j ⎤⎦ . The ellipsoid of concentration of (T1 , , Tr ) is defined as the interior and boundary of the
ellipsoid
r r
∑∑ σ ij (θ ) ⎡⎣ti − τ i (θ )⎤⎦ ⎡⎣t j − τ j (θ )⎤⎦ = r + 2
i =1 j =1
The ellipsoid of concentration measures how concentrated the distribution of (T1 , , Tr ) is about
(τ1 (θ ) , , τ r (θ ) ) . The distribution an estimator (T1 , , Tr ) whose ellipsoid of concentration is contained within
the ellipsoid of concentration of another estimator (T1′, , Tr′ ) is more highly concentrated about
(τ1 (θ ) , , τ r (θ ) ) than is the distribution of (T1′, , Tr′ ) .
4. Wilks’ Generalized Variance

Let (T1 , , Tr ) be an unbiased estimator of (τ1 (θ ) , , τ r (θ ) ) . Wilk’s generalized variance of (T1 , , Tr ) is
defined to be determinant of the covariance matrix of (T1 , , Tr ) .

Risk Function
Rd (θ ) = Expected loss = E ⎡⎣ w (θ − d ( x ) ) ⎤⎦
= Smaller should be desired
= Smaller the risk better the estimator
Minimax Estimator
If a random variable X as a density function f (θ ; x ) and d ( x ) is some estimate of θ then the risk function is
Rd (θ ) = E ⎡⎣ w (θ , d ( x ) ) ⎤⎦
A minimax estimator d ( x ) is any estimator which minimize the supremum Sup Rd (θ ) .

θ
Properties of Minimax Estimator

i) If T * = t * ( X 1 , X 2 , , X n ) is a Bayes estimator having constant risk. i.e., Rt x (θ ) = constant then
T * = minimax estiamtor .
If tn′ dominants a minimax estimator tn then tn′ is also minimax.

*
ii)
iii) If an estimator has constant risk and is admissible it is minimax.
Properties of Admissible Estimator

i) If the loss function L is strictly convex than every admissible estimator must be normalized.
ii) If L is strictly convex and T be an admissible estimator and if t ′ is another estimator with the same risk that if
T ′ is an another estimator with the same risk i.e. R (θ , t ) = R (θ , t ′ ) then t = t ′ with probability 1 .
iii) Any unique Bayes estimator is admissible (here uniqueness mean that any two Bayes estimator on a set N
with Rθ ( N ) = θ ∀ Θ ).
Problem: If x1 , , xn are n independent Gaussian normal random variable with distribution function N ( µ , θ ) and the
loss function is the squared error. Find the minimax estimator of mean θ .
Solution
Consider a sequence of prior distribution with mean 0 and variance σ 2 . If θ is a prior distribution of
p (θ | x1 , , xn )
P (θ | x1 , , xn ) = for n = 1, x = x1
p ( x1 , , xn )
p (θ | x ) =
( )
N ( x ; θ , 1) N θ ; 0, σ 2
∞
∫ N ( x ; θ , 1) N (θ ; 0, σ ) dθ
2
−∞
1 ⎡ 1 1+σ 2 ⎛ σ2 ⎞
2⎤
= exp ⎢ − ⎜ θ − x ⎟⎟ ⎥
2πσ 2 ⎢ 2 σ 2 ⎝⎜ 1+ σ 2 ⎠ ⎥
⎣ ⎦
(1 + σ ) 2
E (θ | x ) = d ( x )
xσ 2
=
1+ σ 2
σ2 σ2
V (θ | x ) = then Sup ( V (θ | x ) ) = Sup
1+σ 2 σ 1+ σ 2
⎛ ⎞
⎜ 1 ⎟
= Sup ⎜ ⎟
σ ⎜ 1+ 1 ⎟
⎜ ⎟
⎝ σ2 ⎠
σ 2x
and lim δ ( x ) = lim
σ →∞ δ →∞ 1 + σ 2
⎡ ⎤
= lim ⎢
1 ⎥
x =1
δ →∞ ⎢ 1 + 1⎥
⎣⎢ σ 2
⎦⎥
Hence x is a minimax estimator.
Problem: Find the minimax estimator of θ in sampling from the Bernoulli distribution using a squared error loss function.
Solution
A Bayes estimator is given by
n−∑ x
∫0 θθ ∑ (1 − θ ) (1 B ( a, b ) )θ a −1 (1 − θ )b−1 dθ ∫0 θ ∑ x + a (1 − θ )n−∑ x +b−1dθ
1 x 1
i i i i
= 1
n−∑ x
∫0 θ ∑ (1 − θ ) (1 B ( a, b ) )θ a −1 (1 − θ )b−1 dθ ∫0 θ ∑ x + a −1 (1 − θ )n−∑ x +b−1 dθ
1 x i i i i
B ( ∑ xi + a + 1, n − ∑ xi + b )
=
B ( ∑ xi + a, n − ∑ xi + b )
=
∑ xi + a + 1 n − ∑ xi + b × ∑ xi + a + n − ∑ xi + b
∑ xi + a + 1 + n − ∑ xi + b ∑ xi + a n − ∑ xi + b
=
∑ xi + a n + a + b
a + b + n +1 1
=
∑ xi + a
n+a+b
So, the Bayes estimator with respect to a beta prior distribution with parameters a and b is given by
t * ( x1 , x2 , , xn ) =
∑ xi + a = ∑ xi +
a
(i )
n+a+b n+a+b n+a+b
⎡ ⎤
∑ xi + B
1 a
⇒ t *AB ( x1 , x2 , , xn ) = A ⎢ A = n + a + b and B = n + a + b ⎥
⎣ ⎦
Risk of the estimator ,
(θ ) = E ⎡⎣( A∑ xi + B ) − θ ⎤⎦
2
ℜt *
AB
{ ( ∑ x − nθ ) + B − θ + nAθ }⎤⎦⎥
2
= E ⎡⎢ A
⎣ i
= A2 E ⎡⎢ ( ∑ xi − nθ ) {∑ E ( xi ) − nθ }
2⎤
⎥⎦ + ( B − θ + nAθ ) + 2 ( B − θ + nAθ ) A
2
⎣
2 ⎡
(∑ )
xi − nθ + ( B − θ + nAθ ) ⎤⎥
2 2
= A E⎢
⎣ ⎦
2 2 ⎡ 2⎤
= A n E ⎢ n ( xn − θ ) ⎥ + ( B − θ + nAθ )
2
⎣ ⎦
θ (1 − θ )
+ ( B − θ + nAθ )
2
= A2 n 2
n
= A2 nθ (1 − θ ) + ( B + θ ( nA − 1) )
2

= A2 nθ − A2 nθ 2 + B 2 + θ 2 ( nA − 1) + 2 ( nA − 1) Bθ
2
= θ 2 ⎢⎡( nA − 1) − nA2 ⎤⎥ + θ ⎡ nA2 + 2 ( nA − 1) B ⎤ + B 2

2
⎣ ⎦ ⎣ ⎦
will be constant if ( nA − 1)2 − nA2 = 0 & nA2 + ( 2nA − 1) B = 0
And ( )
A2 n 2 − n − 2nA + 1 = 0
⇒ A=
(
2n ± 4n 2 − 4 ⋅ 1 n 2 − n )= 2n ± 2 n
2(n − n) ⎛ 2
( )
2 2⎞
2⎜ n − n ⎟
⎝ ⎠
n± n
=
( n + n )( n − n )
1
=
n± n
1
=
n ( )
n ±1
Again, nA2 + 2 ( nA − 1) − B = 0
2
⎛ 1 ⎞
⎜ ⎟
nA2 n +1 ⎠
= ⎝
1
∴ B= for A =
2 (1 − nA ) ⎛
2 ⎜⎜ 1 −
n ⎞
⎟
n ( n +1 )
⎝ n + 1 ⎟⎠
1
=
2 n +1 ( )
1
Now, A=
n+a+b
⎡ ⎤
1 1 ⎢∵ A = 1 ⎥
⇒ =
n ( n +1 ) n+a+b ⎢
⎢⎣ n ( )
n + 1 ⎥⎥
⎦
⇒ n+a+b = n ( n +1 )
⇒ a+b = n+ n −n
⇒ a+b = n
a
Again B=
n+a+b
1 a
=
2 ( n +1) n+a+b
⇒ n + a + b = 2a ( n +1 )
⇒ n + n = 2a ) ( n +1
n+ n n ( n + 1) n
∴ a= = =
2 ( n + 1) 2 ( n + 1) 2
So our estimator is
∑ xi + a = ∑ xi + a
n+a+b n+ n
So, this is the Bayes estimator with constant risk. Hence it is the Minimax estimator.
Bayesian Confidence Interval
Example: Assuming each item coming off a production line either is or is not defective. So, we can call each item a
Bernoulli trial. Assume again the trials are independent with P ⎣⎡defective⎦⎤ = θ for each trial.
If we select n times from the production line and then,
⎧1 if the item i is defective

Let , X i = ⎨
⎩0 if the item i is not defective
The X1, X 2 , , X n is a random sample of a random variable x with parameter θ . We know that the conjugate prior
of θ is a beta density with parameters a and b .
For example, for our production line suppose our prior information suggest
E ⎡⎣θ ⎤⎦ = 0.01 Var ⎡⎣θ ⎤⎦ = 0.0001
The larger we take Var (θ ) , the less sure we are of our prior of our information. Thus we determine a & b .
a
= 0.01
a+b
⇒ a = 0.0101b
ab
= 0.0001
( a + b ) ( a + b + 1)
2
⇒ {
ab = 0.0001 ( 0.0101b + b ) ( 0.0101b + b + 1)
2
}
Now, if we observe ∑ X i = ∑ xi from the sample we observe that posterior distribution for θ is again a beta
distribution with parameters a + ∑ xi & b + n − ∑ xi
a+b+n
θ (1 − θ )
a+ ∑ xi b+n− ∑ xi
Thus the Bayes estimator of θ is mean of these posterior distribution i.e.,
∑ xi ⎤⎦ = a + ∑ xi + ∑
a+ xi
θ * = E ⎡⎣θ |
b + x − ∑ xi
=
a+ ∑ xi
a+b+n
= Bayes estimator of θ =
( ∑ xi + 0.98)
n + 97.02 + 0.98
Bayesian Interval
Given a random sample of a random variable the confidence interval can be evaluated and in a sense we are
100 (1 − α ) % sure that observe confidence interval covers the true unknown parameter value. Very similar
manipulation can be accomplished with the Bayesian approach. Suppose we are given a random sample of a
random variable x whose distribution depends on unknown parameter θ . The parameter θ has a prior density
fθ (θ ) . Once the sample values x1, x2 , , xn are known, we can compute the posterior distribution fθ | x (θ | x ) which
summarizes all the current information about θ then if c1 < c2 are two constants
p ⎣⎡c1 ≤ θ ≤ c2 | x ⎦⎤ = 1 − α
We are 100 (1 − α ) % sure that ( c1, c2 ) includes θ given the sample values. We will call such an interval ( c1, c2 ) a
100 (1 − α ) % Bayesian interval for θ .

Approximation confidence interval in large samples:
We know that under certain regularity conditions MLE ' s are asymptotically normal with mean θ and asymptotic
variance
1 1
σ n2 (θ ) = =−
⎡ δ ln f ( x,θ ) ⎤
2 ⎡ δ 2 ln f ( x,θ ) ⎤
nEθ ⎢ ⎥ n Eθ ⎢ ⎥
⎣⎢ δθ ⎦⎥ ⎢⎣ δθ 2 ⎥⎦
tn − θ
When such asymptotically normal estimation exists then may be taken as a pivotal quantity and a
σ n (θ )
100 (1 − α ) % C.I of θ may be taken as approximately
⎡ ⎤
⎢⎣Tn + zα 2σ n (θ ) , Tn + z1−α σ n (θ ) ⎥
2 ⎦
( tn − θ )
The above method provides a large sample confidence interval so long as can be inverted.
σ n (θ )
Example: Let X1, ( )

, X n be a random sample drawn from N 0, σ 2 . Here, θ = σ 2 . Find central C.I for σ 2 with an
approximate confidence coefficient 1 − α .
Solution:
2
1⎛ x ⎞
− ⎜ ⎟
The probability density function, ( )
f x ; 0, σ 2 =
1
2πσ 2
e 2⎝ σ ⎠
⎞ − 1 ∑ 2i 1 ∑ xi
n x2 2
⎛ 1 ⎛ 1 ⎞ −2 θ
n
The likelihood function is given by, L=⎜ ⎟ e 2 σ =⎜ ⎟ e
⎜ 2πσ 2 ⎟ ⎝ 2πθ ⎠
⎝ ⎠
Now taking ln in both sides
n n
ln L = − ln 2π − ln θ −
1 ∑ xi2
2 2 2 θ
Now,
δ ln L
=0
δθ
⇒ −
n1 1
+
xi2
=0
∑
2 θ 2 θ2
⇒ θˆ =
∑ xi2
n
Again,
δ 2 ln L n
= 2 −2
xi ∑ 2
δθ 2
2θ 2θ 3
⎡ δ 2 ln L ⎤ n nθ n
E⎢ ⎥ = 2 −2 3 =−
⎣⎢ δθ ⎦⎥ 2θ 2θ 2θ 2
2
⎡ δ 2 ln L ⎤ n
∴ −E⎢ ⎥= 2
⎢⎣ δθ ⎥⎦ 2θ
2
We have,
1 2θ 2
σ n2 = =
⎡ δ ln L ⎤
2 n
−E ⎢ 2 ⎥
⎣⎢ δθ ⎦⎥

∴ 100 (1 − α ) % Confidence interval for σ 2 is given by,
⎡
⎢ ∑ xi2 + zα 2σ 4
,
∑ xi2 + z 2σ 4 ⎤⎥
⎢ n n n 1−α n ⎥
2 2
⎣ ⎦
⇒
⎡
⎢ ∑ xi2 + zα 2
θ,
∑ xi2 + z 2 ⎥
θ
⎤
⎡∵ θ = σ 2 ⎤
⎢ n n n 1−α n ⎥ ⎣ ⎦
2 2
⎣ ⎦
⇒
⎡
⎢ ∑ xi2 + zα 2 ∑ xi2 , ∑ xi2 + z 2 ∑ xi2 ⎤⎥ ⎡
⎢∵ θ = ∑ xi2 ⎤⎥
⎢ n n n n 1−α n n ⎥ ⎢ n ⎥
2 2
⎣ ⎦ ⎣ ⎦
⇒
⎡
⎢ ∑ xi2 ⎧⎪⎨1 + zα 2 ⎫⎪
⎬,
∑ xi2 ⎧⎪⎨1 + z 2 ⎫⎪⎤⎥
⎬
⎢ 1−α n ⎪⎭⎥
⎣
n ⎩⎪ 2 n ⎪⎭ n ⎪⎩ 2
⎦
⎡ ⎤
⎢
⎢
∑
xi2 ∑ xi2 ⎥
⎥
∴ n , n
⎢⎡ ⎥
⎢ ⎢1 + z 2⎤ ⎡ 2 ⎤⎥
⎢ ⎣⎢ 1−α ⎥ ⎢1 + zα ⎥⎥
2 n ⎥⎦ ⎣⎢ 2 n ⎦⎥
⎣ ⎦
This C.I is not invariant under transformation of parameters. Thus if we take square roots of C.I then that will not get
the C.I of σ .
Now we will consider construction of Large Sample Confidence Intervals which are invariant under transformation of
parameter.
Suppose that p.d . f f ( x;θ ) is such that
⎡ δ ln f ( x;θ ) ⎤ ⎡ δ ln f ( x;θ ) ⎤ ⎡ δ 2 ln f ( x;θ ) ⎤

E⎢ ⎥=0 & V⎢ ⎥ = − E ⎢ ⎥ = k 2 ( say ) < ∞
⎣⎢ δθ ⎦⎥ ⎣⎢ δθ ⎦⎥ ⎣⎢ δθ 2
⎦⎥
n
Let X 1 , , X n be a random sample of size n drawn from f ( x;θ ) and L = ∏ f ( xi ; θ ) . Clearly then, each of the
i =1
∂ ln ( X i ; θ )
random variables ( i = 1, , n ) has mean zero and variance K 2 . Therefore, by the central limit
∂θ
theorem, their sample mean
δ ln L
1 δ ln L ⎛ k2 ⎞ δθ
~ N ⎜ 0, ⎟⎟ i.e., ~ N ( 0,1) (i )
n δθ ⎜ n
⎝ ⎠ ⎪⎧ ⎛ δ 2 ln L ⎞ ⎪⎫
⎨− E ⎜⎜ 2 ⎟⎟⎬
⎩⎪ ⎝ δθ ⎠ ⎭⎪
Using this property one can get a large sample C.I for θ . Note that the maximum likelihood estimate of θ has not
been used here.
Let, Φ (θ ) be the strictly increasing function of θ . Now,
δ ln L δ ln L δ Φ
=
δθ δ Φ δθ
⎡ δ 2 ln L ⎤ ⎡ δ 2 ln L ⎤ ⎛ δ Φ ⎞ 2 ⎛ δ ln L ⎞ δ 2 Φ
E⎢ ⎥ = E ⎢ 2 ⎥⎜ ⎟ +⎜ ⎟ 2
⎣⎢ δθ ⎦⎥ ⎣⎢ δ Φ ⎦⎥ ⎝ δθ ⎠ ⎝ δ Φ ⎠ δθ
2
⎡ δ 2 ln L ⎤ ⎛ δ Φ ⎞ 2 ⎡ δ 2 ln L ⎤ ⎡ δ ln L ⎤
Hence, E⎢ 2 ⎥ ⎜
= ⎟ E⎢ 2 ⎥ ⎢ Since, = 0⎥
⎢⎣ δθ ⎥⎦ ⎝ δθ ⎠ ⎢⎣ δ Φ ⎥⎦ ⎣ δΦ ⎦
Therefore, if (θ1 , θ 2 ) is a C.I for θ then {Φ (θ1 ) , Φ (θ 2 )} is C.I for Φ (θ ) .

Example: Let X1 , (
, X n be a random sample from a N 0, σ 2 population. Here )
n ∑ xi ⎛ ∂ 2 ln L ⎞
2
∂ ln L 2n
=− + and E ⎜⎜ 2 ⎟⎟ = −σ2
∂σ σ σ3 ⎝ ∂ σ ⎠
∑ X i2 −n
Hence σ2 ~ N ( 0, 1)
2n
A central 100 (1 − α ) % confidence interval for σ is, therefore,
⎧ 1 1
2⎫
⎪⎪⎛⎜ ∑ ⎞ 2
X i2 ⎟
⎛
⎜ ∑
X i2 ⎟
⎞ ⎪⎪
⎨⎜ ⎟
,
⎜n−z ⎬ (1)
⎪⎜ n + zα 2n ⎟ ⎜ α 2n ⎟⎟ ⎪
⎝ ⎠ ⎝ ⎠
⎩⎪ ⎭⎪
2 2
If the variance σ 2 is treated as parameter, then the method yields 100 (1 − α ) % confidence interval for σ 2 as
⎧
⎪⎪
∑ X i2 ∑ X i2 ⎫
n , n ⎪⎪
⎨ ⎬
⎪1 + zα 2 1 − zα 2 ⎪
⎩⎪ 2 n 2 n ⎭⎪
It may be noted that the large sample confidence intervals based on maximum likelihood estimators will be shorter
on an average that the large sample confidence intervals based on any other estimator.
Confidence Belt
Let T be a statistic whose distribution depends on θ , preferably a sufficient statistic for θ . For each θ , let us
determine the values t1 (θ ) and t2 (θ ) such that
Pθ ⎡⎣T < t1 (θ ) ⎤⎦ = α1 and Pθ ⎡⎣T > t2 (θ ) ⎤⎦ = α 2
Where, α1 + α 2 = α . Supposing Θ is a
non-degenerate real interval, by varying
θ we shall get two curves from t1 (θ )
and t2 (θ ) . The first curve, C1 , has the
equation t = t1 (θ ) and the second, C ,
the equation t = t2 (θ ) . Let the two
curves be as in the following figure, so

that any line drawn perpendicular to the
t − axis intersects both the curves.
Let us denote the ordinate of the point of

intersection of this line with C1 by
θ 2 ( t ) and that of the point of intersection of the line with C2 by θ1 ( t ) , so that θ1 ( t ) < θ 2 ( t ) .
Consider now the two random variables θ1 ( t ) and θ 2 ( t ) , which are so defined that for
T = t , θ1 (T ) = θ1 ( t ) and θ 2 (T ) = θ 2 ( t ) . From the way θ1 (T ) and θ 2 ( t ) have been obtained, it is obvious that
θ1 (T ) ≤ θ ≤ θ 2 (T ) iff t1 (θ ) ≤ T ≤ t2 (θ ) .

As such,
Pθ ⎡⎣θ1 (T ) ≤ θ ≤ θ 2 (T ) ⎤⎦ = Pθ ⎡⎣t1 (θ ) ≤ T ≤ t2 (θ ) ⎤⎦ = 1 − α ∀ θ ∈Θ
Hence given a set of observations X , if t denotes the corresponding value of T , then θ1 ( t ) and θ 2 ( t ) are a pair
of confidence limits to θ with confidence coefficient 1 − α .
The region in the (T , θ ) - plane which is bounded by the two curves C1 and C2 is called a confidence belt for θ
corresponding to the confidence coefficient 1 − α .
Example: Suppose ( X i , Yi ) , i = 1, 2, , 20 are a random sample drawn from bivariate normal with
( )
BIV µ x , µ y , σ x2 , σ y2 , ρ where the all parameters are not known. We want to set confidence limits for ρ .
Solution
We know, sample correlation coefficient, r=

∑ ( X i − X )(Yi − Y )
∑ ( X i − X ) (Yi − Y )
2 2
The distribution of r depends on ρ only. Let α = 0.05 . From the tables of the correlation coefficient by F.N David
et al., we may obtain, for each ρ , the values r1 ( ρ ) and r2 ( ρ ) of r such that
Pθ ⎡⎣ r < r1 ( ρ ) ⎤⎦ = Pθ ⎡⎣ r > r2 ( ρ ) ⎤⎦ = 0.025.
These are shown in the following table for the values of ρ from −0.9 to 0.9 at intervals of 0.1 .
Table: Values of r1 ( ρ ) and r2 ( ρ ) for n = 20
ρ r1 ( ρ ) r2 ( ρ ) ρ r1 ( ρ ) r2 ( ρ )
−0.9 −0.97065 −0.77222 0 −0.44486 0.44486
−0.8 −0.92223 −0.56661 0.1 −0.35862 0.52565
−0.7 −0.92289 −0.38984 0.2 −0.26394 0.59586
−0.6 −0.83500 −0.22886 0.3 −0.15880 0.71366
−0.5 −0.78095 −0.08607 0.4 −0.04226 0.72585
−0.4 −0.72585 0.04226 0.5 0.08607 0.78095
−0.3 −0.71366 0.15830 0.6 0.22886 0.83500
−0.2 −0.59586 0.26394 0.7 0.38984 0.92289
−0.1 −0.52565 0.35862 0.8 0.56661 0.92223
0.9 0.77222 0.97065
Now, given the observed value of r for a particular random sample of size 20 from the bivariate normal distribution,
we can obtain the confidence limits to ρ with confidence coefficient 1 − 2 × 0.025 = 0.95 . Suppose, e.g., the
observed value or r is 0.55 . Treating 0.55 as a value of r2 ( ρ ) , we find, by inverse interpolation, the
corresponding value of ρ to be 0.135 . Similarly, treating 0.55 as a value of r1 ( ρ ) , the corresponding value of ρ
is found to be 0.791 . Hence for this value of r , the 95% confidence limits to ρ are 0.135 and 0.791.

Shortest Confidence Intervals
Suppose we have two statistics T1 & T2 of
P ⎡⎣T < t1 (θ ) ⎤⎦ = α1 and P ⎡⎣T > t2 (θ ) ⎤⎦ = α 2
in such a way that α1 = α 2 = α . However, it is clear that α1 & α 2 may be chosen in infinitely many ways, each
2
satisfying the conditions α i ≥ 0 and α1 + α 2 = α . Let us consider a particular function
n ( X −θ )
ψ (T , θ ) =
σ
If α not fixed then we can get many confidence interval. But α is fixed then we have many confidence interval.
So we need some criterion which may make a choice among these infinite set of confidence. And obvious method of
selecting one out of the possible confidence interval is based on the width of the interval.
Let us suppose that T1 & T2 are two values such that
P ⎣⎡T1 ≤ τ (θ ) ≤ T2 ⎦⎤ = 1 − α (i )
Then the confidence interval given by T1 & T2 will be said to be better than that of the interval given by T1′ & T2′
which satisfy if
T2 − T1 ≤ T2′ − T1′ ∀ θ ∈Θ ( ii )
If equation ( ii ) holds for every other pair of statistics T1′ & T2′ satisfying Pθ ⎣⎡T1 ≤ γ (θ ) ≤ T2 ⎦⎤ = 1 − α for all θ ∈ Θ
then the confidence interval given by T1 & T2 will be called uniformly shortest confidence interval for τ (θ ) based
on the statistic T .
( )
Example: Consider X ~ N θ , σ 2 where σ 2 is known. Find the shortest confidence interval for θ .
Solution
⎡ n ( X −θ ) ⎤
We have, P ⎢τ1−α1 ≤ ≤ τα2 ⎥ = 1 − α
⎢⎣ σ ⎥⎦
⎡ σ σ ⎤
⇒ P ⎢ X − τα2 ≤ θ ≤ τ1−α1 ⎥ = 1−α
⎣ n n⎦
The length of the corresponding confidence interval,

σ
L= ⎡τ α + τ1−α ⎤
n⎣ 2 1 ⎦
So, we have to minimize L i.e., minimize τ α 2 + τ 1−α1 subject to the condition α1 ≥ 0 and α1 + α 2 = α .
Due to symmetry of the distribution of n ( X − θ ) σ about zero, the difference will be minimum when
τ1−α1 = −τ α 2 i.e. α1 = α 2 = α 2
Hence the interval is in fact the shortest confidence interval based on the distribution of X .
In some situation the length of he confidence interval may involve some function of sample observations, e.g., when,
under the normal set-up the confidence interval for µ is obtained from the t − distribution for the statistic
n ( X − µ ) S or when, under the same set-up, the confidence interval for σ 2 is obtained from the
∑( Xi − X )
2
χ 2 − distribution for σ 2 . Here in order to make choice among all possible confidence intervals with
same confidence coefficient, we may make use of the average or expected length of the confidence interval.
For the statistics T1 and T2 expected length is,
Eθ (T2 − T1 )
The interval for which these expected length is minimum may be called the interval with shortest expected length or
shortest average length.
Example: Let X 1 , X 2 , , X n is a random sample draw from N µ , σ 2 ( ) here both µ and σ 2 are unknown. We have
to estimate shortest confidence interval for µ .
Solution
⎡ n (X − µ) ⎤
We have, P ⎢t(1−α1 ), n −1 ≤ ≤ tα 2 , n −1 ⎥ = 1 − α
⎢⎣ S ⎥⎦
⎡ S S ⎤
⇒ P ⎢ X − tα 2 ,n −1 ≤ µ ≤ X − t1−α1 ,n −1 ⎥ = 1−α
⎣ n n⎦
The expected length of the confidence interval is,
Eθ ( S )
( tα ,n−1 − t1−α ,n−1 )
2 1
n
= kσ ⎡⎣tα 2 ,n −1 − t1−α1 ,n −1 ⎤⎦
where k is constant that depends on n alone. So we have to minimize ⎡⎣tα 2 , n −1 − t1−α1 , n −1 ⎤⎦ subject to the condition
α1 ≥ 0, α 2 ≥ 0 and α1 + α 2 = α .
Due to symmetric of the t − distribution around zero, the difference tα 2 , n −1 − t1−α1 , n −1 will be minimum if
t1−α1 ,n −1 = −tα 2 , n −1 , i.e. when α1 = α 2 = α .

2
( )
Example: Let X ~ N µ , σ 2 where µ is known. Find the confidence interval for σ 2 .
Solution
Pθ ⎡ ∑ ( xi − µ ) θ < χ12−α1 ,n ⎤ = α1 ⎫
2
The inequalities
⎣ ⎦ ⎪
⎬ here θ = σ 2
⎡
∑ ( xi − µ ) ⎤
θ > χα 2 ,n = α 2 ⎪
2 2
and Pθ
⎣ ⎦ ⎭
∑ i
⎡ ( X − µ )2
∑( Xi − µ ) ⎤
2
lead to the result Pθ ⎢ ≤θ ≤ ⎥ = 1−α
⎢ χα22 ,n χ12−α1 ,n ⎥
⎣ ⎦
The corresponding confidence interval has the length,
⎡ 1 1 ⎤
∑( Xi − µ )
2
⎢ 2 − 2 ⎥
⎢⎣ χ1−α1 ,n χα 2 ,n ⎥⎦
⎡ 1 1 ⎤
which has the expected value nθ ⎢ 2 − 2 ⎥
⎣⎢ χ1−α1 , n χα 2 , n ⎦⎥
⎡ 1 1 ⎤
The minimization of this expected length amounts to minimization of ⎢ − ⎥ subject to the condition,
⎢⎣ χ1−α1 ,n χα 2 ,n ⎥⎦
2 2
χ 22
∫ f ( χ ) d ( χ ) = 1−α
2 2
χ12
where, χ12 = χ12−α1 , n , χ 22 = χα22 , n and f is the p.d.f of the χ 2 − distribution with n degrees of freedom.
Using Lagrange’s method of undetermined multipliers, which involves the partial differentiation of
⎡ χ 22 ⎤
∫ ( ) ( )
1 1 ⎢ f χ 2 d χ 2 − (1 − α ) ⎥
− + λ
χ12 χ 22 ⎢ 2 ⎥
⎣ χ1 ⎦
with respect to χ1 and χ 2 , we get the minimizing equation as

2 2
1
χ14
( )
+ λ f χ12 = 0 and
1
χ 24
( )
+ λ f χ 22 = 0
( )
⇒ 1 + χ14 λ f χ12 = 0 (1) ⇒ ( )
1 + χ 24 λ f χ 22 = 0 ( 2)
4
( )
Now from equation (1) and ( 2 ) we can write, χ1 f χ1 = χ 2 f χ 2
2 4
( ) is satisfied, besides the equation,
2
χ 22
∫ f (χ ) d ( χ ) = 1−α
2 2
χ12
The actual determination of the values χ12 and χ 22 will, of course, by pretty difficult. In practice, one takes χ12 and
α
χ 22 such that α1 = α 2 = . But this may make the average length too big.
2
For example, if n = 10, α = 0.05, α1 = α 2 = 0.025 then χ12 = 3.247 and χ 22 = 20.483 . So, average length of the
interval is,
⎡ 1 1 ⎤
10 θ ⎢ − ⎥ = 3.0318 θ
⎣ 3.247 20.483 ⎦
On the other hand, if we take α1 = 0.05 , α 2 = 0 then χ12 = 3.940 and χ 22 = ∞ then the average length of the
interval is ,
⎡ 1 ⎤
10 θ ⎢ − 0 ⎥ = 2.58 θ
⎣ 3.940 ⎦
⎡ v ⎤ n
Thus this second procedure, where the confidence interval will be of the form ⎢ 0, ⎥ , where v =
χ12 ⎦⎥
∑ ( xi − µ )2 ,
⎣⎢ i =1
would seem to be preferable to this procedure. Thus, this interval is, in fact, not only shorter on the average, but
shorter in every case.
Case of Discrete Random Variable

The case of discrete random variables requires to be separately dealt with, for if we want to apply one of the previous
procedures, we immediately face a difficulty. In this case we cannot hope to get for each α ( 0 < α < 1) a confidence
interval that will have confidence coefficient exactly equal to 1 − α .

One way of avoiding this problem is to require only that the confidence coefficient be at least 1 − α . Then the
statistics T1 and T2 will provide confidence limits to a parametric function γ (θ ) if
Pθ ⎡⎣T ≤ γ (θ ) ≤ T2 ⎤⎦ ≥ 1 − α for all θ ∈ Θ
The actual determination of the confidence intervals may be carried out by drawing confidence belts.
Example: Let X 1 , X 2 , , X10 be a random sample from a (point binomial) distribution with p.m. f .
1− x
⎪⎧θ x (1 − θ ) if x = 0, 1
fθ ( x ) = ⎨ where 0 ≤ θ ≤ 1.
⎪⎩0 oterwise
For obtaining confidence limits to θ with a confidence coefficient at least equal to 0.90 , we may first determine, for
a suitable set of values of θ , the values t1 (θ ) and t2 (θ ) of the sufficient statistic T = ∑ Xi such that
i
Pθ ⎡⎣T < t1 (θ ) ⎤⎦ ≤ 0.05 and Pθ ⎡⎣T > t2 (θ ) ⎤⎦ ≤ 0.05
the inequalities for Pθ being made as near to equalities as possible.
For values of θ from 0.1 to 0.9 (taken at intervals of 0.1 ), these numbers t1 (θ ) and t2 (θ ) are as shown in the
table bellow:
θ t1 (θ ) t2 (θ ) If we draw t1 (θ ) and t2 (θ ) for different values of θ we
0.1 0 3 can get a confidence belt of θ .

0.2 0 4 Pθ ⎡⎣t1 (θ ) ≤ T ≤ t2 (θ ) ⎤⎦ ≥ 0.90
0.3 1 5
the confidence coefficient belt can be improved and the
0.4 2 7
0.5 2 8 confidence coefficient made closer to 0.9 if n is make
0.6 3 8 large and if at the same time we tabulate t1 (θ ) and
0.7 5 9
0.8 6 10 t2 (θ ) at finer intervals of θ .
0.9 7 10
Theory of Confidence Set

In this context, we are interested in a set of the parameter space Θ , determined in the light of the observations X ,
that may be supposed to cover the true value(s) of the parameter(s) and that is why a concept of confidence set
rather than confidence intervals.
Let S be a set of parameter space Θ , then we shall write ' S c θ ' to mean that this set covers or includes θ , so
that S c θ ⇔ θ ∈ S .
Definition
A family of sets S ( X ) , for varying x ∈ ℑ , of the parameter space Θ is said to be a family of confidence sets at the
level 1 − α (or with the confidence coefficient 1 − α ) if
Pθ ⎣⎡ S ( X1 , X 2 , , X n ) c θ ⎦⎤ = 1 − α for all θ ε Θ

Definition
A family of sets S0 ( Χ ) , for varying x ∈ ℑ , of the parameter space Θ is said to constitute a family of uniformly more
accurate (or most selective or smallest) confidence sets if
Pθ ⎡⎣ S0 ( X1 , X 2 , , X n ) c θ ⎤⎦ = 1 − α for all θ ∈ Θ (i )
and Pθ ⎡⎣ Sθ ( X 1 , X 2 , , X n ) c θ ⎤⎦ ≤ Pθ ⎡⎣ S ( X1 , X 2 , , X n ) c Θ ⎤⎦ for all θ , θ ′∈ Θ (θ ≠ θ ′ ) ( ii )
whatever, the other family of sets satisfies ( i ) and ( ii ) . The implication of equation ( ii ) is that it has a smaller
probability of including a wrong value or set of values of the parameter θ then any other family of sets at the same
level. In this sense S0 ( x ) is the smallest confidence set of level α corresponding to the set of observation x . In
most cases a family of UMA sets cannot be obtained. Hence we introduce the concept of unbiasedness.
Definition
A family of sets S ( Χ ) for different values of x ∈ ℑ of the Θ is said to constitute a family of unbiasedness
confidence sets of level 1 − α if
Pθ ⎡⎣ S ( X1 , X 2 , , X n ) c θ ⎤⎦ ≤ 1 − α for all θ , θ ′ ∈ Θ, θ ≠ θ ′
Hence S ( Χ ) is a family of unbiased sets iff the probability for S ( X 1 , X 2 , , X n ) to cover θ when some
alternative value θ ′ is true does not exceed the same probability for the case when θ itself is true. Surely this is a
desirable feature of a family of confidence sets.
Uniformly Most Accurate Unbiased Set (UMAU)

A family of sets S0 ( Χ ) , for varying x ∈ ℑ of the parameter space Θ is said to constitute a family of uniformly most
accurate unbiased (UMAU ) confidence sets of level 1 − α if
Pθ ⎣⎡ S0 ( X 1 , X 2 , , X n ) c θ ⎦⎤ = 1 − α for all θ ∈ Θ,
Pθ ′ ⎡⎣ S0 ( X 1 , X 2 , , X n ) c θ ⎤⎦ ≤ 1 − α for all θ , θ ′∈Θ (θ ≠ θ ′ )
and Pθ ′ ⎡⎣ S0 ( X1 , X 2 , , X n ) c θ ⎤⎦ ≤ Pθ ′ ⎡⎣ S ( X 1 , X 2 , , X n ) c θ ⎤⎦ for all θ , θ ′∈Θ (θ ≠ θ ′ )

Hypothesis-I
Most Powerful Test

The critical region w is the most powerful critical region of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 if
P ( x ∈ w H 0 ) = L0 dx = α
∫ ... ... ... (1)
w
and P ( x ∈ w H1 ) ≥ P ( x ∈ w1 H1 ) ... ... ... ( 2)

for every critical region w1 satisfying (1) . The test based on the most powerful critical region is called most powerful
test of level α .
Uniformly Most Powerful (UMP) Test

The region w is called uniformly most powerful (UMP) critical region of size α for testing H 0 : θ = θ 0 against
H1 : θ ≠ θ 0 i.e. against H1 : θ = θ1 ≠ θ 0 if
P ( x ∈ w H 0 ) = L0 dx = α
∫ ... ... ... (1)
w
and P ( x ∈ w H1 ) ≥ P ( x ∈ w1 H1 ) ... ... ... ( 2)

for all θ ≠ θ 0 whatever the region w1 satisfying (1) . The test based on the uniformly most powerful critical region is
called uniformly most powerful test of level α .
Unbiased Test and Unbiased Critical Region

Let us consider the testing of H 0 : θ = θ 0 against H1 : θ = θ1 . The critical region w and consequently the test based
on it is said to be unbiased if the power of the test exceeds the size of the critical region i.e.
Power of the test ≥ Size of the C.R
⇒ 1− β ≥ α
⇒ Pθ1 ( w ) ≥ Pθ0 ( w )
⇒ P [ x : x ∈ w | H1 ] ≥ P [ x : x ∈ w | H 0 ]
In other words, the critical region w is said to be unbiased if
Pθ1 ( w ) ≥ Pθ0 ( w ) ; ∀ θ ( ≠ θ 0 ) ∈ Ω .
Uniformly Most Powerful Unbiased (UMPU) Test

Let φ be an unbiased test (or w a critical region) of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 ; θ1 ∈ Ω ,
i.e.
i) E {φ ( x ) | θ 0 } = P ( x ∈ w | θ0 ) = α
ii) E {φ ( x ) | θ1} ≥ E {φ ( x ) | θ 0 } ; ∀ θ1 ∈ Ω
Suppose that for every other test φ satisfying the conditions (1) and ( 2 ) we have
*
{
E {φ ( x | θ1 )} ≥ E φ * ( x | θ1 )} ; ∀ θ1 ∈ Ω
then φ is a uniformly most powerful unbiased (UMPU) test of size α .
Hypothesis-I ~ 1 of 11
UMPU Type A1 Test
Let φ be an unbiased test (or w a critical region) of sizw α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 ;
θ1 ∈ Ω , i.e.
i) E {φ ( x ) θ0 } = P ( x ∈ w θ0 ) = α
ii) E {φ ( x ) θ1} ≥ E {φ ( x ) θ 0 } ; ∀ θ1 ∈ Ω
δ
iii) E {φ ( x ) θ1} =0
δθ1 θ1 =θ0
Then φ is called UMPU type A1 test. For a UMPU test it is not required that power curve should have a regular
minimum at θ 0 but this is often the name UMPU test is used to imply type A1 test.
Show that 1 − β ≥ α .
Let w be a BCR of size α for testing a simple H 0 against a simple H1 . Then by definition we have,
P ( x ∈ w H 0 ) = ∫ L ( x H 0 ) dx = α
w
By Neyman-Pearson lemma, we have,
L ( x H0 )
≤K if x ∈ w ... ... ... (i )
L ( x H1 )
L ( x H0 )
and ≥K if x ∈ ( S − w ) ... .... ... ( ii )
L ( x H1 )
From ( i ) we have, K .L ( x H1 ) ≥ L ( x H 0 )
⇒ K ∫ L ( x H1 ) dx ≥∫ L ( x H 0 ) dx
w w
⇒ K (1 − β ) ≥ α " " " ( iii )
Again from ( ii ) we have, K .L ( x H1 ) ≤ L ( x H 0 )
⇒ K ∫ L ( x H1 ) dx ≤ ∫ L ( x H 0 ) dx
S −w S −w
⇒ K β ≤ (1 − α ) " " " ( iv )
From ( iii ) and ( iv ) we have, K (1 − α )(1 − β ) ≥ K αβ
⇒ 1− β ≥ α ( Proved )
Example: Let x1 , x2 , ..., xn be a random sample darwn from N ( µ ,1) . For testing H 0 : µ = µ0 against
H1 : µ ≠ µ1 ≠ µ0 , show that for α1 = α 2 UMPU test exists.
Solution
Since x1 , x2 , ..., xn are drawn from N ( µ ,1) , we have,
n
⎛ 1 ⎞ − 2 ∑ ( xi − µ0 )
1 2
L ( x H0 ) = ⎜ ⎟ e
⎝ 2π ⎠
n
⎛ 1 ⎞ − 2 ∑ ( xi − µ1 )
1 2
and L ( x H1 ) = ⎜ ⎟ e
⎝ 2π ⎠
According to Neyman-Pearson lemma, we have the BCR is given by
L ( x H0 )
≤K
L ( x H1 )
⇒
⎡ n
{ 2 ⎤
exp ⎢ − ( x − µ0 ) − ( x − µ1 ) ⎥ ≤ K
⎣ 2
2
⎦
}
⇒
⎡ n 2
{ ⎤
exp ⎢ − x − 2 x µ0 + µ02 − x 2 + 2 x µ1 − µ12 ⎥ ≤ K
⎣ 2 ⎦
}
⇒
⎡ n
⎣ 2
{ ⎤
exp ⎢ − 2 x ( µ1 − µ0 ) + µ02 − µ12 ⎥ ≤ K
⎦
( )}
⇒
n
{
− 2 x ( µ1 − µ0 ) + µ02 − µ12 ≤ ln K
2
( )}
⇒ ( 2
2 x ( µ1 − µ0 ) + µ02 − µ12 ≥ − ln K
n
)
⇒ ( 2
2 x ( µ1 − µ0 ) ≥ µ12 − µ02 − ln K
n
)
⇒ x (µ − µ ) ≥
(µ 2
1 − µ02 ) − 1 ln K " " " (i )
1 0
2 n
IF µ1 > µ0 then
µ1 + µ0 1
x≥ + ln K
2 n ( µ0 − µ1 )
⇒ x ≥ λ1 ( say ) " " " ( ii )
We know that
P ( x ≥ λ1 H 0 ) = α
∞
⇒ ∫ f ( x ) dx = α under H 0 : µ = µ0
λ1
∞
n −
n
( x − µ0 ) 2 ⎛ σ2 ⎞
⇒
2π ∫ e 2 dx =α since x ~ N ⎜ µ ,
⎜
⎝ n
⎟⎟
⎠
λ1
∞ 2
n −z 1
⇒
2π ∫
1− µ
e 2
n
dz = α
0
1
n
∞ 2
1 −z
⇒
2π ∫ e 2 dz =α
zα
We have,
λ1 − µ0
zα =
1
n
1
⇒ λ1 − µ0 = zα
n
1
⇒ λ1 = µ0 + zα
n
Hence from equation we have, the BCR is
1
x ≥ µ0 + zα
n
Again, if µ1 < µ0 , then frim the equation ( i ) we have, the BCR is:
µ12 − µ02 1
x ( µ0 − µ1 ) ≥ − ln k
2 n
µ0 + µ1 1
⇒ x≥− − ln k
2 n ( µ0 − µ1 )
µ0 + µ1 1
⇒ x≤ + ln k
2 n ( µ0 − µ1 )
⇒ x ≤ λ2 ( say ) " " " ( iii )
Again, we know that,
P ( x < λ2 H 0 ) = α
λ2
⇒ ∫ f ( x ) dx = α under H 0 : µ = µ0
−∞
λ2 n
n − ( x − µ0 )2
⇒
2π ∫ e 2 dx =α
−∞
λ2 − µ0
1
z2
x − µ0
n
n − 1
⇒
2π ∫ e 2 dz = α
1
=z ⇒ dx =
n
dz
−∞
n
∞ z2
1 −
⇒
2π ∫
λ2 − µ0
e 2 dz = 1 − α
1
n
∞ z2
1 −
⇒
2π ∫ e 2 dz = 1 − α
z1−α
λ2 − µ0
∴ z1−α =
1
n
1
⇒ λ2 = µ0 + z1−α
n
By symmetry of normal distribution, we have,
z1−α = − zα
1
∴ λ2 = µ0 − zα
n
From the equation ( iii ) we have the BCR is
1
x ≤ µ0 − zα
n
So that we have to w , the critical region as
1 1
w : x ≤ xα 2 , x ≥ xα1 where, xα1 = µ0 + zα1 and xα 2 = µ0 − zα 2
n n
where z is N ( 0,1) and α1 % to the right and α 2 % to the left side value.
∴ P ⎡⎣ z ≤ zα1 ⎤⎦ = α1 and
P ⎡⎣ z ≥ zα 2 ⎤⎦ = α 2 where, α1 + α 2 = α
For µ1 > µ0 , the power function is
∞ ∞
∫ f ( x ) dx = ∫ f ( z ) dz = 1− F (m)
xα 2 xα 2 − µ1
=m
1
n
= F ( −m )
⎛ ⎞
⎜ µ1 − xα 2 ⎟
= F⎜ ⎟
⎜ 1 n ⎟
⎝ ⎠
⎡ 1 ⎤
= F ⎡ n ( µ1 − µ0 ) + zα 2 ⎤ ⎢∴ xα 2 = µ0 − zα 2 ⎥
⎣ ⎦
⎣ n⎦
For µ1 < µ0 , the power function is

xα1 − µ2
1
xα1 n
∫ f ( x ) dx = ∫ f ( z ) dz
−∞ −∞
= F ⎡ − n ( µ1 − µ0 ) + zα1 ⎤
⎣ ⎦
So power,
P = F ⎡ n ( µ1 − µ0 ) + zα 2 ⎤ + F ⎡ − n ( µ1 − µ0 ) + zα1 ⎤
⎣ ⎦ ⎣ ⎦
⇒ P = F ⎡ n ∆ + zα 2 ⎤ + F ⎡ − n ∆ + zα1 ⎤
⎣ ⎦ ⎣ ⎦ [ ∆ = µ1 − µ0 ]
n ∆+ zα 2
−
z2 − ( n ∆− zα1 ) −
z2
1 1
⇒ P= ∫ 2π
e 2 dz + ∫ 2π
e 2 dz
−∞ −∞
( ) ( )
2 2
n ∆+ zα 2 n ∆− zα1
1 − 1 1 − ⎛ 1 ⎞
∴ P= e 2 + e 2
⎜− ⎟=0
2π n 2π ⎝ n⎠
⎧ ( n ∆+ zα 2 )
2
( n ∆− zα1 )
2
⎫
n ⎪ − − ⎪
⇒ ⎨e 2 −e 2
⎬=0
2π ⎪ ⎪
⎩ ⎭
( )
2
n ∆− zα1
1
( )
2
− n ∆+ zα 2 −
⇒ e 2 =e 2
⇒ n ∆ + zα 2 = n ∆ − zα1
⇒ − zα 2 = z α1
∴ α1 = α 2
Thus we see the power curve is minimum at µ1 = µ0 if and only if α1 = α 2 . Otherwise the minimum occurs at some
µ1 ≠ µ0 implying that the probability of rejecting H 0 is actually smaller when H 0 is false them when it is true,
Evidently two curves (b ) and (c) representing one sided UMP tests are biased. Power curve (a) represents a
most powerful test among all unbiased tests, but not a most powerful among all tests.
Locally Uniformly Most Powerful Unbiased (LUMPU) Test

An unbiased test which is most powerful in the neighborhood of θ 0 is called locally uniformly most powerful unbiased
test. This test is also called uniformly most powerful unbiased test of type A . The critical region associated with this
test is called unbiased critical region of type A .
The region w is said to be a type A critical region of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 , if
i) P ( x ∈ w | H0 ) = α
ii ) P ( x ∈ w | H1 ) ≥ α
(
iii ) P ( x ∈ w | H1 ) ≥ P x ∈ w* | H1 )
δ
iv) ⎡ P ( x ∈ w | H1 ) ⎤⎦ =0
δθ1 ⎣ θ1 =θ0
δ2 δ2 ⎡
v) ⎡
δθ12 ⎣
P ( x ∈ w | H 1 ) ⎤ * ⎤
⎦θ1 =θ0 δθ 2 ⎣ P x ∈ w | H1 ⎦θ =θ
≥
1 0
( )
1
where wt is any other region satisfying conditions (1) to ( iv ) .
We must choose a critical region for which the power is largest in the neighborhood of H 0 : θ = θ 0 . This condition is
made by ( v ) , conditions ( i ) , ( ii ) and ( iii ) controls the first type of error and unbiasedness and condition ( iv )
makes the region locally unbiased. This test is recommended only when H 0 and H1 are close to each other. Also
condition (v) states that the rate of increase of the curve related to w is very large than that of w* in the
neighborhood of θ 0 .
Construction of Type A regions

Let us consider the problem of constructing a UMP unbiased region for H 0 : θ = θ 0 against H1 : θ ≠ θ 0 when no
UMP region exists. This statement states the following theorem:
Theorem
If w be an MP region for testing H 0 : θ = θ 0 against H1 : θ = θ1 , then it is necessarily unbiased. Similarly, if w be
UMP for testing H 0 : θ = θ 0 against H1 : θ ∈ Ω it is necessarily unbiased.
Proof
If w be an MP region of size α for testing H 0 against H1 then for a non-negative constant k ,
∫ L0 ( x ) dx = ∫ L0 ( x ) dx = α
w
{ x| L1 ( x ) > kLo( x ) }
where L0 ( x ) be the likelihood function under H 0 , and
∫ L1 ( x ) dx = ∫ L1 ( x ) dx = α
w
{x|L ( x )> kL ( ) }
1 o x
So that,
∫ L1 ( x ) dx = ∫ L1 ( x ) dx > k ∫ L0 ( x ) dx = kα ... ... ... (i )

w
{ x L1 ( x ) > kLo( x ) } { x L1 ( x ) > kLo( x ) }
while, 1 − ∫ L1 ( x ) dx = ∫ L1 ( x ) dx ≤ k ∫ L0 ( x ) dx = k (1 − α ) ... ... ... ( ii )
w
{x L ( x )≤kL ( )}
1 o x {x L ( x )≤kL ( )}
1 o x
If k ≥ 1 , then from ( i ) we have,
∫ L1 ( x ) dx > α
w
If k < 1 , then from ( ii ) we have,
1 − ∫ L1 ( x ) dx < 1 − α
w
which implies,
∫ L1 ( x ) dx > α i.e. 1− β > α

w
Hence w is unbiased.
In case w is a UMP region of size α , then too the above approach will hold good if for θ1 we read θ such that
θ ∈ Ω . So we have,
Pθ ( w ) > α for all θ ∈ Ω
So, here also w is unbiased.
( )
Example: Consider the case of random sample from N θ , σ 2 , where θ is unknown ( −∞ < θ < ∞ ) and σ 2 is known.
Find the type A region for testing H 0 : θ = θ 0 against H1 : θ ≠ θ1 .
Solution
⎡
∑ ( xi − θ ) ⎤
n 2
⎛ 1 ⎞
L ( x) = ⎜ ⎟ exp ⎢ − ⎥
⎝ σ 2π ⎠ ⎢ 2σ 2 ⎥
⎣ ⎦
∑ ( xi − θ )
2
n
ln L ( x ) = − ln 2πσ 2 −
2
( )
2σ 2
δ ln L ( x ) 2
Hence = 2 ∑ ( xi − θ )( −1)
δθ 0 σ
n ( x − θ0 )
⇒ φ=
σ2
δ 2 ln L ( x ) n
∴ =φ' =−
δθ 02 σ2
n
= a + bφ ( say ) where a = − , b=0
σ2
As such the type-A region for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 is w , given by
w = { x φ < c1 ∪ φ > c2 }
= { x x < d1 ∪ x > d 2 } ( say )
where c1 and c2 or d1 and d 2 are constants such that
∫ Lθ ( x ) dx = α
0
and ∫ φ Lθ ( x ) dx = 0
0
w w
Now, these conditions are equivalent to the conditions

d1 ∞ d1 ∞
∫ gθ0 ( x ) dx + ∫ gθ0 ( x ) dx = α and ∫ φ gθ ( x ) dx + ∫ φ gθ ( x ) dx = 0

0 0
−∞ d2 −∞ d2
where gθ be the marginal p d.f. of x or equivalent to
d2
∫ gθ ( x ) dx = 1 − α
0
" " " ( *)
d1
d2
⎡ ∞ ⎤
and ∫ φ gθ0 ( x ) dx = 0 ⎢since
⎢⎣
∫ φ gθ ( x ) dx = 0⎥⎥
0
" " " (**)
d1 −∞ ⎦
Now from (*) we can write,
n ( d 2 −θ0 )
σ y2
1 −
2π ∫ e 2 dy = 1 − α
n ( d1 −θ0 )
σ
and from (**) we have,
n ( d 2 −θ 0 )
n ( x − θ0 )
σ y2
− ⎡ ny⎤
∫ e 2 dy = 0 ⎢since φ =
⎣⎢ σ 2
= ⎥
σ ⎥⎦
n ( d1 −θ0 )
σ
n ( d 2 −θ0 )
⎡ − y2 ⎤ σ
⇒ ⎢ −e 2 ⎥ =0 " " " (***)
⎢ ⎥ n ( d1 −θ 0 )
⎣ ⎦
σ
2 2
1 ⎧⎪ n ( d 2 −θ0 ) ⎫⎪ 1 ⎧⎪ n ( d1 −θ0 ) ⎫⎪
− ⎨ ⎬ − ⎨ ⎬
2 ⎩⎪ σ ⎭⎪ 2 ⎩⎪ σ ⎭⎪
⇒ −e +e =0
2 2
1 ⎧⎪ n ( d 2 −θ0 ) ⎫⎪ 1 ⎧⎪ n ( d1 −θ0 ) ⎫⎪
− ⎨ ⎬ − ⎨ ⎬
2 ⎩⎪ σ ⎭⎪ 2 ⎩⎪ σ ⎭⎪
⇒ −e = −e
2 2
1 ⎧⎪ n ( d 2 − θ0 ) ⎫⎪ 1 ⎧⎪ n ( d1 − θ 0 ) ⎫⎪
⇒ − ⎨ ⎬ =− ⎨ ⎬
2 ⎪⎩ σ ⎭⎪ 2 ⎩⎪ σ ⎭⎪
Solving (***) we have,
n ( d1 − θ0 ) n ( d2 − θ0 )
=−
σ σ
n ( d 2 −θ0 )
σ y2
1 −
and since
2π ∫ e 2 dy = 1 − α
n ( d 2 −θ 0 )
−
σ
n ( d2 − θ0 )
We have, = τα
σ 2
σ
⇒ d 2 = θ0 + τ α .
2 n
n ( d1 − θ0 )
Hence also, − = τα
σ 2
σ
⇒ d1 = θ0 − τ α .
2 n
As such, the type-A region of size α is
⎧ σ σ ⎫ ⎧⎪ n x − θ0 ⎫⎪
w = ⎨ x | x < θ0 − τ α . ∪ x > θ0 + τ α . ⎬ = ⎨x | > τα ⎬
⎩ 2 n 2 n⎭ ⎩⎪ σ 2⎪
⎭
Similar Region (Testing Composite Hypothesis)
Let X be a random variable distributed as f ( x ; θ1 , " , θ k ) . A hypothesis of the form
H 0 : θ1 = θ10 , " , θ r = θ r 0 (r < k ) " " " (1)

Here (k − r ) unspecified parameters and it is a composite hypothesis with (k − r ) d . f . We want to determine a
critical region ω of size α such that
∫ L ( x | H 0 ) dx = α " " " ( 2)

ω
and ∫ L ( x | H1 ) dx is maximum " " " ( 3)

ω
where H1 is some simple hypothesis about the parameters.
Since the parameters θ r +1 , " , θ k are unspecified by H 0 , α given in ( 2) is in general a function of these
parameters and hence con not be uniquely determined. If α does not depend on the unspecified parameters the
region ω for which equation ( 2 ) is true is called a region similar to the sample space with respect to the parameters
θ r +1 , " , θ k or we can say that the region is similar region.
A test based on a similar region of size α is called a similar size α test.
Concept of Similar Region

In case of a composite hypothesis the selection of a suitable tests involves three important stages
i) finding all similar region
ii) finding these similar region S which are of size of α
iii) finding a similar region of size α that is best from the point of view of power, then we will get UMP critical region.
Construction of Similar Region

When statistic sufficient for each of the unspecified parameters exists or when a jointly sufficient statistic exists for
the unknown parameters, then regions similar to the sample space can be constructed.
Let ω be any critical region of size α . Now we define a indicator function or variable Iω of the critical region ω
⎧1 ; if observation lies in the C.R. i.e. x ∈ ω

Iω = ⎨
⎩0 ; if observation lies outside the C.R. i.e. x ∈ ω
The set of all points X for which Iω = 1 is the region of rejection.
∫ L ( x | H ) = ∫ Iω L ( x | H1 ) dx = E ( Iω | H )
ω S
= Expected value of Iω when it is true
⎧α ; if H = H 0
and E ( Iω | H ) = ⎨
⎩1 − β ; if H = H1
If the parameter θ admits a sufficient estimator the likelihood function factorizes into
L ( x | θ ) = g ( t , θ ) h ( x, t )
where g ( t , θ ) is the frequency function of the sufficient statistic t , and h ( x, t ) is the functions of sample values
only for a given t .

Now, E ( Iω ) = ∫ Iω L ( x | θ ) dx = ∫ Iω g ( t , θ ) h ( x, t ) dx
S S
= E ⎡⎣ E ( Iω | t ) ⎤⎦ " " " ( 4)

The equation ( 4 ) is very important for us, since t is sufficient for θ . E ( Iω | t ) does not depend on θ and this has
the same expectation as Iω i.e.
E ( Iω | t ) = E ( Iω )
If t is sufficient for θ , both H 0 and H1 are true, equation ( 4 ) implies that there is a region based on t similar to
the sample with size and power exactly equal to the original critical region ω .
Neyman Structure
A test with critical region ω is said to be of Neyman structure with respect to t if E ( Iω | t ) is the same almost
everywhere for θ i.e. a test satisfying E ( Iω | t ) = α is said to have Neyman structure with respect to t .
Example: Let x1 , " , xn be a random sample drawn from N µ , σ 2 ( ) where both µ and σ 2 are unknown. Test
H 0 : µ = µ0 against H1 : µ = µ1 .
Solution
The hypothesis H 0 has one d . f . , the parameter σ 2 being unspecified. We have
n
⎛ 1 ⎞ ⎡ 1 2⎤
L ( x | H0 ) = ⎜ ⎟ exp ⎢ − 2
⎝ 2πσ ⎠ ⎣ 2σ
∑ ( xi − µ0 ) ⎥
⎦
n
∑ ( xi − µ0 ) is sufficient for σ 2 and also this is complete sufficient statistic. Consider a
2
Under H 0 the statistic V =
i =1
simple H 0 and H1 as
H 0 : µ = µ0 , σ 2 = σ 02
H1 : µ = µ1 , σ 2 = σ12
According to Neyman-Pearson lemma, we have,
L ( x | H0 ) ⎡ 1 2⎤
∑ ( xi − µ0 ) ∑ ( xi − µ1 )
2 1
= exp ⎢ − 2 + ⎥ ≤ Constant
L ( x | H1 ) ⎣⎢ 2σ 0 2σ12 ⎦⎥
With this we can find out the MP critical region of size α for testing simple H 0 against simple H1 is
( ) (
L x | µ1 , σ12 > k ( v ) L x | µ0 , σ 02 )
where k ( v ) is such that the conditional size of ω0 given V = v is α which implies that
( µ1 − µ0 )( x − µ0 ) > k1 ( v ) " " " (1)

where k1 ( v ) is related to k ( v ) .
Case I:
If µ1 > µ0 , there condition (1) is equivalent to
( x − µ0 ) > k2 ( v )
n ( x − µ0 )
⇒ > k3 ( v ) ( say )
v
So as such, we can write,
⎪⎧ n ( x − µ0 ) ⎪⎫
ω0 = ⎨ x | > k3 ( v ) ⎬
⎪⎩ v ⎪⎭
where k3 ( v ) is to be determined such that Pµ0 [ω0 | v ] = α
n ( X − µ0 ) n ( X − µ0 )
Here, and v are independent. So that the conditional distribution given V = v is the
v v
n ( x − µ0 )
same as the distribution of .
V
⎡ n ( X − µ0 ) ⎤
So k3 ( v ) will be independent of V . Hence we can write P⎢ > k3 ⎥ = α
⎢⎣ V ⎥⎦
We know that,
n ( X − µ0 ) n ( X − µ0 )
=
n ( X − µ0 ) + ∑ ( xi − X )
v 2
n ( X − µ0 ) t n ( X − µ0 )
= = ~ tn −1 where, t=
⎛ t2 ⎞ t2 + n −1 S
∑ ( xi − X )
2
⎜⎜ 1 + ⎟⎟
⎝ n −1 ⎠
n ( X − µ0 )
Since < k3 iff t > k4 ( say ) , we may also write,
v
⎧⎪ n ( x − µ0 ) ⎫⎪
ω0 = ⎨ x | > k3 ( v ) ⎬ = { x | t > k4 } ⎡Where Pθ [t > k4 ] = α ⎤
⎣ ⎦
⎪⎩ ⎪⎭
0
v
where k4 is the upper α point of A distribution of tn −1 and we can write finally
⎧⎪ n ( x − µ0 ) ⎫⎪
ω0 = ⎨ x | > tα , n −1 ⎬
⎩⎪ S ⎭⎪
Since this is independent of σ 0 and σ 1 , it is the MP similar region of size α for testing H 0 against H1 .
2 2
Case II:
If µ1 < µ0 , in this case we have as before
( µ1 − µ0 )( x − µ0 ) > k1 ( v )
⇒ ( x − µ0 ) < k2′ ( v )
So preceding as before, MP similar region of size α for testing H 0 against H1 is
⎧⎪ n ( x − µ0 ) ⎫⎪
ω0′ = ⎨ x | < tα , n −1 ( v ) ⎬
⎩⎪ S ⎭⎪
Since ω0 is independent of µ1 i.e. it is the same for all µ1 > µ0 in fact it is the UMP similar region of size α for
testing H 0 against the more composite H1 : µ > µ1 .
Similarly, ω0′ is the UMP similar region of size α for testing H 0 : µ = µ1 against H1 : µ < µ1 .
Likelihood Ratio Test
Introduction
Neyman and Pearson (1928) developed a simpler method of testing hypothesis called the method of Likelihood
Ratio. Just like a method of maximum likelihood which yields an estimate of a parameter, the method of maximum
likelihood ratio test yields a statistic rather more easily.
Definition
Let θ ∈ Ω be a vector of parameters and let X = ( x1 , " , xn ) be a random vector with p.d . f . fθ , θ ∈ Ω .
Consider the problem of testing the null hypothesis H 0 : X ~ fθ , θ ∈ Ω0 against the alternative hypothesis
H1 : X ~ fθ , θ ∈ Ω1 = Ω − Ω0 . The likelihood ratio test for testing H 0 against H1 is defined as the ratio
sup fθ ( x1 , " , xn )
λ = λ ( X ) = λ ( x1 , " , xn ) =
θ ∈Ω0
=
( )
L Ω
ˆ
0
sup fθ ( x1 , " , xn ) L (Ω
ˆ)
θ ∈Ω
And the test is of the form: reject H 0 iff λ ( X ) < C , where C is some constant, determined from the size α (the
(
level of significance, 0 < α < 1 i.e., sup Pθ x : λ { x} < C = α ).)
θ ∈Ω0
Remarks
The numerator of the likelihood ratio λ is the best explanation of X that the H 0 can provide and denominator is
the best possible explanation of X . H 0 is rejected if there is a much better explanation of X then the best one
provided by H 0 . It is clear that 0 ≤ λ ≤ 1 .
Properties of LRT
LRT has some desirable properties, specially large sample properties. LRT is generally UMP if an UMP test exists.
We state below, the two asymptotic properties of LRT.
i) Under certain conditions, −2 ln λ has an asymptotic chi-square distribution.

ii) Under certain assumptions, LRT is consistent.
Properties of LRT Statistic ( λ )
i) The likelihood ratio λ is a function of x only and hence λ is a statistic which does not depend on θ .
ii) Since λ is the ratio of conditional maximum of likelihood function to its unconditional maximum, thus
0 ≤ λ ≤ 1.
λ0
iii) The critical region is 0 < λ < λ0 when ∫0 h ( λ ) d λ = α , the level of significance.
iv) λ is always a function of sufficient statistic.
v) If the null hypothesis H 0 is composite, the distribution of λ may not be always unique.
vi) Under certain condition −2 ln λ follows χ(1) .

2
Likelihood Ratio Test ~ 1 of 9

LRT for Testing the Equality of Means of Two Normal Populations
Let us consider, two independent random variables, X 1 and X 2 follows normal distribution with N µ1 , σ 1 ( 2
) and
( )
N µ2 , σ 22 respectively. We want to test the hypothesis
H 0 : µ1 = µ2 = µ ( say ) ; 0 < σ12 < ∞ , 0 < σ 22 < ∞

against , H1 : µ1 ≠ µ 2 ; σ12 > 0 , σ 22 > 0
Case I: Population variance are unequal.
Ω= {( µ , µ , σ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2}

1 2
2
1
2
2 i
2
i
and Ω0 = {( µ , σ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2}

2
1
2
2
2
i
Let x1i ( i = 1, " , m ) and x2 j ( j = 1, " , n ) be two independent random samples of sizes m and n from the
(
population N µ1 , σ 1
2
) and N ( µ , σ ) respectively. Then the likelihood function is given by-
2
2
2
m n
⎛ 1 ⎞ 2 ⎡ 1 m
2⎤⎛ ⎞ 2 ⎡ 1 n
2⎤
∑ ( x1i − µ1 ) ∑ ( x2 j − µ2 )
1
L=⎜
⎜ 2πσ 2 ⎟⎟
exp ⎢ − 2 ⎥ ⎜⎜ 2 ⎟
⎟
exp ⎢ − 2 ⎥ " " " (1)
⎝ 1 ⎠ ⎣⎢ 2σ 1 i =1 ⎦⎥ ⎝ 2πσ 2 ⎠ ⎢⎣ 2σ 2 j =1 ⎥⎦
The maximum likelihood estimates for µ1 , µ 2 , σ 12 , σ 22 are given by

m n
∑ x1i = x1 ∑ x2 j = x2
1 1
µˆ1 = and µˆ 2 =
m i =1 n j =1
m n
∑ ( x1i − x1 ) ∑ ( x2 j − x2 )
1 1
( say ) ( say )
2
σ12 = σ 22 =
2
= s12 and = s22
m i =1 n j =1
m n
⎛ ⎞ ⎛ 1 ⎞ ( m+ n)
( )
2 2
−
Now ˆ =⎜ 1 ⎟
L Ω ⎜⎜ ⎟⎟ e 2
⎜ 2π s 2 ⎟
⎝ 2π s2
2
⎝ 1 ⎠ ⎠
Under H 0 , the likelihood function is given by
m n
⎛ 1 ⎞ 2 ⎡ 1 m
2⎤⎛ ⎞ 2 ⎡ 1 n
2⎤
∑ ( x1i − x1 ) ∑ ( x2 j − x2 )
1
L ( Ω0 ) = ⎜ ⎟⎟ exp ⎢ − 2 ⎥ ⎜⎜ 2 ⎟
exp ⎢ − 2 ⎥
⎜ 2πσ 2 ⎟
⎝ 1 ⎠ ⎣⎢ 2σ1 i =1 ⎦⎥ ⎝ 2πσ 2 ⎠ ⎢⎣ 2σ 2 j =1 ⎥⎦
To obtain the maximum value of L ( Ω0 ) for variation in µ , σ 1 , σ 2 , it will be seen that estimate of µ is obtained as
2 2
the rot of a cubic equation.
m 2 ( x1 − µ ) n 2 ( x2 − µ )
m
+ n
∑ ( x1i − µˆ ) ∑ ( x2 j − µˆ )
2 2
i =1 j =1
And thus is complicated function of the sample observations. It is impossible to obtain the critical region 0 < λ < λ0 ,
for given α since the distribution of the population variances is ordinarily unknown. As an approximate test, −2 ln λ
can be distributed as χ 2 variate with 1 d . f .
Case II: Population variances are equal, i.e., σ 12 = σ 22 = σ 2 ( say )

Ω= {( µ , µ , σ ) ; − ∞ < µ < ∞ , σ > 0 ;
1 2
2
i
2
i = 1, 2 }
and Ω0 = {( µ , σ ) ; − ∞ < µ < ∞ , σ > 0}
2 2

The likelihood estimate is then given by
µˆ1 = x1 and µˆ 2 = x2
1 ⎡m n
2⎤
σˆ 2 = ∑
⎢ ( x1i − µˆ1 ) + ∑ ( x2 j − µˆ 2 )
2
and ⎥
m + n ⎣⎢ i =1 j =1 ⎦⎥
1 ⎡m n
2⎤
∑ ∑ ( x2 j − x2 )
1 ⎡ 2
⎢ ( x1i − x1 ) + ms1 + ns22 ⎤⎦
2
= ⎥ =
m + n ⎢⎣ i =1 j =1 ⎥⎦ m+n ⎣
( m+ n)
⎧ ⎫ 2
( m+n)
m+n
Now ( )
ˆ = ⎨⎪
L Ω
⎪
⎬
⎡ ms12 + ns22 ⎤ ⎪
e
−
2 ⎡ Substituting the values of µˆ1 , µˆ 2 , σˆ 2 in (1) ⎤
⎣ ⎦
⎪⎩ 2π ⎣ ⎦⎭
Under H 0 the likelihood function is
m+n
⎛ 1 ⎞ ⎡ 1 ⎧⎪ m n
⎪⎤
2⎫
∑ ∑ ( x2 j − µ )
2
L ( Ω0 ) = ⎜ exp ⎢ − 2 ⎨ ( x1i − µ ) + ⎬⎥
2
⎟
⎝ 2πσ 2 ⎠ ⎢⎣ 2σ ⎩⎪ i =1 j =1 ⎭⎪⎥⎦
m+n ⎪⎧ ⎫
2⎪
m n
∑ ∑ ( x2 j − µ )
1
ln L ( Ω0 ) = C − ln σ 2 − 2 ⎨ ( x1i − µ ) +
2
⇒ ⎬
2 2σ ⎩⎪ i =1 j =1 ⎭⎪
where C is a constant independent of µ and σ 2 . The likelihood equation for estimating µ is
∂ ln L
=0
∂µ
1 ⎧⎪ m n ⎫⎪
⇒ ∑
⎨ ( x1i − µ ) +
σ 2 ⎩⎪ i =1
∑ ( x2 j − µ )⎬ = 0
j =1 ⎭⎪
⇒ ( mx1 + nx2 ) − ( m + n ) µ = 0
( mx1 + nx2 )
⇒ µ=
(m + n)
∂ ln L
Also, =0
∂σ 2
m+n 1 ⎧⎪ m n
2⎫
⎪
⇒ −
2σ 2
+ 4 ⎨∑ 1i
2σ ⎩⎪ i =1
( x − µ )2 + ∑ x2 j − µ ( ) ⎬=0
j =1 ⎭⎪
1 ⎪⎧ m n ⎫
2⎪
⇒ σˆ 2 = ⎨∑ ( x1i − µˆ ) + ∑ x2 j − µˆ
( m + n ) ⎪⎩ i =1
2
( ) ⎬
j =1 ⎪⎭
But
m m
∑ ( x1i − µˆ ) = ∑ ( x1i − x1 + x1 − µˆ )
2 2
i =1 i =1
m
= ∑ ( x1i − x1 ) + m ( x1 − µˆ )
2 2
i =1
mn 2 ( x1 − x2 )
2 2
⎛ mx + nx2 ⎞
= ms12 + m ⎜ x1 − 1 ⎟ = ms12 +
⎝ m+n ⎠ ( m + n )2
Similarly
nm 2 ( x2 − x1 )
n 2
∑( )
2
x2 j − µˆ = ns22 +
j =1 ( m + n )2
1 ⎧⎪ 2 mn ( x1 − x2 ) nm 2 ( x2 − x1 ) ⎫⎪
2 2 2
∴ σˆ 2 = ⎨ ms + + ns 2
+ ⎬
( m + n ) ⎩⎪ 1 ( m + n )2
2
( m + n )2 ⎭⎪
1 ⎧⎪ 2 mn ( x1 − x2 ) ⎫⎪
2
= ⎨ ms + ns 2
+ ⎬
( m + n ) ⎩⎪ 1 2
( m + n ) ⎭⎪
( m+n)
⎧ ⎫ 2
⎪⎪ m+n ⎪⎪ ( m+ n)
∴ ( )
L Ω
ˆ
0 =⎨
⎡ 2 mn 2⎤⎪
⎬ e
−
2
⎪ 2π ms + ns 2
+ ( x − x )
⎪⎩ ⎢⎣ 1 2
m+n
1 2 ⎥⎦ ⎪
⎭
∴ λ=
( )
L Ω
ˆ
0
L (Ω
ˆ)
( m+ n)
( m+ n)
⎧ ⎫ 2 −
⎪ ⎪ ⎧ mn ( x1 − x2 )
2 ⎫ 2
ms12 + ns22 ⎪ ⎪
=⎨ ⎬ = ⎨1 + ⎬
⎪ ms12 + ns22 + mn ( x1 − x2 )2 ⎪ ⎩⎪ (
( m + n ) ms12 + ns22 ) ⎭⎪
⎩ m+n ⎭
We know that, under H 0 : µ1 = µ2 , the test statistic
x1 − x2
t=
1 1
where
1
m+n−2
ms12 + ns22 ( )
S +
m n
follows student t distribution with m + n − 2 d . f .
Thus for testing the null hypothesis
H 0 : µ1 = µ2 = µ ; σ 12 = σ 22 = σ 2 > 0
against , H1 : µ1 ≠ µ2 ; σ12 = σ 22 = σ 2 > 0
We have the two-tailed t -test defined as follows-
If t =
x1 − x2
S 1 +1
> tm + n − 2 α ( 2 ) reject H 0 , otherwise H 0 may be accepted.
m n
Likelihood Ratio Test for Testing the Equality of Variances of Two Population
Consider two normal populations N µ1 , σ 1 ( 2

) ( )
and N µ2 , σ 2 , where the means µ1 and µ2 and variances σ 12
2
and σ 2 are unspecified. We want to test the hypothesis

2
H 0 : σ12 = σ 22 = σ 2 (Unspecified ) , with µ1 and µ2 (Unspecified )

against H1 : σ12 ≠ σ 22 ; µ1 and µ2 (Unspecified )
If x1i ( i = 1, " , m ) and x2 j ( j = 1, " , n ) be independent random samples of size m and n form N µ1 , σ 1 ( 2
)
(
and N µ 2 , σ 2
2
) respectively then
m n
⎛ 1 ⎞ 2 ⎡ 1 m
2⎤⎛ 1 ⎞ 2 ⎡ 1 n
2⎤
L=⎜
⎜ 2πσ 2 ⎟⎟
exp ⎢ − 2 ∑ ( x1i − µ1 ) ⎥⎥ ⎜⎜ 2πσ 2 ⎟⎟ exp ⎢ − 2 ∑ ( x2 j − µ2 ) ⎥ " " " (1)
⎝ 1 ⎠ ⎣⎢ 2σ 1 i =1 ⎦⎝ 2 ⎠ ⎣⎢ 2σ 2 j =1 ⎦⎥
In this case,
Ω= {( µ , µ , σ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2}

1 2
2
1
2
2 i
2
i
and Ω0 = {( µ , µ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2}

1 2
2
i
2

The maximum likelihood estimates for µ1 , µ2 , σ 12 and σ 22 is given by
m n
∑ ∑ x2 j = x2
1 1
µˆ1 = x1i = x1 and µˆ 2 =
m i =1 n j =1
m n
∑ ( x1i − x1 ) ∑ ( x2 j − x2 )
1 1
( say ) ( say )
2
σ12 = σ 22 =
2
= s12 and = s22
m i =1 n j =1
Substituting these values in (1) we get,
m n
⎛ ⎞ ⎛ 1 ⎞ ( m+ n)
( )
2 2
−
ˆ =⎜ 1 ⎟
L Ω ⎜⎜ ⎟⎟ e 2
⎜ 2π s 2 ⎟
⎝ 2π s2
2
⎝ 1 ⎠ ⎠
Under H 0 , the likelihood function is given by
m+ n
⎛ 1 ⎞ ⎡ 1 ⎧⎪ m n
⎪⎤
2⎫
∑ ∑ ( x2 j − µ2 )
2
L ( Ω0 ) = ⎜ exp ⎢ − 2 ⎨ ( x1i − µ1 ) + ⎬⎥ ( 2)
2
⎟ " " "
⎝ 2πσ 2 ⎠ ⎢⎣ 2σ ⎩⎪ i =1 j =1 ⎭⎪⎥⎦
The MLE’s for µ1 , µ 2 and σ

2
are now given by
m n
∑ ∑ x2 j = x2
1 1
µˆ1 = x1i = x1 and µˆ 2 =
m i =1 n j =1
1 ⎡m n
2⎤
σˆ 2 = ∑
⎢ ( x1i − µˆ1 ) + ∑ ( x2 j − µˆ 2 )
2
and ⎥
m + n ⎢⎣ i =1 j =1 ⎥⎦
1 ⎡m n
2⎤
∑ ∑ ( x2 j − x2 )
1 ⎡ 2
⎢ ( x1i − x1 ) + ms1 + ns22 ⎤⎦
2
= ⎥ =
m + n ⎣⎢ i =1 j =1 ⎦⎥ m+n ⎣
Substituting these values in ( 2 ) we get,
( m+ n)
⎧ ⎫ 2
( m+n)
m+n
( )
L Ω
ˆ
0
⎪
=⎨
⎪
⎬ e
−
2 ⎡ Substituting the values of µˆ1 , µˆ 2 , σˆ 2 in (1) ⎤
⎣ ⎦
⎪⎩ 2π ⎡⎣ ms1 + ns2 ⎤⎦ ⎪⎭
2 2
⎧ m n ⎫
∴ λ=
L Ω
ˆ( )
0
= (m + n)
m+n ( ) ( )
⎪⎪ s12 2 s22 2 ⎪⎪
⎨ m+n ⎬
L (Ω
ˆ)
2
⎪
⎪⎩ (ms1
2
+ )
ns 2
2
2 ⎪
⎪⎭
⎧ m n ⎫
( ) ( )
m+n
(m + n) ⎪⎪ ms1 ns22 2 ⎪⎪
2 2
( 3)
2
= m n ⎨ m+n ⎬
" " "
m 2n 2 ⎪
⎩⎪( )
ms1 + ns2 2 ⎪
2 2 ⎪
⎭
We know, that under H 0 the statistic
∑ ( x1i − x1 )
2
( m − 1) s12
F= =
∑ ( x2 j − x2 )
2
s22
( n − 1)
follows F -distribution with ( m − 1) , ( n − 1) d . f . and also implies
m ( n − 1) s12
F=
n ( m − 1) s22
( m − 1) ms 2
⇒ F = 12
( n − 1) ns2
Substituting in ( 3) and simplifying, we get
⎧ m ⎫
m+ n ⎪ ⎛ m −1 F ⎞ 2 ⎪
(m + n) ⎜ ⎟
2 ⎪ ⎝ n −1 ⎠ ⎪
λ= m n ⎨ m+n ⎬
m 2n 2 ⎪⎛ m −1 ⎞ 2 ⎪
⎪ ⎜1 + F⎟ ⎪
⎪⎩ ⎝ n −1 ⎠ ⎪⎭
Thus λ is a monotonic function of F and hence the test can be carried on with F as test statistics. The critical
region 0 < λ < λ0 can be given by pair of intervals F ≤ F1 and F ≥ F2 , where F1 and F2 are determined so that
under H 0
P ( F ≥ F2 ) = α and P ( F ≥ F1 ) = 1 − α
2 2
Since, under H 0 , F follows F -distribution with m − 1 , n − 1 d . f . we have
F2 = Fm −1, n −1 α ( 2) and (
F1 = Fm −1, n −1 1 − α
2 )
where, Fm, n (α ) is upper α point of F -distribution with ( m, n ) d . f .
Consequently for testing H 0 : σ 1 = σ 2 against alternative hypothesis H1 : σ 1 ≠ σ 2 , we have a two-tailed F -test,

2 2 2 2
the critical region being given by
F > Fm −1, n −1 α ( 2) and (

F < Fm −1, n −1 1 − α
2 )
Example: Let x1 , " , xn be a random sample from f ( x ; θ ) = θ e−θ x I ( 0, ∞ ) ( x ) where Θ = {θ , θ > 0} . Test H 0 : θ ≤ θ0
against H1 : θ > θ 0 .
Solution
sup L ⎡⎣(θ ; x1 , " , xn ) ⎤⎦ = sup ⎡θ n e ∑ i ⎤

−θ x
θ ∈Θ θ >0
⎢
⎣ ⎥⎦
n ⎡ ⎤
⎛ n ⎞ −n ⎢ 1 ⎥
=⎜ ⎟ e ⎢ Since, θ = = n
ˆ
⎜ ∑
xi ⎟⎠ ∑ xi ∑ xi ⎥
⎝ ⎢ ⎥
⎣ n ⎦
sup L ⎡⎣(θ ; x1 , " , xn ) ⎤⎦ = sup ⎡θ n e ∑ i ⎤

−θ x
and
θ ∈Θ0 0<θ ≤θ 0
⎢
⎣ ⎦⎥
⎧⎛ ⎞
n
⎪⎜ n ⎟ e − n if
n
≤ θ0
⎪⎜
= ⎨⎝ ∑ xi ⎟⎠ ∑ xi
⎪ n −θ0 ∑ xi n
⎪θ 0 e if > θ0
⎪⎩ ∑ xi
Hence
⎧1 n ≤ θ0
∑ xi
if
⎪
⎪
⎪ n −θ0 ∑ xi
λ = ⎨ θ0 e n > θ0
∑ xi
if
⎪⎛ n
⎞ −n
⎪⎜ n ⎟
∑
e
⎪⎩ ⎝ xi ⎠

If 0 < λ0 < 1 , then a generalized likelihood ratio test is given by-
Reject H 0 if λ ≤ λ0
∑
n
⎛ θ 0 xi ⎞
Or, Reject H 0 if n
∑ xi
≤ θ 0 and ⎜
⎜ n ⎟
⎟ exp ⎣⎡ −θ 0 ∑ xi + n ⎦⎤ ≤ λ0
⎝ ⎠
Reject H 0 if θ 0 x < 1 and (θ 0 x ) exp ⎡⎣ − n (θ 0 x − 1) ⎤⎦ ≤ λ0

n
Or,
Let, y = θ0 x and can say that y n exp ⎣⎡ −n ( y − 1) ⎦⎤ has a maximum for y = 1 . Hence, y < 1 and
y n exp ⎡⎣ −n ( y − 1) ⎤⎦ ≤ λ0 iff y ≤ k , where k is a constant satisfying 0 < k < 1 .
A generalized likelihood ratio test reduces to the

following.
Reject H 0 iff θ 0 x < k , where 0 < k < 1 .
1
That is, reject H 0 if x is less than some function of .
θ0
If that generalized likelihood ratio test having size α is Figure:
desired, k is obtained as the solution to the equation-
α = Pθ0 (θ0 X < k ) = Pθ0 (θ 0 ∑ X i < nk ) = ∫

nk 1
u n −1e−u du
0 n
( ) (
Note that Pθ θ 0 X < k ≤ Pθ0 θ 0 X < k ) for θ ≤ θ 0 .
Uses of LRT
λ can be used for determination of the rejection as λ is positive monotonic function. It is used
1) Test for the mean of a normal population

H 0 : µ = µ0 ⎫
H1 : µ = µ1 ⎭
⎬ (
x1 , " , xn ~ N µ , σ 2 ) For σ 2 known and σ 2 is unknown.
2) Test for the equality of means of two normal populations

H 0 : µ1 = µ2
H1 : µ1 ≠ µ2 For σ 2 is equal and σ 2 is unequal.
3) Test for the equality of means of several normal populations

H 0 : µ1 = µ2 = " = µk
H1 : µi ' s are not equal
4) Test for the variance of normal population
H 0 : σ 2 = σ 02 ⎫⎪
⎬ σ 2 is specified
H1 : σ 2 ≠ σ 02 ⎪⎭
5) Test for the equality of variances of two normal populations
H 0 : σ12 = σ 22
H1 : σ12 ≠ σ 22

Situation of Using LRT
Neyman-Pearson lemma based on the ratio of two probability distribution function provides a best test for testing
simple hypothesis. Generally best test depends on the form of probability distribution and alternative hypothesis. In
this situation, a general test construction is recommended known as LRT for simple or composite hypothesis.
Consistent Test and LRT

A test of a hypothesis H 0 against a class of alternatives H1 is said to be consistent if, when any number of H1
holds, the probability of rejecting H 0 tends to 1 as sample size tends to infinity. If c* is the CR and X the sample
point, we may write,
lt P ⎡⎣ X ∈ c* | H1 ⎤⎦ = 1 , n be the sample size.

n →∞
The LRT is a consistent test. We have that under a very generally satisfied condition, the MLE θˆ of a parameter
vector θ is consistent. If we are dealing with a situation in which all the MLE’s are consistent, we see from the
definition of the LRT statistic that, as sample size increases,
λ→
(
L x | θ r0 , θ s ) " " " (1)
L ( x | θr , θs )
where, θ r , θ s are the true values of the parameters and θ r0 is the hypothetical values of θ r being tested. Thus,
when H 0 holds
λ → 1, in probability
and the critical region
λ ≤ cα
will therefore have its boundary cα approaching 1 . When H 0 does not hold, the limiting values of λ is (1) will
some constant k satisfying
0 ≤ k <1
and thus we have
P [ λ ≤ cα ] → 1
Therefore, LRT is consistent.
Show that under certain regularity condition −2 ln λ ~ χ 2 with 1 d . f . where λ is LR.
Let x1 , ." , xn be a sample from N µ , σ ( 2

) . Let the hypothesis be
H 0 : µ = µ0
H1 : µ ≠ µ0 where, σ 2 is known
n
⎛ 1 ⎞ exp ⎡ − 1 2⎤
⎜
⎝ σ 2π ⎠
⎟ ⎢⎣ 2σ 2 ∑ ( xi − µ0 ) ⎥⎦
Then, λ= n
[ where, the MLE of µˆ = x ]
⎛ 1 ⎞ exp ⎡ − 1 2⎤
⎜
⎝ σ 2π ⎠
⎟ ⎢⎣ 2σ 2 ∑ ( xi − x ) ⎥⎦
⎡ 1
{
= exp ⎢ − 2 ∑ ( xi − µ0 ) − ∑ ( xi − x ) ⎥
⎣ 2σ
2 2 ⎤
⎦
}
⎡ 1
{
= exp ⎢ − 2 nx 2 − 2nµ0 x + nµ02 ⎥
⎣ 2σ
⎤
⎦
}
⎡ n 2⎤
= exp ⎢ − 2 { x − µ0 } ⎥
⎣ 2σ ⎦
n
⇒ ln λ = − { x − µ 0 }2
2σ 2
n { x − µ 0 }2
⇒ − 2 ln λ = { x − µ 0 }2 =
σ 2
σ2
n
{ x − µ0 }2
If n is large then ~ χ (21)
σ2
n
Thus, −2 ln λ ~ χ (21) where λ is LR.
BCR or LRT is a function of sufficient statistic

Let we have,
H 0 : θ = θ0
H1 : θ = θ1
By LRT and Naymen-Pearson lemma, BCR is given by

L (θ 0 )
≤k " " " (1) where, k is a positive number
L (θ1 )
Let, t be sufficient statistic for θ , then by factorization criteria, we get
L (θ 0 ) = h ( t | θ 0 ) k ( x ) and L (θ1 ) = h ( t | θ1 ) k ( x )
where k ( x ) is the function of x and independent of θ , thus from (1) we get,
L (θ 0 ) h ( t | θ0 ) k ( x )
= ≤k
L (θ1 ) h ( t | θ1 ) k ( x )
h ( t | θ0 )
⇒ ≤k
h ( t | θ1 )
Hence, LRT or BCR is a function of sufficient statistic.

Monotone Likelihood Ratio (MLR)
Monotone Likelihood Ratio (MLR)

A joint p.d . f . L ( x | θ ) is said to have a monotone likelihood ratio (MLR) in the statistic T = t ( x ) if for any two
L ( x | θ2 )
values of the parameter θ1 < θ 2 , the ratio depends on X . Thoroughly the function t ( x ) and this ratio is
L ( x | θ1 )
a non-decreasing function of t ( x ) .
Example
Let X ~ b ( m, θ ) then we have
⎪⎧ ⎛ m ⎞ ⎪⎫
m
L ( x | θ ) = ⎨∏ ⎜ ⎟ ⎬θ ∑ i (1 − θ ) ∑ i
x mn − x
⎩⎪ i =1 ⎝ i ⎠ ⎭⎪
x
If θ 2 > θ1 , then
θ 2∑ i (1 − θ 2 ) ∑ i
mn − x
L ( x | θ2 )
x
=
L ( x | θ1 ) θ ∑ i (1 − θ ) ∑ i
x mn − x
1 1
∑ xi mn − ∑ xi
⎛θ ⎞ ⎪⎧ (1 − θ 2 ) ⎪⎫
=⎜ 2 ⎟ ⎨ ⎬
⎝ θ1 ⎠ ⎩⎪ (1 − θ1 ) ⎭⎪
⎛ θ (1 − θ1 ) ⎞∑ i ⎧⎪ (1 − θ 2 ) ⎫⎪
x mn
= ⎜⎜ 2 ⎟⎟ ⎨ ⎬
⎝ θ1 (1 − θ 2 ) ⎠ ⎩⎪ (1 − θ1 ) ⎭⎪
is a non-decreasing function of ∑ xi , where ∑ xi = T = t ( x ) . Hence L ( x | θ ) has MLR in ∑ xi .
Uses
Distribution having MLR proving UMP test for testing simple H 0 against one sided H1 .
Example
1 −x
Let X ~ exp (θ ) , then f ( x) = e θ ; θ >0 , x>0
θ
−
∑ xi
1
We have, L(x |θ ) = e θ
θn
⎛ ∑ xi ⎞
exp ⎜ −
1 ⎟
L ( x | θ2 ) ⎜ θ2
θ 2n ⎟
= ⎝ ⎠
L ( x | θ1 )
1 exp ⎜ − ∑ i
⎛ x ⎞
⎟
θ1n ⎜ θ1 ⎟
⎝ ⎠
n
⎛θ ⎞ ⎛ ⎧θ − θ ⎫ ⎞
= ⎜ 1 ⎟ exp ⎜⎜ −∑ xi ⎨ 2 1 ⎬ ⎟⎟
⎝ θ2 ⎠ ⎝ ⎩ θ1θ 2 ⎭ ⎠
L ( x | θ2 )
For θ 2 > θ1 ,
L ( x | θ1 )
is a non-decreasing function of ∑ xi . So that L ( x | θ ) has MLR in ∑ xi .
Monotone Likelihood Ration (MLR) ~ 1 of 10
Example
Let X ~ N (θ , 1) , then we have,
1 ⎛ 1 2⎞
f ( x) = exp ⎜ − { x − θ } ⎟
2π ⎝ 2 ⎠
We have,
n
⎛ 1 ⎞ ⎛ 1 ⎞
L(x |θ ) = ⎜ ∑ { xi − θ }
2
⎟ exp ⎜ − ⎟
⎝ 2π ⎠ ⎝ 2 ⎠
n
⎛ 1 ⎞ ⎛ 1 ⎞
∑ { xi − θ2 }
2
L ( x | θ2 ) ⎜ ⎟ exp ⎜ − 2 ⎟
⎝ 2π ⎠ ⎝ ⎠
∴ =
L ( x | θ1 ) ⎛ 1 ⎞
n
⎛ 1 ⎞
∑ { xi − θ1}
2
⎜ ⎟ exp ⎜ − 2 ⎟
⎝ 2π ⎠ ⎝ ⎠
⎛ 1 ⎞
∑ ∑ ∑ xi2 − ∑ xiθ1 + 2 θ12 ⎟⎠
n 1 n
= exp ⎜ − xi2 + xiθ 2 − θ 22 +
⎝ 2 2 2
⎛
⎝
∑
n
= exp ⎜ xi (θ 2 − θ1 ) − θ 22 − θ12 ⎟
2
⎞
⎠
( )
which is a non-decreasing function of ∑ xi . So L ( x | θ ) has MLR in ∑ xi .
Example
Let x1 , " , xn ~ U ( 0, θ ) , θ > 0 , then we have
1
f ( x) =
θ
The joint p.d . f . of x1 , " , xn is
1
L(x |θ ) = ; 0 ≤ max xi ≤ θ
θn
Let θ 2 > θ1 and the ratio

n
⎛1⎞
⎜ ⎟ I ( max xi ≤θ2 )
L ( x | θ2 ) θ
= ⎝ 1 ⎠n
L ( x | θ1 ) ⎛ 1 ⎞
⎜ ⎟ I ( max xi ≤θ1 )
⎝ θ2 ⎠
n
⎛ θ ⎞ I ( max xi ≤θ2 )
=⎜ 2 ⎟
⎝ θ1 ⎠ I ( max xi ≤θ1 )
I ( max xi ≤θ 2 ) ⎪⎧1 ; max xi ∈ [ 0, θ1 ]

Let R ( x ) = =⎨
I ( max xi ≤θ1 ) ⎪⎩∞ ; max xi ∈ [θ1 , θ 2 ]
L ( x | θ2 )
Define R ( x ) = ∞ if max xi > θ 2 . It follows that is a non-decreasing function of max xi and the L ( x | θ )
L ( x | θ1 ) 1≤ i ≤ n
has an MLR in max xi .

1≤ i ≤ n

Theorem: The one-parameter exponential family
L ( x | θ ) = exp {Q (θ ) T ( x ) + S ( x ) + D (θ )}
where Q (θ ) is non-decreasing, has an MLR in T ( x ) .
Proof
For θ 2 > θ1 , Q (θ 2 ) > Q (θ1 ) and thus
L ( x | θ2 )
L ( x | θ1 )
{
= exp T ( x ) ⎡⎣Q (θ 2 ) − Q (θ1 ) ⎤⎦ + ⎡⎣ D (θ 2 ) − D (θ1 ) ⎤⎦ }
which is non-decreasing function in T ( x ) . Hence the exponential family has in MLR .
Example
L ( x | θ2 ) 1 + ( x − θ1 )
2
Let X ~ c (1, θ ) then we have, = →1 as x → ±∞
L ( x | θ1 ) 1 + ( x − θ2 )
2
So that c (1, θ ) does not have an MLR.
Theorem: If a joint p.d . f . L ( x | θ ) has MLR in the statistic T = t ( x ) then there exists a UMP test for testing H 0 : θ = θ 0
against H1 : θ > θ 0 .
Proof
We know that, for testing a simple H 0 : θ = θ 0 against a simple H1 : θ = θ1 ( > θ0 ) there exists a BCR ω0 such that
L ( x | H1 )
≥ a constant " " " (1)
L ( x | H0 )
Since the ratio of the likelihood function is non-decreasing function of t ( x ) . For θ1 > θ 0 , the BCR determined by (1)
is also given by
t ( x ) ≥ k1 if inside ω0 " " " ( 2)

Let the size and power function of this test be α and P (θ ) respectively.
∴ P (θ 0 ) = α
The BCR for testing H 0 : θ = θ 0 against one H1 : θ = θ1 ( > θ0 ) is given by
L ( x | θ2 )
≥ a constant which is inside the C.R.
L ( x | θ0 )
⇒ t ( x ) ≥ k2 " " " ( 3)
If we take k1 = k2 in ( 3) , the CR obtained is identical with ω0 defined in ( 2 ) and is still most powerful for testing
θ = θ 0 against θ = θ 2 ( > θ 0 ) with size of the region α ′ = P (θ ′ ) . As the test is most powerful P (θ 2 ) > P (θ 0 ) .
Thus the power function P (θ ) is strictly increasing for P (θ ) < 1 .
Therefore, for testing θ = θ 0 , the critical region defined by equation ( 2 ) can be used with size less than or equal to
α . The power of the test for nay alternative θ1 > θ0 is maximum and this is so for all alternatives greater than θ 0 .
Hence, the critical region given by ( 2 ) is a UMP for testing θ = θ 0 against θ > θ 0 .

Example
Let X ~ N ( µ , 1) then we have
n
⎛ 1 ⎞ ⎛ 1 2⎞
L ( x, µ ) = ⎜ ⎟ exp ⎜ − ∑ { xi − µ} ⎟
⎝ 2π ⎠ ⎝ 2 ⎠
L ( x | µ1 )
For µ1 > µ0 ,
L ( x | µ0 )
⎛ 1
(⎞
= exp ⎜ nx ( µ1 − µ0 ) + µ12 − µ02 ⎟
⎝ 2 ⎠
)
This is an increasing function of x . So there exist a UMP test for testing H 0 : µ = µ0 against H1 : µ ≥ µ0 .
Theorem: Suppose that x1 , " , xn has joint p.d . f . of the form
f ( x ; θ ) = C (θ ) h ( x ) exp ⎡⎣ q (θ ) l ( x ) ⎤⎦
where q (θ ) is an increasing function of θ then there exists a UMP test.
Example
Consider a random sample of size n from Poisson population with parameter µ , then we have
e − nµ µ ∑ i
x
L(x | µ) = ; xi = 0, 1, "
∏ xi !
= e − nµ (∏ xi !)
−1
exp ⎡ln µ ∑
⎣⎢ ( xi
)⎤⎦⎥
where
q ( µ ) = ln µ t ( x ) = ∑ xi
(∏ xi !)
−1
C ( µ ) = e − nµ h ( x) =
So there will exists a UMP test of size α for testing H 0 : µ = µ0 against H1 : µ > µ0 .
So P ⎡⎣T = ∑ xi ≥ k | H 0 ⎤⎦ = α
Randomized Test
A test γ of a hypothesis H is defined to be a randomized test if γ is defined by the function
ψ γ ( x1 , " , xn ) = P ⎣⎡ H is rejected | ( x1 , " , xn ) is observed ⎦⎤
The function ψ γ ( ⋅, " , ⋅) is called the critical function of the test γ .
For example, let x1 , " , xn be a random sample from f ( x ; θ ) = ϕθ , 25 ( x ) . Consider H : θ ≤ 17 . If γ be a test of
H is toss a coin and reject H iff head appears, then γ is a randomized test.
Non-Randomized Test
Let a test γ of a statistical hypothesis H be defined as follows:
Reject H if and only if ( x1 , " , xn ) ∈ cr , where cr is a subset of the sample space χ ; then γ is called a non-
randomized test and cr is called the critical region of the test γ .

For example, let x1 , " , xn be a random sample from f ( x ; θ ) = φθ , 25 ( x ) . χ is the Euclidean n space. Consider
5
H : θ < 17 and the test γ : Reject H if and anly if x > 17 + , then γ is non-randomized and
n
⎧ 5 ⎫
cr = ⎨( x1 , " , x1 ) : x > 17 + ⎬.
⎩ n⎭
Theorem: Let x1 , " , xn be a random sample of size n from a p.d . f . f ( x ; θ ) which depends continuously is a single
parameter θ belongs to a parametric space Ω i.e. θ ∈ Ω . Let the likelihood function L ( x | θ ) have MLR in
T ( x ) = t ( x1 , " , xn ) . Then for testing H 0 : θ = θ 0 against H1 : θ > θ 0 there exists a UMP test φ ( x1 , " , xn ) of size
α given by
⎧1 ; if T ( x1 , " , xn ) > k
⎪
φ ( x1 , " , xn ) = ⎨γ ; if T ( x1 , " , xn ) = k
⎪
⎩0 ; if T ( x1 , " , xn ) < k
where γ and k are non-negative constants satisfying Eθ0 ⎡⎣φ ( x1 , " , xn ) ⎤⎦ = α .
Proof
Since L ( x | θ ) has MLR in T ( x1 , " , xn ) for any θ1 > θ 0 and a constant k ,
⎧> k
L ( x | θ1 ) ⎪
⎨= k " " " (1)
L ( x | θ0 ) ⎪
⎩< k
⎧> c
⎪
is equivalent to T ( x1 , " , xn ) ⎨ = c for some constant c .
⎪< c
⎩
Hence by Neyman-Pearson lemma there exists a test
⎧1 ; if T ( x1 , " , xn ) > k
⎪
φ ( x1 , " , xn ) = ⎨γ ; if T ( x1 , " , xn ) = k
⎪
⎩0 ; if T ( x1 , " , xn ) < k
which is most powerful of size α for testing H 0 : θ = θ 0 against any simple alternative provided θ1 > θ 0 .
⎧1
⎪
Furthermore, for any pair (θ ′, θ ′′ ) with θ ′ ≤ θ ′′ the test φ ( x1 , " , xn ) = ⎨γ is most powerful for testing a simple
⎪0
⎩
H 0 : θ = θ ′ against a simple H 0 : θ = θ ′′ for size α . So, if we find the power for this test then it will be more
powerful. Therefore, there exist a UMP test for testing θ = θ 0 against θ = θ1 > θ 0 .
Example
Let x1 , " , xn be a random sample of size n drawn form N 0, σ ( 2

) . Find the UMP test of size α for H 0 : σ = σ 0
2 2
against H1 : σ > σ 02 , where σ 02 is specified.

2
Solution
n x2
⎡ 1 ⎤ − 12 ∑ i2
We have, L(x |σ ) = ⎢ ⎥ e σ
⎣⎢ 2πσ ⎥⎦
2

n
⎛ 1 ⎞ 2 ⎛ 1 xi2 ⎞
L( )x | σ 22
⎜⎜ 2
σ
⎟⎟ exp ⎜⎜ − ∑ ⎟
σ 22 ⎟⎠
Now, =⎝ 2 ⎠ ⎝ 2
L(x |σ ) 2 n
⎛ 1 ⎞ 2 ⎛ 1 x2 ⎞
∑ σi2 ⎟⎟
1
⎜⎜ 2 ⎟⎟ exp ⎜⎜ −
⎝ σ1 ⎠ ⎝ 2 1 ⎠
n
⎛σ 2 ⎞ 2 ⎛1 ⎡ 1 1 ⎤⎞
= ⎜ 12
⎜σ ⎟⎟
exp ⎜
⎜2 ∑ xi2 ⎢⎢σ 2 − σ 2 ⎥⎥ ⎟⎟
⎝ 2 ⎠ ⎝ ⎣ 1 2 ⎦⎠
which is a non-decreasing function of ∑ xi2 for σ 2 > σ 1 . So there exist a MLR. So that
2 2
⎧1
⎪
; if ∑ xi2 > c
⎪
φ ( x1 , " , xn ) = ⎨γ ; if ∑ xi2 = c
⎪
⎪⎩0 ; if ∑ xi2 < c
where γ and c are constant.
Now, we have to find out the value of c , we have,
∑ xi2 ≥ c | H 0 ⎤⎦ = α
P ⎡⎣
⎡ ∑ xi2 c ⎤
⇒ P⎢ ≥ | H0 ⎥ = α
⎢⎣ σ 0 σ 02 ⎥⎦
2
⎡ c ⎤
⇒ P ⎢χ 2 ≥ 2 | H0 ⎥ = α
⎣⎢ σ0 ⎦⎥
∑ xi2 ≥ c , then
c
Thus may be real from the table and c determined. If H 0 is rejected at the significance level
σ 02
α , otherwise H 0 is accepted.
If we take n = 10, α = 0.05, σ 0 = 2 then from χ

2 2
table we have,
c
= 18.307
σ 02
⇒ c = 18.307 × 2 = 36.614
If ∑ xi2 ≥ 36.614 then we reject H 0 : σ 02 = 2 , otherwise we accept H 0 .
Theorem: Let f ( x | θ ) be a continuous density function of a random variable x . If the likelihood function L ( x ; θ ) of n
independent observation is differentiable with respect to θ under the sign of integration, the derivative L′ ( x ;θ ) of
L ( x ; θ ) with regard to θ is everywhere continuous in θ and does not vanish identically the sub-space and for
testing a sample H 0 : θ = θ 0 defining the family of alternatives, there does not exist a UMP test for both negative
and positive values of (θ − θ 0 ) .
Proof
Let H1 : θ = θ1 be a simple alternatives. The likelihood functions under H 0 and H1 are L ( x | H 0 ) and L ( x | H1 )
respectively. Then expanding L ( x | H1 ) about θ = θ 0 by Taylor series
L ( x | θ1 ) = L ( x | θ 0 ) + (θ1 − θ 0 ) L′ ( x | θ ′ ) " " " (1)

where θ ′ be some value of θ in the interval (θ1 , θ 0 ) .

Let us assume that there exists a UMP test for testing H 0 : θ = θ 0 against H1 : θ ≠ θ 0 . According to Neyman-
Pearson Lemma, the BCR for testing H 0 : θ = θ 0 against H1 : θ = θ1 is
L ( x | θ1 )
≥ k (θ1 ) within the C. R. " " " ( 2)
L ( x | θ0 )
Here k depends on α and sample size. But here fix k and we assume k will depend on θ1 only. Now from (1)
we have,
L ( x | θ1 ) L′ ( x | θ1′ )
= 1 + (θ1 − θ 0 )
L ( x | θ0 ) L ( x | θ0 )
L ′ ( x | θ1′ )
⇒ 1 + (θ1 − θ0 ) ≥ k (θ1 ) ⎡⎣by ( 2 ) ⎤⎦ " " " ( 3)
L ( x | θ0 )
When θ1 = θ 0 , we can write k (θ 0 ) = 1 . Therefore, we can expand k (θ1 ) about θ 0 using again by Taylor’s series
k (θ1 ) = 1 + (θ1 − θ0 ) k ′ (θ ′′ ) " " " ( 4) where θ 0 < θ ′′ < θ1
Using ( 3) and ( 4 ) we have,
L′ ( x | θ1′ )
1 + (θ1 − θ 0 ) ≥ 1 + (θ1 − θ 0 ) k ′ (θ ′′ )
L ( x | θ0 )
⎡ L′ ( x | θ1′ ) ⎤
⇒ (θ1 − θ0 ) ⎢ − k ′ (θ ′′ ) ⎥ ≥ 0 " " " ( 5)
⎣⎢ L ( x | θ 0 ) ⎦⎥
If x denotes the point on the boundary of the BCR defined by equation ( 2 ) then,
L ( x | θ1 )
= k (θ1 )
L ( x | θ0 )
L′ ( x | θ1 )
So that , = k ′ (θ1 ) [by differentiating w. r. to θ1 ]
L ( x | θ0 )
L′ ( x | θ ′′ )
Similarly = k ′ (θ ′′ ) [by differentiating w. r. to θ1 ]
L ( x | θ0 )
Substituting the value of k ′ (θ ′′ ) in ( 5 ) we have,
⎡ L′ ( x | θ ′ ) L′ ( x | θ ′′ ) ⎤
⇒ (θ1 − θ0 ) ⎢ − ⎥≥0 " " " (6)
⎣⎢ L ( x | θ0 ) L ( x | θ 0 ) ⎦⎥
For the C. R. to be UMP must hold good for all θ . Therefore, ( 6 ) must be true identically for all values of θ1 , x
and x within the BCR.
Since (θ1 − θ 0 ) can assume both positive and negative values and for all positive and negative values and for all
⎡ L′ ( x | θ ′ ) L ′ ( x | θ ′′ ) ⎤
these values ( 6 ) must hold good, the expression ⎢ − ⎥ must vanish within the BCR.
⎣⎢ L ( x | θ 0 ) L ( x | θ 0 ) ⎦⎥
The outside of the BCR in equation ( 2 ) is defined by
L ( x | H1 )
< k (θ1 ) " " " (7)
L ( x | H0 )

With the help of the same arguments which lead from ( 2) to (6) we see from (7) and (6) with inequality sign
reversed is true for both positive and negative values of (θ1 − θ0 ) outside the BCR and hence the expression
⎡ L′ ( x | θ ′ ) L′ ( x | θ ′′ ) ⎤
⎢ − ⎥ is zero outside the BCR also.
⎣⎢ L ( x | θ 0 ) L ( x | θ 0 ) ⎦⎥
L′ ( x | θ ′ ) L′ ( x | θ ′′ )
Thus, − = 0 throughout the sample space that is
L ( x | θ0 ) L ( x | θ0 )
L′ ( x | θ ′ ) L′ ( x | θ ′′ )
=
L ( x | θ0 ) L ( x | θ0 )
Therefore, since L′ ( x | θ 0 ) is continuous in θ we get,
L′ ( x | θ 0 ) ∂ ln L ( x ;θ ) ⎤
= ⎥
L ( x | θ0 ) ∂θ ⎦θ =θ0
is a constant and this is the essential condition for the existence of a UMP test for the two sided alternatives. We
have,
∫ L ( x | θ ) dx = 1
S
Since differentiation under the sign of integration is valid this leads to
∂ ln L ( x ;θ ) ⎤
∫ ⎥ L ( x | θ 0 ) dx = 0
∂θ ⎦θ =θ0
This result is similar with the earlier result. i.e.
L′ ( x | θ 0 )
=0
L ( x | θ0 )
⇒ L ( x | θ0 ) = 0
Identically in S (sample space) and this is a contradiction. Hence the theorem.
Example
−( x −θ )
Let us consider f ( x | θ ) = e for testing H 0 : θ = θ 0 against two sided alternative.
L(x |θ ) = e ∑ i
− ( x −θ )
Here
⇒ ln L ( x | θ ) = − ∑ ( xi − θ )
∂ ln L ( x | θ )
⇒ =n
∂θ
Here n is constant, so there exist a UMP test.

Since the lower point of the range of integration depends on the parameter µ , the smallest observation x1 in the
sample is sufficient for µ . Therefore, the probability that x1 < µ1 is zero. Thus
L′ ( x | H 0 ) ⎧⎪∞ ; x1 < µ1
= ⎨ n( µ − µ )
L ( x | H0 ) ⎪⎩e 0 1 ; otherwise
n( µ0 − µ1 )
That is e ≤k " " " (1)
Determine the BCR where k is so chosen as to make its size equal to α . The left hand side of (1) is a constant
and is therefore independent of the observations.
Hence (1) will be satisfied by every C. R. of size α with x1 ≥ µ1 . Thus every such C. R. is of equal power and is
therefore a BCR.
If we permit µ1 to take values greater or less than µ0
⎧∞ ; µ0 ≤ x1 < µ1
⎪ n( µ0 − µ1 )
L ( x | H0 ) ⎪e <1 ; x1 ≥ µ1 > µ0
=⎨
L ( x | H1 ) ⎪e
n ( µ0 − µ1 )
>1 ; x1 ≥ µ0 > µ1
⎪
⎩ 0 ; µ1 ≤ x1 < µ0
The BCR is therefore given by
( x1 − µ0 ) < 0 , ( x1 − µ0 ) > c1
When H 0 hold probability that ( x1 − µ0 ) < 0 is zero, the value of c1 is so chosen as to satisfy the condition
P {( x1 − µ0 ) > c1 | H 0 } = α
This C. R. is BCR for all alternatives µ1 ≠ µ0 and is therefore UMP with respect to these alternatives.
Example
Examine whether a UMP test exists for testing H 0 : θ = θ 0 , σ = σ 0 in N θ , σ 2 . ( )
Solution
If H1 : θ = θ1 , σ = σ 1 is any simple alternative then
L ( x | H1 ) ⎡
1 ⎧⎪ ∑ ( xi − θ1 ) − ∑ ( xi − θ0 ) ⎫⎪⎤
n 2 2
⎛σ ⎞
= ⎜ 0 ⎟ exp ⎢ − ⎨ ⎬⎥ ≥ k
L ( x | H0 ) ⎝ σ1 ⎠ ⎢ 2⎪ σ 12 σ 02 ⎥
⎣ ⎩ ⎭⎪⎦

This may be written as
⎞ ( x − θ1 ) ( x − θ0 ) 2 ⎡ 1 ⎛ σ1 ⎞ ⎤
n
⎛ 1
2 2
1
S2 ⎜ 2 − 2 ⎟⎟ + − ≤ ln ⎢ ⎜ ⎟ ⎥
⎜σ n ⎢ k ⎝ σ0 ⎠ ⎥
⎝ 1 σ0 ⎠ σ12 σ 02 ⎣ ⎦
⎛ 1 1 ⎞ ⎛ 1 1 ⎞ ⎛θ θ ⎞
⇒ S 2 ⎜ 2 − 2 ⎟ + x 2 ⎜ 2 − 2 ⎟ + 2 x ⎜ 02 − 12 ⎟ ≤ constant
⎜σ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 1 σ0 ⎠ ⎝ σ1 σ 0 ⎠ ⎝ σ 0 σ1 ⎠
( )
⎛ 1 1 ⎞ 2 ⎡ θ0σ12 − θ1σ 02 ⎤
⎜⎜ 2 − 2 ⎟⎟ S + { x − δ } ≤ constant ⎢δ =
2
⇒ ⎥
⎝ σ1 σ 0 ⎠ ⎣⎢ σ 02σ12 ⎦⎥
⇒ (σ 2
0 − σ12 )∑(x −δ ) i
2
≤ constant
This means that if σ 0 > σ 1 , the BCR is bounded by a hyper sphere centered at (δ , " , δ ) where δ itself is
dependent on H1 . When σ 1 > σ 0 , the BCR lies outside this sphere. In both cases the BCR changes with the
alternative.
Therefore, there does not exist ant UMP test for any set of alternatives.
Example
Examine for what values of λ there exists a UMP test for H 0 : µ = µ0 , λ = λ0 in the distribution
1
1 − ( x−µ )
f ( x ; µ, λ ) = e λ ; µ≤x≤∞
λ
Solution
∑ xi then
1
If H1 : µ = µ1 , λ = λ1 is any simple hypothesis and if x =
n
L ( x | H1 )
n
⎛λ ⎞ ⎡ ⎧ x − µ0 x − µ1 ⎫⎤
= ⎜ 0 ⎟ exp ⎢ n ⎨ − ⎬⎥
L ( x | H0 ) ⎝ λ1 ⎠ ⎣⎢ ⎩ λ0 λ1 ⎭⎦⎥
n
⎛λ ⎞ ⎡ ⎧1 1⎫ ⎛ µ µ ⎞⎤
= ⎜ 0 ⎟ exp ⎢ nx ⎨ − ⎬ + n ⎜ 1 − 0 ⎟ ⎥
⎝ λ1 ⎠ ⎢⎣ ⎩ λ0 λ1 ⎭ ⎝ λ1 λ0 ⎠ ⎥⎦
The BCR for the above H 0 and H1 is defined by the relation
⎧1 1⎫
x ⎨ − ⎬ ≥ Constant
⎩ λ0 λ1 ⎭
Thus, UMP tests exist separately for λ1 > λ0 and λ1 < λ0 irrespective of the value of µ1 .

Statistical Inference

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Statistical Inference

Загружено:

Авторское право:

Доступные форматы

STATISTICAL INFERENCE

be a vector. A statistic T = t ( X1 , X 2 , , Xn ) is defined to be a sufficient statistic if and only if the

T = t ( X1 , X 2 , , Xn ) is defined to be a sufficient statistic if and only if the conditional distribution of

T = t ( X1 , X 2 , , X n ) for which the conditional distribution of T given T ′ depends on θ .

X1 , X 2 , , X n given T1 = t1 , , Tr = tr does not depend on θ .

sufficient and t ( ⋅, ⋅, ⋅) is not.

Now, we have f x1 , x2 , x3 |S =1 ( 0, 1, 0 |1) = P [ X1 = 0; X 2 = 1; X 3 = 0 | S = 1]

from the density f ( ⋅ ; θ ) , where the parameter θ may be a vector. A statistic T = t ( X1 , X 2 , , X n ) is

n from the density f (⋅ ; θ ) , where the parameter θ may be a vector. A statistics

T1 = t1 ( X 1 , X 2 , , Xn ), , Tr = tr ( X 1 , X 2 , , X n ) is jointly sufficient if and only if the joint density of

g {t1 , , tr ;θ } is nonnegative and depends on x1 , , xn only through the function t1 ( ⋅, , ⋅) , , tr ( ⋅, , ⋅) .

p.d . f . fθ ( x1 , , xn ) , θ ∈ Θ . if the statistic t ( X 1 , , X n ) is such that fθ {t ( X 1 , , X n )} = ψ (θ ) ∀ θ , and if

If eθ ( tn ) ≡ 1 , then tn is called efficient and a Best Regular Unbiased Estimator (BRUE).

f ( x ;θ ) , where θ is a one dimensional parameter belongs to parametric space Ω . Let T be an unbiased

that there exists a ψ ∈ Ω such that θ ≠ ψ and S (θ ) = { fθ ( x ) > 0} ⊃ S (ψ ) = { fψ ( x ) > 0} then

Proof: Since T is unbiased for τ (θ ) , Eψ (T ( X ) ) = τ (ψ ) ∀ ψ ∈ Ω . Hence, for ψ ≠ θ

can say that the regularity condition do not hold.

inequality is applicable. Now for N ≠ N ′ ∈ Ω

Also, PN and PN ′ are different for N ≠ N ′ . Thus

Now, let us take

Uniformly Minimum Variance Unbiased Estimator (UMVUE): Let X1 , X 2 , , X n be a random

sample from f ( x, θ ) . An estimator T * = t * ( X1 , , X n ) of τ (θ ) is defined to be a uniformly minimum

variance unbiased estimator of τ (θ ) if and only if

S1* = E ( S | T1 ) and S2* = E ( S | T2 )

which estimator will have the smaller variance.

Minimal Sufficient Statistics: Suppose X1 , X 2 , (

A statistic T ( x1 , x2 , , xn ) is a minimal sufficient statistic if T ( ⋅) is a sufficient statistic and a function of

every other sufficient statistic that is

t ( x ) = t ( y ) then T ( x ) is a minimal sufficient statistic for θ .

Example: Suppose x1 , x2 , , xn are independent random variables each with N θ , σ 2 , where θ is ( )

Which will be independent of θ iff

Hence t ( x1 , x2 , , xn ) = ∑ xi is a minimal sufficient statistic for θ .

Show that no nontrivial sufficient statistic exists.

Example: Let X be a single observation from the point probability function

The probability distribution of t ( x ) is

Where c1 , c2 , c3 , c4 are distinct constants.

b) For every ε > 0 , lim Pθ ⎣⎡ Tn′ − τ (θ ) > ε ⎦⎤ = 0 for each θ ∈ Ω

n ⎡⎣Tn′ − τ (θ ) ⎤⎦ approaches N ⎡⎣0, σ 2 (θ ) ⎤⎦

d) σ 2 (θ ) is not less than σ ′2 (θ ) for all θ in any open interval.

BAN is sometimes replaced by consistent asymptotically normal efficient (CANE).

µ . Since the limiting distribution of ( )

variance is any interval of µ values.

Best Consistent Unbiased Asymptotically Normal (BCUAN): A CAN estimator TN of g (θ ) is said

every function u ( x ) , the identity Eθ {u ( X )} = 0 implies Pθ {u ( X ) = 0} = 1 for all θ ∈ Ω . This is sometimes

sufficient statistic for λ and check if it is complete.

which is independent of λ if ∑ xi = ∑ yi . So that ∑ xi is a minimal sufficient statistic. The distribution of

∑ xi is also Poisson and has a p.m. f .

equivalent to showing that the Poisson family is complete.

complete minimal sufficient statistic for λ .

statistic for θ and show that it is complete.

So, the p.d . f . of x( n ) is

⇒ u (θ )θ n −1 = 0 [ Differentiating both sides w. r. to θ ]

Ancillary Statistic: A statistic U ( X1 , X 2 , , X n ) is called first order ancillary if Eθ {U ( X1 , X 2 , , X n )} is

a constant independent of θ . U ( X1 , X 2 , , X n ) is called an ancillary statistic for θ if the distribution

function of U ( X1 , X 2 , , X n ) does not depend on θ .

information about θ ) the ancillary statistic should be independent of T ( X 1 , X 2 , , Xn ) .

Example: Let X1 , X 2 , , Xn be a random sample from N ( µ , 1) . Then the statistic

Basu’s Theorem: Let T ( X1 , X 2 , , X n ) be a complete sufficient statistic and U ( X 1 , X 2 , , X n ) an

ancillary statistic. Then T and U are independent random variables.

So, Eθ ⎡⎣ g (T ) − P {U = u}⎤⎦ = 0 . By completeness of T , g ( t ) − P {U = u} = 0 for all t , that is

Hence U and T are independent random variables.

depend on µ . Hence it is an ancillary statistic for µ . Since T ( X 1 , X 2 ) is a complete sufficient statistic, it

fY1 ( t ) = P ⎛⎜ Y1 Y ≤ t ⎞⎟ = P (Y1 ≤ tYn )