26 views

Uploaded by Senthilnathan Nagarajan

statastics

- Sampling Distribution
- Chapters4-5.pdf
- A 417534
- US Federal Reserve: 200231pap
- Chapters4-5.pdf
- articolo Dreassi ridotto
- section8.1
- Comparing Measures of Species Diversity From Incomplete Inventories - An Update
- 06a Fenton Estimation
- How Should Educators Interpret Value-Added Scores Oct 2012 Carnegie Foundation
- Solved examples of Cramer Rao Lower Bound
- Age Lmendin Balkans Prince 2008
- The_human_visual_system's_assumption_that_light_co.pdf
- 8D Measurement Systems
- Stats Notes
- 9709_w12_qp_73
- stats_ch9.pdf
- hasil analisis manova
- it14_belotti.pdf
- Geodetic Sciences Observations Modeling

You are on page 1of 17

Rajender Parsad

I.A.S.R.I., Library Avenue, New Delhi-110 012, India

1. Introduction

Statistics is a science which deals with collection, presentation, analysis and

interpretation of results. The procedures involved with collection, presentation, analysis

and interpretation of results can be classified into two broad categories viz.:

1. Descriptive Statistics: It deals with the methods of collecting, presenting and

describing a set of data so as to yield meaningful information.

2. Statistical Inference: It comprises of those methods which are concerned with the

analysis of a subset of data leading to predictions or inferences about the entire set of

data. It is also known as the art of evaluating information to draw reliable inferences

about the true value of the phenomenon under study.

The main purpose of statistical inference is

1. to estimate the parameters(the quantities that represent a particular characteristic of a

population) on the basis of sample observations through a statistic ( a function of

sample values which does not involve any parameter). (Theory of Estimation).

2. to compare these parameters among themselves on the basis of observations and their

estimates ( testing of hypothesis)

To distinguish clearly between theory of estimation and testing of hypothesis, let us

consider the following examples:

Example 1.1: A storage bin contains 1,00,000 seeds. These seeds after germination will

produce either red or white coloured flowers. We want to know the percentage of seeds

that will produce red coloured flowers. This is the problem of estimation.

Example 1.2: It is known that the chir pine trees can yield an average of 4 kg of resin per

blaze per season. On some trees the healed up channels were treated with a chemical.

We want to know whether the chemical treatment of healed up channels enhances the

yields of resin. This is the problem of testing of hypothesis.

In many cases like above, it may not be possible to determine and test about the value of

a population parameter by analysing the entire set of population values. The process of

determining the value of the parameter may destroy the population units or it may simply

be too expensive in money and /or time to analyse the units. Therefore, for making

statistical inferences, the experimenter has to have one or more samples of observations

from one or more variables. These observations are required to satisfy certain

assumptions viz. the observations should belong to a population having some specified

probability distribution and that they are independent. For example in case of example 1,

ESTIMATION OF PARAMETERS

it will not be natural to compute the percentage of seeds after germination of all the

seeds, as we may not be willing to use all the seeds at one time or there may be lack of

resources for maintaining the whole bulk at a time. In example 2, it will not be possible

to apply the chemical treatment to the healed up channels of all the chir pine trees and

hence we have to test our hypothesis on the basis of a sample of chir pine trees. More

specifically, the statistical inference is the process of selecting and using a sample

statistic (a function of sample observations) to draw inferences about population

parameters(s){a function of population values}.. Before proceeding further, it will not be

out of place to describe the meaning of parameter and statistic.

Parameter: Any value describing a characteristic of the population is called a parameter.

For example, consider the following set of data representing the number of errors made

by a secretary on 10 different pages of a document: 1, 0, 1, 2, 3, 1, 1, 4, 0 and 2. Let us

assume that the document contains exactly 10 pages so that the data constitute of a small

finite population. A quick study of this population leads to a number of conclusions. For

instance, we could make the statement that the largest number of typing errors on any

single page was 4, or we might say that the arithmetic mean of the 10 numbers is 1.5. The

4 and 1.5 are descriptive properties of the population and are called as parameters.

Customarily, the parameters are represented by Greek letters. Therefore, population mean

of the typing errors is = 1.5. It may be noted hat the parameter is a constant value

describing the population.

Statistic: Any numerical value describing a characteristics of a sample is called a

statistic. For example, let us suppose that the data representing the number of typing

errors constitute a sample obtained by counting the number of errors on 10 pages

randomly selected from a large manuscript. Clearly, the population is now a much larger

set of data about which we only have partial information provided by the sample. The

numbers 4 and 1.5 are now descriptive measures of sample and are called statistic. A

statistic is usually represented by ordinary letters of the English alphabet. If the statistics

happens to be the sample mean, we denote it by x . For our random sample of typing

errors we have x =1.5. Since many random samples are possible from the same

population, we would expect the statistic to vary from sample to sample.

Now coming back to the problem of statistical inference: Let x1, x2 ,K, xn be a random

sample from a population which is distributed in a form which is completely known

except that it contains some unknown parameters and probability density function (pdf)

or probability mass function (pmf) of the population is given by f(X,). Therefore, in this

situation the distribution is not known completely until we know the values of the

unknown parameters. For simplicity let us take the case of single unknown parameter.

The unknown parameter have some admissible values which lie on a real line in case of

a single parameter, in a plane for two parameters, three dimensional plane in case of three

parameters and so on. The set of all possible values of the parameter(s) is called the

parameteric space and is denoted by . If is the parameter space, then the set of all

{f(X,) ; } is called the family of pdfs of X if X is continuos and the family of

pmfs of X if X is discrete. To be clearer, let us consider the following examples.

II-48

Estimation of Parameters

Example 1.3: Let X B(n,p) and p is unknown. Then = {p: 0<p<1} and {B(n,p):

0<p<1} is the family of pmfs of X.

Example 1.4: Let X N(, 2), if both and 2 are unknown then = {(, 2) :

<<, 2 >0} and if = 0 , say and 2 is unknown, then = {(0, 2) : 2 >0}.

On the basis of a random sample x1, x2 ,K, xn from a population, our aim is to estimate

the unknown parameter . Henceforth, we shall discuss only the theory of estimation.

The estimation of unknown population parameter(s) through sample values can be done

in two ways:

1. Point Estimation

2. Interval Estimation

In the first case we are required to determine a number which can be taken as the value of

, where as in the second, we are required to determine an interval (a, b) in which the

unknown parameter, is expected to lie. For example, if the population is normal, then a

possible point estimate of population mean is done through sample mean and a possible

interval estimate of mean is ( x 3s, x + 3s), where x =

s2 =

1 n

xi is the sample mean and

n i=1

1 n

(xi x )2 is the sample variance.

n 1 i =1

the observations gives the estimate of the population parameter. On the other hand, an

estimate means the numerical value of the estimator of a given sample. Thus, an

estimator is a random variable calculated from the sample data that supplies either

interval estimates or point estimates for population parameters.

between an estimator and estimate is same as that between a function f regarded as

defined for a range of a variable X and the particular value which the function assumes ,

say f(a) for a specified value of X= a. For instance, if the sample mean x is used to

estimate a population mean (), and the sample mean is 15, the estimator used is the

sample mean whereas the estimate is 15. Thus the statistic which is used to estimate a

parameter is an estimator whereas the numerical value of the estimator is called an

estimate.

2. Point Estimation

A point estimator is a random variable calculated from the sample data that

supplies the point estimates for population parameters.

49

ESTIMATION OF PARAMETERS

Let x1, x2 ,K, xn be a random sample from a population with pdf or pmf as

In general there can be several alternative procedures that can be adopted to obtain the

point estimate of the population parameter. For instance, we may compute arithmetic

mean or median or geometric mean to estimate the population mean. As we never know

the true value of the parameter, therefore, it does not make sense that our estimate is

correct. Therefore, if there are more than one estimator then the question arises, which

among them is better. This means that we must stipulate some criteria which can be

applied to decide whether one estimator is better than the other i.e. although an estimator

is not expected to estimate the population parameter without error. Simultaneously we

dont expect them to be very far off. Following are the criteria which should be satisfied

by a good estimator. These are:

1. Unbiasedness

2. Consistency

3. Efficiency

4. Sufficiency

Unbiasedness: An estimator (T) is said to be unbiased if the expected value of the

estimator is equal to the population parameter () being estimated. For example, if the

same estimator is used repeatedly for all possible samples and we average these values

we would expect the average to be same as true values of the parameter. For instance, if

sample mean ( x ) is an unbiased estimator for population mean () then we must get the

expected value of sample mean equal to population mean. In symbols, E ( x ) = .

1 n

( xi x ) 2 is an unbiased

Similarly variance of the sample observations s2 =

n 1 i =1

2

2

2

estimate of population variance ( ) because E (s ) = .

Steps to check whether a given estimator is unbiased or not

1. Draw all possible samples of a given size from the population

2. Calculate the value of given estimator for all these samples separately.

3. Take the average of all these values obtained in Step 2.

If this average is equal to the population parameter, then the estimator is unbiased and if

this average is more than the population parameter, then the estimator is said to be

positively biased and when this average is less than the population parameter, it is said to

be negatively biased.

Consistency: An estimator (T) is said to be consistent estimator of parameter () if, as

the sample size n is increased, the estimator (T) converges to in probability. It is also

intuitively an appealing characteristic for an estimator to possess. For it says that as the

sample size increases (which should mean, in most reasonable circumstances that more

information becomes available) the estimate becomes better in the sense indicated.

II-50

Estimation of Parameters

1. Show that T is an unbiased estimator of .

2. Obtain the variance of T i.e. variance of all the values of T obtained from all possible

samples of a particular size.

3. Increase the sample size and repeat the above two steps.

4. If variance of T decreases as sample size (n) increases and approaches to zero as n

becomes infinitely large, then T is said to be consistent estimator of .

Efficiency: It is sometimes possible to find more than one estimator which are unbiased

and consistent for population parameter. For instance, in case of a normal population

N(, 2) sample mean ( x ) and sample median (xmd) are unbiased and consistent

estimators for population mean (). However it can easily be seen that Var(xmd)

2

2

>Var ( x ) =

n

2n

as > 1.

2

As the variance of a random variable measures the variability of the random variable

about its expected value. Hence, it intuitively appeals that an unbiased estimator with

smaller variance is preferable to an unbiased estimator with larger variance. Therefore,

in above example, sample mean is preferable to sample median. Thus there is a necessity

of some further criterion which will enable us to choose the best estimator. Such a

criterion which is based on the concept of variance is known as efficiency.

Minimum Variance Unbiased Estimator (MVUE): An estimator (T) is said to be an

MVUE for population parameter if T is unbiased and has the smallest variance among

all the unbiased estimators of . This is also called as most efficient estimator of as it

has the smallest variance among all the unbiased estimators of .

The ratio of variance of an MVUE and variance of a given unbiased estimator is

termed as efficiency of the given unbiased estimator.

theorem, Lehmann-Scheffe Theorem for finding minimum variance unbiased estimators.

Best Linear Unbiased Estimator: An estimator (T) is said to be Best Linear Unbiased

Estimation (BLUE) of if

1. T is unbiased for .

2. T is a linear function of sample observations.

3. T has the minimum variance among all unbiased estimators of which are linear

functions of the sample observations.

Example 2.1: It is claimed that when a particular chemical is applied to some ornamental

plants will increase the height of the plant rapidly in a period of one week. The increase

in heights of 5 plants to which this chemical was applied are given below:

51

ESTIMATION OF PARAMETERS

Plants

Increase in Height (cm)

1

5

2

7

3

8

4

9

5

6

Assuming that the distribution of increase in heights is normal, draw a sample of size 3

with replacement and show that sample mean ( x ) and sample median ( xmd) are unbiased

estimators for the population mean.

Solution:

Step 1: Obtain the population mean.

5 + 7 + 8 + 9 + 6 35

=

= 7 cm.

5

5

Step 2: Draw all possible samples of size three with replacement and obtain their sample

mean and sample median.

Population mean =

Sample Mean

Frequency

15/3

1

16/3

3

17/3

6

18/3

10

19/3

15

20/3

18

21/3

19

22/3

18

23/3

15

24/3

10

25/3

6

26/3

3

27/3

1

Step 3: Obtain the mean of the sample means.

Sample Median

5

6

7

8

9

Frequency

13

31

37

31

13

Mean of the Sample Means= 7 cm. Variance of the sample means=0.667 cm2

Step 4: Obtain the mean of sample medians.

Mean of the sample medians=7cm; Variance of the Sample Medians = 1.328 cm2

The median =7 cm.

Therefore, we can see that both the sample mean and sample median are unbiased

estimators for population mean in case of normal population.

Sufficiency: An estimator (T) is said to be sufficient for the parameter , if it contains

all the information in the sample regarding the parameter. This criterion has a practical

importance in the sense that after the data is collected either from a sample survey or a

designed experiment, the job of a statistician is to draw some statistically valid

II-52

Estimation of Parameters

conclusions about the population under investigation. The raw data by themselves

besides being costly to store are not suitable for this purpose. Therefore, the statistician

would like to condense the data by computing some statistic from them and to base his

analysis on these statistic, provided that there is no loss of information in doing so. In

many statistical problems of statistical inference a function of the observations contain as

much information about the unknown parameter as do all the observed values. To make

it clearer, let us consider the following example:

Example 2.2: Suppose you wish to play a coin tossing game against an adversary who

supplies the coin. If the coin is fair and you win a dollar if you predict the outcome of a

toss correctly and lose a dollar otherwise, then your net expected gain is zero. Since your

adversary supplies the coin you may want to check if the coin is fair before you start

playing the game, i.e., to test H0: p=0.5 against H1: p0.5. You toss the coin n times,

should you record the outcome of each trial or is it enough to know the total number of

heads in n tosses to test H0? Intuitively it seems clear that the number of heads in n

trials contains all the information about the unknown parameter p and precisely this is

the information which we have used so far in the problems of inference concerning p.

Writing xi=1, if the ith toss results in a head and zero otherwise and setting T = T(x1, ...,

n

xn) =

i =1

substantial reduction in data collection and storage if we record the value of T=t rather

than the observation vector (x1, ..., xn) because t can take only n+1 values whereas the

vector can take values numbering 2n. Therefore, whatever decision we make about H0

should depend on the value of t.

It can easily be seen that a trivial statistic T(x1, ..., xn)= (x1, ..., xn) is always a sufficient

but does not provide any reduction in data collection. Hence, is not preferable as our aim

is to condense the data and simultaneously retaining all the information about the

parameter contained in the sample. A sufficient statistic which reduces the data most is

called minimal sufficient statistic.

One of the ways to check whether a given statistic is sufficient or not is that the

conditional distribution of x1, ..., xn given T ( given sufficient statistic) is independent of

population parameter.

Until now, we have discussed several properties of good estimators like unbiasedness,

consistency, efficiency and sufficiency that seems desirable in the context of point

estimation. Thus, we would like to check whether a proposed estimator satisfies all or

some of these criteria. However, if we are faced with a point estimation problem, the

question arises where can we start to look for the estimator. Therefore, it would be

convenient to have one (or several) intuitively reasonable methods of generating possibly

good estimators to study our problem. The principal methods of obtaining point

estimators are:

53

ESTIMATION OF PARAMETERS

1.

2.

3.

4.

Method of moments

Method of minimum chi-square

Method of least squares

Method of maximum likelihood.

The application of the above mentioned methods in particular cases lead to estimators

which may differ and hence possess different attributes of goodness. The most important

method of point estimation is the method of maximum likelihood which provides

estimators with desirable properties.

Method of Maximum Likelihood: To introduce the method of maximum likelihood,

consider a very simple estimation problem. Suppose that an urn contains a number of

black and a number of white balls and suppose that it is known that the ratio of the

numbers is 3:1 but that it is not known whether the black or white balls are more

numerous, i.e., the probability of drawing a black ball is either 1/4 or 3/4. If n balls are

drawn with replacement from the urn. The distribution of X, the number of black balls is

binomial distribution and its probability mass function is given by

f(X, p) = n CX p X q n X for X = 0,1,...,n

where q = 1 - p and p is the probability of drawing a black ball, here p = 1/4 or 3/4. We

shall draw a sample of three balls, i.e., n = 3 with replacement and attempt to estimate the

unknown parameter p of the distribution. The estimation problem is particularly simple

in this case because we have only to choose between the two numbers 1/4 and 3/4. The

possible outcomes of the sample and their probabilities are given below:

Outcome : X

f(X;3/4)

f(X;1/4)

0

1/64

27/64

1

9/64

27/64

2

27/64

9/64

3

27/64

1/64

In the present example, if we found that X=0, the estimate 1/4 for p would be preferred

over 3/4 because the probability 27/64 is greater than 1/64, i.e., because a sample with

X=0 is more likely ( in the sense of having larger probability ) to arise from a population

with p=1/4 than from one with p=3/4. In general, we substitute p by 1/4 when X=0 or

1 and by 3/4 when X=2 or 3. The estimator may thus be defined as

1 / 4, forX = 0 or1

.

3 / 4, forX = 2,3

p$ = p$ ( X ) =

The estimator thus selects for every possible value of X, the value of p, say p$ such that

f ( X ; p$ ) > f ( x , p )

where p is any other value of p, 0<p<1.

Let us consider another experimental situation. A lion turned man eater. The lion has

three possible states of activity each night; they are very active (denoted by 1),

II-54

Estimation of Parameters

moderately active (denoted by 2) and lethargic (denoted by 3). This lion eats i

people with probability P(i/), = {1, 2, 3}. The numerical values are given in

the table below:

Lions Appetite Distribution

i

0

1

2

3

4

.00

.05

.05

.80

.10

P(i/1)

.05

.05

.80

.10

.00

P(i/2)

.90

.08

.02

.00

.00

P(i/3)

If we are told that X=x0 people were eaten last night and asked to estimate the lions

activity state 1, 2 or 3. One seemingly reasonable method is to estimate as that

that provides the largest probability of observing what we did observe. It can easily be

seen that $ (0) = 3 , $ (1) = 3 , $ (2) = 2 , $ (3) = 1 and $ (4) = 1 . Thus maximum

likelihood estimator ( $ ) of population parameter is that the value of which maximizes

the likelihood function, i.e., the joint pdf/pmf of sample observations taken as a function

of .

MLE for population mean : The MLE of population mean , based on a random

sample of size n is the sample mean x and if the variance of the population units Xis is

2, then variance of x is 2/n. Therefore, it can easily be seen that x is unbiased,

consistent, sufficient and efficient estimator of .

MLE for proportion : The MLE of the proportion p in a binomial experiment is given

by p$ = x/n, where x represents the number of successes in n trials. Therefore, the

sample proportion p$ =x/n is MLE for the parameter p. The variance of p$ is p(1-p)/n

and E( p$ )=X/n is unbiased, consistent and sufficient estimator of p.

MLE for population variance: In case of large samples from any population or small

samples from a normal population the MLE of the population variance 2 when

population mean is unknown is given by

1 n

( x i x ) 2 where x1, ...,xn are the sample observations and x is the sample

n i =1

mean.

S2 =

1 2

1 2 4

2

2

E(S ) = (1- ) and variance of S is Var (S ) = (1 )

. It can easily be seen

n n

n

that as n , S2 is consistent estimator for population mean, it can also be proved that

it is asymptotically unbiased and asymptotically efficient estimator for the population

variance. However an exact unbiased estimator for population variance is s2 =

1 n

( x i x ) 2 . Therefore, it can be inferred that MLEs are not in general unbiased.

n 1 i =1

2

55

ESTIMATION OF PARAMETERS

in the above case if we multiply S2 by n/(n-1) we get s2, an unbiased estimator for 2.

Point estimators, however, are not good estimators of the population parameters in the

sense that even an MVUE is unlikely to estimate the population parameter exactly. It is

true that our accuracy increases with large samples, but still there is no reason why we

should expect a point estimate from a given sample to be exactly equal to the population

parameter it is supposed to estimate. Point estimators fail to throw light on how close

we can expect such an estimator to the population parameter, we wish to estimate.

Thus, we cannot associate a probability statement with point estimators. Therefore, it

would be desirable to determine an interval within which we should expect to find the

value of the parameter with some probability statement associated with it. This is done

through the interval estimation.

3. Interval Estimation

An interval estimator is a formula that tells us how to use sample data to calculate an

interval that estimates a population parameter.

Let x1, x2,..., xn be a sample from a population with pdf or pmf as f (x,), . Our aim

is to find two estimators T1 = T1 (x1,...,xn) and T2 = T2(x1...,xn) such that

P{T T2} = 1-.

Then the interval (T1, T2) is called the 100(1-)% confidence interval (CI) estimator with

confidence coefficient 100(1-)% as the confidence coefficient. Therefore, the

confidence coefficient is the probability that an interval estimator encloses the population

parameter if the estimator is used repeatedly a large number of times. T1, T2 are the lower

and upper bounds of the CI where for a particular application we substitute the

appropriate numerical values for the confidence, and the lower and upper bounds. The

above statement reflects our confidence in the process rather than in the particular

interval formed. We know that 100 (1-)% of the resulting intervals will contain the

population parameter. There is usually no way to determine whether a particular interval

is one of those which contain the population parameter or one that does not. However,

unlike point estimators, confidence intervals have some measure of reliability, the

confidence coefficient, associated with them, and for that reason preferred to point

estimators.

Thus to obtain a 100 (1-)% confidence interval if = .05, we have a 95% confidence

interval, and when = .01, we obtain a wider 99% confidence interval. The wider the

confidence interval is, the more confident we can be that the given interval contains the

unknown parameter. Of course, it is better to be 95% confident that the average life of a

machine is between 12 to 15 years than to be 99% confident that it is between 8 to 18

years. Ideally, we prefer a short interval with a high degree of confidence. Sometimes,

restrictions on the size of our sample prevent us from achieving short intervals without

sacrificing some of our degree of confidence.

II-56

Estimation of Parameters

Consider a sample has been selected from a normal population or failing this if n is

sufficiently large. Let the population mean is and population variance is 2.

Confidence Interval for , known

If x is the mean of a random sample of size n from a population with known variance 2,

a 100 (1-)% confidence interval for is

x - Z / 2

< < x +

Z / 2

The 100(1-)% provides an estimate of accuracy of our point estimate. If x is used as an

estimate of , we can then be 100 (1-)% confident that the error will not exceed

Z / 2

Frequently, we wish to know, how large a sample is necessary to ensure that the error in

estimating the population mean will not exceed a specified amount e. Therefore, by

=e.

n

If x is used as an estimate of , we can be 100(1-)% confident that the errors by e

above or e below or width of the interval will not exceed W=2e when the sample size

is

Z

Z

n = / 2 or n = 4 /22 2

W

e

2

when solving for the sample size, n, all fractional values are rounded up to the next whole

number.

When the value of is unknown and sample size is large, then, it can be replaced by

1 n

2

(xi - x) 2 and the above formulae

sample standard deviation S, where S =

n i =1

can be used.

Example 3.1: Unoccupied seats on flights cause the airlines to lose revenue. Suppose a

large airline wants to estimate its average number of unoccupied seats per flight over the

past year. To accomplish this, the records of 225 flights are randomly selected and the

number of unoccupied seats is noted for each of the sample flights. The sample mean

and standard deviation are

57

ESTIMATION OF PARAMETERS

x = 11.6 seats

S = 4.1 seats

Estimate , the mean number of unoccupied seats per flight during the past year using a

90% confidence interval.

Solution: For 90% confidence interval = 0.10. The general form for large sample 90%

confidence interval for a population mean is

S

4.1

x Z / 2

= 11.6 1.645.

= 11.6 .45

n

225

(11.5, 12.05). That is the airline can be 90% confident that the mean number of

unoccupied flights was between (11.15, 12.05) during the sampled year.

In this example, we are 90% confident that the sample mean x differs from the true mean

by no more than 0.45. If in the above example, we want to know the sample size, so that

our estimate is not off by more than 0.05 seats. Then we can obtain:

2

1.645 4.1

0.05 = 1.645 4.1 / n which implies n =

=18195.3121. However, if we can

0.05

2

1.645 4.1

= 45.49 46 is

1

enough.

Exercise 1: The mean and standard deviation for quality grade-point averages of a

random sample of 36 college seniors are calculated to be 2.6 and 0.3, respectively. Obtain

95% and 99% confidence intervals for the entire senior class. (Z0.05=1.96 and

Z0.01=2.575).

Small sample confidence interval for , unknown

If x and s are mean and standard deviations of a random sample of size n < 30 from an

approximate normal population with unknown variance 2, a 100 ((1-)% confidence

interval for is

s

s

x - t / 2

< < x + t / 2

n

n

where t/2 is the t-value with n - 1 degrees of freedom leaving an area of /2 to the right.

Estimating the difference between two population means

Confidence Interval for 1 - 2 , 12 and 22 known: If x1 and x2 are the means of

independent random samples of size n1 and n2 from populations with known variances

12 and 22 respectively, a 100(1-)% confidence interval for 1-2 is given

( x1

- x2 ) - Z / 2

12

n1

22

n2

< 1 - 2 < ( x1 x2 ) + Z / 2

II-58

12

n1

22

n2

Estimation of Parameters

The above CI for estimating the difference between two means is applicable if 12 and

22 are known or can be estimated from large samples. If the sample sizes are small i.e.

n1 and n2 are small (<30) and 12 and 22 are unknown, the above interval will not be

reliable.

Small-sample Confidence Interval for 1-2; 12 = 22 = 2 unknown : If x1 and

x2 are the means of small independent random samples of sizes n1 and n2 respectively,

from approximate normal populations with unknown but equal variances a 100(1-)%.

CI for 1-2 is given by

1 1

1 1

+

< 1 - 2 < ( x1 - x2 ) + t / 2 sp

+

( x1 - x2 ) t / 2 sp

n1 n2

n1 n2

(n1 1) s12 + (n 2 1) s22

2

sp =

n1 + n 2 2

and t/2 is the t-value with n1+n2-2 degrees of freedom, leaving an area of /2 to the right.

Small sample confidence interval for 1-2; 12 22 unknown : If x1 and s12 and x2

and s22 , are the means and variances of small independent small samples of size n1 and n2

respectively, from approximate normal distributions, with unknown and unequal

variances, an approximate 100(1-)% confidence interval for 1-2 is given by

(x1 - x2 ) t/2

s12 s22

+

< 1 2 < ( x1 - x2 ) + t/2

n1 n2

s12 s22

+

n1 n2

2

s2 / n 2

s2 / n 2

s12

s22

1 1

2

2

+

+

( n2 1)

n2

n1

( n1 1)

Confidence Interval for D = 1 -2 for paired observations

If d and sd are the mean and standard deviation of the differences of n random pairs of

measurements, a 100(1-)% confidence interval for D = 1 - 2 is

s

s

d - t / 2 d < d < d + t / 2 d

n

n

where t/2 is the t-value with n-1 degrees of freedom, leaving an area of /2 to the right.

Example 3.2: A random sample of size 30 were taken from an apple orchard.

Distribution of weights of apples is given below:

59

ESTIMATION OF PARAMETERS

Wt in (gms): 125

frequency :

1

150

4

175

3

200

5

225

4

250

7

275

4

300

1

325

1

350

0

Construct a 95% confidence interval for population mean i.e. average weight of apples if

i) The population variance is given to be 46.875gm

ii) If the population variance is unknown.

Solution :

fi xi

= 220.833

fi

Step 3: Obtain the Interval as follow

, x + Z / 2

x - Z / 2

= (218.38, 223.28)

n

n

ii)

n

1

s2 =

(xi x) 2 = 2503.592

n -1 i =1

Step 2: See the value of t29 (.025) = 2.045

Step 3 : Obtain the confidence interval as

s

s

, x + t n-1, / 2

x - t n-1, / 2

=(202.152, 239.512)

n

n

random sample of size n, and q$ = 1 - p$ , an approximate 100(1 - ) % confidence

interval for the binomial parameter p is given by

$$

$$

pq

pq

p$ - Z/2

< p < p$ + Z/2

,

n

n

The method for finding a confidence interval for the binomial parameter p is also

applicable when the binomial distribution is being used to approximate the

hypergeometric distribution, that is, when n is small relative to N, population size.

Error in Estimating p: If p$ is used as an estimate of p, then we can be 100(1 - )%

II-60

$ $ / n.

pq

Estimation of Parameters

Sample Size for Estimating p: If p$ is used as an estimate of p, then we can be 100(1 ) % confident that the error will not exceed a specified amount e above or below when

the sample size is

$$

Z 2 pq

n= /2 .

e2

The above result is somewhat misleading in the sense that we must use p$ to determine

the sample size n, but p$ is computed from the sample. If a crude estimate of p can be

made without taking a sample, we could use this value for p$ and then determine n.

Lacking such an estimate, we could take a preliminary sample of size n 30 to provide an

estimate of p. Then using the above result regarding the sample size, we could determine

approximately how many observations are needed to provide the desired degree of

accuracy. Once again, all fractional values of n are rounded up to the next whole

number.

Therefore, if we substitute p$ = 1/2 into the formula for n. When, in fact, p actually

differs from 1/2 then n will turn out to be larger than necessary for the specified degree of

confidence and as a result our degree of confidence will increase.

If p$ is used as an estimate of p, we can be at least 100(1-)% confident that the error

will not exceed a specified amount e when the sample size is

Z2

n = /22

4e

Large - Sample Confidence Interval for p1 - p2

If p$ 1 and p$ 2 are the proportion of success in random samples of size n1 and n2 ,

respectively, q$ 1 = 1 - p$ 1 and , q$ 2 = 1 - p$ 2 , an approximate 100(1-)% confidence

interval for difference to two binomial parameters , p1 - p2 , is given by

p$ 1q$ 1 p$ 2 q$ 2

p$ 1q$ 1 p$ 2 q$ 2

p$ 1 p$ 2 -Z/2

+

< p1 - p2 < p$ 1 p$ 2 + Z/2

+

,

n1

n2

n1

n2

where Z/2 is the Z value leaving an area of /2 to the right.

Confidence Interval for 2

If s2 is the variance of a random sample of size n from a normal population, a 100(1-)%

confidence interval 2 is given by

(n 1) s2

(n 1)s2

2

< <

2

2

/2

1- / 2

where 2 / 2 and 1-2 / 2 are 2 values with n-1 degrees of freedom leaving areas of /2

and 1-/2, respectively to the right and left. A 100(1-)% confidence interval for is

obtained by taking the square root of each endpoint of the interval for 2.

61

ESTIMATION OF PARAMETERS

Example 3.3: The following are the volumes, in deciliters, of 10 cans of peaches

distributed by a certain company: 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2 and

46.0. Find a 95% confidence interval for the variance of all such cans of peaches

distributed by this company, assuming volume to be a normally distributed variable.

Solution:

Step1: Find the sample variance s2 = 0.286.

Step 2: To obtain a 95% confidence interval, we choose = 0.05. Then the value of

2 / 2 ( 20.025 ) = 19.023 and 1-2 / 2.. ( 20.975 ) = 2.700.

(n 1) s2

(n 1)s2

2

< <

2

2

/2

1- / 2

( 9)( 0.286)

(9)(0.286)

< 2 <

,

19.023

2.700

or simply

12

Confidence Interval for 2

2

If s12 and s22 are the variances of independent samples of size n1 and n2, respectively, from

normal populations, then a 100(1-)% confidence interval for

s12

1

<

2

s2 F / 2, (v 1 ,v 2 )

12

s12

<

F / 2, (v

22

s22

12

is

22

2, , v1 ),

where F/2,(v1,v2) is an F value with v1 = n1-1 and v2 = n2-1 degrees of freedom leaving

an area of /2 to the right, and F/2,(v2,v1) is a similar F value with v2 = n2 - 1 and v1 = n1

- 1 degrees of freedom.

Example 3.4: A standardized placement test in mathematics was given to 25 boys and

16 girls. The boys made an average grade of 82 with a standard deviation of 8, while the

girls made an average grade of 78 with a standard deviation of 7. Find a 98% confidence

12

1

12

for 2 and

, where 2 are the variances of populations of grades for all boys and

2

2

2

girls, respectively, who at some time have taken or will take this test. Assume the

populations to be normally distributed.

Solution: We have n1 = 25, n2 = 16, s1 = 8 and s2 = 7.

II-62

Estimation of Parameters

Step 1 : For a 98% confidence interval, = 0.02 and F0.01 (24,15) = 3.29 and F0.0r

(15,24)

= 2.89.

Step 2 : Substituting these in the formula

1

s12

s12

12

<

<

F / 2, (v 2, , v1 ),

s22

s22 F / 2, (v , v2 )

22

we obtain the 98% confidence interval

64

49

1

3 .2 9

<

which simplifies to

2

1

2

2

<

0.397 <

64

( 2 .8 9 )

49

12

< 3.775.

22

Step 3 : take square roots of the confidence limits, a 98% confidence interval for

0.630 <

1

< 1.943.

2

1

is

2

References

Grewal, P.S. (1990). Methods of Statistical Analysis : second edition. Sterling Publishers

Pvt. Ltd., New Delhi.

Meyer, P.L. (1970). Introductory probability and statistical applications. Oxford & IBH

Publishing Co., New Delhi.

Snedecor, G.W. and Cochran, W.G. (1967). Statistical Methods: Sixth edition. Oxford &

IBH Publishing Co., New Delhi.

Walpole, R.E. (1982). Introduction to Statistics: Third Edition. Macmillan Publishing

Co., Inc., New York.

63

- Sampling DistributionUploaded byumar2040
- Chapters4-5.pdfUploaded byrobin
- A 417534Uploaded byfeutseu
- US Federal Reserve: 200231papUploaded byThe Fed
- Chapters4-5.pdfUploaded bypicala
- articolo Dreassi ridottoUploaded bysimonedizio
- section8.1Uploaded bysfofoby
- Comparing Measures of Species Diversity From Incomplete Inventories - An UpdateUploaded bylucasao
- 06a Fenton EstimationUploaded byAndres Pino
- How Should Educators Interpret Value-Added Scores Oct 2012 Carnegie FoundationUploaded byCindy Chiriboga
- Solved examples of Cramer Rao Lower BoundUploaded byIk Ram
- Age Lmendin Balkans Prince 2008Uploaded byfabrizioaryan
- The_human_visual_system's_assumption_that_light_co.pdfUploaded bySaffronMae
- 8D Measurement SystemsUploaded bydesurkarb
- Stats NotesUploaded byNat Leung
- 9709_w12_qp_73Uploaded byAnonymous F8kQ7L4
- stats_ch9.pdfUploaded bySivagami Saminathan
- hasil analisis manovaUploaded byAprila C. Dara
- it14_belotti.pdfUploaded byHolis Ade
- Geodetic Sciences Observations ModelingUploaded bywanfangxiaozi
- Fat Tails and (Anti)FragilityUploaded byjibranqq
- Power Law Distribution in Emperical DataUploaded byadastra
- CEEN3320 S11 Lab2 StatisticsUploaded bydraqbhatti
- syllabusUploaded byRahul
- Research ProtocolUploaded byAlex
- Individual Assignment Week 1 RES342Uploaded byMaria Esther Feliciano Delgado
- Confidence Intervals.pdfUploaded bywolfretonmaths
- Guidance for Writing Lab ReportsUploaded byalawi1889
- ClayUploaded byAshish Singh
- ForecastingUploaded bynauli

- HSE Plan part 1Uploaded bySenthilnathan Nagarajan
- HSE Plan part 3Uploaded bySenthilnathan Nagarajan
- HSE Plan part 8Uploaded bySenthilnathan Nagarajan
- progress_grantUploaded byErick Vicente
- HSE Plan part 5Uploaded bySenthilnathan Nagarajan
- HSE Plan part 6Uploaded bySenthilnathan Nagarajan
- HSE Plan part 5.docxUploaded bySenthilnathan Nagarajan
- HSE Plan part 6.docxUploaded bySenthilnathan Nagarajan
- Removed cables details.xlsxUploaded bySenthilnathan Nagarajan
- HSE Plan part 2Uploaded bySenthilnathan Nagarajan
- HSE Plan part 7Uploaded bySenthilnathan Nagarajan
- Duct Laying RA 1Uploaded bySenthilnathan Nagarajan
- QCS-2014-SEC 1 Part 10 Heallth & SafetyUploaded bySenthilnathan Nagarajan
- Dewa Noc LetterUploaded bySenthilnathan Nagarajan
- HSE Plan part 4Uploaded bySenthilnathan Nagarajan
- weekly planning 27-5-207.xlsxUploaded bySenthilnathan Nagarajan
- cvUploaded bySenthilnathan Nagarajan
- Khansaheb Material DetailsUploaded bySenthilnathan Nagarajan
- Weekly ProgramUploaded bySenthilnathan Nagarajan
- weekly planning.xlsxUploaded bySenthilnathan Nagarajan
- Trough Protection RA.revUploaded bySenthilnathan Nagarajan
- Material SubmittalUploaded bySenthilnathan Nagarajan
- smm7 question and answers.pdfUploaded byJanesha
- Subcontractor EvaluationUploaded bySenthilnathan Nagarajan
- Program of WorkUploaded bySenthilnathan Nagarajan
- weekly planning 27-5-207.xlsxUploaded bySenthilnathan Nagarajan
- DEWA EDUploaded bySenthilnathan Nagarajan
- Removed cables details.xlsxUploaded bySenthilnathan Nagarajan
- Removed Cables DetailsUploaded bySenthilnathan Nagarajan
- Weekly Planning 2652017Uploaded bySenthilnathan Nagarajan

- Introduction to Geostatistics - UNP.pdfUploaded byArdhymanto Am.Tanjung
- Sampling in BiometricsUploaded byErik Eddy
- Chapter FourUploaded byabdishakur
- Brooks Answers (Introductory Econometrics for Finance)Uploaded byAndreyZinchenko
- Stat 2013Uploaded byLeonardo Byon
- econ705sUploaded byzenith1107
- Acce Course ModulesUploaded byMuhammad Jawad Vohra
- Statistics-_Elec__.pdfUploaded byAnonymous 4CQj0C5
- Statistical MethodsUploaded byGuruKPO
- Stat (Advanced) DrillsUploaded byHazel Marie Ignacio Peralta
- sssaUploaded byImad A Shaheen
- HeteroscedasticityUploaded byarmailgm
- wp2005-08Uploaded byRym Charef
- M.A. Syllabus 2012-13.docxUploaded byBUNTY KUMAR
- statnotes.pdfUploaded bySiva Nantham
- Contingency Monte CarloUploaded byMuneer Bachh
- CGARCHUploaded byLavinia Ioana
- Analysis of Survival Data_LN_D Zhang_05Uploaded bymirceacomanro2740
- UsefulStataCommandsUploaded bygergoszetlik7300
- System Identification without Lennart Ljung: what would have been different?Uploaded byAnonymous vcdqCTtS9
- Shock Model Approach to Determine the Expected Time to Recruitment using Two Parameter Type I Generalized Logistic DistributionUploaded byBONFRING
- HTE and Random ForestUploaded byusaid
- Sampling MethodsUploaded byvinay10356
- Channel Estimation in OfdmUploaded bysriblue7
- Topic 6 Two Variable Regression Analysis Interval Estimation and Hypothesis TestingUploaded byOliver Tate-Duncan
- spring-2019-ltam-syllabi (3)Uploaded bypitipiti
- Krebs Chapter 05 2013Uploaded bymicrobeateria
- 1pdf.net_monte-carlo-methods-mit.pdfUploaded byFethi Zizou
- Reliability of inserts in sandwich composite panelsUploaded bycedemo
- AssumptionUploaded byGabriel Sim