100%(2)100% нашли этот документ полезным (2 голоса)

6K просмотров39 страницSTAT_T_3

© Attribution Non-Commercial (BY-NC)

PDF, TXT или читайте онлайн в Scribd

Attribution Non-Commercial (BY-NC)

100%(2)100% нашли этот документ полезным (2 голоса)

6K просмотров39 страницSTAT_T_3

Attribution Non-Commercial (BY-NC)

Вы находитесь на странице: 1из 39

investigated the population and have full knowledge of its mean and standard deviation.

Population mean is denoted by µ and standard deviation of population is denoted by σ . The

measures µ and σ of populations are called parameters.

Sample Distribution: When we talk of a sample distribution, we take a sample from the

population. The mean and standard deviation of the sample are denoted by X and s . These

measures related to sample are called statistic. It may be noted several sample distributions

are possible from a given population.

the expected value of X is population mean. i.e., E ( X ) = µ X = µ and the standard deviation

σ

of X is given by σ X = where n is the sample size. Standard deviation of mean is also

n

known as standard error of mean.

In order to use the standard deviation of the sample ‘s’ as an estimate the for σ , we have the

following formula

s=

∑ (x i − x)2

n −1

And the standard error of mean is given by

σX =

s

=

∑ (x i − x)2

n n( n − 1)

When the sample size ‘n’ is not very small in comparison with the finite population size ‘N’,

then we consider the following formula:

σ N −n

σX =

n N −1

Exercise: The time between two arrivals in a queuing model is normally distributed with

mean 2 minutes and standard deviation 0.25 minute. If a random sample size of 36 is drawn,

what is the probability that a sample mean will be greater than 2.1 minutes?

Solution: n = 36; µ = 2 ; σ = 0.25 . The standard error for mean is calculated as under:

σ 0.25

σX = = = 0.042 .

n 36

Now to find the probability that sample mean greater than 2.1 is given by

X − 2 2.1 − 2

P( X ≥ 2.1) = P

0.042 ≥ 0.042 = P( Z ≥ 2.38) = 1 − 0.9913 = 0.0087

Exercise: The weight of certain type of car tire is normally distributed with mean of 25

pounds and variance of 3 pounds. A random sample of 50 tires is selected. What is the

probability that the mean of this sample lies between 24.5 and 25.5 pounds?

Exercise: An auditor takes a sample of size 36 from a population of 1000 accounts receivable.

The standard deviation of the population is unknown, but the standard deviation of the

sample is Rs 43. If the true mean value of the accounts receivable is Rs 260, what is the

probability that the sample mean will be less than or equal to Rs 250?

In most of the research studies, population parameters are unknown and have to be estimated

from a sample. As such the methods of estimating parameters assume an important role in

statistical analysis.

The estimate of a population parameter may be one single value or it could be a range of

values. If the estimate is one single value, it is referred as point estimate, whereas in the range

of values case it is termed as interval estimate.

(i) An estimator should on the average be equal to the value of the parameter being

estimated. (Property of Unbiased ness)

(ii) An estimator should have relatively less variance. (Property of efficiency)

(iii) An estimator should use as much as possible the information available from

sample (Property of Sufficiency)

(iv) An estimator should approach the value of parameter as the sample size becomes

larger and larger. (Property of Consistency)

The interval estimator for the mean µ is given by the interval around X for certain degree

of confidence with the help of Standard error.

For example, for 95% degree of confidence interval for the population mean is given by the

lower limit X − 1.96 SE and upper limit X + 1.96 SE . In other words, the probability of µ

Or, P[ X − 1.96SE ≤ µ ≤ X + 1.96SE ] = 0.95

In the above, 1.96 is the z-variate of standard normal distribution for the confidence level of

95% (or the significance level of 5%)

If the sample size is smaller, i.e., lesser than 30, we use t-variate with n-1 degree of freedom,

for the estimation.

Exercise: From a random sample of 36 civil service personnel, the mean age and sample

standard deviation were found to be 40 years and 4.5 years respectively. Construct a 95%

confidence interval for the mean age of civil servants. Also construct a 96% confidence

interval for the mean age of civil servants.

Solution: In the above n = 36, X = 40 and s = 4.5 . Population size is not finite. Sample size

may be considered as large. The standard error of mean is given by

s 4.5

SE = σ X = = = 0.75

n 36

Standard normal variate for 95% confidence is 1.96.

Thus 95% confidence interval for the mean of population is given by the limits X ± 1.96 SE .

X ± 1.96SE = 40 ± (1.96)(0.75) = 40 ± 1.47

Therefore the 95% confidence interval for population mean is [38.53, 41.47]

In other words, P (38.53 ≤ µ ≤ 41.47) = 0.95 .

Thus 95% confidence interval for the mean of population is given by the limits X ± 2.065SE .

X ± 2.065SE = 40 ± (2.065)(0.75) = 40 ± 1.55

Therefore the 95% confidence interval for population mean is [38.45, 41.55]

In other words, P (38.45 ≤ µ ≤ 41.55) = 0.96 .

Exercise: In a random selection of 64 of 2400 intersection in a small city, the mean number of

scooter accidents per year was 3.2 and sample standard deviation was 0.8.

(i) Make an estimate of standard deviation of the population from the standard

deviation

(ii) Workout standard error of mean for this finite population

(iii) If the desired confidence level is 0.90, what will be the upper limit and lower

limits of confidence interval for the mean number accidents per year?

Exercise: A random sample of 16 values from normal population showed a mean of 41.5

inches and the sum of squares of deviation from this mean is 135 square inches. Obtain 95%

and 99% confidence limit for the same.

Exercise: The foreman of ABC mining company has estimated the average quantity of iron

ore extracted to be 36.8 tons per shift and the sample standard deviation to be 2.8 tons per

shift, based upon a random selection of four shifts. Construct a 90% confidence interval

around the estimate.

Size of the sample should be determined by a researcher keeping the following points:

(i) Nature of Universe: If the items of the universe are homogeneous, a small sample

can serve the purpose. But if the items are heterogeneous, a large sample would be

required. Technically, this can be termed as dispersion factor.

(ii) Number of classes proposed: If many class groups are to be formed, a large

sample would be required because a small sample may not be able to give

reasonable number of items in each class-group.

(iii) Nature of Study: If items are to be intensively and continuously studied, the

sample should be small. For a general survey the size of the sample should be

large, but small sample is considered appropriate in technical survey.

(iv) Type of sampling: Sampling technique plays an important part in determining the

size of the sample. A small random sample is apt to be much superior to a larger

but badly selected sample.

(v) Standard of accuracy and acceptable confidence level: If the standard of accuracy

or the level of precision is to be kept high, we shall require relatively larger

sample. For doubling the accuracy for fixed significance level, the sample size has

to be increased fourfold.

(vi) Availability of finance: In practice, the size of the sample depends upon the

amount of money available for the study purposes. This factor should be kept in

view while determining the size of the sample. Larger sample result in increasing

the cost of sampling estimates.

(vii) Other considerations: Nature of units, size of population, size of questionnaire,

availability of trained investigators, the conditions under which the sample is

being conducted, the time available for completion of the study are few other

considerations to which a researcher must pay attention while selecting the size of

the sample.

Note that the limits of confidence interval for the Mean of Population is by

σ

X ± z.SE = X ± z ,

n

where X is the sample mean

z is the value of standard variate at given confidence level

n is the sample size, and

σ is the standard deviation of population.

If the researcher like to estimate the mean of population within desired precision ± e , then

σ z 2σ 2

get e = z and therefore n = .

n e2

In case of finite population, we get

σ N −n z 2σ 2 N

e = z.SE = z and therefore n =

n N −1 ( N − 1)e 2 + z 2σ 2

Many a times, the standard deviation of population is not known and sample is not yet taken,

rough estimate of the population is given by

Range of Population Distribution

σˆ =

6

Range in the above may have to be obtained from past records or through a pilot survey of

large number of items.

Exercise: If the acceptable error in estimating the population is within 3 units of the sample

mean with 95% confidence estimate the sample size, when the standard deviation of the

population is known and equals to 4.8.

Solution: Here e = 3 z = 1.96 (for 95% confidence level) and σ =4.8. The estimation of

sample size for 95% confidence limit and within 3 units from the sample mean is given by

z 2σ 2 (1.96) 2 (4.8) 2

n= = = 9.834 ≅ 10

e2 (3) 2

Therefore the size of sample for estimating population mean within range of 3 units and with

95% confidence is 10.

Exercise: A cigarette manufacturer wishes to use a random sample to estimate the average

nicotine content. The error should not be more than 1 milligram above or below the true

mean, with 99% confidence coefficient. The population standard deviation is 4 milligrams.

What sample size should one the company use in order to satisfy the requirement?

Exercise: Determine the size of the sample for estimating the true weight of the 5000 cereal

container on the basis of following information:

The variance of weight is 4 ounces on the basis of past records.

Estimate should be within 0.8 ounces of the true average weight with 99% probability.

Will there be change in the size of sample if we assume infinite population in the given case?

If so, explain by how much?

If we are to find the sample size for estimating a proportion of population, our reasoning

remains similar to what we have said in the context of population mean. It is required to

specify the precision and the confidence level and then estimate the sample size as under:

Note that the standard error of proportion is given by

pq

SE = σ p = (in case of infinite population)

n

pq N − n

SE = σ p = (in case of finite population of size N)

n N −1

Where, p is the sample proportion, q = 1-p, z is the standard variate for appropriate

confidence level and n is the sample size.

Further, confidence interval for the population proportion is given by

p ± z.SE

If e is the precision rate, the acceptable error then the sample size can be expressed as

z 2 pq

n= (in case of infinite population)

e2

z 2 pqN

n= (in case of finite population)

e 2 ( N − 1) + z 2 pq

Exercise: What should be the sample size if a simple random sample from a population of

4000 items to be drawn to estimate the percent of defective within 2% of true value with

95.5% probability? What should be the size of the sample if the population is assumed to be

infinite in the given case? (from the pilot study, it has been observed that the proportion of

defective items is about 2%)

Solution:

In this case N = 4000, z = 2.005, p = 0.02 and e = 0.02

z 2 pqN (2.005) 2 (0.02)(0.98)(4000)

n= = = 187.88 ≅ 188

e 2 ( N − 1) + z 2 pq (0.02) 2 (4000 − 1) + ((2.005) 2 (0.2)(0.98)

Therefore the sample size is estimated to be equals to 188 for sample proportion to be with in

2% limit and 95.5% confidence.

z 2 pq (2.005) 2 (0.02)(0.98)

n= = = 196.98 ≅ 197

e2 (0.02) 2

the hotel’s guests who stay for more than 3 days. The reservation manager wants to be 95%

confident that the percentage has been estimated within 3% of the true value. What is the

most conservative sample size needed for the problem.

Exercise: Suppose the following ten values represent random observation from a normal

parent population:

2, 6, 7, 9, 5, 1, 0, 3, 5, 4.

Construct a 99 percent confidence interval for the mean of the parent population.

Exercise: A team of medic research experts feels confident that a new drug they have

developed will cure about 80% of the patients. How large should the sample size be for the

team to be 98% certain that the sample proportion of cure is within plus and minus 2% of the

proportion of all cases that drug will cure?

Exercise: Annual income of 900 salesmen employed by a company is known to be

approximately normally distributed. If the company wants 95% confident that the true mean

of this year’s salesmen’s income does not differ by more than 2% of the last year’s mean

income of Rs 12,000, what sample size would be required assuming the population standard

deviation to be Rs 1500?

Exercise: In a random sample of 64 items taken from a large consignment, some were found

to be defective. Deduce that percentage of defective items in the consignment almost

certainly lies between 31.25 and 68.75 given that the standard error of the proportion of

defective items in the sample is 1/16.

Exercise: A cigarette manufacturer claims that his cigarettes have an average content of 18.3

mg of nicotine. If random samples of this type have for content of 20, 17, 21, 19, 22, 21, 20

and 16 mg, would you agree with the manufacturer’s claim. Assume suitable value for level

of significance. (Level of significance = 1 – Level of Confidence)

If a population is large and normally distributed with standard deviation σ , the standard

deviation of random samples of size ‘n’ (n is large) are approximated by normal distribution

The standard deviation of the distribution of standard deviation of samples drawn from a

normal population is called standard error of standard deviation and is denoted by

SE = σ / 2n .

TESTING OF HYPOTHESES

Hypothesis:

It is an assumption or some supposition to be proved or rejected.

Definition: Hypothesis is a proposition or a set of propositions set forth as an explanation for

the occurrence of some specified group of phenomena either asserted merely as a provisional

conjecture to guide some investigation or accepted as highly probable in the light of

established facts.

Characteristic of Hypothesis:

(i) Hypothesis should be clear and precise.

(ii) Hypothesis should be capable of being tested

(iii) Hypothesis should state relationship between variables, if it happens to be

relational hypothesis

(iv) Hypothesis should be limited to the scope and must be specific. A researcher must

remember that narrower hypothesis is more generally testable and should develop

such hypothesis.

(v) Hypothesis should be stated as far as possible in most simple terms so that the

same is easily understandable by all concerned.

(vi) Hypothesis should be consistent with most known facts, i.e., it must be consistent

with a substantial body of established facts. In other words, it should be one which

judges accept as being the most likely.

(vii) Hypothesis should be amenable to testing within reasonable time.

(viii) Hypothesis must explain the facts that gave rise to the need for explanation.

concerning a population parameter. It is generally denoted by H 0 . Any hypothesis which

differs from a null hypothesis is called ‘alternative hypothesis. Alternative Hypothesis is

denoted by H 1 .

Type I error: The error of rejecting the hypothesis when it should have been accepted is

known as type I error.

Type II error: The error of accepting the hypothesis when it should have been rejected is

known as type II error

The probability of Type I error is usually determined in advance and understood as level of

significance of testing the hypothesis. If the type I error is fixed at 5%, it means that there are

about 5 chances in 100 that we reject H 0 when H 0 is true.

But with a fixed sample size, n, when we try to reduce the type I error, the probability of

committing type II error increases. Both type of error can not be reduced simultaneously.

Two-tailed and One –tailed test: A two-tailed test rejects the null hypothesis if, say, the

sample mean is significantly higher or lower than the hypothesized value of mean of the

population. Thus in a two-tailed test, there are two rejection regions, one on each tail of

normal curve.

H 0 : µ = µH0

H1 : µ ≠ µ H0

A one-tailed test would be used when there are to test, say, whether the population mean is

either lower than or higher than some hypothesized value.

H 0 : µ = µH0 H 0 : µ = µH0

or

H1 : µ > µH0 H1 : µ < µ H0

Examples: A random sample of 25 tiers from a large consignment gave an average life of

38,000 kms and standard deviation of 5000 kms. Could the sample come from a population

with mean life of tiers 40,000 kms?

Solution:

We make null hypothesis and the alternative hypothesis as under:

H 0 : µ = 40000

H 1 : µ ≠ 40000

We make two-tailed test for population mean. Consider the level of significance α = 0.005 .

X −µ

The test criterion is t = . Here n = 25, sample mean is 38,000 and sample standard

s

n

deviation is s = 5000. Therefore

X − µ 38000 − 40000

t= = = −2

s 5000

n 25

| t |= 2

From the table, t-variate value for 5% significant level (95% confidence level) and with 24

degree of freedom is 2.064.

Since the calculated t-value is lesser than the table value, we accept the null hypothesis that

the mean life of tier is 40,000 kms with 5% significance level (95% confidence level).

State H 0 as well H 1

value from sample data

as widely as it has from expectations, if the null hypothesis

were true (find z-value or t-value for the purpose)

tailed test; α in case of one tail test).

(Find whether calculated z or t value is in the rejection region)

Yes No

Reject H 0 Accept H 0

Exercise: A certain stimulus administered to each of 12 patients resulted in the following

change in of blood pressure:

5, 2, 8, -1, 3, 0, -2, 1, 5, 0, 4, 6

Can it be concluded that the stimuli will, in general, accompanied by an change in blood

pressure?

Solution:

From the given data, we obtain sample mean and sample variance as

X =

∑X =

31

= 2 .6

n 12

s =

2 ∑ ( X − X )2

= 9.538

n −1

s = 3.08

We shall make a null hypothesis that stimulus in general not be accompanied by the change

in blood pressure. Therefore the null hypothesis and the alternative hypothesis can be

formulated as under:

H0 : µ = 0

H1 : µ ≠ 0

Assume 5% level of significance. i.e., 95% level of confidence. Corresponding t-value with

11 degree of freedom is 2.201( t 0.025 , 24 ). Further rejection region is given by R :| t |> 2.201

X −µ 2.6 − 0

t= = = 2.94

s 3.08

n 12

Since calculated t-value is bigger than the table value, we reject the null hypothesis.

change in of blood pressure:

5, 2, 8, -1, 3, 0, -2, 1, 5, 0, 4, 6

Can it be concluded that the stimuli will, in general, accompanied by an increase in blood

pressure?

Solution:

From the given data, we obtain sample mean and sample variance as

X =

∑X =

31

= 2 .6

n 12

s2 =

∑ ( X − X ) 2 = 9.538

n −1

s = 3.08

We shall make a null hypothesis that stimulus in general not be accompanied by an increase

in blood pressure. Therefore the null hypothesis and the alternative hypothesis can be

formulated as under:

H0 : µ = 0

H1 : µ > 0

Assume 5% level of significance. i.e., 95% level of confidence. Corresponding t-value (one-

tail test) with 11 degree of freedom is 1.796 ( t 0.05,11 ) and the rejection region is R : t > 1.796

X −µ 2.6 − 0

t= = = 2.94

s 3.08

n 12

Since calculated t-value is bigger than the table value, we reject the null hypothesis.

Exercise: A cigarette manufacturer claims that his cigarettes have an average content of 18.3

mg of nicotine. If random samples of this type have for content of 20, 17, 21, 19, 22, 21, 20

and 16 mg, would you agree with the manufacturer’s claim. Assume suitable value for level

of significance.

Exercise: Raju Restaurant near the railway station has been having a average sales of 500 tea

cups per day. Because of some development of bus stand nearby, it expects to increase its

sales. During the first 12 days after the start of the bus stand, the daily sales were as under:

550, 570, 490, 615, 505, 580, 570, 460, 600, 580, 530, 526

On the basis of simple information, can one conclude that Raju Restaurant’s sales have

increased? Use 5% level of significance.

Solution: Consider null hypothesis that sales average is 500 cups and sale has not increased

unless proved. We can write:

H 0 : µ = 500

H 1 : µ > 500

The sample size is small from infinite population. So, we shall use one-tailed t-test and

X −µ

compute t-statistic given by t = . Further note that population standard deviation is not

s n

Xi (X i − X ) ( X i − X )2

550 2 4

570 22 484

490 -58 3364

615 67 4489

505 -43 1849

580 32 1024

570 22 484

460 -88 7744

600 52 2704

580 32 1024

530 -18 324

526 -22 484

6576 23978

6576

X = = 548

12

s=

∑ (X i − X )2

=

23978

= 46.68

n −1 11

X − µ 548 − 500

t= = = 3.558

s n 46.68 12

Degree of freedom = n – 1 = 12 – 1 = 11. Therefore, corresponding t-value (one-tail test) at

5% significance level and with 11 degree of freedom is 1.796 ( t 0.05,11 ) and R : t > 1.796

Since calculated t-value is greater than table value and in the rejection region, we reject the

null hypothesis that there is no change in the sales and conclude that there is increase in sales.

Exercise: A sample of 400 male students is found to have a mean height 67.47 inches. Can it

be regarded as a sample from large population with mean height 67.39 inches and standard

deviation 1.30 inches? Test 5% level of significance.

Solution: Consider the null hypothesis that the average height is 67.39 inches and we can

H 0 : µ = 67.39

write . The sample size is large (400), population is infinite and standard

H 1 : µ ≠ 67.39

deviation of the population is known, we shall use two-tailed z –test and find z-statistic

X −µ

z= . Note that at 5% significance level for 2-tailed test, z-variate is 1.96 and the

σ n

rejection region is R :| z |> 1.96

X −µ 67.47 − 67.39

z= = = 1.231 and therefore the calculated z-variate value is within the

σ n 1.30 400

acceptance region. We accept the null hypothesis that the mean height of students 67.39 at

5% significance level.

Exercise: Suppose that we are interested in a population of 20 industrial units of same size,

all of which are experiencing excessive of labor turnover problems. The past records show

that the mean of the distribution of turnover is 320 employees, with a standard deviation of

75 employees. A sample of 5 of these industrial units is taken at random which gives a mean

of annual turnover as 300 employees. Is the sample mean consistent with the population

mean? Test at 5% significant level.

deviation of 2.5. The production manager may welcome any change is mean value towards

higher side but would like to safeguard against decreasing values of mean. He takes a sample

of 36 items that gives a mean value of 48.5. What inference should the manager take for the

production process on the basis of sample results? Use 5% level of significance for the

purpose.

Exercise: The mean lifetime of a random sample of 50 similar torch bulbs drawn from a batch

of 500 bulbs is 72 hours. The standard deviation of the lifetime of sample is 10.4 hours. The

batch is classed as inferior if the mean lifetime is less than the 75 hours. Determine whether,

as a result of sample data, the batch is considered to be inferior at level of significance of a)

0.05 and b) 0.01

Solution: Population is finite and N = 500. Sample size is n = 50. The sample mean X = 72

hrs and sample standard deviation is 10.4(s). Claimed lifetime of the bulbs (population

mean) is minimum of 75 Hrs. Objective is to test the given batch is of inferior quality (life

time less than 75 Hrs). Therefore we make null hypothesis that the life time of the bulbs is

not less than 75 Hrs. i.e.,

H 0 : µ ≥ 75

H 1 : µ < 75

We shall have one-tail test for larger sample from finite population.

X −µ 72 − 75 −3

z= = = = −2.148

s N −n 10.4

500 − 50 (1.471)(0.95)

n N −1 50 500 − 1

(a) Test at 5% significance level: Table value for z is -1.645. Therefore rejection region is

R : z < −1.645 . Calculated value for z is in the rejection region and therefore we

reject null hypothesis at 5% level of significance.

(b) Test at 1% significance level: Table value for z is -2.33. Therefore rejection region is

R : z < −2.33 . Calculated value for z is not in the rejection region and therefore we

accept null hypothesis at 1% level of significance.

In some decision making situations, we may have to find whether the parameters of two

populations are alike or different. For example, one may like to know whether female worker

earn same as male worker or different. In this situation, we like to test whether the mean

income of males and females are same or not.

In this case the parameter of our interest is µ1 − µ 2 , where µ1 may the mean income of

female population and µ 2 may be the mean income of male population. Suppose n1 and n2

are the sizes of two samples and, σ 1 and σ 2 are the standard deviations of populations

respectively. We consider the standard deviations of samples in the absence of population

standard deviation for the estimation.

Standard Error for the difference of means is given by

σ 12 σ 22

SE = σ X1 − X 2 = + and test statistic is given by

n1 n2

X1 − X 2

z= (in case of large sample)

σ 12 σ 22

+

n1 n 2

X1 − X 2

t= (with n1 + n2 - 2 degree of freedom, in case of small sample)

σ 12 σ 22

+

n1 n 2

In case of large samples are presumed to be drawn from same population whose variance

( σ 2 ) is known, we use z test for the difference in means and compute z-statistics and t-

statistics are as under

X1 − X 2

z= (in case of large sample)

1 1

σ 2

n n

+

1 2

X1 − X 2

t= (with n1 + n2 - 2 degree of freedom; in case of small sample)

1 1

σ 2

n + n

1 2

In case population variance is not known, we estimate the standard deviation of population as

under:

σˆ = ; where D1 = ( X 1 − X 1, 2 ); D2 = ( X 2 − X 1, 2 )

n1 + n 2

n1 X 1 + n 2 X 2

X 1, 2 =

n1 + n 2

In case of small samples are presumed to be taken from same population and population

variance is not known, then we use t-test for the difference of means and z- statistics and t-

statistics are computed as under:

X1 − X 2

z=

∑ (X 1i − X 1 ) 2 + ∑ ( X 2i − X 2 ) 2 1

+

1

n1 + n 2 − 2 n1 n 2

X1 − X 2

t= ; with n 1 + n 2 - 2 degree of freedom

∑ ( X 1i − X 1 ) 2 + ∑ ( X 2i − X 2 ) 2 1

+

1

n1 + n 2 − 2 n1 n 2

Alternatively,

X1 − X 2

z= ;

(n1 − 1) s + (n 2 − 1) s 22

2

1 1

1

+

n1 + n2 − 2 n1 n 2

X1 − X 2

t= ; with n 1 + n 2 - 2 degree of freedom

(n1 − 1) s12 + (n2 − 1) s 22 1 1

+

n1 + n 2 − 2 n1 n 2

Exercise: The mean produce of wheat of a sample of 100 fields 200 quintal per acre with

standard deviation 100 quintal. Another sample of 150 fields gives the same mean of 220

quintal per acre with standard deviation of 12 quintal. Can the two samples be considered to

have been taken from the two populations with same mean yield? Use 5% level of

significance.

Solution: Taking the null hypothesis that the mean of two populations do not differ, consider

H 0 : µ1 = µ 2

H 1 : µ1 ≠ µ 2

It is given that

n1 = 100; n2 = 150;

X 1 = 200; X 2 = 220;

s1 = 10; s 2 = 12

Sample sizes are large; we can have two-tailed test to compare the mean with 5% level of

significance. Z-statistics for 5% level of significance in two tailed test is 1.96. Therefore the

rejection is R :| z |> 1.96 .

Note that standard deviations of the populations are not given. From the given data we have

X1 − X 2 200 - 220 − 20

z= = = = −14.28

s12 s 22 10 2 12 2 1.4

+ +

n1 n2 100 150

Since calculated z-variate is is not in the acceptance region and in fact, in the rejection region,

we reject the null hypothesis at 5% level of significance.

Exercise: The mean produce of wheat of a sample of 100 fields 200 quintal per acre with

standard deviation 100 quintal. Another sample of 150 fields gives the same mean of 220

quintal per acre with standard deviation of 12 quintal. Can the two samples be considered to

have been taken from the same population whose standard deviation is 11 quintal? Use 5%

level of significance.

Solution: Assuming that both the samples are from same population, consider the null

hypothesis

H 0 : µ1 = µ 2

H 1 : µ1 ≠ µ 2

Where

n1 = 100; n2 = 150;

X 1 = 200; X 2 = 220

Standard deviation of the population is given as 11 quintal, i.e., σ = 11. Since the null

hypothesis is that both the samples are from same population, we can take that σ 1 = σ 2 = 11.

Sample sizes are large; we can have two-tailed test to compare the mean with 5% level of

significance. Z-statistics for 5% level of significance in two tailed test is 1.96. Therefore the

rejection is R :| z |> 1.96 .

Further, z-statistics is calculated as:

X1 − X 2 200 - 220 − 20

z= = = = −14.08 .

σ2

σ 2

112

11 2 1.42

1

+ 2

+

n1 n 2 100 150

Calculated z-value falls in the rejection region and therefore we reject the null hypothesis at

5% significance level.

workers in two cities gives the following information:

City Average Monthly earning St deviation of monthly earning Size of sample

A 695 40 200

B 710 60 175

Test the hypothesis that there is no difference between monthly earning of workers of two

cities.

Exercise: Sample of sales in similar shops in two groups are taken for a new product with

following results:

Group Mean Sales Variance Size of sample

A 57 5.3 5

B 61 4.8 7

Is there any evidence that both the groups are in the same town without any difference in

sales pattern? Use 5% level of significance.

Solution: Presuming that both the groups are from the same town and having same sales

pattern. In other words we make null hypothesis that both the groups are from single

population. Consider hypotheses

H 0 : µ1 = µ 2

H 1 : µ1 ≠ µ 2

It is given that

n1 = 5; n 2 = 7;

X 1 = 57; X 2 = 61;

s1 = 5.3; s 2 = 4 .8

Since the samples are small and population variances are not known, we consider the

following test t statistics as under:

X1 − X 2 57 - 61

t= = = −3.053

(n1 − 1) s12 + (n2 − 1) s 22 1 1 (5 - 1)(5.3) + (7 - 1)(4.8) 1 1

+ +

n1 + n 2 − 2 n1 n 2 5+7-2 5 7

At 5% level of significance and with 5+7-2=10 degree of freedom t-statistics from table is

2.228 and therefore the rejection region is given by R :| t |> 2.228 . Note that calculated t-

value is in the rejection region. So we reject the null hypothesis at 5% level of significance.

So we may conclude that the sample groups A and B are from different population with

different sales pattern.

Exercise: Two independent samples of size 9 and 7 respectively had the following values:

Sample 1: 18 20 36 50 49 36 34 49 41

Sample 2: 29 28 26 35 30 44 46

Is the difference between the means of sample significant at 5% level of significance?

Exercise: A group of seven-week old chickens reared on a high protein diet weigh 12, 15, 11,

16, 14, 14 and 16 ounces; a second group of five chickens, similarly treated except that they

receive a low protein diet, weigh 8, 10, 14, 10 and 13 ounces. Test at 5% level whether there

is significant evidence that additional protein has increased the weight of chickens.

Recall that the standard error of proportion is given by

pq

SE = σ p = (in case of infinite population)

n

pq N − n

SE = σ p = (in case of finite population of size N)

n N −1

Where, p is the proportion of the items in the population, q = 1-q, z is the standard variate for

appropriate confidence level and n is the sample size.

If p̂ is the observed proportion, then to test the null hypothesis that H 0 : p = p H 0 , we

pˆ − p H

z= .

SE

pˆ − p H

For a large population, we have z = .

pq

n

Standard error in case of difference between proportions is,

pˆ 1 qˆ1 pˆ 2 qˆ 2

SE = σ p1 − p2 = + , where p̂1 and p̂2 are sample proportions of samples of sizes

n1 n2

n1 and n2 respectively. The above formula is more conveniently used whenever the samples

are drawn from two heterogeneous populations. But when we assume that the populations are

similar as regards the given attribute, we make use of the following formula to compute SE.

1 1

SE = σ p1 − p2 = pˆ 0 qˆ 0

n + n where

1 2

n p + n2 p2

p0 = 1 1 q0 = 1 − p0

n1 + n 2

Exercise: A sample survey indicates that out of 3232 births, 1705 were boys and the rest were

girls. Do these figures confirm the hypothesis that the sex ratio is 50:50? Test at 5% level of

significance.

Solution: Define p as the ratio of boy babies. We shall make null hypothesis and alternative

hypothesis as under:

H 0 : p = 0.5

H 1 : p ≠ 0.5

1705

Observed value for p is given by pˆ = = 0.5275 .

3232

pq (0.5)(0.5)

Standard error for the proportion is given by SE = σ p = = = 0.0088 and z-

n 3232

pˆ − p 0.5275 − 0.5

test statistic is given by z = = = 3.125 .

pq 0.0088

n

With reference to null hypothesis and alternative hypothesis, we apply two-tailed test and

rejection region at 5% significance level is R :| z |> 1.96 . Calculated z-value lies in the

rejection region and therefore we reject null hypothesis at five percent significance level and

conclude that the sex ratio among the births are not 50:50.

Exercise: A certain process produces 10% defective items. A supplier of new raw material

claims that the use of his material would reduce the proportion of defectives. The random

sample of 400 units using this new material was taken out of which 34 were defective. Can

the supplier claim be accepted? Test at 1% level of significance.

Solution: Since the supplier claim that there is a decrease in defective items, we shall

consider the following null hypothesis and alternative hypothesis:

H 0 : p = 0.10

H 1 : p < 0.10

From the above null hypothesis and alternative hypothesis, we shall have one-tail test (left) at

1% level of significance. Rejection region at 1% level of significance is R : z < −2.32 .

34

Observed sample proportion is given by pˆ = = 0.085 further z-statistics from the given

400

pˆ − p 0.085 − 0.1

data is z = = = −1.00

pq (0.1)(0.9)

n 400

Since computed z-value does not fall in the rejection region, we accept the null hypothesis at

1% level of significance. So at 1% level of significance, we can accept the supplier’s claim

that there is significant reduction in the defective items.

Exercise: The null hypothesis is that 20% of the passengers go in first class, but management

recognizes the possibility that this percentage could be more or less. A random sample of 400

passengers includes 70 passengers holding first class ticket. Can the null hypothesis be

rejected at 10% level of significance?

Exercise: A drug research experimental unit is testing two drugs newly developed to reduce

BP level. The drugs are administered to two different sets of animals. In group one, 350 of

600 animals tested respond to drug one and in group two, 260 of 500 animals tested respond

to drug two. The research unit wants to test whether there is difference between the efficiency

of the said two drugs at 5% level of significance. How will you deal with this problem?

Solution: Let p1 be the proportion of animals respond to the drug one and p2 be the

proportion of animals respond to drug two. Here we may consider that the samples are from

different population.

Consider the null hypothesis:

H 0 : p1 = p 2 i.e., the proportions of response for both the drugs are same.

And the alternative hypothesis:

H 1 : p1 ≠ p 2

We shall have two-tailed test for the samples from different population at 5% significance

level. The rejection region is R :| z |> 1.96

From given data, we have

350

pˆ 1 = = 0.583

600 n1 = 600

260 n2 = 500

pˆ 2 = = 0.520

500

Further, z-value for the observed data is given by

pˆ 1 − pˆ 2 0.583 − 0.520

z= = = 2.093

pˆ 1 qˆ1 pˆ 2 qˆ 2 (0.583)(0.417) (0.520)(0.480)

+ +

n1 n2 600 500

As calculated value is in the rejection region, we reject the null hypothesis at 5% level of

significance.

Exercise: A drug research experimental unit is testing two drugs newly developed to reduce

BP level. The drugs are administered to two different sets of animals. In group one, 350 of

600 animals tested respond to drug one and in group two, 260 of 500 animals tested respond

to drug two. The research unit wants to test whether the efficiency of the first drug is more

than the second drug at 5% level of significance. How will you deal with this problem?

Exercise: At a certain date in a large city 400 out of a random sample 500 men were found to

be smokers. After the tax on tobacco had been heavily increased, another random sample of

600 men in the same city included 400 smokers. Was the observed decrease in the proportion

of smokers significant? Test at 5% level of significance.

Solution: We start with null hypothesis that the proportion of smokers even after the heavy

tax on the tobacco remains unchanged i.e., H 0 : p1 = p 2 and alternative hypothesis that

proportion of smokers after tax has decreased i.e., H 0 : p1 > p 2 . So, we shall have one-tail

From the given data, we have

400

p1 = = 0 .8

500

400

p2 = = 0.667

600

On the presumption that the populations are similar, the best estimator for the proportion is

given by

n1 p1 + n 2 p 2 500(0.8) + 600(0.667)

p0 = = = 0.7273

n1 + n2 500 + 600

q 0 = 1 − 0.7273 = 0.2727

Further,

p1 − p 2 (0.8) − (0.667)

z= = = 4.926

1 1 1 1

p0 q 0 +

n + n

(0.7273)(0.2727)

500 600

1 2

So the calculated value is in the rejection region an therefore we reject the null hypothesis at

5% level of significance. There is a significance decrease in smokers after the increase in tax

on tobacco.

Exercise: There are 100 students in a university college and in the whole university, inclusive

of this college; the number of students is 2000. In a random sample study of 20 were found

smokers in the college and the proportion of smokers in the university is 0.05. Is there a

significant difference between the proportion between the smokers in the college and

university? Test at 5% level.

CHI-SQUARE TEST

values that involve sum of squares. Chi-square distribution is defined for positive value of

random variable and the distribution curve is not symmetric. This distribution depends on yet

another parameter, the degree of freedom (n-1), where n is the sample size.

Chi-square, by notation χ 2 , is a statistical measure used in the context of sampling analysis

for comparing a sample variance to a theoretical variance. As a non-parametric test, it can be

used to determine if categorical data shows dependency or two classifications are

independent. It can also be used to make comparisons between theoretical populations and

actual data when categories are used. Thus, the chi-square test is applicable in large number

of problems in the areas such as:

(i) test the goodness of fit

(ii) test the significance of association between two attributes, and

(iii) test the homogeneity or the significance of population variance.

We can use the test to judge if a random sample has been drawn from a normal population

with mean µ and with a specified variance σ 2 . Given a sample of size ‘n’ and the sample

s2

variance s , we observe that the quantity χ = 2 (n − 1) has the chi-square distribution with

2 2

σ

n-1 degree of freedom. To test the null hypothesis H 0 : σ 2 = s 2 , we compare the calculated

χ 2 value against the table value at n-1 degree of freedom and given level of significance. If

the calculated value is higher than the table value, then we reject the null hypothesis,

otherwise we accept the null hypothesis.

S.No: 1 2 3 4 5 6 7 8 9 10

Weight(kg): 38 40 45 53 47 43 55 48 52 49

Can we say that the variance of the distribution of weight of all students from which the

above sample of 10 students was drawn is equal to 20 kgs? Test at 5% level of significance.

Solution:

First we shall find the variance of sample data given.

S.No X i (weight) (X i − X ) ( X i − X )2

1 38 -9 81

2 40 -7 49

3 45 -2 04

4 53 6 36

5 47 0 00

6 43 -4 16

7 55 8 64

8 48 1 01

9 52 5 25

10 49 2 04

470 280

470

X = = 47

10

s2 =

∑ (X i − X )2 =

280

= 31.11

n −1 9

Let the null hypothesis H 0 : σ 2 = s 2 . To test the hypothesis, we shall compute

s2 31.11

χ = 2 (n − 1) =

2

(10 − 1) = 13.99

σ 20

Table value of χ 2 at 10 – 1 = 9 degree of freedom and 5% level of significance is 16.92.

Since calculated value is less than the table value we accept the null hypothesis at 5% level of

significance. In other words, we can say that the sample is taken from the population with

variance 20 kgs.

Exercise: A sample of 10 is drawn randomly from a certain population. The sum of squared

deviation from the mean of given sample is 50. Test the hypothesis that the variance of the

population is 5 at 5% level of significance.

Chi-Square Test as Non-Parametric Test: This test can be used for (i) Testing goodness of

fit (ii) Testing independence of data

Testing goodness of fit: Chi-square test enables us to see how well does the assumed

theoretical distribution fit to the observed data. When some theoretical distribution is fitted to

the given data, we are always interested in knowing as to how well this distribution fits with

observed data.

We consider the fit is considered to be good, in other words, the divergence between the

observed and expected frequencies is attributable to fluctuation of sample, if the calculated

value of χ 2 is lesser than the table value for certain level of significance. Otherwise, fit is

not considered to be good one.

Test of independence: χ 2 test enables us to explain whether or not two attributes are

associated. For instance, we may be interested in knowing a new medicine is effective in

controlling fever or not, in such a case χ 2 test helps us in deciding the issue.

In such situation, we proceed with null hypothesis that the two attributes are independent. i.e,

the new medicine is not effective in controlling fever. On this basis we calculate the expected

frequencies and then workout the value of χ 2 . If the calculated χ 2 value is lesser than the

table for given degree of freedom, we accept the null hypothesis, otherwise, we reject.

(O − E ) 2

We calculate χ = ∑

2

where O is the observed frequency and E is the expected

E

frequency.

Degree of Freedom:

If there are ‘n’ number of frequency classes and there is one independent constraint, then the

degree of freedom is given by ‘n-1’.

When we have two independent constraints (bivariate case) with ‘c’ number of rows and ‘r’

number of columns then the degree of freedom is given by (c-1)(r-1).

For instance, in the following data obtained during the outbreak of smallpox:

Attacked Not attacked Total

Vaccinated 31 469 500

Not vaccinated 185 1315 1500

Total 216 1784 2000

The degree of freedom is (2-1)(2-1) = 1

Exercise: Genetic theory states that children having one parent of blood type A and the other

of blood type B will always one of the three types, A, AB, B and the proportion of three types

will be on an average be as 1 : 2 : 1. A report states that out of 300 children having one A

parent and one B parent, 30 percent were found to be of type A, 45 percent type AB and

remainder type B. Test the hypothesis by χ 2 test.

Solution: Observed frequencies of type A, AB and B are given by 90, 135 and 75 respectively

(in the proportion of 30 : 45: 25). Theoretically, it should have in the proportion of 1 : 2 : 1.

Therefore the expected frequencies of type A , AB and B are 75, 150 and 75 respectively. We

shall have chi-square test to verify the goodness of fit of theoretical distribution given.

Let the null hypothesis that the given data fits into given distribution. We shall calculate the

χ 2 as under:

Type Observed Expected (O − E ) (O − E ) 2 (O − E ) 2

Frequency(O) Frequency(E) E

A 90 75 15 225 3

AB 135 150 -15 225 1.5

B 75 75 0 0 0

χ 2 = 3 + 1.5 + 0 = 4.5

Degree of freedom = 3 – 1= 2

Table value of χ 2 for 2 degree of freedom at 5% level of significance is 5.991

Calculated χ 2 value is lesser than the table value. Therefore we accept the null hypothesis

that on an average type A , AB and B stand in the proportion of 1 : 2 : 1.

Exercise: A dice is rolled 240 times and observed frequencies are given below.

Face observed 1 2 3 4 5 6

Frequency observed 49 35 32 46 49 29

Using χ 2 test verify whether the dice is unbiased. Test at 5% level of significance.

Exercise: A sample of 10 is drawn randomly from a certain population. The sum of squared

deviation from the mean of given sample is 50. Test the hypothesis that the variance of the

population is 5 at 5% level of significance.

Exercise: In a city a survey was carried out of 200 families, each with 5 children. The

distribution shown below was produced.

(Boys, Girls) (5, 0) (4, 1) (3, 2) (2, 3) (1, 4) (0, 5)

No of families 11 35 69 55 25 5

Test the null hypothesis that the observed frequencies are consistent with male and female

births being equal probable, assuming binomial distribution, a level of significance of 0.05.

Solution: Assume that male and female births are equal probable. That is p = q = 0.5 . Note

5 5 −k

that the probability having k boys among 5 children in a family is given by

k

k

(0.5) (0.5)

(B, G) Observed Prob Expected (O − E ) (O − E ) 2 (O − E ) 2

Frequency(O) Frequency (E) E

(Prob x 200)

(5, 0) 11 0.03125 6 5 25 4.167

(4, 1) 35 0.15625 31 4 16 0.516

(3, 2) 69 0.3125 63 6 36 0.571

(2, 3) 55 0.3125 63 -8 64 1.016

(1, 4) 25 0.15625 31 -6 36 1.161

(0, 5) 5 0.03125 6 -1 1 1.167

(O − E ) 2

χ2 = ∑ = 7.598

E

Degree of freedom = 6 – 1 = 5

Table value of χ 2 for 5 degree of freedom at 5% level of significance is 11.1.

Since calculated χ 2 is lesser than the table value, we accept the null hypothesis that observed

frequencies are consistent with male and female births are equal probable.

Exercise: Two research groups classified some people in income groups on the basis of

sampling studies. The results are as follows:

Investigator Income groups Total

Poor Middle Rich

A 160 30 10 200

B 140 120 40 300

Total 300 150 50 500

Show that the sampling technique of at least one research group is defective.

Solution: Let us make the hypothesis that the techniques adopted both the groups are similar

and the data are similar.

Expected frequencies are

Investigator Income groups Total

Poor Middle Rich

A 120 60 20 200

B 180 90 30 300

Total 300 150 50 500

(O − E ) 2

χ2 = ∑

E

(160 − 120) 2 (30 − 60) 2 (10 − 20) 2 (140 − 180) 2 (120 − 90) 2 (40 − 30) 2

= + + + + +

120 60 20 180 90 30

= 55.54

Degree of freedom = (3-1)(2-1)=2

Table value of χ 2 for 2 degree of freedom at 5% level of significance is 5.991. Since the

calculated value is bigger than the table value, we conclude the rejection of null hypothesis at

5% level of significance. Technique adopted by one of two groups in data collection is

defective.

Attacked Not attacked Total

Vaccinated 31 469 500

Not vaccinated 185 1315 1500

Total 216 1784 2000

Test the effectiveness of vaccination in preventing the attack from the smallpox.

Exercise: Consider the following information regarding home condition and children’s

condition:

Condition of child Condition of home Total

Clean Dirty

Clean 70 50 120

Fairly Clean 80 20 100

Dirty 35 45 80

Total 185 115 300

State whether the two attributes viz., condition of home and condition of child are

independent. Use chi-square test for the purpose.

(i) Observation recorded and used are collected on random basis.

(ii) All the items in the sample must be independent.

(iii) No group should contain very few items.

(iv) The overall number of items also must also be reasonably large.

(v) The constraints must be linear. Constraints which involve linear equations in the

cell frequencies of a contingency table.

ANOVA

Consider a case of three varieties of wheat, each grown on four plots and production of wheat

for each kind of wheat per acre land in each kind of plot is given below:

Plot of land Variety of wheat

A B C

1 6 5 5

2 7 5 4

3 3 3 3

4 8 7 4

Researcher may be interested if there is significant difference between varieties of wheat

and/or varieties of plots.

ANOVA technique is very useful in making analysis in the above context.

ANOVA is an important technique in those entire situations where we want to compare more

than two populations such as in comparing the yield of crop from several varieties of seeds,

mileage of several automobiles and so on. In the circumstances of these kinds, one generally

does not want to consider all those combinations of two populations at a time, where the

number of tests required before arriving to a decision is larger.

The basic principle of ANOVA is to test for differences among the means of the populations

by examining the amount of variation within each of these samples, relative to the amount of

variation between the samples. In terms of variation within the given population, it is

assumed that the values of X differ from the mean of this population only because of random

effects, i.e., there are influences on X which are unexplainable, where as in examining

differences between populations we assume that the difference between the mean of jth

populations and the grand mean is attributable to what is called a ‘specific factor’ or what is

technically described as treatment effect. Thus while using ANOVA, we assume that each of

the samples is drawn from normal population and each of these populations has the same

variance. We also assume that all the factors other than the one or more being tested are

effectively controlled. In other words, means that we assume the absence of many factors that

might affect our conclusions concerning the factor(s) to be studied.

In this case we make two estimates of populations, namely, one based on between samples

variance and the other based on within the samples variance. Then the said two estimates of

population variance are compared with F-test, wherein we workout.

Esimate of population variance based on between samples variance

F=

Estimate of population variance on whithin samples variance

This value of F is to be compared to the F-limit for given degree of freedom. If the calculated

F value is more than the F-limit value, we may say that there are significance differences

between the sample means.

Under one-way ANOVA, we consider just one factor and then observe that the reason for

said factor to be important is that several possible types of samples can occur within that

factor. We then determine if there are differences within that factor.

(i) Obtain X 1 , X 2 ,..., X k where ‘k’ is the number of samples.

X 1 + X 2 + ... + X k

(ii) Workout mean of sample mean by the formula X =

k

(iii) Calculate sum of square between the samples by the formula

SS between = n1 ( X 1 − X ) 2 + n2 ( X 2 − X ) 2 + .... + nk ( X k − X ) 2

(iv) Compute Mean Square between the samples by the formula

SS between

MS between =

k −1

Where k-1 represents degree of freedom between the samples

(v) Calculate Sum of squares within samples by the formula :

SS within = ∑ ( X 1i − X 1 ) 2 + ∑ ( X 2i − X 2 ) 2 + .... + ∑ ( X ki − X k ) 2

SS within

MS within =

n −k

Where n is the total number of samples, and k is the number of sample.

(vii) Compute sum of squares of deviations for total variance by

(viii) Finally, F ratio is computed by formula

MS between

F − ratio =

MS within

If the calculated F-ration is greater than the F-value for the given degrees of

freedom and the significance level, then we reject the null hypothesis and in fact,

we conclude that the differences between the means are significant.

ANOVA TABLE (ONE-WAY ANALYSIS)

Source of Sum of Squares Degree Mean F-Ratio

variation (SS) of Square

Freedom (MS)

Between n1 ( X 1 − X ) 2 + n2 ( X 2 − X ) 2 + (k-1) SS between MS between

the samples k −1 MS within

.... + nk ( X k − X ) 2

Within

Samples

∑ (X 1i − X 1 ) 2 + ∑ ( X 2i − X 2 ) 2 +

(n-k) SS within

.... + ∑ ( X ki − X k ) 2

n-k

Total

∑ (X ij − X )2 (n-1)

Compute T = ∑ X ij and further

(SS) Freedom (MS)

Between the samples (T j ) 2 (T ) 2 (k-1) SS between MS between

∑ nj

−

n k −1 MS within

Within Samples

(T ) 2

∑ X 2 ij −

n

−

(T j ) 2 (T ) 2 (n-k) SS within

∑ nj

−

n n-k

Total (T ) 2 (n-1)

∑ X 2 ij −

n

Exercise: Setup an ANOVA table for the following per acre production for three varieties of

wheat, each grown on four plots and state if the variety difference is significant:

Plot of land Variety of wheat

A B C

1 6 5 5

2 7 5 4

3 3 3 3

4 8 7 4

Solution:

Plot of land Variety of wheat

A B C

1 6 5 5

2 7 5 4

3 3 3 3

4 8 7 4

Total 24 20 16

n1 = n2 = n3 = 4

n = 12

24 20 16

X1 = = 6; X2 = = 5; X3 = =4

4 4 4

6 +5 +4

X = =5

3

SS between = n1 ( X 1 − X ) 2 + n 2 ( X 2 − X ) 2 + n3 ( X 3 − X ) 2

= 4(6 − 5) 2 + 4(5 − 5) 2 + 4( 4 − 5) 2 = 8

SS within = ∑ ( X 1i − X 1 ) 2 + ∑ ( X 2i − X 2 ) 2 + ∑ ( X 3i − X 3 ) 2 = 24

Between the samples 8 3-1=2 SS between 8 MS between 4.00

= =4 =

k −1 2 MS within 2.67

= 1 .5

= = 2.67

n-k 9

Total 32 (n-1) F-limit 5%

level of significance

F(2,9) = 4.26

Since the calculated value lesser than the table value, we accept the null hypothesis that

difference between outputs due to variety of seed is not significant.

Exercise: A manager of a firm wishes to test whether the salesmen of his firm (A, B, and C)

tend to make sales of same size. During a week there have been 14 sale calls. A made 5, B

made 4 and C made 5 calls respectively. Following are the sales data for the week of 3

salesmen.

A: 500 400 700 800 600

B: 300 700 400 600

C: 500 300 500 400 300

Perform ANOVA and draw your conclusion at 5% level of significance (F(2, 11)=3.98)

TWO-WAY ANOVA

Two-way ANOVA technique is used when the data are classified on the basis of two factors.

For example:

(i) The agricultural output may be classified on the basis of different varieties of

seeds and also on the basis of different fertilizers used.

(ii) A business firm may have its sales data classified on the basis of different

salesmen and also on the basis of sales in different regions.

(iii) In a factory, the various units of a product produced during a certain period may

be classified on the basis of different varieties of machines and also on the basis of

different grades of labor.

Two way designs may have repeated measurement of each factor or may not have repeated

values. The ANOVA technique is little different in case of repeated measurements where we

also compute interaction variation.

Source of Sum of Squares Degree of Mean Square F-Ratio

variation (SS) Freedom (MS)

Between (T j ) 2 (T ) 2 (c – 1) SS between columns MS between columns

Column ∑ nj

−

n c −1 MS Residual

Treatment

Between (Ti ) 2 (T ) 2 (r – 1) SS between rows MS between rows

rows

∑ ni

−

n r −1 MS Residual

Treatment

Residual Total SS – (SS (c-1)(r-1) SS Residual

error between columns (c − 1)(r − 1)

+SS between rows

Total (T ) 2 c.r - 1

∑X 2

ij −

n

Exercise: Set up an ANOVA table for the following two-way design results:

Per acre production of wheat

Variety of Variety of wheat

Fertilizer A B C

1 6 5 5

2 7 5 4

3 3 3 3

4 8 7 4

Total 24 20 16

Also state whether variety differences are significant at 5% level significance.

Solution:

Variety of Variety of wheat Row total

Fertilizer

1 6 5 5 16

2 7 5 4 16

3 3 3 3 09

4 8 7 4 19

Column Total 24 20 16 60

n = 12; T = 60

T 2 60 2

= = 300

n 12

(T ) 2

SS total = ∑ X 2 ij −

n

= (6 + 5 + 5 + 7 + 5 2 + 4 2 + 33 + 3 2 + 3 2 + 8 2 + 7 2 + 4 2 ) − 300 = 32

2 2 2 2

(T j ) 2 (T ) 2

SS between column treatment = ∑ −

nj n

24 2 20 2 16 2

= + + − 300 = 8

4 4 4

(Ti ) 2 (T ) 2

SS between row treatment = ∑ −

ni n

16 2 16 2 9 2 19 2

= + + + − 300 = 18

3 3 3 3

SS Residual = Total SS – (SS between columns +SS between rows)

= 32 –(8 +18) = 6

ANOVA Table:

Source of Sum of Squares Degree of Mean Square F-Ratio 5% level

variation (SS) Freedom (MS) F-limit

Between 8 2 4 4 F(2, 6) =

Column 5.14

Treatment

Between 18 3 6 6 F(3, 6) =

rows 4.76

Treatment

Residual 6 6 1

error

Total 32 11

F-ratio due to column treatment (4) is lesser than the table value (5.14) at 5% level of

significance. Therefore difference among mean yield due to column treatment (varieties of

seeds) is not significant.

But F-ratio in case of row treatment (6) is larger than the table value (4.76) at 5% level of

significance. Therefore, difference among mean yield due to column treatment (varieties of

fertilizer) is significant.

Exercise: The following data gives the number of units produced per day by 5 workers using

4 different machines.

M1 M2 M3 M4

A: 45 42 48 38

B: 40 32 50 34

C: 43 36 44 40

D: 36 38 46 36

E: 41 37 47 37

Test if the production is equal with respect machines and with respect to workers.

Answer:

Source of Sum of Squares Degree of Mean Square F-Ratio 5% level

variation (SS) Freedom (MS) F-limit

Between 335 3 111.67 14.97 F(3,12)

Column = 3.49

Treatment

Between 48.5 4 12.125 1.626 F(4,12)

rows =3.26

Treatment

Residual 89.5 12 7.458

error

Total 473 11

(i) The test should not be used in a mechanical fashion. It should be kept in view that

testing is not decision making itself; the tests are only useful aids for decision-

making. Hence proper interpretation of statistical evidence is important to

intelligent decision.

(ii) Tests do not explain the reasons as to why do the difference exist. They simply

indicate whether the difference is due to fluctuations of sampling or because of

other reasons but tests do not tell us as to which is/are the other reason(s) causing

the difference.

(iii) Results of significant tests are based on probabilities and such cannot be expressed

with full certainty. When a test shows that a difference is statistically significant,

then it simply suggest that the difference is probably not due to chance.

(iv) Statistical inferences based on the significance tests cannot be said to be entirely

correct evidences concerning the truth of hypothesis. This is specially so in case of

small samples where the probability of drawing erring inferences happens to be

generally higher. For greater reliability, the size of samples is sufficiently

enlarged.