Вы находитесь на странице: 1из 36

INTERVAL ESTIMATION AND TESTS OF SIGNIFICANCE

A brief note on chi-square, t and F distributions

I The chi-square distribution (also written as 𝝌𝟐 distribution)


𝜈
The chi-square distribution is a special form of the Gamma distribution with 𝛼 = 2 and 𝜆 = ½
The probability density of the Gamma distribution with parameters 𝛼 and 𝜆 is
𝜆𝛼
𝑓(𝑥) = 𝑥 𝛼−1 𝑒 −𝜆𝑥 𝑥>0
𝛤(𝛼)
𝛼 𝛼
The mean, the variance and the moment generating function of the Gamma distribution are ,
𝜆 𝜆2
𝑡 −𝛼
and (1 − 𝜆)

The probability density of the chi-square distribution with parameter ν is


1 𝜈⁄ −1 𝑥⁄
𝑓(𝑥) = 𝜈 𝜈 𝑥 2 𝑒− 2 𝑥>0
2 ⁄2 𝛤(2)

The parameter ν is called ‘the number of degrees of freedom’.


Mean = 𝜈 Variance = 2𝜈 𝑀𝑋 (𝑡) = (1 − 2𝑡)− 𝜈⁄2
2
The chi-square distribution is positively skewed. The symbol 𝜒𝛼,𝜈 is used to denote the point to
2
the right of which the area under the 𝜒 curve with ν degrees of freedom is 𝛼.
2
𝑃[𝑋 > 𝜒𝛼,𝜈 ]= 𝛼

II The ‘t’ distribution (also called ‘Student’s ‘t’ distribution’)


If Y and Z are independent random variables, Y has a 𝜒 2 distribution with ν degrees of freedom,
and Z has the standard normal distribution, then the distribution of
𝑍
𝑇= is given by
√𝑌⁄𝜈

𝜈+1 𝜈+1
𝛤( ) −
2 𝑡2 2
𝑓(𝑡) = 𝜈 (1 + ) −∞ < 𝑡 < ∞
√𝜋𝜈 𝛤(2) 𝜈

This is called the ‘t’ distribution with ν degrees of freedom.


The ‘t’ distribution is symmetric about 𝑡 = 0. The mean, the median and the mode are at 𝑡 = 0.
The symbol 𝑡𝛼,𝜈 is used to denote the point to the right of which the area under the t curve with ν
degrees of freedom is 𝛼.
𝑃[𝑇 > 𝑡𝛼,𝜈 ] = 𝛼
III The F distribution
If U and V are independent random variables having 𝜒 2 distribution with 𝜈1 and 𝜈2 degrees of
freedom, then
𝑈⁄
𝜈1
𝐹= 𝑉⁄
𝜈2

is a random variable having an F distribution.


The probability density of the F distribution is
𝜈 +𝜈 𝜈1 (𝜈 1 + 𝜈 2 )
𝛤( 1 2 ) 𝜈1 ⁄2 𝜈1
−1 𝜈1 −
2
2
𝑔(𝑓) = 𝜈 𝜈 (𝜈 ) 𝑓2 (1 + 𝜈 𝑓) 𝑓>0
𝛤( 21 ) 𝛤( 22 ) 2 2

The F distribution is positively skewed. The symbol 𝑓𝛼,𝜈1 ,𝜈2 is used to denote the point to the
right of which the area under the F curve with 𝜈1 and 𝜈2 degrees of freedom is 𝛼.

𝑃 [𝐹 > 𝑓𝛼,𝜈 ,𝜈 ] = 𝛼
1 2

---------------

Tests of significance concerning mean, standard deviation and proportion are dealt with
in the study below. Simultaneously, interval estimation of the population mean, standard
deviation and proportion are derived. The following terms are used in interval estimation and the
tests of significance.
Sampling Distribution and Standard Error
The statistical measures of the sample are termed as ‘statistics’; the statistical measures of
the population are called ‘parameters’.
The term ‘Sampling Distribution’ is used to refer to the distribution of a sample statistic.
For example, ‘the sampling distribution of mean’ is the distribution of the mean of independent
random samples of size ‘n’ drawn from a given population.
‘Standard Error’ is the standard deviation of the sampling distribution. The standard
error, denoted S.E. of a sample statistic is used in tests of significance of difference and while
giving an interval estimate for the population parameter.

INTERVAL ESTIMATION

Point estimators give a single value as an estimate for the parameter. There is no
indication about the probability of this estimate being acceptable. In other words, there is no idea
of the size of the error in such an estimate. Interval estimation is based on a certain stipulated
‘confidence level’, and as such, indicates the size of the error in the value provided by
estimation. This confidence level is usually either 95% or 99%. We denote this level as 1 – α. If
(𝑎, 𝑏) is the (1 – α)% confidence interval for the parameter θ, it means that

𝑃[𝑎 < 𝜃 < 𝑏] = 1 − 𝛼

TESTS OF SIGNIFICANCE
It is tested if the observed difference between the sample statistic and population
parameter is significant or not, using a test measure at a stipulated confidence level, denoted as α.
Suppose we want to test the null hypothesis

𝐻0 ∶ 𝜃 = 𝜃0

The alternative can be any one of the following and consequently the nature of the test will be as
below:

𝐻1 ∶ 𝜃 > 𝜃0 leads to right-tailed test

𝐻1 ∶ 𝜃 < 𝜃0 leads to left-tailed test

𝐻1 ∶ 𝜃 ≠ 𝜃0 leads to two-tailed test

The following is the general procedure for testing the significance of an observed difference:
(i) Set up null hypothesis 𝐻0 of no difference and a suitable alternative hypothesis 𝐻1
(ii) Decide whether one-tailed test or two-tailed test is to be used
(iii) Decide on the level of significance α
(iv) Select the appropriate test statistic (this involves the S.E. of the sample statistic under question)
(v) Demarcate critical region in the distribution of the test statistic
(vi) Check if the calculated value of the test statistic falls within or outside the critical region
(vii) Accordingly reject / accept 𝐻0 at the level of significance α

Instead of demarcating the critical region and checking whether the calculated value of the
test statistic falls within or outside the critical region, we may also use the P-value. The P-value is
the lowest level of significance at which the null hypothesis could have been rejected. More simply,
P-value is the area of the region corresponding to the calculated value of the test statistic. If the P-
value is less than the stipulated level of significance α, the null hypothesis is rejected at the level of
significance α.
INTERVAL ESTIMATION AND TESTS CONCERNING MEAN
I Mean of a single sample
Let a random sample of size n be taken from an infinite population. 𝜇 and 𝜎 2 denote the
mean and variance of the population. Let 𝑋̅ be the mean of the sample. The distribution of the
sample mean 𝑋̅ which is called as the ‘sampling distribution of the sample mean’ has mean 𝜇 and
𝜎2
variance .
𝑛

The null hypothesis is


𝐻0 : μ = μ0
against the alternative which could be any one of
μ ≠ μ0 μ > μ0 μ < μ0.
Case 1: Population known to be normal with known variance 𝜎 2 . No condition on n.
𝜎 𝑋̅ − 𝜇0
𝑆. 𝐸.𝑋̅ = Test statistic = 𝑧 = 𝜎 ~ 𝑁(0,1) [under 𝐻0 ]
√𝑛 ⁄ 𝑛

𝜎
The 100(1 − 𝛼)% confidence interval for 𝜇 is 𝑋̅ ± 𝑧𝛼⁄2
√𝑛
𝝈
The term 𝒛𝜶⁄𝟐 is the maximum value of the difference between the sample mean and the
√𝒏
population mean and is called the maximum error under this case.

Case 2: Population not necessarily normal. n ≥ 30.


𝜎 𝑋̅ − 𝜇0
𝑆. 𝐸.𝑋̅ = Test statistic = 𝑧 = 𝜎 ~ 𝑁(0,1) if 𝜎 is known [under 𝐻0 ]
√𝑛 ⁄ 𝑛

𝑠 𝑋̅ − 𝜇0
𝑆. 𝐸.𝑋̅ = Test statistic = 𝑧 = 𝑠 ~ 𝑁(0,1) if 𝜎 is not known [under 𝐻0 ]
√𝑛 ⁄ 𝑛

∑(𝑥𝑖 −𝑥̅ )2
𝑠 2 is the sample variance given by 𝑠 2 =
𝑛−1
𝜎 𝑠
The 100(1 − 𝛼)% confidence interval for 𝜇 is 𝑋̅ ± 𝑧𝛼⁄2 or 𝑋̅ ± 𝑧𝛼⁄2
√𝑛 √𝑛

Case 3: Population normal. 𝜎 is unknown. n < 30


𝑠 𝑋̅ − 𝜇0
𝑆. 𝐸.𝑋̅ = Test statistic = 𝑡 = 𝑠 ~ 𝑡 distribution with n −1 degrees of freedom.
√𝑛 ⁄ 𝑛

[under 𝐻0 ]
𝑠
The 100(1 − 𝛼)% confidence interval for 𝜇 is 𝑋̅ ± 𝑡𝛼⁄2, 𝑛−1 ( 𝑛)

𝒔
The term 𝒕𝜶⁄𝟐, 𝒏−𝟏 ( 𝒏) is called the maximum error under this case.

II Means of two samples


Two independent random samples of sizes n1 and n2 are taken from two populations with
means denoted by μ1, μ2 and variances denoted by σ12, σ22. The sample means are 𝑋̅1 and 𝑋̅2.

𝜎 𝜎 2 2
𝐸(𝑋̅1 − 𝑋̅2 ) = 𝜇1 − 𝜇2 Variance (𝑋̅1 − 𝑋̅2 ) = √ 𝑛1 + 𝑛2
1 2

The null hypothesis 𝐻0 : μ1 − μ2 = δ against the alternative which could be any one of

μ1 − μ2 ≠ δ μ1 − μ2 > δ μ1 − μ2 < δ.

Case 1: Populations known to be normal with known variances σ12, σ22. No condition on n1 and n2.
𝜎 2 𝜎2 2 ̅ 1 −𝑋̅ 2 − 𝛿
𝑋
𝑆. 𝐸.𝑋̅1 −𝑋̅2 = √ 𝑛1 + Test statistic = 𝑧 = ~ 𝑁 (0, 1) [under 𝐻0 ]
1 𝑛2 𝜎 2 𝜎 2
√ 1 + 2
𝑛1 𝑛2

𝜎 𝜎 2 2
The 100(1 − 𝛼)% confidence interval for 𝜇1 − 𝜇2 is 𝑋̅1 − 𝑋̅2 ± 𝑧𝛼⁄2 √ 𝑛1 + 𝑛2
1 2

Case 2: Populations not known. Both n1 and n2 > 30


If σ12, σ22 are known, use the same procedure as Case 1. If σ12, σ22 are not known, substitute the
sample variances s12 and s22 in place of σ12, σ22 and proceed as in Case 1.

Case 3: Populations normal. σ12, σ22 are not known. Either or both of n1 and n2 < 30
Let the samples be {𝑋1𝑖 ⁄𝑖 = 1, 2, . . . 𝑛1 } and {𝑋2𝑗 ⁄𝑗 = 1, 2, . . . 𝑛2 }
Assume 𝜎1 = 𝜎2 = 𝜎 ; 𝜎 is estimated by pooling the squared deviations from the means of
the two samples. The pooled estimate of 𝜎 is denoted as 𝑆𝑃 .
2
(𝑛1 −1)𝑠1 2 + (𝑛2 −1)𝑠2 2 ̅1 )2 + ∑(𝑥2𝑗 − 𝑥
∑(𝑥1𝑖 − 𝑥 ̅2)
2
𝑆𝑃 = 𝑛1 + 𝑛2 −2
= 𝑛1 + 𝑛2 −2

1 1
𝑆. 𝐸.𝑋̅1 −𝑋̅2 = 𝑆𝑃 √𝑛 + 𝑛
1 2

(𝑋̅ 1 −𝑋̅ 2 ) − 𝛿
Test statistic = t = 1 1
𝑆𝑃 √ +
𝑛1 𝑛2
~ t distribution with 𝑛1 + 𝑛2 − 2 degrees of freedom. [under 𝐻0 ]

1 1
The 100(1 − 𝛼)% confidence interval for 𝜇1 − 𝜇2 is (𝑋̅1 − 𝑋̅2 ) ± 𝑡𝛼⁄2, 𝑛1 +𝑛2−2 [𝑆𝑃 √𝑛 + 𝑛 ]
1 2

TABLE SHOWING CRITICAL VALUES OF z

Level of significance (two-tailed test) (right-tailed test) (left-tailed test)


α 𝑧𝛼⁄2 𝑧𝛼 − 𝑧𝛼
5% 1.96 1.645 − 1.645
2% 2.327 2.055 − 2.055
1% 2.575 2.327 − 2.327
Criterion for │z│> 𝑧𝛼⁄2 z > 𝑧𝛼 z < −𝑧𝛼
rejecting 𝐻0 or t > 𝑡𝛼,𝜈 or t < −𝑡𝛼,𝜈
or │t│> 𝑡𝛼⁄2,𝜈

1) Test at the 0.05 level of significance whether the mean of a random sample of size n = 16 is
significantly less than 10, if the distribution from which the sample was taken is normal; 𝑥̅ = 8.4, σ =
3.2.
Solution: 𝐻0 : 𝜇 = 10 𝐻1 : 𝜇 < 10 (left-tailed) 𝛼 = 0.05
Population is normal; 𝜎 is known; standard normal variable z is used as the test statistic.
𝑋̅ − 𝜇0 8.4−10 (−1.6)×4
𝑧= 𝜎 = 3.2⁄ = = −2
⁄ 𝑛 3.2
√ √16

−𝑧0.05 = −1.645 𝑧 < −𝑧0.05


Inference: There is sufficient reason to reject the null hypothesis at the 0.05 level of
significance. The sample mean is significantly less than 10.
If P-value is used to draw the inference, the steps are as follows:
The P-value corresponding to the value −2 of z is read from the normal tables as 0.0228.
0.0228 < 0.05
Hence, reject the null hypothesis at a significance level of 0.05.
2) A medical research worker finds that the mean of a random sample of size 120 of readings of
blood pressure of women in their fifties is 141.8 mm of Hg. Based on experience, he knows that σ =
10.5 mm of Hg. Do the sample data support the claim that the mean blood pressure of women in
their fifties is 140 mm of Hg at a level of significance of 5%? Also find the (a) 95% (b) 98%
confidence limits for μ, the mean blood pressure of women in their fifties.
Solution: 𝐻0 : 𝜇 = 140 𝐻1 : 𝜇 ≠ 140 (two-tailed) 𝛼 = 0.05
Size of the sample is more than 30; standard normal variable z is used as the test statistic.
│𝑋̅ − 𝜇0 │ 141.8−140 (1.8)×√120
│𝑧│ = 𝜎 = 10.5⁄ = = 1.878
⁄ 𝑛 10.5
√ √120
𝑧𝛼⁄2 = 𝑧0.025 = 1.96 │𝑧│ < 𝑧𝛼⁄2
Inference: The null hypothesis cannot be rejected at 5% level of significance. The claim that
the mean blood pressure of women in their fifties is 140 mm of Hg can be accepted.
If P-value is used to draw the inference, the steps are as follows:
The P-value corresponding to the value 1.878 (≈ 1.88) of z is read from the normal tables as
0.0602.
0.0602 > 0.05
Hence, accept the null hypothesis.
𝜎
The 95% confidence limits for μ = 𝑋̅ ± 𝑧𝛼⁄2
√𝑛
10.5
= 141.8 ± (1.96) ( ) = (139.9, 143.7) mm of Hg
√120

Note: The claimed value of 140 mm of Hg according to 𝐻0 is within the 95% confidence limits for μ
and it can be inferred that the claim can be accepted.
𝜎
The 98% confidence limits for μ = 𝑋̅ ± 𝑧𝛼⁄2
√𝑛
10.5
= 141.8 ± (2.327) ( ) = (140.8, 142.8) mm of Hg
√120

3) The security department of a factory wants to know whether the true average time required by the
night guard to walk his round is 30 minutes. If in a random sample of 32 rounds, the night guard
averaged 30.8 minutes with a standard deviation of 2.1 minutes, determine whether this is sufficient
evidence to reject the null hypothesis μ = 30 minutes in favour of the alternative hypothesis μ ≠ 30
minutes at (a) 0.01 (b) 0.05 levels of significance.
Solution: 𝐻0 : 𝜇 = 30 𝐻1 : 𝜇 ≠ 30 (two-tailed) 𝛼 = 0.01, 0.05
Size of the sample is more than 30; standard normal variable z is used as the test statistic. σ is not
known; use the sample standard deviation s in place of σ.
│𝑋̅ − 𝜇0 │ 30.8−30 (0.8)×√32
│𝑧│ = 𝜎 = 2.1⁄ = = 2.155
⁄ 𝑛 2.1
√ √32

(a) 𝑧𝛼⁄2 = 𝑧0.005 = 2.575 │𝑧│ < 𝑧𝛼⁄2

(b) 𝑧𝛼⁄2 = 𝑧0.025 = 1.96 │𝑧│ > 𝑧𝛼⁄2


Inference: The null hypothesis cannot be rejected at 1% level of significance; but it can be
rejected at 5% level of significance.
The P-value corresponding to the value 2.155 (≈ 2.16) of z is read from the normal tables as
0.0308.
0.0308 > 0.01 0.0308 < 0.05
𝐻0 is accepted at 1% level of significance; but it is rejected at 5% level of significance.

4) Five measurements of the tar content of a certain kind of cigarette yielded 14.5, 14.2, 14.4, 14.3,
14.6 mg/cigarette. Assume that the data are a random sample from a normal population.
(a) Find if there is reason enough to reject the null hypothesis μ = 14.0 in favour of the alternative μ
≠ 14.0.
(b) What would be the inference if the null hypothesis is μ = 14.3 and the alternative is μ ≠14.3?
(c) What would be the inference if the null hypothesis is μ = 14.2 and the alternative is μ > 14.2?
Use 0.05 level of significance.
Solution: Population normal; σ unknown; n < 30
Sample mean = 𝑥̅ = 14.4
∑(𝑥𝑖 −𝑥̅ )2 0.10
∑(𝑥𝑖 − 𝑥̅ )2 = 0.10 𝑠2 = = = 0.025
𝑛−1 4

(a) 𝐻0 : 𝜇 = 14.0 𝐻1 : 𝜇 ≠ 14.0 (two-tailed) 𝛼 = 0.05


│𝑋̅ − 𝜇0 │ 0.4
Test statistic = │𝑡│ = 𝑠 = = 5.657
⁄ 𝑛 √0.025⁄
√ √5

𝑡𝛼⁄2,𝜈 = 𝑡0.025,4 = 2.776 │t│ > 𝑡𝛼⁄2,𝜈


Inference: There is reason enough to reject the null hypothesis at 5% level of significance.
The mean tar content of cigarettes in the population is not 14.0 mg / cigarette.

(b) 𝐻0 : 𝜇 = 14.3 𝐻1 : 𝜇 ≠ 14.3 (two-tailed) 𝛼 = 0.05


│𝑋̅ − 𝜇0 │ 0.1
Test statistic = │𝑡│ = 𝑠 = = 1.414
⁄ 𝑛 √0.025⁄
√ √5

𝑡𝛼⁄2,𝜈 = 𝑡0.025,4 = 2.776 │t│ < 𝑡𝛼⁄2,𝜈


Inference: There is reason enough to accept the null hypothesis at 5% level of significance.
The mean tar content of cigarettes in the population could be taken as 14.3 mg / cigarette.

(c) 𝐻0 : 𝜇 = 14.2 𝐻1 : 𝜇 > 14.2 (right-tailed) 𝛼 = 0.05


𝑋̅ − 𝜇0 0.2
Test statistic = 𝑡 = 𝑠 = √0.025 = 2.828
⁄ 𝑛 ⁄
√ √5

𝑡𝛼,𝜈 = 𝑡0.05,4 = 2.132 t > 𝑡𝛼,𝜈


Inference: The null hypothesis is rejected in favour of the alternative hypothesis at 5% level
of significance. The mean tar content of cigarettes in the population could be taken as more than
14.2 mg / cigarette.

5) The length of the skulls of 10 fossil skeletons of an extinct species of bird has a mean of 5.68 cm
and a standard deviation of 0.29 cm. Assume that such measurements are normally distributed and
find a 95% confidence interval for the mean length of the skulls of this species of bird.
Solution: Population normal; σ unknown; n < 30
Sample mean = 𝑥̅ = 5.68 cm Sample S.D. = s = 0.29
𝑠
95% confidence limits for μ, the mean length of the skulls = 𝑥̅ ± 𝑡𝛼⁄2, 𝑛−1 ( 𝑛)

𝑠 0.29
= 𝑥̅ ± 𝑡0.025,9 ( 𝑛) = 5.68 ± (2.262)
√ √10

= 5.68 ± (0.21) = (5.47 𝑐𝑚, 5.89𝑐𝑚)

6) In a study of television viewing habits, it is desired to estimate the average number of hours that
teenagers spend watching per week. If it is reasonable to assume that σ = 3.2 hours, how large a
sample is needed so that it will be possible to assert with 95% confidence that the sample mean is off
the true mean by less than 20 minutes?
Solution: σ = 3.2 hours α = 0.05
It is required to find the least value of the sample size n.
The difference between the sample mean 𝑥̅ and the true mean μ should not exceed 20 minutes.
1
│𝑥̅ − 𝜇│ < 3 Hour.
Let us work with the assumption that n will turn out to be ≥ 30.
The maximum error (maximum difference between 𝑥̅ and μ)
𝜎 3.2 (1.96) (3.2)
= 𝑧𝛼⁄2 = 𝑧0.025 ( )=
√𝑛 √ 𝑛 √𝑛

1
This term should be less than Hour
3
(1.96) (3.2) 1
<
√𝑛 3

√𝑛 > 3 (1.96)(3.2) ⇒ 𝑛 > 354.04 ⇒ 𝑛 > 355


This answer tallies with the assumption that 𝑛 > 30.
The conclusion is that size of the sample should be at least 355 in order that the sample mean is off
the true mean by less than 20 minutes.

7) An experiment is performed to determine whether the average nicotine content of one kind of
cigarette exceeds that of another kind by 0.20 mg. A random sample of size 50 cigarettes of the first
kind had an average nicotine content of 2.61 mg with a standard deviation of 0.12 mg. Another
random sample of size 40 cigarettes of the second kind had an average nicotine content of 2.38 mg
with a standard deviation of 0.14 mg. Test the null hypothesis μ1 – μ2 = 0.20 against the alternative
hypothesis μ1 – μ2 ≠ 0.20 at the 0.05 level of significance. Also check the decision on the P-value
corresponding to the value of the appropriate test statistic.
Solution:
Populations not known; σ12, σ22 unknown; n1, n2 both > 30
H0: μ1 – μ2 = 0.20 H1: μ1 – μ2 ≠ 0.20 (two-tailed) α = 0.05
̅1 −𝑥̅2 − 𝛿
𝑥
Test statistic = z = ~ 𝑁 (0, 1)
𝑠1 2 𝑠2 2
√ +
𝑛1 𝑛2

2.61−2.38−0.20
│𝑧│ = 2 2
= 1.08
√(0.12) + (0.14)
50 40

Inference: 𝑧𝛼⁄2 = 𝑧0.025 = 1.96 │𝑧│ < 𝑧𝛼⁄2


The null hypothesis cannot be rejected at 5% level of significance.
The P-value corresponding to z = 1.08 is 0.2802. This is more than 0.05. So, the null
hypothesis cannot be rejected at 5% level of significance.
8) With reference to Question 7, for what values of 𝑥̅1 − 𝑥̅2 would the null hypothesis have been
rejected? Also find the probability of type II error with the given criterion if
(a) μ1 – μ2 = 0.12 (b) μ1 – μ2 = 0.16
│𝑥̅1 −𝑥̅2 − 𝛿│
Solution: If the null hypothesis is to be rejected, must be > 1.96.
𝑠1 2 𝑠2 2
√ +
𝑛1 𝑛2

𝑠 2 𝑠2 2 0.12
( )2 (0.14)2
√ 𝑛1 + 𝑛2
= √ 50 + 40
= 0.02789
1

Hence, │(𝑥̅1 − 𝑥̅ 2 ) − 0.20│ > 1.96 × 0.02789

│(𝑥̅1 − 𝑥̅2 ) − 0.20│ > 0.055


H0 is rejected if (𝑥̅1 − 𝑥̅2 ) − 0.20 > 0.055 OR (𝑥̅1 − 𝑥̅2 ) − 0.20 < −0.055
H0 is rejected if (𝑥̅1 − 𝑥̅2 ) > 0.255 OR (𝑥̅1 − 𝑥̅2 ) < 0.145
H0 is rejected if (𝑥̅1 − 𝑥̅2 ) does not lie between 0.145 and 0.255

(a) It is required to find β if the true value of μ1 – μ2 is 0.12.


̅1 −𝑥̅2 − 𝛿
𝑥
~ 𝑁 (0, 1)
𝜎 2 𝜎 2
√ 1 + 2
𝑛1 𝑛2

Hence, 𝑥̅1 − 𝑥̅ 2 ~ 𝑁(0.12, 0.2789)


β = P [accepting H0 / H0 is false] = 𝑃 [0.145 < (𝑥̅1 − 𝑥̅2 ) < 0.255]
0.145−0.12 (𝑥̅ 1 −𝑥̅ 2 ) −0.12 0.255−0.12
= 𝑃[ < < ]
0.2789 0.2789 0.2789

= 𝑃[0.90 < 𝑧 < 4.84] = 0.1841


(b) It is required to find β if the true value of μ1 – μ2 is 0.16.
̅1 −𝑥̅2 − 𝛿
𝑥
~ 𝑁 (0, 1)
𝜎 2 𝜎 2
√ 1 + 2
𝑛1 𝑛2

Hence, 𝑥̅1 − 𝑥̅ 2 ~ 𝑁(0.16, 0.2789)


β = P [accepting H0 / H0 is false] = 𝑃 [0.145 < (𝑥̅1 − 𝑥̅2 ) < 0.255]
0.145−0.16 (𝑥̅ 1 −𝑥̅ 2 ) −0.12 0.255−0.16
= 𝑃[ < < ]
0.2789 0.2789 0.2789

= 𝑃[−0.54 < 𝑧 < 3.41] = 0.7054


9) Sample surveys conducted in a large county in a certain year and again 20 years later showed that
originally the average height of 400 ten-year-old boys was 53.8 inches with a standard deviation of
2.4 inches, whereas 20 years later, the average height of 500 ten-year-old boys was 54.5 inches with
a standard deviation of 2.5 inches. At a level of significance of 0.05, test the null hypothesis μ1 – μ2
= − 0.5 against the alternative hypothesis μ1 – μ2 < − 0.5
Solution: Populations not known; σ12, σ22 unknown; Samples independent; n1, n2 both > 30
H0: μ1 – μ2 = − 0.5 H1: μ1 – μ2 < − 0.5 (left-tailed) α = 0.05

̅1 −𝑥̅2 − 𝛿
𝑥
Test statistic = z = ~ 𝑁 (0, 1)
𝑠1 2 𝑠2 2
√ +
𝑛1 𝑛2

(53.8−54.5)− (−0.5)
z= 2 2
= −1.22
√(2.4) +(2.5)
400 500

Inference: −𝑧0.05 = −1.645 𝑧 > −𝑧0.05


We cannot reject the null hypothesis at a level of significance of 0.05. The average height of ten-
year-old boys has increased by 0.5 inches 20 years later, and not more than 0.5 inches.

10) With reference to Question 9, base your decision on the P-value for the test statistic calculated.
Solution: z = −1.22; P-value corresponding to −1.22 is 0.1112.
Inference: P-value > 0.05. Accept the null hypothesis at a level of significance of 0.05.

11) To compare two kinds of bumper guards, six of each kind were mounted on a certain make of
compact car. Then each car was run into a concrete wall at 5 miles per hour, and the following are
the costs of the repairs (in dollars):
Bumper guard 1: 127 168 143 165 122 139
Bumper guard 2: 154 135 132 171 153 149
Test at a level of significance of 0.01 whether the difference between the means of these two
samples is significant.
Solution: Populations not known; σ12, σ22 unknown; Samples independent; n1, n2 both < 30
Assume that the populations are normal and have equal variance 𝜎 2 . The pooled estimate of 𝜎 2 is
𝑆𝑃 2 .
H0: μ1 = μ2 H1: μ1 ≠ μ2 (two-tailed) α = 0.01
(𝑋̅ 1 −𝑋̅ 2 )
Test statistic = T = ~ t distribution with 𝑛1 + 𝑛2 − 2 degrees of freedom
1 1
𝑆𝑃 √ +
𝑛1 𝑛2

2
̅1 )2 + ∑(𝑥2𝑗 − 𝑥
∑(𝑥1𝑖 − 𝑥 ̅2 )
2
𝑆𝑃 = 𝑛1 + 𝑛2 −2

Calculations:
Sample 1:
864
x1i 127 168 143 165 122 139 𝑥̅1 = = 144
6

x1i − 𝑥̅1 − 17 24 −1 21 − 22 −5
(𝑥1𝑖 − 𝑥̅1 )2 289 596 1 441 484 25 ∑(𝑥1𝑖 − 𝑥̅1 )2 = 1836

Sample 2:
894
x2j 154 135 132 171 153 149 𝑥̅2 = = 149
6

x2j − 𝑥̅2 5 − 14 − 17 22 4 0
2 2
(𝑥2𝑗 − 𝑥̅2 ) 25 196 289 484 16 0 ∑(𝑥2𝑗 − 𝑥̅2 ) = 1010

1836 +1010
𝑆𝑃 2 = = 284.6
10

│𝑥̅ 1 − 𝑥̅ 2 │ 5
│𝑡│ = 1 1
= 1 1
= 0.5133
𝑆𝑃 √ + √284.6 √6 + 6
𝑛1 𝑛2

Inference: 𝑡𝛼⁄2,𝜈 = 𝑡0.005,10 = 3.169 │t│< 𝑡𝛼⁄2,𝜈


We cannot reject the null hypothesis at a level of significance of 0.01. The difference between the
means of these two samples is not significant.

12) The following are the average weekly losses of work-hours due to accidents in 10 industrial
plants before and after a certain safety program was put into operation:
45 and 36 73 and 60 46 and 44 124 and 119 33 and 35
57 and 51 83 and 77 34 and 29 26 and 24 17 and 11
Test whether the safety program is effective at a level of significance of 0.05.
Solution: Populations not known; σ12, σ22 unknown; Samples NOT independent; n1, n2 both < 30
Since the samples are not independent, the difference between the population means cannot
be tested using the usual procedure. The random variable under question is: difference between each
pair of observations.
X denotes the difference between each pair of observations.
X = (average weekly losses of work-hours due to accidents before the safety program)
– (average weekly losses of work-hours due to accidents after the safety program)
Let μ denote the mean of the population of X. The null hypothesis is that there is no significant
improvement after the safety program. The alternative hypothesis is that there a significant decrease
in the average weekly losses of work-hours due to accidents after the safety program. It is now a test
of significance of the difference between the mean of a sample and the mean of the population.
𝐻0 : 𝜇 = 0 𝐻1 : 𝜇 > 0 (right-tailed) 𝛼 = 0.05
Population of X is assumed as normal. Sample size n = 10 (< 30).
𝑋̅ − 𝜇0
Test statistic = 𝑡 = 𝑠
⁄ 𝑛

Calculations:
X 9 13 2 5 −2 6 6 5 2 6 ∑ 𝑥 = 52
X2 81 169 4 25 4 36 36 25 4 36 ∑ 𝑥 2 = 420
𝑋̅ = 5.2
10 420 52 2
𝑠2 = [ 10 − (10) ] = 16.62
9

5.2 −0
𝑡= = 4.033
√16.62⁄
√10

Inference: 𝑡𝛼,𝜈 = 𝑡0.05,9 = 1.833 𝑡 > 𝑡𝛼,𝜈


Reject 𝐻0 at 0.05 level. We have sufficient reason to believe that the safety program is effective at a
level of significance of 0.05.

13) A study of two kinds of photocopying equipment shows that 61 failures of the first kind of
equipment took on the average 80.7 minutes to repair with a standard deviation of 19.4 minutes,
while 61 failures of the second kind of equipment took on the average 88.1 minutes to repair with a
standard deviation of 18.8 minutes. Find a 99% confidence interval for the difference between the
true average amounts of time it takes to repair failures of the two kinds of photocopying equipment.
Solution: n1 = 61 n2 = 61
𝑋̅1 = 80.7 𝑋̅2 = 88.1
𝑠1 = 19.4 𝑠2 = 18.8
Populations not known; σ1 and σ2 not known; n1 and n2 > 30
Use 𝑠1 and 𝑠2 in place of σ1 and σ2.
𝑠 2 𝑠2 2
99% confidence interval for (μ1 – μ2) = (𝑥̅1 − 𝑥̅2 ) ± 𝑧0.005 √ 𝑛1 +
1 𝑛2

(19.4)2 (18.8)2
= (80.7 − 88.1) ± (2.575)√ +
61 61

= (− 7.4) ± (8.907) = (−16.307, 1.507)

14) Twelve randomly selected mature citrus trees of one variety have a mean height of 13.8 feet with
a standard deviation of 1.2 feet and fifteen randomly selected mature citrus trees of another variety
have a mean height of 12.9 feet with a standard deviation of 1.5 feet. Assuming that the random
samples were selected from normal populations with equal variances, construct a 95% confidence
interval for the difference between the true average heights of the two kinds of citrus trees.
Solution: n1 =12 n2 = 15
𝑋̅1 = 13.8 𝑋̅2 = 12.9
𝑠1 = 1.2 𝑠2 = 1.5
Populations normal; σ1 =σ2 (value not known); n1 and n2 < 30
1 1
95% confidence interval for (μ1 – μ2) = (𝑋̅1 − 𝑋̅2 ) ± 𝑡𝛼⁄2, 𝑛1 +𝑛2 −2 [𝑆𝑃 √𝑛 + 𝑛 ]
1 2

(𝑛1 −1)𝑠1 2 + (𝑛2 −1)𝑠2 2 (11)(1.2)2 + (14)(1.5)2


𝑆𝑃 2 = = = 1.8936 = (1.376)2
𝑛1 + 𝑛2 −2 12+15−2

1 1
95% confidence interval for (μ1 – μ2) = (13.8 − 12.9) ± 𝑡0.025,25 [(1.376)√12 + 15 ]

1 1
= (0.9) ± (2.06) [(1.376)√12 + 15 ]

= (0.9) ± (1.0978) = (− 0.1978, 1.9978)

INTERVAL ESTIMATION AND TESTS CONCERNING PROPORTIONS


We deal with discrete populations such as binomial, Poisson and so on.
Size of sample is ≥ 30
I Single sample proportion
A random sample of size n (> 30) is taken from a population. The proportion of success in
the population is denoted by P. The proportion of success in the sample is p. The null hypothesis to
be tested is
𝐻0 : P = P0
against the alternative which could be any one of
P ≠ P0 P > P0 P < P0

𝑃0 (1− 𝑃0 ) 𝑝 − 𝑃0
𝑆. 𝐸.𝑝 = √ Test statistic = 𝑧 = ~ 𝑁(0, 1) [under 𝐻0 ]
𝑛 𝑃 (1− 𝑃0 )
√ 0
𝑛

𝑝(1− 𝑝)
The 100(1 − 𝛼)% confidence interval for P is 𝑝 ± 𝑧𝛼⁄2 √ 𝑛

II Two Sample proportions


Two independent random samples of sizes n1 and n2 (both > 30) are taken from two
independent populations with proportions of success denoted by P1 and P2. The proportions of
success in the samples are p1 and p2. The null hypothesis to be tested is
P1 = P2.
against the alternative which could be any one of
P1 ≠ P2 P1 > P2 P1 < P2
First calculate the pooled proportion 𝑃̂ using the formula
𝑛1 𝑝1 + 𝑛2 𝑝2
𝑃̂ =
𝑛1 + 𝑛2
𝑝1 − 𝑝2
Test statistic = 𝑧 = ~ 𝑁(0, 1) [under 𝐻0 ]
1 1
√𝑃̂ (1−𝑃̂ )(𝑛 + 𝑛 )
1 2

𝑝1 (1−𝑝1 ) 𝑝2 (1−𝑝2 )
The 100(1 − 𝛼)% confidence interval for P1 − P2 is (𝑝1 − 𝑝2 ) ± 𝑧𝛼⁄2 √ +
𝑛1 𝑛2

15) The manufacturer of a spot remover claims that his product removes 90 percent of all spots. If, in
a random sample, only 174 of 200 spots were removed with the manufacturer’s product, test the null
hypothesis P = 0.90 against the alternative hypothesis P < 0.90 at the 5% level of significance.
Solution: 𝐻0 : 𝑃 = 0.90 𝐻1 : 𝑃 < 0.90 𝛼 = 0.05
174
𝑝= = 0.87 n = 200
200
𝑝 − 𝑃0 0.87 −0.90
Test statistic = 𝑧 = = (0.9)(0.1)
= − 1.41
𝑃 (1− 𝑃0 )
√ 0 √
200
𝑛

Inference: −𝑧𝛼 = −𝑧0.05 = −1.645 𝑧 > −𝑧𝛼


There is no reason to reject the null hypothesis. The manufacturer’s claim that his product removes
90 percent of all spots cannot be rejected.

16) In random samples, 74 out of 250 persons who watched a certain television program on a small
TV set and 92 out of 250 persons who watched the same program on a large TV set remembered 2
hours later what products were advertised. Test the null hypothesis that there is no difference
between the two populations at a level of significance of 0.01.
Solution: 𝐻0 : 𝑃1 = 𝑃2 𝐻1 : 𝑃1 ≠ 𝑃2 (two-tailed) 𝛼 = 0.01
74 92
n1 = 250 n2 = 250 𝑝1 = 250 = 0.296 𝑝2 = 250 = 0.368
𝑝1 − 𝑝2
Test statistic = 𝑧 =
1 1
√𝑃̂ (1−𝑃̂ )(𝑛 + 𝑛 )
1 2

𝑛1 𝑝1 + 𝑛2 𝑝2 74+92
𝑃̂ = = = 0.332
𝑛1 + 𝑛2 250+250

│0.296−0.368│
│𝑧│ = 1 1
= 1.71
√(0.332)(0.668)( + )
250 250

Inference: 𝑧𝛼⁄2 = 𝑧0.005 = 2.575 │𝑧│ < 𝑧0.005


There is no reason to reject the null hypothesis. There is no difference between watching programs
on small set and watching programs on large set at a level of significance of 0.01.

17) A private opinion poll is engaged by a politician to estimate what proportion of her constituents
favor a certain proposal. Determine how large a sample the poll will have to take to be at least 95%
confident that the sample proportion is off by less than 0.02.
Solution: The maximum error should be less than 0.02 at α = 0.05

𝑃0 (1− 𝑃0 )
Maximum error = 𝑧𝛼⁄2 √ < 0.02 𝑧𝛼⁄2 = 𝑧0.025 = 1.96
𝑛
1
Maximum value of 𝑃0 (1 − 𝑃0 ) is 4
---------------------------------------------------------------------------------------------------------------
To maximize 𝑥(1 − 𝑥):
𝑑𝑦
Let 𝑦 = 𝑥(1 − 𝑥) = (1 − 𝑥) − 𝑥 = 0 ⇒ 1 − 2𝑥 = 0 ⇒ 𝑥 = ½
𝑑𝑥
𝑑2 𝑦
= −2 < 0.
𝑑𝑥 2
Hence, 𝑥(1 − 𝑥) is maximum when 𝑥 = ½
1 1 1
Maximum value of 𝑥(1 − 𝑥) = 2 (1 − 2) = 4
-----------------------------------------------------------------------------------------------------------------
1⁄ 1.96
(1.96)√ 4
< 0.02 ⇒ √𝑛 > 0.04 ⇒ √𝑛 > 49 ⇒ 𝑛 > 2401
𝑛

18) A sample survey at a supermarket showed that 204 of 300 shoppers use discount coupons.
Construct a 95% confidence interval for the corresponding true proportion.
204
Solution: 𝑝= = 0.68
300
𝑝(1− 𝑝)
The 95 % confidence interval for P is 𝑝 ± 𝑧0.025 √ 𝑛
(0.68)(0.32)
= 0.68 ± (1.96)√ = 0.68 ± 0.053 = (0.627, 0.733)
300

19) Among 500 marriage license applications chosen at random in a given year, there were 48 in
which the woman was at least one year older than the man and among 400 marriage license
applications chosen at random six years later, there were 68 in which the woman was at least one
year older than the man. Construct a 99% confidence interval for the difference between the
corresponding true proportions of marriage license applications in which the woman was at least one
year older than the man.
48 68
Solution: 𝑝1 = = 0.096 𝑝2 = 400 = 0.170 𝛼 = 0.01
500
𝑝1 (1−𝑝1 ) 𝑝2 (1−𝑝2 )
The 100(1 − 𝛼)% confidence interval for P1 − P2 is (𝑝1 − 𝑝2 ) ± 𝑧𝛼⁄2 √ +
𝑛1 𝑛2

The 99% confidence interval for P1 − P2 is


(0.96)(0.904) (0.17)(0.83)
(0.096 − 0.170) ± (2.575)√ +
500 400

= − 0.074 ± (2.575) (0.0229) = − 0.074 ± 0.059 = (− 0.133, −0.015)

INTERVAL ESTIMATION AND TESTS CONCERNING VARIANCES


I (a) Single sample variance
A random sample of size n is taken from a normal population with the variance denoted by
σ . The variance in the sample is s2. The null hypothesis to be tested is
2

𝐻0 : σ2 = σ02
against the alternative which could be any one of
σ2 ≠ σ02 σ2 > σ02 σ2 < σ02

(𝑛−1)𝑠2
Test statistic 𝜒2 = ~ 𝜒 2 distribution with n – 1 degrees of freedom [under 𝐻0 ]
𝜎0 2

Reject 𝐻0 if 𝐻1 : 𝜎 2 ≠ 𝜎0 2 𝜒 2 ≥ 𝜒 2 𝛼⁄2 ,𝑛−1 or 𝜒 2 ≤ 𝜒 21−𝛼⁄


2, 𝑛−1
2 2 2 2
𝐻1 : 𝜎 > 𝜎0 𝜒 ≥ 𝜒 𝛼, 𝑛−1
2 2 2 2
𝐻1 : 𝜎 < 𝜎0 𝜒 ≤ 𝜒 1−𝛼, 𝑛−1
2
The 100(1 − 𝛼)% confidence interval for 𝜎 is
(𝑛−1)𝑠 2 (𝑛−1)𝑠 2
[𝜒2 , ]
𝛼⁄2 , 𝑛−1 𝜒2 1−𝛼⁄ , 𝑛−1
2

I (b) This case is a variation of the previous case. A simpler test can be used when the sample size is
more than 30.
A random sample of size n (≥ 30) is taken from a normal population with the variance
denoted by σ2. The variance in the sample is s2. The null hypothesis and the alternative are the same
as in I (a).
𝑠
Test statistic = 𝑧 = (𝜎 − 1) √2(𝑛 − 1) ~ 𝑁(0, 1)
0

II Two sample variances


Two independent random samples of sizes n1 and n2 are taken from two normal populations
with variances denoted by σ12 and σ22. The variances in the samples are s12 and s22. The null
hypothesis to be tested is
𝐻0 : σ12 = σ22
against the alternative which could be any one of
σ12 ≠ σ22 σ12 > σ22 σ12 < σ22
The test statistic is the ratio of the variances s12 and s22. Under 𝐻0 this ratio has the F
distribution. The order of the numbers of degrees of freedom depends on which variance is taken in
the numerator in the ratio.
𝒔𝟏 𝟐
Reject 𝐻0 if 𝑯𝟏 : 𝝈𝟏 𝟐 ≠ 𝝈𝟐 𝟐 ≥ 𝑭𝜶⁄ if 𝒔𝟏 𝟐 ≥ 𝒔𝟐 𝟐
𝒔𝟐 𝟐 𝟐, 𝒏𝟏 −𝟏, 𝒏𝟐 −𝟏
𝑠2 2
≥ 𝐹𝛼⁄ if 𝑠2 2 ≥ 𝑠1 2
𝑠1 2 2, 𝑛2 −1, 𝑛1 −1
𝒔𝟏 𝟐
𝑯𝟏 : 𝝈𝟏 𝟐 > 𝝈𝟐 𝟐 ≥ 𝑭𝜶, 𝒏𝟏−𝟏, 𝒏𝟐−𝟏
𝒔𝟐 𝟐

𝑠2 2
𝐻1 : 𝜎2 2 > 𝜎1 2 ≥ 𝐹𝛼, 𝑛2−1, 𝑛1−1
𝑠1 2

𝜎1 2
The 100(1 − 𝛼)% confidence interval for is
𝜎2 2

𝑠 2 1 𝑠1 2
[𝑠1 2 𝐹𝛼⁄2, 𝜈1 , 𝜈2
,
𝑠2 2
𝐹𝛼⁄2, 𝜈2, 𝜈1 ]
2

where 𝜈1 = 𝑛1 − 1 and 𝜈2 = 𝑛2 − 1

20) In a random sample, the weights of 24 Black Angus steers of a certain age have a standard
deviation of 238 pounds. Assume that the weights constitute a random sample from a normal
population, and test the null hypothesis 𝜎 = 250 pounds against the two-sided alternative 𝜎 ≠ 250
pounds at the level of significance of 0.01.
Solution: 𝐻0 : 𝜎 = 250 𝐻1 : 𝜎1 ≠ 250 (two-tailed) 𝛼 = 0.01
Population normal; n = 24
(𝑛−1)𝑠2 23 (238)2
Test statistic 𝜒2 = = = 20.84
𝜎0 2 (250)2

Inference: 𝜒 2 𝛼⁄2 ,𝑛−1 = 𝜒 2 0.005,23 = 44.181 𝜒 21−𝛼⁄ = 𝜒 2 0.995,23 = 9.260


2, 𝑛−1

Calculated 𝜒 2 < 𝜒 2 𝛼⁄2 ,𝑛−1 and > 𝜒 21−𝛼⁄


2, 𝑛−1

There is no reason to reject the null hypothesis at the level of significance of 0.01.

21) In a random sample, s = 2.53 minutes for the amount of time that 30 women took to complete the
written test for their driver’s licenses. At the level of significance of 5%, test the null hypothesis that
σ = 2.85 minutes against the alternative that σ < 2.85 minutes.
Solution: 𝐻0 : 𝜎 = 2.85 𝐻1 : 𝜎1 < 2.85 (left-tailed) 𝛼 = 0.05
Population normal; n = 30
(𝑛−1)𝑠2 29 (2.53)2
Test statistic 𝜒2 = = = 22.85
𝜎0 2 (2.850)2

Inference: 𝜒 21−𝛼, 𝑛−1


= 𝜒 2 0.95, 29
= 17.708
𝜒 2 > 𝜒 21−𝛼, 𝑛−1

There is no reason to reject the null hypothesis at the level of significance of 0.05.
Aliter: Since the size of the sample is 30, the sample could be considered large. In this case, the test
is as follows:
𝑠 2.53
Test statistic = 𝑧 = (𝜎 − 1) √2(𝑛 − 1) = (2.85 − 1) √2(29) = − 0.8551
0

Inference: −𝑧𝛼 = −𝑧0.05 = −1.645 𝑧 > −𝑧𝛼


The null hypothesis cannot be rejected at the level of significance of 0.05.

22) Past data indicate that the standard deviation of measurements made on sheet metal stampings by
experienced inspectors is 0.41 square inch. If a new inspector measures 50 stampings with a standard
deviation of 0.49 square inch, test the null hypothesis that σ = 0.41 square inch against the alternative
that σ > 0.41 square inch at the level of significance of 5%. Check your answer using the P-value of
the statistic.
Solution: 𝐻0 : 𝜎 = 0.41 𝐻1 : 𝜎1 > 0.41(right-tailed) 𝛼 = 0.05
Population normal; n = 50
Since the size of the sample is 50, the sample could be considered large.
𝑠 0.49
Test statistic = 𝑧 = (𝜎 − 1) √2(𝑛 − 1) = (0.41 − 1) √2(49) = 1.93
0

Inference: 𝑧𝛼 = 𝑧0.05 = 1.645 𝑧 > 𝑧𝛼


The null hypothesis is rejected at the level of significance of 0.05.
P-value corresponding to z = 1.93 is 0.0268; 0.0268 < 0.05. Reject 𝐻0 at 5%.

23) To compare two kinds of bumper guards, six of each kind were mounted on a certain make of
compact car. Then each car was run into a concrete wall at 5 miles per hour, and the following are
the costs of the repairs (in dollars):
Bumper guard 1: 127 168 143 165 122 139
Bumper guard 2: 154 135 132 171 153 149
Test at a level of significance of 0.02 whether it is reasonable to assume that the two populations
sampled have equal variances.
Solution: 𝐻0 : 𝜎1 2 = 𝜎2 2 𝐻1 : 𝜎1 2 ≠ 𝜎2 2 (two-tailed) 𝛼 = 0.02
The test statistic is determined after ascertaining which of the sample variances is larger.
Calculations:
Sample 1:
864
x1i 127 168 143 165 122 139 𝑥̅1 = = 144
6

x1i − 𝑥̅1 − 17 24 −1 21 − 22 −5
(𝑥1𝑖 − 𝑥̅1 )2 289 596 1 441 484 25 ∑(𝑥1𝑖 − 𝑥̅1 )2 = 1836
∑(𝑥1𝑖 − 𝑥̅ 1 )2 1836
Sample variance = 𝑠1 2 = = = 367.2
𝑛1 −1 5

Sample 2:
894
x2j 154 135 132 171 153 149 𝑥̅2 = = 149
6

x2j − 𝑥̅2 5 − 14 − 17 22 4 0
2 2
(𝑥2𝑗 − 𝑥̅2 ) 25 196 289 484 16 0 ∑(𝑥2𝑗 − 𝑥̅2 ) = 1010
2
∑(𝑥2𝑗 − 𝑥̅2 ) 1010
Sample variance = 𝑠2 2 = = = 202
𝑛2 −1 5

𝑠1 2 367.2
Since 𝑠1 2 > 𝑠2 2 , test statistic is 𝐹 = = = 1.82
𝑠2 2 202

Inference: 𝐹𝛼⁄2, 𝑛1 −1, 𝑛2−1 = 𝐹0.01,5,5 = 11.0

𝐹 < 𝐹𝛼⁄2, 𝑛1−1, 𝑛2 −1

The null hypothesis cannot be rejected at the level of significance of 0.02.

24) In the comparison of two kinds of paint, a consumer testing service finds that four 1-gallon cans
of one brand have a standard deviation of 31 square feet, while four 1-gallon cans of another brand
have a standard deviation of 26 square feet. Assume that the two populations are normal and test the
null hypothesis that 𝜎1 = 𝜎2 against the alternative that 𝜎1 > 𝜎2 at the level of significance of 5%.
Solution: 𝐻0 : 𝜎1 = 𝜎2 𝐻1 : 𝜎1 > 𝜎2 (right-tailed) 𝛼 = 0.05
Populations normal; 𝑛1 = 4; 𝑛2 = 4
𝑠1 = 31; 𝑠2 = 26 𝑠1 2 > 𝑠2 2
𝑠1 2 961
Test statistic is 𝐹 = = = 1.4216
𝑠2 2 676

Inference: 𝐹𝛼, 𝑛1−1, 𝑛2 −1 = 𝐹0.05,3,3 = 9.28


𝐹 < 𝐹𝛼, 𝑛1 −1, 𝑛2 −1
The null hypothesis cannot be rejected at the level of significance of 0.05.
25) The length of the skulls of 10 fossil skeletons of an extinct species of bird a standard deviation of
0.29 cm. Assume that such measurements are normally distributed and find a 95% confidence
interval for the true variance of the skull length of this species of bird.
Solution: Population normal; n = 10
The 100(1 − 𝛼)% confidence interval for 𝜎 2 is
(𝑛−1)𝑠 2 (𝑛−1)𝑠 2
[𝜒2 , ]
𝛼⁄2 , 𝑛−1 𝜒2 1−𝛼⁄ , 𝑛−1
2
2
The 95% confidence interval for 𝜎 is
9 (0.29)2 9 (0.29)2 9(0.0841) 9(0.0841)
[ 𝜒2 , ] = [ , ] = (0.938, 0.280)
0.025,9 𝜒2 0.975,9 19.023 2.7

26) Twelve randomly selected mature citrus trees of one variety have a standard deviation of 1.2 feet
and fifteen randomly selected mature citrus trees of another variety have a standard deviation of 1.5
feet. Assuming that the random samples were selected from normal populations, construct a 98%
confidence interval for the ratio of the variances of the two populations sampled..
Solution: n1 =12 n2 = 15
𝑠1 = 1.2 𝑠2 = 1.5
Populations normal;
𝜎1 2
The 100(1 − 𝛼)% confidence interval for is
𝜎2 2

𝑠 2 1 𝑠1 2
[𝑠1 2 𝐹𝛼⁄2, 𝜈1 , 𝜈2
,
𝑠2 2
𝐹𝛼⁄2, 𝜈2, 𝜈1 ]
2

where 𝜈1 = 𝑛1 − 1 and 𝜈2 = 𝑛2 − 1
𝜎1 2
The 98% confidence interval for is
𝜎2 2

𝑠 2 1 𝑠1 2
[𝑠1 2 𝐹0.01,11,14
,
𝑠2 2
𝐹0.01,14,11 ]
2

(1.2)2 1 (1.2)2
= [(1.5)2 , (4.30)] = [0.165, 2.752]
3.87 (1.5)2

27) The following are the heat-producing capacities of coal from two mines (in millions of calories
per ton):
Mine A: 8500 8330 8480 7960 8030
Mine B: 7710 7890 7920 8270 7860
Assume that the data constitute independent random samples from normal populations and construct
a 90% confidence interval for the ratio of the variances of the two populations sampled.
Solution: Populations normal; 𝑛1 = 5 𝑛2 = 5
The ratio of the variances will not change by scaling the observations using the same scale measure
for both samples. Let us use a scale of 10.

Mine A:
4130
Xi 850 833 848 796 803 𝑋̅ = = 826
5

𝑋𝑖 − 𝑋̅ 24 7 22 −30 −23
∑(𝑋𝑖 −𝑋̅)2 2538
(𝑋𝑖 − 𝑋̅)2 576 49 484 900 529 𝑠2 = = = 634.5
𝑛−1 4

Mine B:
3965
Xi 771 789 792 827 786 𝑋̅ = = 793
5

𝑋𝑖 − 𝑋̅ −22 −4 −1 34 −7
∑(𝑋𝑖 −𝑋̅)2 1706
(𝑋𝑖 − 𝑋̅)2 484 16 1 1156 49 𝑠2 = = = 426.5
𝑛−1 4

𝑠1 2 = 634.5 𝑠2 2 = 426.5
𝜎1 2
The 100(1 − 𝛼)% confidence interval for is
𝜎2 2

𝑠 2 1 𝑠1 2
[𝑠1 2 𝐹𝛼⁄2, 𝜈1 , 𝜈2
,
𝑠2 2
𝐹𝛼⁄2, 𝜈2, 𝜈1 ]
2

where 𝜈1 = 𝑛1 − 1 and 𝜈2 = 𝑛2 − 1
𝜎1 2
The 90% confidence interval for is
𝜎2 2

𝑠 2 1 𝑠1 2 634.5 1 634.5 (6.39)


[𝑠1 2 𝐹0.05,4,4
,
𝑠2 2
𝐹0.05,4,4 ] = [ 426.5 6.39 , 426.5
] = [0.233, 9.506]
2

TESTS OF SIGNIFICANCE FOR CORRELATION CO-EFFICIENT


I Single sample correlation co-efficient
(a) A random sample of n pairs of observations is taken from a bivariate normal population with
the correlation co-efficient denoted by ρ. The correlation co-efficient in the sample is 𝑟. The null
hypothesis to be tested is
ρ=0
against the alternative
ρ≠0

𝑟
Test statistic 𝑡= √𝑛 − 2 ~ 𝑡𝑛−2 [under 𝐻0 ]
√1−𝑟 2

If n > 30, 𝑡𝑛−2 ~ N (0, 1)

(b) A random sample of n pairs of observation is taken from a bivariate normal population with
the correlation co-efficient denoted by ρ. The correlation co-efficient in the sample is 𝑟. The null
hypothesis to be tested is
ρ = 𝜌0
against the alternative
ρ ≠ 𝜌0

𝑍− 𝑍0
Test statistic ~ N (0, 1) [under 𝐻0 ]
√1⁄𝑛−3

1 1+𝑟 1 1+𝜌
Where 𝑍= 𝑙𝑜𝑔𝑒 [1−𝑟] 𝑍0 = 𝑙𝑜𝑔𝑒 [1−𝜌]
2 2

II Two sample correlation co-efficients


Two independent random samples of n1 and n2 pairs of observations respectively are taken
from two independent bivariate normal populations with the correlation co-efficients denoted by ρ1
and ρ2 . The correlation co-efficients in the samples are 𝑟1 and 𝑟2 . The null hypothesis to be tested is
ρ1 = ρ2
against the alternative ρ1 ≠ ρ2

𝑍1 − 𝑍2
Test statistic 1 1
~ N (0, 1) [under 𝐻0 ]
√𝑛 −3 + 𝑛 −3
1 2

1 1+𝑟 1 1+𝑟
Where 𝑍1 = 𝑙𝑜𝑔𝑒 [1−𝑟1] 𝑍2 = 𝑙𝑜𝑔𝑒 [1−𝑟2 ]
2 1 2 2
Important Note: The alternative hypotheses are two-tailed. Compare the calculated value of the test
statistic with 𝑡𝑛−2, 𝛼⁄2 or 𝑧𝛼⁄2 as the case may be for drawing inference.
28) Test the significance of the values of the correlation co-efficient obtained from samples of n
pairs of observation from bivariate normal population at the level of 5%:
(i) 𝑛 = 38; 𝑟 = 0.6 (ii) 𝑛 = 11; 𝑟 = 0.5
Solution:
(i) 𝐻0 : 𝜌 = 0 𝐻1 : 𝜌 ≠ 0 𝛼 = 0.05
𝑟 0.6
𝑡= √𝑛 − 2 = √38 − 2 = 4.5
√1−𝑟 2 √1− (0.6)2

𝑡 ~ 𝑡𝑛−2
𝑛 > 30; Use normal distribution values.
Inference: 𝑧0.025 = 2.575 4.5 > 𝑧0.025
Reject the null hypothesis at the level of 5%. The correlation co-efficient obtained from the sample
is suggestive of correlation between the variables in the population.
(ii) 𝐻0 : 𝜌 = 0 𝐻1 : 𝜌 ≠ 0 𝛼 = 0.05
𝑟 0.5
𝑡= √𝑛 − 2 = √11 − 2 = 1.73
√1−𝑟 2 √1− (0.5)2

𝑡 ~ 𝑡𝑛−2
Inference: 𝑡9,0.025 = 2.262; 1.73 < 𝑡9,0.025
We cannot reject the null hypothesis at the level of 5%. The correlation co-efficient obtained from
the sample does not signify correlation between the variables in the population.

29) Find the least value of 𝑟 in a sample of 27 pairs of observations from bivariate normal population
which would be significant of correlation in the population at the level of 5%.
Solution:
𝑟
√1−𝑟 2
√𝑛 − 2 > 𝑡27−2,0.025
𝑟
√1−𝑟 2
√25 > 2.060
𝑟
> 0.412
√1−𝑟 2

𝑟2
> 0.1697
1− 𝑟 2

𝑟 2 > 0.1697 − 0.1697𝑟 2


1.1697 𝑟 2 > 0.1697
𝑟 2 > 0.1451

│𝑟│ > 0.3809


𝑟 < −0.3809 or 𝑟 > 0.3809
The value of the sample correlation co-efficient must be less than – 0.3809 or more than
0.3809 in order to significant of correlation in the population at the level of 5%.

30) The correlation co-efficient in a sample of 18 pairs of observation from bivariate normal
population was found to be 0.5. It is claimed that the correlation co-efficient in the population is 0.7.
Does the sample correlation co-efficient justify this claim at 5% level?
Solution: 𝐻0 : 𝜌 = 0.7 𝐻1 : 𝜌 ≠ 0.7 𝛼 = 0.05
1 1+𝑟 1 1.5
𝑍= 𝑙𝑜𝑔𝑒 [1−𝑟] = 𝑙𝑜𝑔𝑒 [0.5] = 0.5943
2 2
1 1+𝜌 1 1.7
𝑍0 = 𝑙𝑜𝑔𝑒 [1−𝜌] = 𝑙𝑜𝑔𝑒 [0.3] = 0.8673
2 2

│𝑍− 𝑍0 │ │0.5943−0.8673│
Test statistic = 1
= 1.232
√1⁄𝑛−3 √
15

Inference: 𝑧0.025 = 1.96 1.232 < 𝑧0.025


Accept the null hypothesis at 5% level. The claim that the correlation co-efficient in the population
is 0.7 is justified.
31) Two independent random samples of 5 and 12 pairs of observations respectively are taken from
two independent bivariate normal populations. The correlation co-efficients in the samples are 0.87
and 0.56. Test if the difference between the sample correlation co-efficients is significant at 5%
level.
Solution: Denote the correlation co-efficients in the two independent bivariate normal populations as
𝜌1 and 𝜌2. The null and alternative hypotheses are:
𝐻0 ∶ 𝜌1 = 𝜌2 𝐻1 ∶ 𝜌1 ≠ 𝜌2 𝛼 = 0.05
1 1+𝑟 1 1+0.87
𝑍1 = 𝑙𝑜𝑔𝑒 [1−𝑟1] = 𝑙𝑜𝑔𝑒 [1−0.87] = 1.3330
2 1 2

1 1+𝑟 1 1+0.56
𝑍2 = 𝑙𝑜𝑔𝑒 [1−𝑟2 ] = 𝑙𝑜𝑔𝑒 [1−0.56] = 0.6328
2 2 2

│𝑍1 − 𝑍2 │ 1.3330 − 0.6328


Test statistic = 1 1
= 1 1
= 0.896
√𝑛 −3 + √ +
1 𝑛2 −3 2 9

Inference: 𝑧0.025 = 1.96 0.896 < 𝑧0.025


Accept the null hypothesis at 5% level. The correlation co-efficients in the two populations are not
significantly different from each other.

𝝌𝟐 TEST OF INDEPENDENCE OF TWO ATTRIBUTES


Let the two attributes be labeled 𝐴 and 𝐵. The different categories under 𝐴 and 𝐵 are
{𝐴1 , 𝐴2 , . . . 𝐴𝑖 . . . 𝐴𝑚 }, {𝐵1 , 𝐵2 , . . . 𝐵𝑗 . . . 𝐵𝑛 }

The observed cell frequencies are denoted as 𝑓𝑖,𝑗 , 𝑖 = 1, 2, . . . 𝑚; 𝑗 = 1, 2, . . . 𝑛.


The following is the contingency table:
A B B1 B2 . . . Bj . . . Bn Total
A1 f11 f12 f1j f1m (A1)
A2 f21 f22 f2j f2m (A2)
. . .
Ai fi1 fi2 fij fim (Ai)
. . .
Am fm1 fm2 fmj fmn (Am)
Total (B1) (B2) . . . (Bj) . . . (Bn) N

The null hypothesis to be tested:


𝐻0 : The two attributes 𝐴 and 𝐵 are independent.
The alternative hypothesis:
𝐻1 : They are not independent.
Under 𝐻0 the expected frequency of the cell (i, j), denoted as 𝑒𝑖,𝑗 is calculated as follows:
(𝐴𝑖 )×(𝐵𝑗 )
𝑒𝑖𝑗 =
𝑁
2
2 (𝑓𝑖𝑗 − 𝑒𝑖𝑗 )
The statistic 𝜒 = ∑𝑖 ∑𝑗 [
𝑒𝑖𝑗
] ~ 𝜒 2 (𝑚−1),(𝑛−1)

The critical region is 𝜒 2 ≥ 𝜒 2 [(𝑚−1)(𝑛−1)],𝛼 at the level of significance 𝛼.

32) The following sample data pertain to the shipments received by a large firm from three different
vendors. Test at the 0.01 level of significance whether the three vendors ship products of equal
quality.
Number rejected Number imperfect Number perfect Total
but acceptable
Vendor A 12 23 89 124
Vendor B 8 12 62 82
Vendor C 21 30 119 170
Total 41 65 270 376

Solution: 𝐻0 : The three vendors ship products of equal quality.


𝐻1 : There is significant difference in the quality of goods supplied by the vendors.
𝛼 = 0.01
Calculation of expected frequencies:
124 × 41 82 × 41
𝑒11 = = 13.5 𝑒21 = = 8.9
376 376
124 × 65 82 × 65
𝑒12 = = 21.4 𝑒22 = = 14.2
376 376

Table of expected frequencies:


Number rejected Number imperfect Number perfect Total
but acceptable
Vendor A 13.5 21.4 89.1 124
Vendor B 8.9 14.2 58.9 82
Vendor C 18.6 29.4 122 170
Total 41 65 270 376
2
2 (𝑓𝑖𝑗 − 𝑒𝑖𝑗 ) (12 − 13.5)2 (8 − 8.9)2 (119 − 122)2
Test statistic = 𝜒 = ∑𝑖 ∑𝑗 [ ] = + + . . .+ = 1.1995
𝑒𝑖𝑗 13.5 8.9 122

No. of degrees of freedom = ν = (3 − 1) × (3 − 1) = 4


Inference: 𝜒 2 4,0.01 = 13.277 1.1995 < 𝜒 2 4,0.01

Null hypothesis can be accepted at 0.01 level.


There is sufficient evidence to say that the three vendors supply products of equal quality.

33) In 360 tosses of a pair of dice, 74 sevens and 26 elevens are observed. Test the hypothesis that
the dice are fair. Use a level of significance of 5%.
Solution: 𝐻0 : The dice are fair; all the six scores in each die are equally likely.
𝐻1 : The dice are biased.
𝛼 = 0.05
Under 𝐻0
6
Probability of a score of seven from two fair dice = 𝑃{(1,6)(2,5)(3,4)(4,3)(5,2)(6,1)} = 36
6
Expected number of sevens in 360 tosses = 360 × = 60
36
2
Probability of a score of eleven from two fair dice = 𝑃{(5,6)(6,5)} = 36
2
Expected number of elevens in 360 tosses = 360 × = 20
36

Score 7 11
𝑓𝑖,𝑗 74 26
𝑒𝑖,𝑗 60 20
2
(𝑓𝑖𝑗 − 𝑒𝑖𝑗 ) (74 − 60)2 (26 − 20)2
Test statistic = 𝜒 2 = ∑𝑖 ∑𝑗 [ ] = + = 5.067
𝑒𝑖𝑗 60 20

Inference:
The number of independent observations = 2
We have used one constraint that the dice are tossed 360 times.
Number of degrees of freedom = 2 – 1 = 1
𝜒 21,0.05 = 3.841 5.067 > 𝜒 21,0.05

There is sufficient reason to reject the null hypothesis at the level of significance of 5%. The dice are
suspected to be biased.

34) Over 5 years, in T-20 cricket matches, the team ‘Breezy Butterflies’ played 60 matches and won
35 of them. They took first batting in 24 matches and won in 18 of these. An enthusiast commented
that batting first is lucky for the team. Use the 𝜒 2 test to examine if there is any association between
first batting and winning. Use a level of significance of 5%.
Solution: 𝐻0 : First batting has no influence on winning.
𝐻1 : There is an association between first batting and winning.
𝛼 = 0.05
The contingency table showing the observed frequencies is:
No. of games batting No. of games batting Total
first second
No. of games won 18 17 35

No. of games lost 6 19 25

Total 24 36 60

Table of expected frequencies:


No. of games batting No. of games batting Total
first second
No. of games won 14 21 35

No. of games lost 10 15 25

Total 24 36 60

2
2 (𝑓𝑖𝑗 − 𝑒𝑖𝑗 ) (18 − 14)2 (17 − 21)2 (6 − 10)2 (19 − 15)2
Test statistic = 𝜒 = ∑𝑖 ∑𝑗 [ ] = + + + = 4.57.
𝑒𝑖𝑗 14 21 10 15

No. of degrees of freedom = ν = (2 − 1) × (2 − 1) = 1


Inference: 𝜒 21,0.05 = 3.841 4.57 > 𝜒 21,0.05

There is sufficient reason to reject the null hypothesis at the level of significance of 5%. There is an
association between first batting and winning. It is seen from the numbers that this is a positive
association. First batting has significantly helped the team in winning the game.

𝝌𝟐 TEST OF GOODNESS OF FIT


This is the test to determine whether a set of data may be taken as a random sample from a
population having a given distribution. The null hypothesis to be tested is:
𝐻0 : The population has the given distribution.
(𝑓𝑖 − 𝑒𝑖 )2
Test statistic = 𝜒 2 = ∑𝑚
𝑖=1 [ ] ~ 𝜒 2 𝑚−𝑡−1
𝑒𝑖
𝑓𝑖 = the observed frequency of the 𝑖 𝑡ℎ class
𝑒𝑖 = the theoretical (expected) frequency of the 𝑖 𝑡ℎ class under 𝐻0
𝑚 = the number of classes
𝑡 = the number of independent parameters estimated on the basis of the sample data
Critical Region: 𝝌𝟐 ≥ 𝝌𝟐 𝜶, 𝒎−𝒕−𝟏

Note: The expected frequencies 𝑒𝑖 ’s will be calculated from theoretical distribution under the null
hypothesis with the condition that ∑𝑖 (𝑒𝑖 ) = ∑𝑖(𝑓𝑖 ). This leads to a further loss of one degree of
freedom.
Important: The test is valid only if none of the 𝑒𝑖 ’s is less than 5. If any 𝑒𝑖 is less than 5, the
particular class would have to be combined with the neighbouring class so as to remedy the snag.
35) Four coins were tossed 160 times and 0, 1, 2, 3 or 4 heads showed, respectively, 19, 54, 23 and 6
times. Use the 5% level of significance to test whether it is reasonable to suppose that the coins are
balanced and randomly tossed.
Solution:
𝐻0 : The coins are balanced and randomly tossed so that the population is binomial with the
probability of heads = ½.
α = 0.05
The following table shows the calculation of the expected frequencies and the test statistic:
i 𝑒𝑖 = 𝑁 × (𝑛𝐶𝑖 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 ) 𝑓𝑖 (𝑓𝑖 − 𝑒𝑖 )2
𝑒𝑖
0 160 × (4𝐶0 (½)4 ) = 10 19 4.263
1 160 × (4𝐶1 (½)4 ) = 40 54 3.630
2 160 × (4𝐶2 (½)4 ) = 60 58 0.069
3 160 × (4𝐶3 (½)4 ) = 40 23 12.565
4 160 × (4𝐶4 (½)4 ) = 10 6 2.667

Total 160 160 23.194

The value 𝑝 = ½ was taken from the null hypothesis and not the sample. Hence there is no loss of
any degree of freedom in that respect. The total N = 160 was taken from sample data. This constraint
makes one degree of freedom lost.
𝜈 = number of degrees of freedom = no. of classes –1= 5 – 1 = 4.
Inference: 𝜒 2 0.05,4 = 9.488 23.194 > 𝜒 2 0.05,4

We have no reason to accept the null hypothesis at 5%. It is not reasonable to suppose that the coins
are balanced and randomly tossed.

36) Each day, Monday through Saturday, a baker bakes three large chocolate cakes. Those not sold
on the same day are given away to a food bank. Use the data shown in the following table to test at
the 5% level of significance whether they may be looked upon as values of a binomial random
variable.
No. of cakes sold 0 1 2 3
No. of days 1 16 55 228
Solution: 𝐻0 : The observed values can be looked upon as values of a binomial random variable.
α = 0.05
The value of the parameter p of the binomial distribution cannot be found using general reasoning.
In such a case, p is found from the sample.
The mean of the binomial distribution is np. Calculate the value of the mean of the sample to find p.
1
Mean = [0 × 1 + 1 × 16 + 2 × 55 + 3 × 228] = 2.7
300

𝑛𝑝 = 3𝑝 = 2.7 ⇒ 𝑝 = 0.9
The following table shows the calculation of the expected frequencies and the test statistic:
i 𝑒𝑖 = 𝑁 × [𝑛𝐶𝑖 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 ] 𝑓𝑖 (𝑓𝑖 − 𝑒𝑖 )2
𝑒𝑖
0 300 × [3𝐶0 (0.9)0 (0.1)3 ] = 0.3 1
1 300 × [3𝐶1 (0.9)1 (0.1)2 ] = 8.1 16 (17 − 8.4)2
= 8.805
8.4
(55 − 72.9)2
2 300 × [3𝐶2 (0.9)2 (0.1)1 ] = 72.9 55 = 4.395
72.9
(228−218.7)2
= 0.395
3 300 × [3𝐶3 (0.9)3 (0.1)0 ] = 218.7 228 218.7

Total 300.0 300 13.595

Frequency of the first class is 1 (< 5). This class has been combined with the next class so as to make
the frequency greater than 5.
Calculation of number of degrees of freedom:
𝑚 = number of classes = 3
𝑡 = number of independent parameters estimated using sample data = 1
𝜈=𝑚−𝑡−1=1
Inference: 𝜒 2 0.05,1 = 3.841 13.595 > 𝜒 2 0.05,1

We have no reason to accept the null hypothesis at 5%. The observed values cannot be looked upon
as values of a binomial random variable.

37) It is desired to test whether the number of gamma rays emitted per second by a certain
radioactive substance is a random variable having the Poisson distribution with λ = 2.4. Use the
following data obtained for 300 1-second intervals to test this null hypothesis at the level of
significance of 0.05.
No. of gamma rays 0 1 2 3 4 5 6 7 or more
Frequency 19 48 66 74 44 35 10 4
Solution: 𝐻0 : The given random variable has Poisson distribution with λ = 2.4.
α = 0.05
The table showing the calculation of the expected frequencies and the test statistic follows later.
Frequency of the last class is 4 (< 5). This class has been combined with the previous class so as to
make the frequency greater than 5.
Calculation of number of degrees of freedom:
𝑚 = number of classes = 7
𝑡 = number of independent parameters estimated using sample data = nil
𝜈=𝑚−𝑡−1=6
Inference: 𝜒 2 0.05,6 = 12.592 29.06 > 𝜒 2 0.05,6

We have no reason to accept the null hypothesis at 5%. The given random variable does not conform
to Poisson distribution with λ = 2.4.

i 𝑒 − 2.4 (2.4)𝑥 𝑓𝑖 (𝑓𝑖 − 𝑒𝑖 )2


𝑒𝑖 = 300 ×
𝑥! 𝑒𝑖
0 𝑒 − 2.4 (2.4)0 19 (19 − 27.22)2
300 × = 27.22 = 2.48
0! 27.22
𝑒 − 2.4 (2.4)1 (48 − 65.32)2
300 × = 65.32 = 4.59
1! 65.32
1 48
𝑒 − 2.4 (2.4)2 (66 − 78.38)2
2 300 × = 78.38 66 = 1.96
2! 78.38
𝑒 − 2.4 (2.4)3 (74 − 62.70)2
300 × = 62.70 = 2.04
3! 62.70
3 74
𝑒 − 2.4 (2.4)4 (44 − 37.62)2
4 300 × = 37.62 44 = 1.08
4! 37.62
𝑒 − 2.4 (2.4)5 (35 − 18.06)2
300 × = 18.06 = 15.89
5! 18.06
5 35
𝑒 − 2.4 (2.4)6 (14 − 10.70)2
6 300 × = 7.22 10 = 1.02
6! 10.70
3.48 4
7 or more
Total 300 300 29.06

38) In experiments on pea-breeding, Mendel obtained the following frequencies of seeds:


Round and yellow: 315 Round and green: 108
Wrinkled and green: 32 Wrinkled and yellow: 101
Theory predicts that the frequencies should be in the proportion 9 : 3 : 3 : 1. Examine the
correspondence between theory and experiment at the level of significance of 0.05.
Solution: 𝐻0 : The frequencies are in the proportion 9 : 3 : 3 : 1
α = 0.05
Calculation of the expected frequencies:
Total number of seeds in the experiment = 556
9
Expected number of round and yellow seeds = 556 × 16 = 312.75
3
Expected number of wrinkled and yellow seeds = 556 × 16 = 104.25
3
Expected number of round and green seeds = 556 × 16 = 104.25
1
Expected number of wrinkled and green seeds = 556 × 16 = 34.75
(𝑓𝑖 − 𝑒𝑖 )2 (315 – 312.75)2 (101 – 104.25)2 (108 – 104.25)2 (32 – 34.75)2
𝜒 2 = ∑𝑖 [ ]= + + + = 0.4700
𝑒𝑖 312.75 104.25 104.25 34.75

Calculation of number of degrees of freedom:


𝑚 = number of classes = 4
𝑡 = number of independent parameters estimated using sample data = nil
𝜈=𝑚−𝑡−1=3
Inference: 𝜒 2 0.05,3 = 7.815 0.47 < 𝜒 2 0.05,3

Accept 𝐻0 at 5% level. We have no reason to suspect the null hypothesis at 5% level. The theory that
the frequencies are in the proportion 9 : 3 : 3 : 1 is justified by the result of the experiment.
---------------------