0 оценок0% нашли этот документ полезным (0 голосов)

121 просмотров36 страницStats inference

Oct 12, 2019

© © All Rights Reserved

DOCX, PDF, TXT или читайте онлайн в Scribd

Stats inference

© All Rights Reserved

0 оценок0% нашли этот документ полезным (0 голосов)

121 просмотров36 страницStats inference

© All Rights Reserved

Вы находитесь на странице: 1из 36

𝜈

The chi-square distribution is a special form of the Gamma distribution with 𝛼 = 2 and 𝜆 = ½

The probability density of the Gamma distribution with parameters 𝛼 and 𝜆 is

𝜆𝛼

𝑓(𝑥) = 𝑥 𝛼−1 𝑒 −𝜆𝑥 𝑥>0

𝛤(𝛼)

𝛼 𝛼

The mean, the variance and the moment generating function of the Gamma distribution are ,

𝜆 𝜆2

𝑡 −𝛼

and (1 − 𝜆)

1 𝜈⁄ −1 𝑥⁄

𝑓(𝑥) = 𝜈 𝜈 𝑥 2 𝑒− 2 𝑥>0

2 ⁄2 𝛤(2)

Mean = 𝜈 Variance = 2𝜈 𝑀𝑋 (𝑡) = (1 − 2𝑡)− 𝜈⁄2

2

The chi-square distribution is positively skewed. The symbol 𝜒𝛼,𝜈 is used to denote the point to

2

the right of which the area under the 𝜒 curve with ν degrees of freedom is 𝛼.

2

𝑃[𝑋 > 𝜒𝛼,𝜈 ]= 𝛼

If Y and Z are independent random variables, Y has a 𝜒 2 distribution with ν degrees of freedom,

and Z has the standard normal distribution, then the distribution of

𝑍

𝑇= is given by

√𝑌⁄𝜈

𝜈+1 𝜈+1

𝛤( ) −

2 𝑡2 2

𝑓(𝑡) = 𝜈 (1 + ) −∞ < 𝑡 < ∞

√𝜋𝜈 𝛤(2) 𝜈

The ‘t’ distribution is symmetric about 𝑡 = 0. The mean, the median and the mode are at 𝑡 = 0.

The symbol 𝑡𝛼,𝜈 is used to denote the point to the right of which the area under the t curve with ν

degrees of freedom is 𝛼.

𝑃[𝑇 > 𝑡𝛼,𝜈 ] = 𝛼

III The F distribution

If U and V are independent random variables having 𝜒 2 distribution with 𝜈1 and 𝜈2 degrees of

freedom, then

𝑈⁄

𝜈1

𝐹= 𝑉⁄

𝜈2

The probability density of the F distribution is

𝜈 +𝜈 𝜈1 (𝜈 1 + 𝜈 2 )

𝛤( 1 2 ) 𝜈1 ⁄2 𝜈1

−1 𝜈1 −

2

2

𝑔(𝑓) = 𝜈 𝜈 (𝜈 ) 𝑓2 (1 + 𝜈 𝑓) 𝑓>0

𝛤( 21 ) 𝛤( 22 ) 2 2

The F distribution is positively skewed. The symbol 𝑓𝛼,𝜈1 ,𝜈2 is used to denote the point to the

right of which the area under the F curve with 𝜈1 and 𝜈2 degrees of freedom is 𝛼.

𝑃 [𝐹 > 𝑓𝛼,𝜈 ,𝜈 ] = 𝛼

1 2

---------------

Tests of significance concerning mean, standard deviation and proportion are dealt with

in the study below. Simultaneously, interval estimation of the population mean, standard

deviation and proportion are derived. The following terms are used in interval estimation and the

tests of significance.

Sampling Distribution and Standard Error

The statistical measures of the sample are termed as ‘statistics’; the statistical measures of

the population are called ‘parameters’.

The term ‘Sampling Distribution’ is used to refer to the distribution of a sample statistic.

For example, ‘the sampling distribution of mean’ is the distribution of the mean of independent

random samples of size ‘n’ drawn from a given population.

‘Standard Error’ is the standard deviation of the sampling distribution. The standard

error, denoted S.E. of a sample statistic is used in tests of significance of difference and while

giving an interval estimate for the population parameter.

INTERVAL ESTIMATION

Point estimators give a single value as an estimate for the parameter. There is no

indication about the probability of this estimate being acceptable. In other words, there is no idea

of the size of the error in such an estimate. Interval estimation is based on a certain stipulated

‘confidence level’, and as such, indicates the size of the error in the value provided by

estimation. This confidence level is usually either 95% or 99%. We denote this level as 1 – α. If

(𝑎, 𝑏) is the (1 – α)% confidence interval for the parameter θ, it means that

TESTS OF SIGNIFICANCE

It is tested if the observed difference between the sample statistic and population

parameter is significant or not, using a test measure at a stipulated confidence level, denoted as α.

Suppose we want to test the null hypothesis

𝐻0 ∶ 𝜃 = 𝜃0

The alternative can be any one of the following and consequently the nature of the test will be as

below:

The following is the general procedure for testing the significance of an observed difference:

(i) Set up null hypothesis 𝐻0 of no difference and a suitable alternative hypothesis 𝐻1

(ii) Decide whether one-tailed test or two-tailed test is to be used

(iii) Decide on the level of significance α

(iv) Select the appropriate test statistic (this involves the S.E. of the sample statistic under question)

(v) Demarcate critical region in the distribution of the test statistic

(vi) Check if the calculated value of the test statistic falls within or outside the critical region

(vii) Accordingly reject / accept 𝐻0 at the level of significance α

Instead of demarcating the critical region and checking whether the calculated value of the

test statistic falls within or outside the critical region, we may also use the P-value. The P-value is

the lowest level of significance at which the null hypothesis could have been rejected. More simply,

P-value is the area of the region corresponding to the calculated value of the test statistic. If the P-

value is less than the stipulated level of significance α, the null hypothesis is rejected at the level of

significance α.

INTERVAL ESTIMATION AND TESTS CONCERNING MEAN

I Mean of a single sample

Let a random sample of size n be taken from an infinite population. 𝜇 and 𝜎 2 denote the

mean and variance of the population. Let 𝑋̅ be the mean of the sample. The distribution of the

sample mean 𝑋̅ which is called as the ‘sampling distribution of the sample mean’ has mean 𝜇 and

𝜎2

variance .

𝑛

𝐻0 : μ = μ0

against the alternative which could be any one of

μ ≠ μ0 μ > μ0 μ < μ0.

Case 1: Population known to be normal with known variance 𝜎 2 . No condition on n.

𝜎 𝑋̅ − 𝜇0

𝑆. 𝐸.𝑋̅ = Test statistic = 𝑧 = 𝜎 ~ 𝑁(0,1) [under 𝐻0 ]

√𝑛 ⁄ 𝑛

√

𝜎

The 100(1 − 𝛼)% confidence interval for 𝜇 is 𝑋̅ ± 𝑧𝛼⁄2

√𝑛

𝝈

The term 𝒛𝜶⁄𝟐 is the maximum value of the difference between the sample mean and the

√𝒏

population mean and is called the maximum error under this case.

𝜎 𝑋̅ − 𝜇0

𝑆. 𝐸.𝑋̅ = Test statistic = 𝑧 = 𝜎 ~ 𝑁(0,1) if 𝜎 is known [under 𝐻0 ]

√𝑛 ⁄ 𝑛

√

𝑠 𝑋̅ − 𝜇0

𝑆. 𝐸.𝑋̅ = Test statistic = 𝑧 = 𝑠 ~ 𝑁(0,1) if 𝜎 is not known [under 𝐻0 ]

√𝑛 ⁄ 𝑛

√

∑(𝑥𝑖 −𝑥̅ )2

𝑠 2 is the sample variance given by 𝑠 2 =

𝑛−1

𝜎 𝑠

The 100(1 − 𝛼)% confidence interval for 𝜇 is 𝑋̅ ± 𝑧𝛼⁄2 or 𝑋̅ ± 𝑧𝛼⁄2

√𝑛 √𝑛

𝑠 𝑋̅ − 𝜇0

𝑆. 𝐸.𝑋̅ = Test statistic = 𝑡 = 𝑠 ~ 𝑡 distribution with n −1 degrees of freedom.

√𝑛 ⁄ 𝑛

√

[under 𝐻0 ]

𝑠

The 100(1 − 𝛼)% confidence interval for 𝜇 is 𝑋̅ ± 𝑡𝛼⁄2, 𝑛−1 ( 𝑛)

√

𝒔

The term 𝒕𝜶⁄𝟐, 𝒏−𝟏 ( 𝒏) is called the maximum error under this case.

√

Two independent random samples of sizes n1 and n2 are taken from two populations with

means denoted by μ1, μ2 and variances denoted by σ12, σ22. The sample means are 𝑋̅1 and 𝑋̅2.

𝜎 𝜎 2 2

𝐸(𝑋̅1 − 𝑋̅2 ) = 𝜇1 − 𝜇2 Variance (𝑋̅1 − 𝑋̅2 ) = √ 𝑛1 + 𝑛2

1 2

The null hypothesis 𝐻0 : μ1 − μ2 = δ against the alternative which could be any one of

μ1 − μ2 ≠ δ μ1 − μ2 > δ μ1 − μ2 < δ.

Case 1: Populations known to be normal with known variances σ12, σ22. No condition on n1 and n2.

𝜎 2 𝜎2 2 ̅ 1 −𝑋̅ 2 − 𝛿

𝑋

𝑆. 𝐸.𝑋̅1 −𝑋̅2 = √ 𝑛1 + Test statistic = 𝑧 = ~ 𝑁 (0, 1) [under 𝐻0 ]

1 𝑛2 𝜎 2 𝜎 2

√ 1 + 2

𝑛1 𝑛2

𝜎 𝜎 2 2

The 100(1 − 𝛼)% confidence interval for 𝜇1 − 𝜇2 is 𝑋̅1 − 𝑋̅2 ± 𝑧𝛼⁄2 √ 𝑛1 + 𝑛2

1 2

If σ12, σ22 are known, use the same procedure as Case 1. If σ12, σ22 are not known, substitute the

sample variances s12 and s22 in place of σ12, σ22 and proceed as in Case 1.

Case 3: Populations normal. σ12, σ22 are not known. Either or both of n1 and n2 < 30

Let the samples be {𝑋1𝑖 ⁄𝑖 = 1, 2, . . . 𝑛1 } and {𝑋2𝑗 ⁄𝑗 = 1, 2, . . . 𝑛2 }

Assume 𝜎1 = 𝜎2 = 𝜎 ; 𝜎 is estimated by pooling the squared deviations from the means of

the two samples. The pooled estimate of 𝜎 is denoted as 𝑆𝑃 .

2

(𝑛1 −1)𝑠1 2 + (𝑛2 −1)𝑠2 2 ̅1 )2 + ∑(𝑥2𝑗 − 𝑥

∑(𝑥1𝑖 − 𝑥 ̅2)

2

𝑆𝑃 = 𝑛1 + 𝑛2 −2

= 𝑛1 + 𝑛2 −2

1 1

𝑆. 𝐸.𝑋̅1 −𝑋̅2 = 𝑆𝑃 √𝑛 + 𝑛

1 2

(𝑋̅ 1 −𝑋̅ 2 ) − 𝛿

Test statistic = t = 1 1

𝑆𝑃 √ +

𝑛1 𝑛2

~ t distribution with 𝑛1 + 𝑛2 − 2 degrees of freedom. [under 𝐻0 ]

1 1

The 100(1 − 𝛼)% confidence interval for 𝜇1 − 𝜇2 is (𝑋̅1 − 𝑋̅2 ) ± 𝑡𝛼⁄2, 𝑛1 +𝑛2−2 [𝑆𝑃 √𝑛 + 𝑛 ]

1 2

α 𝑧𝛼⁄2 𝑧𝛼 − 𝑧𝛼

5% 1.96 1.645 − 1.645

2% 2.327 2.055 − 2.055

1% 2.575 2.327 − 2.327

Criterion for │z│> 𝑧𝛼⁄2 z > 𝑧𝛼 z < −𝑧𝛼

rejecting 𝐻0 or t > 𝑡𝛼,𝜈 or t < −𝑡𝛼,𝜈

or │t│> 𝑡𝛼⁄2,𝜈

1) Test at the 0.05 level of significance whether the mean of a random sample of size n = 16 is

significantly less than 10, if the distribution from which the sample was taken is normal; 𝑥̅ = 8.4, σ =

3.2.

Solution: 𝐻0 : 𝜇 = 10 𝐻1 : 𝜇 < 10 (left-tailed) 𝛼 = 0.05

Population is normal; 𝜎 is known; standard normal variable z is used as the test statistic.

𝑋̅ − 𝜇0 8.4−10 (−1.6)×4

𝑧= 𝜎 = 3.2⁄ = = −2

⁄ 𝑛 3.2

√ √16

Inference: There is sufficient reason to reject the null hypothesis at the 0.05 level of

significance. The sample mean is significantly less than 10.

If P-value is used to draw the inference, the steps are as follows:

The P-value corresponding to the value −2 of z is read from the normal tables as 0.0228.

0.0228 < 0.05

Hence, reject the null hypothesis at a significance level of 0.05.

2) A medical research worker finds that the mean of a random sample of size 120 of readings of

blood pressure of women in their fifties is 141.8 mm of Hg. Based on experience, he knows that σ =

10.5 mm of Hg. Do the sample data support the claim that the mean blood pressure of women in

their fifties is 140 mm of Hg at a level of significance of 5%? Also find the (a) 95% (b) 98%

confidence limits for μ, the mean blood pressure of women in their fifties.

Solution: 𝐻0 : 𝜇 = 140 𝐻1 : 𝜇 ≠ 140 (two-tailed) 𝛼 = 0.05

Size of the sample is more than 30; standard normal variable z is used as the test statistic.

│𝑋̅ − 𝜇0 │ 141.8−140 (1.8)×√120

│𝑧│ = 𝜎 = 10.5⁄ = = 1.878

⁄ 𝑛 10.5

√ √120

𝑧𝛼⁄2 = 𝑧0.025 = 1.96 │𝑧│ < 𝑧𝛼⁄2

Inference: The null hypothesis cannot be rejected at 5% level of significance. The claim that

the mean blood pressure of women in their fifties is 140 mm of Hg can be accepted.

If P-value is used to draw the inference, the steps are as follows:

The P-value corresponding to the value 1.878 (≈ 1.88) of z is read from the normal tables as

0.0602.

0.0602 > 0.05

Hence, accept the null hypothesis.

𝜎

The 95% confidence limits for μ = 𝑋̅ ± 𝑧𝛼⁄2

√𝑛

10.5

= 141.8 ± (1.96) ( ) = (139.9, 143.7) mm of Hg

√120

Note: The claimed value of 140 mm of Hg according to 𝐻0 is within the 95% confidence limits for μ

and it can be inferred that the claim can be accepted.

𝜎

The 98% confidence limits for μ = 𝑋̅ ± 𝑧𝛼⁄2

√𝑛

10.5

= 141.8 ± (2.327) ( ) = (140.8, 142.8) mm of Hg

√120

3) The security department of a factory wants to know whether the true average time required by the

night guard to walk his round is 30 minutes. If in a random sample of 32 rounds, the night guard

averaged 30.8 minutes with a standard deviation of 2.1 minutes, determine whether this is sufficient

evidence to reject the null hypothesis μ = 30 minutes in favour of the alternative hypothesis μ ≠ 30

minutes at (a) 0.01 (b) 0.05 levels of significance.

Solution: 𝐻0 : 𝜇 = 30 𝐻1 : 𝜇 ≠ 30 (two-tailed) 𝛼 = 0.01, 0.05

Size of the sample is more than 30; standard normal variable z is used as the test statistic. σ is not

known; use the sample standard deviation s in place of σ.

│𝑋̅ − 𝜇0 │ 30.8−30 (0.8)×√32

│𝑧│ = 𝜎 = 2.1⁄ = = 2.155

⁄ 𝑛 2.1

√ √32

Inference: The null hypothesis cannot be rejected at 1% level of significance; but it can be

rejected at 5% level of significance.

The P-value corresponding to the value 2.155 (≈ 2.16) of z is read from the normal tables as

0.0308.

0.0308 > 0.01 0.0308 < 0.05

𝐻0 is accepted at 1% level of significance; but it is rejected at 5% level of significance.

4) Five measurements of the tar content of a certain kind of cigarette yielded 14.5, 14.2, 14.4, 14.3,

14.6 mg/cigarette. Assume that the data are a random sample from a normal population.

(a) Find if there is reason enough to reject the null hypothesis μ = 14.0 in favour of the alternative μ

≠ 14.0.

(b) What would be the inference if the null hypothesis is μ = 14.3 and the alternative is μ ≠14.3?

(c) What would be the inference if the null hypothesis is μ = 14.2 and the alternative is μ > 14.2?

Use 0.05 level of significance.

Solution: Population normal; σ unknown; n < 30

Sample mean = 𝑥̅ = 14.4

∑(𝑥𝑖 −𝑥̅ )2 0.10

∑(𝑥𝑖 − 𝑥̅ )2 = 0.10 𝑠2 = = = 0.025

𝑛−1 4

│𝑋̅ − 𝜇0 │ 0.4

Test statistic = │𝑡│ = 𝑠 = = 5.657

⁄ 𝑛 √0.025⁄

√ √5

Inference: There is reason enough to reject the null hypothesis at 5% level of significance.

The mean tar content of cigarettes in the population is not 14.0 mg / cigarette.

│𝑋̅ − 𝜇0 │ 0.1

Test statistic = │𝑡│ = 𝑠 = = 1.414

⁄ 𝑛 √0.025⁄

√ √5

Inference: There is reason enough to accept the null hypothesis at 5% level of significance.

The mean tar content of cigarettes in the population could be taken as 14.3 mg / cigarette.

𝑋̅ − 𝜇0 0.2

Test statistic = 𝑡 = 𝑠 = √0.025 = 2.828

⁄ 𝑛 ⁄

√ √5

Inference: The null hypothesis is rejected in favour of the alternative hypothesis at 5% level

of significance. The mean tar content of cigarettes in the population could be taken as more than

14.2 mg / cigarette.

5) The length of the skulls of 10 fossil skeletons of an extinct species of bird has a mean of 5.68 cm

and a standard deviation of 0.29 cm. Assume that such measurements are normally distributed and

find a 95% confidence interval for the mean length of the skulls of this species of bird.

Solution: Population normal; σ unknown; n < 30

Sample mean = 𝑥̅ = 5.68 cm Sample S.D. = s = 0.29

𝑠

95% confidence limits for μ, the mean length of the skulls = 𝑥̅ ± 𝑡𝛼⁄2, 𝑛−1 ( 𝑛)

√

𝑠 0.29

= 𝑥̅ ± 𝑡0.025,9 ( 𝑛) = 5.68 ± (2.262)

√ √10

6) In a study of television viewing habits, it is desired to estimate the average number of hours that

teenagers spend watching per week. If it is reasonable to assume that σ = 3.2 hours, how large a

sample is needed so that it will be possible to assert with 95% confidence that the sample mean is off

the true mean by less than 20 minutes?

Solution: σ = 3.2 hours α = 0.05

It is required to find the least value of the sample size n.

The difference between the sample mean 𝑥̅ and the true mean μ should not exceed 20 minutes.

1

│𝑥̅ − 𝜇│ < 3 Hour.

Let us work with the assumption that n will turn out to be ≥ 30.

The maximum error (maximum difference between 𝑥̅ and μ)

𝜎 3.2 (1.96) (3.2)

= 𝑧𝛼⁄2 = 𝑧0.025 ( )=

√𝑛 √ 𝑛 √𝑛

1

This term should be less than Hour

3

(1.96) (3.2) 1

<

√𝑛 3

This answer tallies with the assumption that 𝑛 > 30.

The conclusion is that size of the sample should be at least 355 in order that the sample mean is off

the true mean by less than 20 minutes.

7) An experiment is performed to determine whether the average nicotine content of one kind of

cigarette exceeds that of another kind by 0.20 mg. A random sample of size 50 cigarettes of the first

kind had an average nicotine content of 2.61 mg with a standard deviation of 0.12 mg. Another

random sample of size 40 cigarettes of the second kind had an average nicotine content of 2.38 mg

with a standard deviation of 0.14 mg. Test the null hypothesis μ1 – μ2 = 0.20 against the alternative

hypothesis μ1 – μ2 ≠ 0.20 at the 0.05 level of significance. Also check the decision on the P-value

corresponding to the value of the appropriate test statistic.

Solution:

Populations not known; σ12, σ22 unknown; n1, n2 both > 30

H0: μ1 – μ2 = 0.20 H1: μ1 – μ2 ≠ 0.20 (two-tailed) α = 0.05

̅1 −𝑥̅2 − 𝛿

𝑥

Test statistic = z = ~ 𝑁 (0, 1)

𝑠1 2 𝑠2 2

√ +

𝑛1 𝑛2

2.61−2.38−0.20

│𝑧│ = 2 2

= 1.08

√(0.12) + (0.14)

50 40

The null hypothesis cannot be rejected at 5% level of significance.

The P-value corresponding to z = 1.08 is 0.2802. This is more than 0.05. So, the null

hypothesis cannot be rejected at 5% level of significance.

8) With reference to Question 7, for what values of 𝑥̅1 − 𝑥̅2 would the null hypothesis have been

rejected? Also find the probability of type II error with the given criterion if

(a) μ1 – μ2 = 0.12 (b) μ1 – μ2 = 0.16

│𝑥̅1 −𝑥̅2 − 𝛿│

Solution: If the null hypothesis is to be rejected, must be > 1.96.

𝑠1 2 𝑠2 2

√ +

𝑛1 𝑛2

𝑠 2 𝑠2 2 0.12

( )2 (0.14)2

√ 𝑛1 + 𝑛2

= √ 50 + 40

= 0.02789

1

H0 is rejected if (𝑥̅1 − 𝑥̅2 ) − 0.20 > 0.055 OR (𝑥̅1 − 𝑥̅2 ) − 0.20 < −0.055

H0 is rejected if (𝑥̅1 − 𝑥̅2 ) > 0.255 OR (𝑥̅1 − 𝑥̅2 ) < 0.145

H0 is rejected if (𝑥̅1 − 𝑥̅2 ) does not lie between 0.145 and 0.255

̅1 −𝑥̅2 − 𝛿

𝑥

~ 𝑁 (0, 1)

𝜎 2 𝜎 2

√ 1 + 2

𝑛1 𝑛2

β = P [accepting H0 / H0 is false] = 𝑃 [0.145 < (𝑥̅1 − 𝑥̅2 ) < 0.255]

0.145−0.12 (𝑥̅ 1 −𝑥̅ 2 ) −0.12 0.255−0.12

= 𝑃[ < < ]

0.2789 0.2789 0.2789

(b) It is required to find β if the true value of μ1 – μ2 is 0.16.

̅1 −𝑥̅2 − 𝛿

𝑥

~ 𝑁 (0, 1)

𝜎 2 𝜎 2

√ 1 + 2

𝑛1 𝑛2

β = P [accepting H0 / H0 is false] = 𝑃 [0.145 < (𝑥̅1 − 𝑥̅2 ) < 0.255]

0.145−0.16 (𝑥̅ 1 −𝑥̅ 2 ) −0.12 0.255−0.16

= 𝑃[ < < ]

0.2789 0.2789 0.2789

9) Sample surveys conducted in a large county in a certain year and again 20 years later showed that

originally the average height of 400 ten-year-old boys was 53.8 inches with a standard deviation of

2.4 inches, whereas 20 years later, the average height of 500 ten-year-old boys was 54.5 inches with

a standard deviation of 2.5 inches. At a level of significance of 0.05, test the null hypothesis μ1 – μ2

= − 0.5 against the alternative hypothesis μ1 – μ2 < − 0.5

Solution: Populations not known; σ12, σ22 unknown; Samples independent; n1, n2 both > 30

H0: μ1 – μ2 = − 0.5 H1: μ1 – μ2 < − 0.5 (left-tailed) α = 0.05

̅1 −𝑥̅2 − 𝛿

𝑥

Test statistic = z = ~ 𝑁 (0, 1)

𝑠1 2 𝑠2 2

√ +

𝑛1 𝑛2

(53.8−54.5)− (−0.5)

z= 2 2

= −1.22

√(2.4) +(2.5)

400 500

We cannot reject the null hypothesis at a level of significance of 0.05. The average height of ten-

year-old boys has increased by 0.5 inches 20 years later, and not more than 0.5 inches.

10) With reference to Question 9, base your decision on the P-value for the test statistic calculated.

Solution: z = −1.22; P-value corresponding to −1.22 is 0.1112.

Inference: P-value > 0.05. Accept the null hypothesis at a level of significance of 0.05.

11) To compare two kinds of bumper guards, six of each kind were mounted on a certain make of

compact car. Then each car was run into a concrete wall at 5 miles per hour, and the following are

the costs of the repairs (in dollars):

Bumper guard 1: 127 168 143 165 122 139

Bumper guard 2: 154 135 132 171 153 149

Test at a level of significance of 0.01 whether the difference between the means of these two

samples is significant.

Solution: Populations not known; σ12, σ22 unknown; Samples independent; n1, n2 both < 30

Assume that the populations are normal and have equal variance 𝜎 2 . The pooled estimate of 𝜎 2 is

𝑆𝑃 2 .

H0: μ1 = μ2 H1: μ1 ≠ μ2 (two-tailed) α = 0.01

(𝑋̅ 1 −𝑋̅ 2 )

Test statistic = T = ~ t distribution with 𝑛1 + 𝑛2 − 2 degrees of freedom

1 1

𝑆𝑃 √ +

𝑛1 𝑛2

2

̅1 )2 + ∑(𝑥2𝑗 − 𝑥

∑(𝑥1𝑖 − 𝑥 ̅2 )

2

𝑆𝑃 = 𝑛1 + 𝑛2 −2

Calculations:

Sample 1:

864

x1i 127 168 143 165 122 139 𝑥̅1 = = 144

6

x1i − 𝑥̅1 − 17 24 −1 21 − 22 −5

(𝑥1𝑖 − 𝑥̅1 )2 289 596 1 441 484 25 ∑(𝑥1𝑖 − 𝑥̅1 )2 = 1836

Sample 2:

894

x2j 154 135 132 171 153 149 𝑥̅2 = = 149

6

x2j − 𝑥̅2 5 − 14 − 17 22 4 0

2 2

(𝑥2𝑗 − 𝑥̅2 ) 25 196 289 484 16 0 ∑(𝑥2𝑗 − 𝑥̅2 ) = 1010

1836 +1010

𝑆𝑃 2 = = 284.6

10

│𝑥̅ 1 − 𝑥̅ 2 │ 5

│𝑡│ = 1 1

= 1 1

= 0.5133

𝑆𝑃 √ + √284.6 √6 + 6

𝑛1 𝑛2

We cannot reject the null hypothesis at a level of significance of 0.01. The difference between the

means of these two samples is not significant.

12) The following are the average weekly losses of work-hours due to accidents in 10 industrial

plants before and after a certain safety program was put into operation:

45 and 36 73 and 60 46 and 44 124 and 119 33 and 35

57 and 51 83 and 77 34 and 29 26 and 24 17 and 11

Test whether the safety program is effective at a level of significance of 0.05.

Solution: Populations not known; σ12, σ22 unknown; Samples NOT independent; n1, n2 both < 30

Since the samples are not independent, the difference between the population means cannot

be tested using the usual procedure. The random variable under question is: difference between each

pair of observations.

X denotes the difference between each pair of observations.

X = (average weekly losses of work-hours due to accidents before the safety program)

– (average weekly losses of work-hours due to accidents after the safety program)

Let μ denote the mean of the population of X. The null hypothesis is that there is no significant

improvement after the safety program. The alternative hypothesis is that there a significant decrease

in the average weekly losses of work-hours due to accidents after the safety program. It is now a test

of significance of the difference between the mean of a sample and the mean of the population.

𝐻0 : 𝜇 = 0 𝐻1 : 𝜇 > 0 (right-tailed) 𝛼 = 0.05

Population of X is assumed as normal. Sample size n = 10 (< 30).

𝑋̅ − 𝜇0

Test statistic = 𝑡 = 𝑠

⁄ 𝑛

√

Calculations:

X 9 13 2 5 −2 6 6 5 2 6 ∑ 𝑥 = 52

X2 81 169 4 25 4 36 36 25 4 36 ∑ 𝑥 2 = 420

𝑋̅ = 5.2

10 420 52 2

𝑠2 = [ 10 − (10) ] = 16.62

9

5.2 −0

𝑡= = 4.033

√16.62⁄

√10

Reject 𝐻0 at 0.05 level. We have sufficient reason to believe that the safety program is effective at a

level of significance of 0.05.

13) A study of two kinds of photocopying equipment shows that 61 failures of the first kind of

equipment took on the average 80.7 minutes to repair with a standard deviation of 19.4 minutes,

while 61 failures of the second kind of equipment took on the average 88.1 minutes to repair with a

standard deviation of 18.8 minutes. Find a 99% confidence interval for the difference between the

true average amounts of time it takes to repair failures of the two kinds of photocopying equipment.

Solution: n1 = 61 n2 = 61

𝑋̅1 = 80.7 𝑋̅2 = 88.1

𝑠1 = 19.4 𝑠2 = 18.8

Populations not known; σ1 and σ2 not known; n1 and n2 > 30

Use 𝑠1 and 𝑠2 in place of σ1 and σ2.

𝑠 2 𝑠2 2

99% confidence interval for (μ1 – μ2) = (𝑥̅1 − 𝑥̅2 ) ± 𝑧0.005 √ 𝑛1 +

1 𝑛2

(19.4)2 (18.8)2

= (80.7 − 88.1) ± (2.575)√ +

61 61

14) Twelve randomly selected mature citrus trees of one variety have a mean height of 13.8 feet with

a standard deviation of 1.2 feet and fifteen randomly selected mature citrus trees of another variety

have a mean height of 12.9 feet with a standard deviation of 1.5 feet. Assuming that the random

samples were selected from normal populations with equal variances, construct a 95% confidence

interval for the difference between the true average heights of the two kinds of citrus trees.

Solution: n1 =12 n2 = 15

𝑋̅1 = 13.8 𝑋̅2 = 12.9

𝑠1 = 1.2 𝑠2 = 1.5

Populations normal; σ1 =σ2 (value not known); n1 and n2 < 30

1 1

95% confidence interval for (μ1 – μ2) = (𝑋̅1 − 𝑋̅2 ) ± 𝑡𝛼⁄2, 𝑛1 +𝑛2 −2 [𝑆𝑃 √𝑛 + 𝑛 ]

1 2

𝑆𝑃 2 = = = 1.8936 = (1.376)2

𝑛1 + 𝑛2 −2 12+15−2

1 1

95% confidence interval for (μ1 – μ2) = (13.8 − 12.9) ± 𝑡0.025,25 [(1.376)√12 + 15 ]

1 1

= (0.9) ± (2.06) [(1.376)√12 + 15 ]

We deal with discrete populations such as binomial, Poisson and so on.

Size of sample is ≥ 30

I Single sample proportion

A random sample of size n (> 30) is taken from a population. The proportion of success in

the population is denoted by P. The proportion of success in the sample is p. The null hypothesis to

be tested is

𝐻0 : P = P0

against the alternative which could be any one of

P ≠ P0 P > P0 P < P0

𝑃0 (1− 𝑃0 ) 𝑝 − 𝑃0

𝑆. 𝐸.𝑝 = √ Test statistic = 𝑧 = ~ 𝑁(0, 1) [under 𝐻0 ]

𝑛 𝑃 (1− 𝑃0 )

√ 0

𝑛

𝑝(1− 𝑝)

The 100(1 − 𝛼)% confidence interval for P is 𝑝 ± 𝑧𝛼⁄2 √ 𝑛

Two independent random samples of sizes n1 and n2 (both > 30) are taken from two

independent populations with proportions of success denoted by P1 and P2. The proportions of

success in the samples are p1 and p2. The null hypothesis to be tested is

P1 = P2.

against the alternative which could be any one of

P1 ≠ P2 P1 > P2 P1 < P2

First calculate the pooled proportion 𝑃̂ using the formula

𝑛1 𝑝1 + 𝑛2 𝑝2

𝑃̂ =

𝑛1 + 𝑛2

𝑝1 − 𝑝2

Test statistic = 𝑧 = ~ 𝑁(0, 1) [under 𝐻0 ]

1 1

√𝑃̂ (1−𝑃̂ )(𝑛 + 𝑛 )

1 2

𝑝1 (1−𝑝1 ) 𝑝2 (1−𝑝2 )

The 100(1 − 𝛼)% confidence interval for P1 − P2 is (𝑝1 − 𝑝2 ) ± 𝑧𝛼⁄2 √ +

𝑛1 𝑛2

15) The manufacturer of a spot remover claims that his product removes 90 percent of all spots. If, in

a random sample, only 174 of 200 spots were removed with the manufacturer’s product, test the null

hypothesis P = 0.90 against the alternative hypothesis P < 0.90 at the 5% level of significance.

Solution: 𝐻0 : 𝑃 = 0.90 𝐻1 : 𝑃 < 0.90 𝛼 = 0.05

174

𝑝= = 0.87 n = 200

200

𝑝 − 𝑃0 0.87 −0.90

Test statistic = 𝑧 = = (0.9)(0.1)

= − 1.41

𝑃 (1− 𝑃0 )

√ 0 √

200

𝑛

There is no reason to reject the null hypothesis. The manufacturer’s claim that his product removes

90 percent of all spots cannot be rejected.

16) In random samples, 74 out of 250 persons who watched a certain television program on a small

TV set and 92 out of 250 persons who watched the same program on a large TV set remembered 2

hours later what products were advertised. Test the null hypothesis that there is no difference

between the two populations at a level of significance of 0.01.

Solution: 𝐻0 : 𝑃1 = 𝑃2 𝐻1 : 𝑃1 ≠ 𝑃2 (two-tailed) 𝛼 = 0.01

74 92

n1 = 250 n2 = 250 𝑝1 = 250 = 0.296 𝑝2 = 250 = 0.368

𝑝1 − 𝑝2

Test statistic = 𝑧 =

1 1

√𝑃̂ (1−𝑃̂ )(𝑛 + 𝑛 )

1 2

𝑛1 𝑝1 + 𝑛2 𝑝2 74+92

𝑃̂ = = = 0.332

𝑛1 + 𝑛2 250+250

│0.296−0.368│

│𝑧│ = 1 1

= 1.71

√(0.332)(0.668)( + )

250 250

There is no reason to reject the null hypothesis. There is no difference between watching programs

on small set and watching programs on large set at a level of significance of 0.01.

17) A private opinion poll is engaged by a politician to estimate what proportion of her constituents

favor a certain proposal. Determine how large a sample the poll will have to take to be at least 95%

confident that the sample proportion is off by less than 0.02.

Solution: The maximum error should be less than 0.02 at α = 0.05

𝑃0 (1− 𝑃0 )

Maximum error = 𝑧𝛼⁄2 √ < 0.02 𝑧𝛼⁄2 = 𝑧0.025 = 1.96

𝑛

1

Maximum value of 𝑃0 (1 − 𝑃0 ) is 4

---------------------------------------------------------------------------------------------------------------

To maximize 𝑥(1 − 𝑥):

𝑑𝑦

Let 𝑦 = 𝑥(1 − 𝑥) = (1 − 𝑥) − 𝑥 = 0 ⇒ 1 − 2𝑥 = 0 ⇒ 𝑥 = ½

𝑑𝑥

𝑑2 𝑦

= −2 < 0.

𝑑𝑥 2

Hence, 𝑥(1 − 𝑥) is maximum when 𝑥 = ½

1 1 1

Maximum value of 𝑥(1 − 𝑥) = 2 (1 − 2) = 4

-----------------------------------------------------------------------------------------------------------------

1⁄ 1.96

(1.96)√ 4

< 0.02 ⇒ √𝑛 > 0.04 ⇒ √𝑛 > 49 ⇒ 𝑛 > 2401

𝑛

18) A sample survey at a supermarket showed that 204 of 300 shoppers use discount coupons.

Construct a 95% confidence interval for the corresponding true proportion.

204

Solution: 𝑝= = 0.68

300

𝑝(1− 𝑝)

The 95 % confidence interval for P is 𝑝 ± 𝑧0.025 √ 𝑛

(0.68)(0.32)

= 0.68 ± (1.96)√ = 0.68 ± 0.053 = (0.627, 0.733)

300

19) Among 500 marriage license applications chosen at random in a given year, there were 48 in

which the woman was at least one year older than the man and among 400 marriage license

applications chosen at random six years later, there were 68 in which the woman was at least one

year older than the man. Construct a 99% confidence interval for the difference between the

corresponding true proportions of marriage license applications in which the woman was at least one

year older than the man.

48 68

Solution: 𝑝1 = = 0.096 𝑝2 = 400 = 0.170 𝛼 = 0.01

500

𝑝1 (1−𝑝1 ) 𝑝2 (1−𝑝2 )

The 100(1 − 𝛼)% confidence interval for P1 − P2 is (𝑝1 − 𝑝2 ) ± 𝑧𝛼⁄2 √ +

𝑛1 𝑛2

(0.96)(0.904) (0.17)(0.83)

(0.096 − 0.170) ± (2.575)√ +

500 400

I (a) Single sample variance

A random sample of size n is taken from a normal population with the variance denoted by

σ . The variance in the sample is s2. The null hypothesis to be tested is

2

𝐻0 : σ2 = σ02

against the alternative which could be any one of

σ2 ≠ σ02 σ2 > σ02 σ2 < σ02

(𝑛−1)𝑠2

Test statistic 𝜒2 = ~ 𝜒 2 distribution with n – 1 degrees of freedom [under 𝐻0 ]

𝜎0 2

2, 𝑛−1

2 2 2 2

𝐻1 : 𝜎 > 𝜎0 𝜒 ≥ 𝜒 𝛼, 𝑛−1

2 2 2 2

𝐻1 : 𝜎 < 𝜎0 𝜒 ≤ 𝜒 1−𝛼, 𝑛−1

2

The 100(1 − 𝛼)% confidence interval for 𝜎 is

(𝑛−1)𝑠 2 (𝑛−1)𝑠 2

[𝜒2 , ]

𝛼⁄2 , 𝑛−1 𝜒2 1−𝛼⁄ , 𝑛−1

2

I (b) This case is a variation of the previous case. A simpler test can be used when the sample size is

more than 30.

A random sample of size n (≥ 30) is taken from a normal population with the variance

denoted by σ2. The variance in the sample is s2. The null hypothesis and the alternative are the same

as in I (a).

𝑠

Test statistic = 𝑧 = (𝜎 − 1) √2(𝑛 − 1) ~ 𝑁(0, 1)

0

Two independent random samples of sizes n1 and n2 are taken from two normal populations

with variances denoted by σ12 and σ22. The variances in the samples are s12 and s22. The null

hypothesis to be tested is

𝐻0 : σ12 = σ22

against the alternative which could be any one of

σ12 ≠ σ22 σ12 > σ22 σ12 < σ22

The test statistic is the ratio of the variances s12 and s22. Under 𝐻0 this ratio has the F

distribution. The order of the numbers of degrees of freedom depends on which variance is taken in

the numerator in the ratio.

𝒔𝟏 𝟐

Reject 𝐻0 if 𝑯𝟏 : 𝝈𝟏 𝟐 ≠ 𝝈𝟐 𝟐 ≥ 𝑭𝜶⁄ if 𝒔𝟏 𝟐 ≥ 𝒔𝟐 𝟐

𝒔𝟐 𝟐 𝟐, 𝒏𝟏 −𝟏, 𝒏𝟐 −𝟏

𝑠2 2

≥ 𝐹𝛼⁄ if 𝑠2 2 ≥ 𝑠1 2

𝑠1 2 2, 𝑛2 −1, 𝑛1 −1

𝒔𝟏 𝟐

𝑯𝟏 : 𝝈𝟏 𝟐 > 𝝈𝟐 𝟐 ≥ 𝑭𝜶, 𝒏𝟏−𝟏, 𝒏𝟐−𝟏

𝒔𝟐 𝟐

𝑠2 2

𝐻1 : 𝜎2 2 > 𝜎1 2 ≥ 𝐹𝛼, 𝑛2−1, 𝑛1−1

𝑠1 2

𝜎1 2

The 100(1 − 𝛼)% confidence interval for is

𝜎2 2

𝑠 2 1 𝑠1 2

[𝑠1 2 𝐹𝛼⁄2, 𝜈1 , 𝜈2

,

𝑠2 2

𝐹𝛼⁄2, 𝜈2, 𝜈1 ]

2

where 𝜈1 = 𝑛1 − 1 and 𝜈2 = 𝑛2 − 1

20) In a random sample, the weights of 24 Black Angus steers of a certain age have a standard

deviation of 238 pounds. Assume that the weights constitute a random sample from a normal

population, and test the null hypothesis 𝜎 = 250 pounds against the two-sided alternative 𝜎 ≠ 250

pounds at the level of significance of 0.01.

Solution: 𝐻0 : 𝜎 = 250 𝐻1 : 𝜎1 ≠ 250 (two-tailed) 𝛼 = 0.01

Population normal; n = 24

(𝑛−1)𝑠2 23 (238)2

Test statistic 𝜒2 = = = 20.84

𝜎0 2 (250)2

2, 𝑛−1

2, 𝑛−1

There is no reason to reject the null hypothesis at the level of significance of 0.01.

21) In a random sample, s = 2.53 minutes for the amount of time that 30 women took to complete the

written test for their driver’s licenses. At the level of significance of 5%, test the null hypothesis that

σ = 2.85 minutes against the alternative that σ < 2.85 minutes.

Solution: 𝐻0 : 𝜎 = 2.85 𝐻1 : 𝜎1 < 2.85 (left-tailed) 𝛼 = 0.05

Population normal; n = 30

(𝑛−1)𝑠2 29 (2.53)2

Test statistic 𝜒2 = = = 22.85

𝜎0 2 (2.850)2

= 𝜒 2 0.95, 29

= 17.708

𝜒 2 > 𝜒 21−𝛼, 𝑛−1

There is no reason to reject the null hypothesis at the level of significance of 0.05.

Aliter: Since the size of the sample is 30, the sample could be considered large. In this case, the test

is as follows:

𝑠 2.53

Test statistic = 𝑧 = (𝜎 − 1) √2(𝑛 − 1) = (2.85 − 1) √2(29) = − 0.8551

0

The null hypothesis cannot be rejected at the level of significance of 0.05.

22) Past data indicate that the standard deviation of measurements made on sheet metal stampings by

experienced inspectors is 0.41 square inch. If a new inspector measures 50 stampings with a standard

deviation of 0.49 square inch, test the null hypothesis that σ = 0.41 square inch against the alternative

that σ > 0.41 square inch at the level of significance of 5%. Check your answer using the P-value of

the statistic.

Solution: 𝐻0 : 𝜎 = 0.41 𝐻1 : 𝜎1 > 0.41(right-tailed) 𝛼 = 0.05

Population normal; n = 50

Since the size of the sample is 50, the sample could be considered large.

𝑠 0.49

Test statistic = 𝑧 = (𝜎 − 1) √2(𝑛 − 1) = (0.41 − 1) √2(49) = 1.93

0

The null hypothesis is rejected at the level of significance of 0.05.

P-value corresponding to z = 1.93 is 0.0268; 0.0268 < 0.05. Reject 𝐻0 at 5%.

23) To compare two kinds of bumper guards, six of each kind were mounted on a certain make of

compact car. Then each car was run into a concrete wall at 5 miles per hour, and the following are

the costs of the repairs (in dollars):

Bumper guard 1: 127 168 143 165 122 139

Bumper guard 2: 154 135 132 171 153 149

Test at a level of significance of 0.02 whether it is reasonable to assume that the two populations

sampled have equal variances.

Solution: 𝐻0 : 𝜎1 2 = 𝜎2 2 𝐻1 : 𝜎1 2 ≠ 𝜎2 2 (two-tailed) 𝛼 = 0.02

The test statistic is determined after ascertaining which of the sample variances is larger.

Calculations:

Sample 1:

864

x1i 127 168 143 165 122 139 𝑥̅1 = = 144

6

x1i − 𝑥̅1 − 17 24 −1 21 − 22 −5

(𝑥1𝑖 − 𝑥̅1 )2 289 596 1 441 484 25 ∑(𝑥1𝑖 − 𝑥̅1 )2 = 1836

∑(𝑥1𝑖 − 𝑥̅ 1 )2 1836

Sample variance = 𝑠1 2 = = = 367.2

𝑛1 −1 5

Sample 2:

894

x2j 154 135 132 171 153 149 𝑥̅2 = = 149

6

x2j − 𝑥̅2 5 − 14 − 17 22 4 0

2 2

(𝑥2𝑗 − 𝑥̅2 ) 25 196 289 484 16 0 ∑(𝑥2𝑗 − 𝑥̅2 ) = 1010

2

∑(𝑥2𝑗 − 𝑥̅2 ) 1010

Sample variance = 𝑠2 2 = = = 202

𝑛2 −1 5

𝑠1 2 367.2

Since 𝑠1 2 > 𝑠2 2 , test statistic is 𝐹 = = = 1.82

𝑠2 2 202

24) In the comparison of two kinds of paint, a consumer testing service finds that four 1-gallon cans

of one brand have a standard deviation of 31 square feet, while four 1-gallon cans of another brand

have a standard deviation of 26 square feet. Assume that the two populations are normal and test the

null hypothesis that 𝜎1 = 𝜎2 against the alternative that 𝜎1 > 𝜎2 at the level of significance of 5%.

Solution: 𝐻0 : 𝜎1 = 𝜎2 𝐻1 : 𝜎1 > 𝜎2 (right-tailed) 𝛼 = 0.05

Populations normal; 𝑛1 = 4; 𝑛2 = 4

𝑠1 = 31; 𝑠2 = 26 𝑠1 2 > 𝑠2 2

𝑠1 2 961

Test statistic is 𝐹 = = = 1.4216

𝑠2 2 676

𝐹 < 𝐹𝛼, 𝑛1 −1, 𝑛2 −1

The null hypothesis cannot be rejected at the level of significance of 0.05.

25) The length of the skulls of 10 fossil skeletons of an extinct species of bird a standard deviation of

0.29 cm. Assume that such measurements are normally distributed and find a 95% confidence

interval for the true variance of the skull length of this species of bird.

Solution: Population normal; n = 10

The 100(1 − 𝛼)% confidence interval for 𝜎 2 is

(𝑛−1)𝑠 2 (𝑛−1)𝑠 2

[𝜒2 , ]

𝛼⁄2 , 𝑛−1 𝜒2 1−𝛼⁄ , 𝑛−1

2

2

The 95% confidence interval for 𝜎 is

9 (0.29)2 9 (0.29)2 9(0.0841) 9(0.0841)

[ 𝜒2 , ] = [ , ] = (0.938, 0.280)

0.025,9 𝜒2 0.975,9 19.023 2.7

26) Twelve randomly selected mature citrus trees of one variety have a standard deviation of 1.2 feet

and fifteen randomly selected mature citrus trees of another variety have a standard deviation of 1.5

feet. Assuming that the random samples were selected from normal populations, construct a 98%

confidence interval for the ratio of the variances of the two populations sampled..

Solution: n1 =12 n2 = 15

𝑠1 = 1.2 𝑠2 = 1.5

Populations normal;

𝜎1 2

The 100(1 − 𝛼)% confidence interval for is

𝜎2 2

𝑠 2 1 𝑠1 2

[𝑠1 2 𝐹𝛼⁄2, 𝜈1 , 𝜈2

,

𝑠2 2

𝐹𝛼⁄2, 𝜈2, 𝜈1 ]

2

where 𝜈1 = 𝑛1 − 1 and 𝜈2 = 𝑛2 − 1

𝜎1 2

The 98% confidence interval for is

𝜎2 2

𝑠 2 1 𝑠1 2

[𝑠1 2 𝐹0.01,11,14

,

𝑠2 2

𝐹0.01,14,11 ]

2

(1.2)2 1 (1.2)2

= [(1.5)2 , (4.30)] = [0.165, 2.752]

3.87 (1.5)2

27) The following are the heat-producing capacities of coal from two mines (in millions of calories

per ton):

Mine A: 8500 8330 8480 7960 8030

Mine B: 7710 7890 7920 8270 7860

Assume that the data constitute independent random samples from normal populations and construct

a 90% confidence interval for the ratio of the variances of the two populations sampled.

Solution: Populations normal; 𝑛1 = 5 𝑛2 = 5

The ratio of the variances will not change by scaling the observations using the same scale measure

for both samples. Let us use a scale of 10.

Mine A:

4130

Xi 850 833 848 796 803 𝑋̅ = = 826

5

𝑋𝑖 − 𝑋̅ 24 7 22 −30 −23

∑(𝑋𝑖 −𝑋̅)2 2538

(𝑋𝑖 − 𝑋̅)2 576 49 484 900 529 𝑠2 = = = 634.5

𝑛−1 4

Mine B:

3965

Xi 771 789 792 827 786 𝑋̅ = = 793

5

𝑋𝑖 − 𝑋̅ −22 −4 −1 34 −7

∑(𝑋𝑖 −𝑋̅)2 1706

(𝑋𝑖 − 𝑋̅)2 484 16 1 1156 49 𝑠2 = = = 426.5

𝑛−1 4

𝑠1 2 = 634.5 𝑠2 2 = 426.5

𝜎1 2

The 100(1 − 𝛼)% confidence interval for is

𝜎2 2

𝑠 2 1 𝑠1 2

[𝑠1 2 𝐹𝛼⁄2, 𝜈1 , 𝜈2

,

𝑠2 2

𝐹𝛼⁄2, 𝜈2, 𝜈1 ]

2

where 𝜈1 = 𝑛1 − 1 and 𝜈2 = 𝑛2 − 1

𝜎1 2

The 90% confidence interval for is

𝜎2 2

[𝑠1 2 𝐹0.05,4,4

,

𝑠2 2

𝐹0.05,4,4 ] = [ 426.5 6.39 , 426.5

] = [0.233, 9.506]

2

I Single sample correlation co-efficient

(a) A random sample of n pairs of observations is taken from a bivariate normal population with

the correlation co-efficient denoted by ρ. The correlation co-efficient in the sample is 𝑟. The null

hypothesis to be tested is

ρ=0

against the alternative

ρ≠0

𝑟

Test statistic 𝑡= √𝑛 − 2 ~ 𝑡𝑛−2 [under 𝐻0 ]

√1−𝑟 2

(b) A random sample of n pairs of observation is taken from a bivariate normal population with

the correlation co-efficient denoted by ρ. The correlation co-efficient in the sample is 𝑟. The null

hypothesis to be tested is

ρ = 𝜌0

against the alternative

ρ ≠ 𝜌0

𝑍− 𝑍0

Test statistic ~ N (0, 1) [under 𝐻0 ]

√1⁄𝑛−3

1 1+𝑟 1 1+𝜌

Where 𝑍= 𝑙𝑜𝑔𝑒 [1−𝑟] 𝑍0 = 𝑙𝑜𝑔𝑒 [1−𝜌]

2 2

Two independent random samples of n1 and n2 pairs of observations respectively are taken

from two independent bivariate normal populations with the correlation co-efficients denoted by ρ1

and ρ2 . The correlation co-efficients in the samples are 𝑟1 and 𝑟2 . The null hypothesis to be tested is

ρ1 = ρ2

against the alternative ρ1 ≠ ρ2

𝑍1 − 𝑍2

Test statistic 1 1

~ N (0, 1) [under 𝐻0 ]

√𝑛 −3 + 𝑛 −3

1 2

1 1+𝑟 1 1+𝑟

Where 𝑍1 = 𝑙𝑜𝑔𝑒 [1−𝑟1] 𝑍2 = 𝑙𝑜𝑔𝑒 [1−𝑟2 ]

2 1 2 2

Important Note: The alternative hypotheses are two-tailed. Compare the calculated value of the test

statistic with 𝑡𝑛−2, 𝛼⁄2 or 𝑧𝛼⁄2 as the case may be for drawing inference.

28) Test the significance of the values of the correlation co-efficient obtained from samples of n

pairs of observation from bivariate normal population at the level of 5%:

(i) 𝑛 = 38; 𝑟 = 0.6 (ii) 𝑛 = 11; 𝑟 = 0.5

Solution:

(i) 𝐻0 : 𝜌 = 0 𝐻1 : 𝜌 ≠ 0 𝛼 = 0.05

𝑟 0.6

𝑡= √𝑛 − 2 = √38 − 2 = 4.5

√1−𝑟 2 √1− (0.6)2

𝑡 ~ 𝑡𝑛−2

𝑛 > 30; Use normal distribution values.

Inference: 𝑧0.025 = 2.575 4.5 > 𝑧0.025

Reject the null hypothesis at the level of 5%. The correlation co-efficient obtained from the sample

is suggestive of correlation between the variables in the population.

(ii) 𝐻0 : 𝜌 = 0 𝐻1 : 𝜌 ≠ 0 𝛼 = 0.05

𝑟 0.5

𝑡= √𝑛 − 2 = √11 − 2 = 1.73

√1−𝑟 2 √1− (0.5)2

𝑡 ~ 𝑡𝑛−2

Inference: 𝑡9,0.025 = 2.262; 1.73 < 𝑡9,0.025

We cannot reject the null hypothesis at the level of 5%. The correlation co-efficient obtained from

the sample does not signify correlation between the variables in the population.

29) Find the least value of 𝑟 in a sample of 27 pairs of observations from bivariate normal population

which would be significant of correlation in the population at the level of 5%.

Solution:

𝑟

√1−𝑟 2

√𝑛 − 2 > 𝑡27−2,0.025

𝑟

√1−𝑟 2

√25 > 2.060

𝑟

> 0.412

√1−𝑟 2

𝑟2

> 0.1697

1− 𝑟 2

1.1697 𝑟 2 > 0.1697

𝑟 2 > 0.1451

𝑟 < −0.3809 or 𝑟 > 0.3809

The value of the sample correlation co-efficient must be less than – 0.3809 or more than

0.3809 in order to significant of correlation in the population at the level of 5%.

30) The correlation co-efficient in a sample of 18 pairs of observation from bivariate normal

population was found to be 0.5. It is claimed that the correlation co-efficient in the population is 0.7.

Does the sample correlation co-efficient justify this claim at 5% level?

Solution: 𝐻0 : 𝜌 = 0.7 𝐻1 : 𝜌 ≠ 0.7 𝛼 = 0.05

1 1+𝑟 1 1.5

𝑍= 𝑙𝑜𝑔𝑒 [1−𝑟] = 𝑙𝑜𝑔𝑒 [0.5] = 0.5943

2 2

1 1+𝜌 1 1.7

𝑍0 = 𝑙𝑜𝑔𝑒 [1−𝜌] = 𝑙𝑜𝑔𝑒 [0.3] = 0.8673

2 2

│𝑍− 𝑍0 │ │0.5943−0.8673│

Test statistic = 1

= 1.232

√1⁄𝑛−3 √

15

Accept the null hypothesis at 5% level. The claim that the correlation co-efficient in the population

is 0.7 is justified.

31) Two independent random samples of 5 and 12 pairs of observations respectively are taken from

two independent bivariate normal populations. The correlation co-efficients in the samples are 0.87

and 0.56. Test if the difference between the sample correlation co-efficients is significant at 5%

level.

Solution: Denote the correlation co-efficients in the two independent bivariate normal populations as

𝜌1 and 𝜌2. The null and alternative hypotheses are:

𝐻0 ∶ 𝜌1 = 𝜌2 𝐻1 ∶ 𝜌1 ≠ 𝜌2 𝛼 = 0.05

1 1+𝑟 1 1+0.87

𝑍1 = 𝑙𝑜𝑔𝑒 [1−𝑟1] = 𝑙𝑜𝑔𝑒 [1−0.87] = 1.3330

2 1 2

1 1+𝑟 1 1+0.56

𝑍2 = 𝑙𝑜𝑔𝑒 [1−𝑟2 ] = 𝑙𝑜𝑔𝑒 [1−0.56] = 0.6328

2 2 2

Test statistic = 1 1

= 1 1

= 0.896

√𝑛 −3 + √ +

1 𝑛2 −3 2 9

Accept the null hypothesis at 5% level. The correlation co-efficients in the two populations are not

significantly different from each other.

Let the two attributes be labeled 𝐴 and 𝐵. The different categories under 𝐴 and 𝐵 are

{𝐴1 , 𝐴2 , . . . 𝐴𝑖 . . . 𝐴𝑚 }, {𝐵1 , 𝐵2 , . . . 𝐵𝑗 . . . 𝐵𝑛 }

The following is the contingency table:

A B B1 B2 . . . Bj . . . Bn Total

A1 f11 f12 f1j f1m (A1)

A2 f21 f22 f2j f2m (A2)

. . .

Ai fi1 fi2 fij fim (Ai)

. . .

Am fm1 fm2 fmj fmn (Am)

Total (B1) (B2) . . . (Bj) . . . (Bn) N

𝐻0 : The two attributes 𝐴 and 𝐵 are independent.

The alternative hypothesis:

𝐻1 : They are not independent.

Under 𝐻0 the expected frequency of the cell (i, j), denoted as 𝑒𝑖,𝑗 is calculated as follows:

(𝐴𝑖 )×(𝐵𝑗 )

𝑒𝑖𝑗 =

𝑁

2

2 (𝑓𝑖𝑗 − 𝑒𝑖𝑗 )

The statistic 𝜒 = ∑𝑖 ∑𝑗 [

𝑒𝑖𝑗

] ~ 𝜒 2 (𝑚−1),(𝑛−1)

32) The following sample data pertain to the shipments received by a large firm from three different

vendors. Test at the 0.01 level of significance whether the three vendors ship products of equal

quality.

Number rejected Number imperfect Number perfect Total

but acceptable

Vendor A 12 23 89 124

Vendor B 8 12 62 82

Vendor C 21 30 119 170

Total 41 65 270 376

𝐻1 : There is significant difference in the quality of goods supplied by the vendors.

𝛼 = 0.01

Calculation of expected frequencies:

124 × 41 82 × 41

𝑒11 = = 13.5 𝑒21 = = 8.9

376 376

124 × 65 82 × 65

𝑒12 = = 21.4 𝑒22 = = 14.2

376 376

Number rejected Number imperfect Number perfect Total

but acceptable

Vendor A 13.5 21.4 89.1 124

Vendor B 8.9 14.2 58.9 82

Vendor C 18.6 29.4 122 170

Total 41 65 270 376

2

2 (𝑓𝑖𝑗 − 𝑒𝑖𝑗 ) (12 − 13.5)2 (8 − 8.9)2 (119 − 122)2

Test statistic = 𝜒 = ∑𝑖 ∑𝑗 [ ] = + + . . .+ = 1.1995

𝑒𝑖𝑗 13.5 8.9 122

Inference: 𝜒 2 4,0.01 = 13.277 1.1995 < 𝜒 2 4,0.01

There is sufficient evidence to say that the three vendors supply products of equal quality.

33) In 360 tosses of a pair of dice, 74 sevens and 26 elevens are observed. Test the hypothesis that

the dice are fair. Use a level of significance of 5%.

Solution: 𝐻0 : The dice are fair; all the six scores in each die are equally likely.

𝐻1 : The dice are biased.

𝛼 = 0.05

Under 𝐻0

6

Probability of a score of seven from two fair dice = 𝑃{(1,6)(2,5)(3,4)(4,3)(5,2)(6,1)} = 36

6

Expected number of sevens in 360 tosses = 360 × = 60

36

2

Probability of a score of eleven from two fair dice = 𝑃{(5,6)(6,5)} = 36

2

Expected number of elevens in 360 tosses = 360 × = 20

36

Score 7 11

𝑓𝑖,𝑗 74 26

𝑒𝑖,𝑗 60 20

2

(𝑓𝑖𝑗 − 𝑒𝑖𝑗 ) (74 − 60)2 (26 − 20)2

Test statistic = 𝜒 2 = ∑𝑖 ∑𝑗 [ ] = + = 5.067

𝑒𝑖𝑗 60 20

Inference:

The number of independent observations = 2

We have used one constraint that the dice are tossed 360 times.

Number of degrees of freedom = 2 – 1 = 1

𝜒 21,0.05 = 3.841 5.067 > 𝜒 21,0.05

There is sufficient reason to reject the null hypothesis at the level of significance of 5%. The dice are

suspected to be biased.

34) Over 5 years, in T-20 cricket matches, the team ‘Breezy Butterflies’ played 60 matches and won

35 of them. They took first batting in 24 matches and won in 18 of these. An enthusiast commented

that batting first is lucky for the team. Use the 𝜒 2 test to examine if there is any association between

first batting and winning. Use a level of significance of 5%.

Solution: 𝐻0 : First batting has no influence on winning.

𝐻1 : There is an association between first batting and winning.

𝛼 = 0.05

The contingency table showing the observed frequencies is:

No. of games batting No. of games batting Total

first second

No. of games won 18 17 35

Total 24 36 60

No. of games batting No. of games batting Total

first second

No. of games won 14 21 35

Total 24 36 60

2

2 (𝑓𝑖𝑗 − 𝑒𝑖𝑗 ) (18 − 14)2 (17 − 21)2 (6 − 10)2 (19 − 15)2

Test statistic = 𝜒 = ∑𝑖 ∑𝑗 [ ] = + + + = 4.57.

𝑒𝑖𝑗 14 21 10 15

Inference: 𝜒 21,0.05 = 3.841 4.57 > 𝜒 21,0.05

There is sufficient reason to reject the null hypothesis at the level of significance of 5%. There is an

association between first batting and winning. It is seen from the numbers that this is a positive

association. First batting has significantly helped the team in winning the game.

This is the test to determine whether a set of data may be taken as a random sample from a

population having a given distribution. The null hypothesis to be tested is:

𝐻0 : The population has the given distribution.

(𝑓𝑖 − 𝑒𝑖 )2

Test statistic = 𝜒 2 = ∑𝑚

𝑖=1 [ ] ~ 𝜒 2 𝑚−𝑡−1

𝑒𝑖

𝑓𝑖 = the observed frequency of the 𝑖 𝑡ℎ class

𝑒𝑖 = the theoretical (expected) frequency of the 𝑖 𝑡ℎ class under 𝐻0

𝑚 = the number of classes

𝑡 = the number of independent parameters estimated on the basis of the sample data

Critical Region: 𝝌𝟐 ≥ 𝝌𝟐 𝜶, 𝒎−𝒕−𝟏

Note: The expected frequencies 𝑒𝑖 ’s will be calculated from theoretical distribution under the null

hypothesis with the condition that ∑𝑖 (𝑒𝑖 ) = ∑𝑖(𝑓𝑖 ). This leads to a further loss of one degree of

freedom.

Important: The test is valid only if none of the 𝑒𝑖 ’s is less than 5. If any 𝑒𝑖 is less than 5, the

particular class would have to be combined with the neighbouring class so as to remedy the snag.

35) Four coins were tossed 160 times and 0, 1, 2, 3 or 4 heads showed, respectively, 19, 54, 23 and 6

times. Use the 5% level of significance to test whether it is reasonable to suppose that the coins are

balanced and randomly tossed.

Solution:

𝐻0 : The coins are balanced and randomly tossed so that the population is binomial with the

probability of heads = ½.

α = 0.05

The following table shows the calculation of the expected frequencies and the test statistic:

i 𝑒𝑖 = 𝑁 × (𝑛𝐶𝑖 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 ) 𝑓𝑖 (𝑓𝑖 − 𝑒𝑖 )2

𝑒𝑖

0 160 × (4𝐶0 (½)4 ) = 10 19 4.263

1 160 × (4𝐶1 (½)4 ) = 40 54 3.630

2 160 × (4𝐶2 (½)4 ) = 60 58 0.069

3 160 × (4𝐶3 (½)4 ) = 40 23 12.565

4 160 × (4𝐶4 (½)4 ) = 10 6 2.667

The value 𝑝 = ½ was taken from the null hypothesis and not the sample. Hence there is no loss of

any degree of freedom in that respect. The total N = 160 was taken from sample data. This constraint

makes one degree of freedom lost.

𝜈 = number of degrees of freedom = no. of classes –1= 5 – 1 = 4.

Inference: 𝜒 2 0.05,4 = 9.488 23.194 > 𝜒 2 0.05,4

We have no reason to accept the null hypothesis at 5%. It is not reasonable to suppose that the coins

are balanced and randomly tossed.

36) Each day, Monday through Saturday, a baker bakes three large chocolate cakes. Those not sold

on the same day are given away to a food bank. Use the data shown in the following table to test at

the 5% level of significance whether they may be looked upon as values of a binomial random

variable.

No. of cakes sold 0 1 2 3

No. of days 1 16 55 228

Solution: 𝐻0 : The observed values can be looked upon as values of a binomial random variable.

α = 0.05

The value of the parameter p of the binomial distribution cannot be found using general reasoning.

In such a case, p is found from the sample.

The mean of the binomial distribution is np. Calculate the value of the mean of the sample to find p.

1

Mean = [0 × 1 + 1 × 16 + 2 × 55 + 3 × 228] = 2.7

300

𝑛𝑝 = 3𝑝 = 2.7 ⇒ 𝑝 = 0.9

The following table shows the calculation of the expected frequencies and the test statistic:

i 𝑒𝑖 = 𝑁 × [𝑛𝐶𝑖 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 ] 𝑓𝑖 (𝑓𝑖 − 𝑒𝑖 )2

𝑒𝑖

0 300 × [3𝐶0 (0.9)0 (0.1)3 ] = 0.3 1

1 300 × [3𝐶1 (0.9)1 (0.1)2 ] = 8.1 16 (17 − 8.4)2

= 8.805

8.4

(55 − 72.9)2

2 300 × [3𝐶2 (0.9)2 (0.1)1 ] = 72.9 55 = 4.395

72.9

(228−218.7)2

= 0.395

3 300 × [3𝐶3 (0.9)3 (0.1)0 ] = 218.7 228 218.7

Frequency of the first class is 1 (< 5). This class has been combined with the next class so as to make

the frequency greater than 5.

Calculation of number of degrees of freedom:

𝑚 = number of classes = 3

𝑡 = number of independent parameters estimated using sample data = 1

𝜈=𝑚−𝑡−1=1

Inference: 𝜒 2 0.05,1 = 3.841 13.595 > 𝜒 2 0.05,1

We have no reason to accept the null hypothesis at 5%. The observed values cannot be looked upon

as values of a binomial random variable.

37) It is desired to test whether the number of gamma rays emitted per second by a certain

radioactive substance is a random variable having the Poisson distribution with λ = 2.4. Use the

following data obtained for 300 1-second intervals to test this null hypothesis at the level of

significance of 0.05.

No. of gamma rays 0 1 2 3 4 5 6 7 or more

Frequency 19 48 66 74 44 35 10 4

Solution: 𝐻0 : The given random variable has Poisson distribution with λ = 2.4.

α = 0.05

The table showing the calculation of the expected frequencies and the test statistic follows later.

Frequency of the last class is 4 (< 5). This class has been combined with the previous class so as to

make the frequency greater than 5.

Calculation of number of degrees of freedom:

𝑚 = number of classes = 7

𝑡 = number of independent parameters estimated using sample data = nil

𝜈=𝑚−𝑡−1=6

Inference: 𝜒 2 0.05,6 = 12.592 29.06 > 𝜒 2 0.05,6

We have no reason to accept the null hypothesis at 5%. The given random variable does not conform

to Poisson distribution with λ = 2.4.

𝑒𝑖 = 300 ×

𝑥! 𝑒𝑖

0 𝑒 − 2.4 (2.4)0 19 (19 − 27.22)2

300 × = 27.22 = 2.48

0! 27.22

𝑒 − 2.4 (2.4)1 (48 − 65.32)2

300 × = 65.32 = 4.59

1! 65.32

1 48

𝑒 − 2.4 (2.4)2 (66 − 78.38)2

2 300 × = 78.38 66 = 1.96

2! 78.38

𝑒 − 2.4 (2.4)3 (74 − 62.70)2

300 × = 62.70 = 2.04

3! 62.70

3 74

𝑒 − 2.4 (2.4)4 (44 − 37.62)2

4 300 × = 37.62 44 = 1.08

4! 37.62

𝑒 − 2.4 (2.4)5 (35 − 18.06)2

300 × = 18.06 = 15.89

5! 18.06

5 35

𝑒 − 2.4 (2.4)6 (14 − 10.70)2

6 300 × = 7.22 10 = 1.02

6! 10.70

3.48 4

7 or more

Total 300 300 29.06

Round and yellow: 315 Round and green: 108

Wrinkled and green: 32 Wrinkled and yellow: 101

Theory predicts that the frequencies should be in the proportion 9 : 3 : 3 : 1. Examine the

correspondence between theory and experiment at the level of significance of 0.05.

Solution: 𝐻0 : The frequencies are in the proportion 9 : 3 : 3 : 1

α = 0.05

Calculation of the expected frequencies:

Total number of seeds in the experiment = 556

9

Expected number of round and yellow seeds = 556 × 16 = 312.75

3

Expected number of wrinkled and yellow seeds = 556 × 16 = 104.25

3

Expected number of round and green seeds = 556 × 16 = 104.25

1

Expected number of wrinkled and green seeds = 556 × 16 = 34.75

(𝑓𝑖 − 𝑒𝑖 )2 (315 – 312.75)2 (101 – 104.25)2 (108 – 104.25)2 (32 – 34.75)2

𝜒 2 = ∑𝑖 [ ]= + + + = 0.4700

𝑒𝑖 312.75 104.25 104.25 34.75

𝑚 = number of classes = 4

𝑡 = number of independent parameters estimated using sample data = nil

𝜈=𝑚−𝑡−1=3

Inference: 𝜒 2 0.05,3 = 7.815 0.47 < 𝜒 2 0.05,3

Accept 𝐻0 at 5% level. We have no reason to suspect the null hypothesis at 5% level. The theory that

the frequencies are in the proportion 9 : 3 : 3 : 1 is justified by the result of the experiment.

---------------------

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.