Вы находитесь на странице: 1из 51

Hypothesis Testing for Population Mean

Arun Kumar, Ravindra Gokhale, and Nagarajan


Krishnamurthy
Quantitative Techniques-I, Term I, 2012
Indian Institute of Management Indore

Case: Quality Wireless B

Daily hold time at the service call center has mean 79.50
seconds and standard deviation 16.86 seconds.
Performance over ten day in Ray Jacksons absence:
Average hold time is 86.6 seconds.
Performance over ten day in Ray Jacksons presence:
Average hold time is 74.4 seconds.

Jacksons Management Approach?

Jacksons Management Approach?

Not accounting for process variation.

How to make more informed decision?

To ascertain whether 74.4 and 86.6 seconds wait time, which


is different than known average wait time of 79.5, can be
attributed to the natural process variation.

Statistical problem?

Statistical problem?

To ascertain whether wait time of 74.4 seconds (or 86.6


seconds) is an outlier even when process variation is taken into
account.

Framing a Hypothesis

Null Hypothesis
Stating what would be considered normal if no change
happened.
H0 : = 79.5

Framing a Hypothesis

Null Hypothesis
Stating what would be considered normal if no change
happened.
H0 : = 79.5
Alternative Hypothesis
Stating what would happen when a change happens.
HA : 6= 79.5

What conclusions we make?

We fail to reject the null hypothesis.

What conclusions we make?

We fail to reject the null hypothesis.


We reject the null hypothesis.

Two types of error

Truth

Conclusion
Not Reject
Reject
Not Reject OK
Type I Error ()
Reject
Type II error () OK

Type I and Type II error

You can not control both simulatenously.

Type I and Type II error

You can not control both simulatenously.

Type I and Type II error have inverse relationship for a fixed


sample size.

Controlling Type II error


Effect of sample size on Type II error
Increasing the sample size reduces Type II error keeping Type I
error constant. Of course, you can reduce both type I and type
II error to a desirable level by having a sufficiently large sample
size but cost will be a factor.

Controlling Type II error


Effect of sample size on Type II error
Increasing the sample size reduces Type II error keeping Type I
error constant. Of course, you can reduce both type I and type
II error to a desirable level by having a sufficiently large sample
size but cost will be a factor.
Power of a test
Reduction in implies that 1 increases. 1 is also
known as the power of the test.

Controlling Type II error


Effect of sample size on Type II error
Increasing the sample size reduces Type II error keeping Type I
error constant. Of course, you can reduce both type I and type
II error to a desirable level by having a sufficiently large sample
size but cost will be a factor.
Power of a test
Reduction in implies that 1 increases. 1 is also
known as the power of the test.
Be aware of tests that do not have much power.

Significance Level

Acceptable value of Type I error is also known as significance


level.

Significance Level

Acceptable value of Type I error is also known as significance


level.
The best practice is to set the significance level before data is
collected.

What significance level you want to set for Quality


Wireless Example?

What significance level you want to set for Quality


Wireless Example?

5% and 1% are the commonly used levels of significance.

How to conduct the test?

How to conduct the test?

Find out the test statistic.

How to conduct the test?

Find out the test statistic.


Find out the critical value.

How to conduct the test?

Find out the test statistic.


Find out the critical value.
If the absolute value of the test statistic is greater than
the absolute value of the critical value then we reject the
hypothesis. Otherwise we conclude that the data does
not provide sufficient evidence to reject the hypothesis.

Test Statistic for

z=

x 0

/ n

0 is the hypothesized value for .

Test statistic in Quality Wireless case

z=

(74.479.5)

(16.86/ 10)

= 0.95

Critical value

Critical value is Z/2 , where is the significance level.

Quality wireless case

For = 0.05, critical value is 1.96. We fail to reject the null


hypothesis because | 0.95| < 1.96.

Conclusion from the statistical analysis

At significance level of 5%, we fail to reject the null hypothesis


that the performace of the call center was significantly
different than 79.5 seconds.

Conclusions

Conclusions

System was not out of control.

Conclusions

System was not out of control.

Conclusions

System was not out of control.


What if significance level is changed?

Exercise 1

A Vice President in charge of sales for a large corporation


claims that salesperson are averaging no more than 15 sales
per week. Ideally the number should be higher. In order to
check the claim, 36 salespersons are selected at random, and
the number of contacts made by each is recorded for a week.
The mean and the variance of the 36 measurements were 17
and 9, respectively. Does the evidence contradict the vice
presidents claim? Use a test with level = 0.05.

p-value

Probability of observing a data that is more extreme than the


observed data is known as p-value.
We reject the null hypothesis if p-value < significance level.

Calculating the p-value

p-value is 2 P(Z |z|) when alternative is 6=. Z is the


standard normal random variable and z is the test statistic.

Quality Wireless Example

p-value=2*P(Z | 0.95|)=0.34.
You will fail to reject the Null Hypothesis at both 1% and 5%
significance level because p-value is much bigger than 0.01
and 0.05.

Exercise 1...

A Vice President in charge of sales for a large corporation


claims that salesperson are averaging no more than 15 sales
per week. Ideally the number should be higher. In order to
check the claim, 36 salespersons are selected at random, and
the number of contacts made by each is recorded for a week.
The mean and the variance of the 36 measurements were 17
and 9, respectively. Does the evidence contradict the vice
presidents claim? Answer the question using p-value.

Understanding the z-test

Under null hypothesis,


large.

n 0
X

/ n

N(0, 1) as long as n is

Understanding the z-test

0 N(0, 1) as long as n is
Under null hypothesis, X/n
n
large.
Now if significance level is then any x (sample mean)
that is more than z/2 or less than z/2 provides
evidence against = 0 .

Understanding the z-test

0 N(0, 1) as long as n is
Under null hypothesis, X/n
n
large.
Now if significance level is then any x (sample mean)
that is more than z/2 or less than z/2 provides
evidence against = 0 .

Understanding the z-test

0 N(0, 1) as long as n is
Under null hypothesis, X/n
n
large.
Now if significance level is then any x (sample mean)
that is more than z/2 or less than z/2 provides
evidence against = 0 .

What if is unknown?

Understanding the z-test

0 N(0, 1) as long as n is
Under null hypothesis, X/n
n
large.
Now if significance level is then any x (sample mean)
that is more than z/2 or less than z/2 provides
evidence against = 0 .

What if is unknown?
Ans. Old friend t-distribution will help.

t-test

Under null hypothesis,

n 0
X

s/ n

tn1 for any n.

t-test

Under null hypothesis,

n 0
X

s/ n

tn1 for any n.

Now if significance level is then any x (sample mean)


that is more than t(/2,n1) or less than t(/2,n1)
provides evidence against = 0 .

Remembering Quality Wireless example

Test statistic in Quality Wireless case


Assuming s = 16.86, z =

(74.479.5)

(16.86/ 10)

= 0.95

Quality wireless case: critical value using normal


distribution

For = 0.05, critical value is 1.96. We fail to reject the null


hypothesis because | 0.95| < 1.96.

Quality wireless case: critical value using


t-distribution

For = 0.05, critical value is t(/2,9) = 2.262. We fail to


reject the null hypothesis because | 0.95| < 2.262.

Quality wireless case: What is the p-value using


t-distribution?

p-value>0.2

Exercise 1...

A Vice President in charge of sales for a large corporation


claims that salesperson are averaging no more than 15 sales
per week. Ideally the number should be higher. In order to
check the claim, 36 salespersons are selected at random, and
the number of contacts made by each is recorded for a week.
The mean and the variance of the 36 measurements were 17
and 9, respectively. Does the evidence contradict the vice
presidents claim? Calculate p-value using t-distribution.