Вы находитесь на странице: 1из 11

Hi everyone! Hope you have had a good evening so far.

I have a lot of information to share with you tonight. We may not be able to go over all of
them in our seminar but I have written them (hopefully) straightforward enough that you
can go over them after our seminar and follow the concept. My notes will give you the
big picture and of course for a little more detail you also need to read the book along with
it.

I also want to use whiteboard at the end of our seminar tonight to go over Z calculations
again as they are very important in what we will do for the rest of this course. I know Z
calculation is a challenging task so hopefully by trying to show the work visually to you
it can be learned easier and better.

Please use all the options to get help in this class. You can use our office hours on
Sundays (9:00 pm to 11:00 pm ET on AIM) to get a one-on-one help.

You can also email me your questions but I really prefer that you post your questions on
the board (under Any Questions link) so other students can also benefit from the
questions and their answers.

You can also use the NetTutor online tutoring service that is sponsored by Kaplan. To
access their service just click on the NetTutor icon on your "MyDesk" page.

Anyone is using them on a regular basis and if so, are you happy with the service? I know
I have talked about this before but it won’t hurt to share it again.

It is not too late to catch up in this class but you need to hurry up a little as concepts are
going to get a little busier and you need to spend more time to understand them. It won’t
necessarily get harder, just busier. I don’t mean to scare you. I just want you to be aware
of it so you can plan your time accordingly.

I know some of you have some difficulties with the use of Excel in doing some work for
you. From now on, some of the calculations are a little more time consuming and
challenging so I really suggest that you start using Excel commands and functions to
answer the problems more often, if you are able to.

You can get any information about a statistical procedure by simply typing the name of
the procedure in the HELP command of Excel. You will get an explanation and example
for that command or procedure (i.e. mean, standard deviation, regression) .

Before I start going over my notes, I would like to say that I have modified several excel
templates to help you with your work. They are originally from the publisher (which are
inside the “Excel” folder which is inside the Business Stats folder) but I have made a lot
of notes and colors to it to make it more user friendly for my students.
I color coded them for you so you can use them easy. The green parts are the parts that
you enter the values at. Once you do it, the yellow part that is the *result section* will
change accordingly. Just to make sure to press Enter when you enter your last
information.

In most cases all you have to do is just entering mean, standard deviation and sample size
(or, a few other info) for Excel to do the work for you! I will go over one example using
one of them shortly. I have posted them all on Doc Sharing for you. They are a great help
with your assignments!

Lets talk about Estimation. Chapter 9 is about Point and Interval Estimates and
Estimation.

In most cases we don’t know the mean and standard deviation of a particular parameter of
interest (like height) of a large population (think of census data, for example). So, we get
an estimate of those values by getting a sample from the population and calculating the
mean and standard deviation of that sample. We call these sample mean and sample
standard deviation values the “Point Estimates” of the population mean and population
standard deviation.

Since these sample means and standard deviations may not be very accurate (i.e., the
sample may not reflect the good sample from the population) then we want to set an
interval around the value of sample mean and express that this interval contains true
population mean with a certain degree of accuracy. This is called confidence interval.

The Computer Solutions section of each section tells us how to get the upper and lower
level of the confidence interval.

Here is one example to calculate the confidence interval for us.

Suppose we observe that, in a sample of 50 commuters, the average length of travel to


work is 30 minutes with a population standard deviation of 2.5 minutes. We can be 95
percent confident that the population mean is in the interval 29.3 and 30.7 minutes. Here
is the work.

Click on a cell and then type =CONFIDENCE(0.05,2.5,50) in the Excel input box and
click on ok. We only use the sample size and standard deviation in this command. You
will get the value of 0.692951.

So, the expression =CONFIDENCE(0.05,2.5,50) equals 0.692951 or rounded to 0.7.


Therefore, the interval of the average length of travel to work (30 minutes) is calculated
as: 30 +/- 0.7 minutes. This results in an interval of 30 + 0.7 = 30.7 and 30 – 0.7 =
29.3.

So, we are 95% confident that the commute time interval is from 29.3 to 30.7 minutes
(one time add 0.7 to 30 and one time subtract from it).

30 - 0.7 = 29.3 and 30 + 0.7 = 30.7

OK, now…Lets talk about Hypothesis Testing…

Basically, t test statistic and Z test statistic are used in Hypothesis testing to reject or
accept a claim. The claim is usually Null Hypothesis (called H0) and if we reject H0 we
automatically accept Alternative Hypothesis (called H1) because that is the only other
option (kind of like plan B) available to us.

Null and Alternative hypothesis are kind of complement of each other. For example, if
Null hypothesis claims that mean value of something is less than or equal to a certain
value (book call this directional) then alternative would be mean value is greater than that
value. Or, if Null says mean is equal to a certain value then Alternative says mean is
NOT equal to that value. Book call it non directional because it can go to either direction.

Does it makes sense?

Formulas for calculating t and Z are in the book. If population standard deviation (sigma)
is not known, we use t test and if sigma is known (given) we use Z test. These are not
new to us.

The calculations, the meaning of alpha and P-value and conclusion process are the same
in both methods but formulas are a little different. We will get familiar with getting a t
value from the table in our seminar a little later on tonight.

The flow chart on page 369 is a great help in guiding you on which method to use for any
particular Hypothesis situation.

In chapter 10, we want to infer about a population mean when the standard deviation of
the population is NOT KNOWN.

This is the more practical case as we usually don't have the standard deviation of a
population parameter. If you recall, when we know the standard deviation of the
population we use Z test. Now, we use student t test (which has a formula which is very
similar to Z test formula) because standard deviation of population is not known.

The only difference is that we use standard deviation of the sample, s, in the formula,
instead of sigma. Calculation and conclusion of t test is very similar to the calculation
and conclusion of a Z test. We can call these values -calculated t or Z

The formula of Z is: Z = (x - mean value) / (standard deviation of population / SQRT n)

The formula of t is: t = (x - mean value) / (sample standard deviation / SQRT n)

The only major difference is finding the value of t from the table (we can call it critical
t). To find a t value from the table t in the back of the book, you need to take TWO things
to the t table.

One is alpha (that you already know about and is usually given) and the other element is
DEGREE of FREEDOM. Degree of freedom is just a number that helps us to have a
more accurate value for our t statistic.

DF (degree of freedom) value is sample size, n, minus 1 (n-1). It is basically another


factor that comes to play to bring accuracy to the calculations based on different sample
sizes. That is all there is into degree of freedom for us!

So, as book shows in the back of the book in Table t, if you are looking for a t value
when alpha is 5% (one-sided or one-tailed test) and sample size is 74, you go to the Table
and look up the t (0.05,73). The t value when sample size is 73 and alpha is 0.05 is 1.666.
Let me know if you are not getting this value from the t table in the back of the book right
before the Z table.

Just go to the page before the last page in the book (t distribution) and go down on df
column until you see the value of 73 (df for this example is df = sample size - 1 = 74-1=
73)

Just remember that the values in the body of the table represent the shaded area (blue) in
the t distribution as it is shown in the back of the book table t.
If sample size approaches the value of infinity then t distribution approaches standard
normal distribution and the two curves become identical. So, for example, Z of alpha
0.05 = 1.645 which is the same value as t (df=infinity, alpha 0.05) = 1.645.

To compare two population means, we use the t statistic with a different formula. This is
a very useful procedure because you can compare the performance of the two population
or group by comparing their mean values. These groups could be two classes thought by
the same teacher, or two say assembly groups in which we want to see if there is any
difference in their performance of these groups.

The structure of the formula should be recognizable. The only difference is that we are
dealing with the difference between the values of means and the two x values.

The flow chart on page 426 is very clearly giving us the formulas whether the population
standard deviations are the same or not .

All the process and conclusion is very similar to the t test (only different formulas used).

If the standard deviations of the two populations are equal


then the degree of freedom would be df= v= n1+n2-2 but if the standard deviations are
different then df should be calculated using the formula on page 426.

I am going to cover a couple of examples in detail today. I will also tell you what to do to
use the required Excel function for this example in example on page 387.

If you learn how to deal with this problem, you should be able to approach and
understand most problems in this lesson because the process is just the same. Here is the
process. Basically, you calculate a test statistic (t, Z, F, etc.) and compare it with its
critical test statistic to either reject or not reject the Null hypothesis.

I wait a few seconds until you get to example on page 387 in the book.

In this example, the Chekzar company claims that its tires mean life-time is at least
60,000 in highway driving. The editors of a Consumer magazine are skeptical about this
claim so they buy 36 tires and test them in a highway.

They found that the mean tire life of these 36 tires is 58,341.69 miles and standard
deviation is 3632.53 miles. Based on this data, the editors want to test the hypothesis that
the tire company claims tire life-time is greater than 60,000.
This is how he sets up the null hypothesis. Company says Ho: Mean (tire life-time)
greater than or equal to 60,000 and… H1: Mean less than 60,000. If editors manage to
reject H0 with the sample that they took, they will get what they really wanted which is
rejecting the company’s claim that the tire life-time is greater than 60,000.

To solve this problem you can use Excel program (you do not have to do it now. You can
do it after the seminar). Start Excel and open a new blank document and go to Excel
folder inside the “Business Stats data” folder inside your C drive (or the location you
saved it when you downloaded CD onto your computer) to get data file CX10CHECK.

Once you click on data file it displays the data in a Excel worksheet. Either highlight the
data that you opened or enter A1:A37 into the INPUT RANGE box. Also, on page 378
there is the instruction on how to do it.

Now, click on Tools in your Excel menu bar, select Data Analysis Plus, and then open -t
Test: Mean- and -t Estimate: Mean-. Enter the 60000 for Hypothesized Mean and 0.01 in
to the alpha box and click on OK. The Excel output will be displayed on the Excel
worksheet. The Computer Solutions 10.2 will help you with the actual Excel work by
telling you step-by-step what you need to do.

This output is displayed on page 391 in the book. We get computed t= -2.739 (we call it
computed because if we compute the t formula we get the same answer). This -computed-
t is called "t-stat" in the Excel output on page 391.

Again, ignore MINITAB instructions in the book. That is for colleges who have installed
MINITAB software on their computers in their computer labs or had students buy the
software.

Critical t is also listed in the output under name *t-critical-one tail*. Its value is 2.438 and
it is usually for the right side. You can get this info also from table t at the end of the book
if you are doing your work manually. If you go to the table t for alpha 1% and degree of
freedom n-1 = 36-1=35, you get, … t 0.01,35 = 2.438.

Since calculated t has a negative value then we need to focus on the left side corner for
the critical region and therefore we should use a critical t value of –2.438 in this case
(instead of 2.438 which would show the work on the right corner). Recall that due to
symmetry, t< -2.438 and t > 2.438 will have the same probability.

For a graphical example, look at figure 10.7 on page 388.


Basically, if your computed t is between +/- critical t you would accept the null
hypothesis and if it is outside of those values you reject null. So, here because computed
t= -2.739 which is outside of the =- 2.438 then we reject the null (claim).

-2.739 is outside the range of critical t = -2.43…and…..critical t = 2.438 or in short, it is


outside (-2.438……….. 2.438) values.

You can similarly compare alpha with P-value to reject or accept the null hypothesis. This
is much easier than comparing critical t and computed t.

P-value is the probability that we get by taking the calculated Z value to the table. It is
just a new name for something that we already knew how to calculate.

The P-value is given to you in the output. It is called *P(T<=t) one tail*. So, P-value is
0.0048 and since P-value is smaller than alpha (0.01) then we reject the Null hypothesis
in favor of alternative hypothesis.

So. You can either compare alpha with P-value or computed t with critical t to either
accept or reject the Null hypothesis. So, we reject H0 if P-value less than alpha or
calculated test statistic (F, t, Z, etc.) is less than Critical (Table) value of the test statistics
in this problem.

If the hypothesis was in the right side of the standard normal curve then we reject H0 if
P-value less than alpha (same as before) or if calculated test statistic (F, t, Z, etc.) is
greater than Critical (Table) value of the test statistics in this problem.

So, basically, when calculated value is in the rejection region (left side or right side) we
reject the Null hypothesis.

For this problem: computed t is less than critical t (or P-value less than alpha) so we
reject the Null hypothesis. It you have the output it is easier to just compare P-value with
alpha.

Now, what was the Null? It was: mean life-time of tires being greater than 60,000. We
rejected this null hypothesis. Therefore, we are in favor of alternative hypothesis that says
mean tire life-time is less than 60,000.

If your Alternative hypothesis was: H1: mean tire life-time not equal to 60,000 it means
that it can be less or more than 60,000 and that makes it a two tail test (remember the
confidence interval concepts?). So, you need to use the critical values of *P(T<=t) two
tail* and *t-critical-two tail* in your comparisons in that case.
Now lets talk about two things about hypothesis testing that is usually a challenge to
some students.

1. After reading a problem how do I know if a Hypothesis is supposed to be one-


sided or two-sided.

Yes, if the null says more than or less than then it is directional or one-sided. Otherwise,
it is non-directional or two-sided test.

2. What would be different in our calculations if we know our problem is a two-sided


problem?

We use half of alpha to use in finding a critical value for the t or Z because in a two-sided
test alpha is divided into two corners. Each corner contains half of alpha value.

If you followed and understood the process and what we did you should be able to apply
the same process to other problems in this lesson because the underlying process is
exactly the same.

Now another example:

A study of the process costs indicates that the average weight of the diamonds must be
greater than 0.5 karat in order that the process be operated at a profitable level. Do the
six diamond-weight measurements, 0.46, 0.61, 0.52, 0.48, 0.57, 0.54 present sufficient
evidence to indicate that the average weight of the diamonds produces by the process is
in excess of 0.5 karat?

We use t test because sample size is 6 (less than 30). It is a one-sided test because
question is about the value being “greater than”.

H0: population average weight of the diamonds (mu) = 0.5


H1: population average weight of the diamonds (mu) > 0.5

We decide that the value of alpha to be 0.05 (rejecting top 5% of the t values). The degree
of freedom is sample size minus 1 so degree of freedom (df) for this problem is 6-1 = 5.
The Critical t value has the format of t alpha, df. So, for this problem, it is: t 0.05, 5=
2.015 (from t table in the back of the book).

That is, we will reject the Ho if the calculated t (calculated using the formula) is greater
that maximum acceptable table t which is 2.015 (for this problem). In that case, we say
the calculated t is too large to be accepted according to our 5% policy.
So, the Rejection Region for alpha = 5% and (6-1)= 5 degrees of freedom is when
calculated t (using the formula) is greater than 2.015 (look at the t distribution figure on
the top of t –table in the back of the book. The red area is the rejection area).

If you use the t formula for this problem you will find calculated t value to be 1.31. In
this case calculated t is less than critical t (table t), therefore, we do not reject the H0.
This implies that the data do not present sufficient evidence to indicate that the mean
diamond weight exceeds 0.5 karat.

You can make the same conclusion by comparing alpha and P-value too.
Alpha corresponds to table t and P-value corresponds to calculated t probabilities,
respectively.

So, you reject the H0 in this problem if

Calculated t > Critical t (or Table t) or P-value < alpha. It is easier to compare
“calculated t” and “table t” for conclusion.

Now if the question was

Do the six diamond-weight measurements, 0.46, 0.61, 0.52, 0.48, 0.57, 0.54 present
sufficient evidence to indicate that the average weight of the diamonds produces by the
process is less than or greater than (or simply not equal to) 0.5 karat?

We use a 2-sided t test to answer the problem. We calculate the t value the same way and
compare it with table t. The only difference is the value of table t as we need to pick t
( half-alpha, n-1) which is t 0.025, 5.

Since calculated t (1.31) is less than table t (2.571) we accept the H0.

Still useful notes from Past Seminars.

Here are the 4 cases of Z calculations that I discussed in our Unit 4 seminar. If you draw
a normal curve and plug in these values on the graph and visually see what is going on it
will help you to learn the Z calculations much easier. Let’s go over them again here.

If your Z value (after you calculate it) is negative and the sign is Less than (less than), you subtract 0.5
from the probability that you get from Table 3 for Z= -1.1

For example: P (Z < -1.1) = P ( Z < 0) - P ( -1.1 < Z < 0 ) = 0.5 (left side of curve) – P (-1.1 < Z < 0) =
0.5 (left side of curve) – P (0 < Z < 1.1) = 0.5-0.3643 (from the table) = 0.1357

by symmetry, P (-1.1 < Z < 0) = P (0 < Z < 1.1 ) = 0.3643


If your Z value (after you calculate it) is negative and the sign is greater than (>), you add 0.5 to the
probability that you get from Table 3 for Z= -1.1

For example: P (Z > -1.1) = P (-1.1 < Z < 0) + P ( Z > 0 ) = P (-1.1 < Z < 0) + 0.5 (right side of curve) =
P (0 < Z < 1.1) + 0.5 (right side of curve) = 0.3643 (from the table)+ 0.5 (right side of curve) = 0.8643

by symmetry, P (-1.1 < Z < 0) = P (0 < Z < 1.1 ) = 0.3643

If your Z value (after you calculate it) is positive and the sign is Less than (<), you add 0.5 to the
probability that you get from Table 3 for Z=1.1

For example: P (Z < 1.1) = P ( Z < 0) + P (0 < Z < 1.1) = 0.5 (left side of curve) + 0.3643 (from the
table)= 0.8643

If your Z value (after you calculate it) is positive and the sign is greater than (>), you subtract 0.5 from the
probability that you get from Table 3 for Z=1.1

P (Z > 1.1) = P ( Z > 0) - P ( 0 < Z < 1.1 ) = 0.5 (right side of curve) – P ( 0 < Z < 1.1 ) = 0.5 - 0.3643
(from the table) = 0.1357

Here is an example on how to find the area under the normal curve for a given x
value using Excel.

Suppose the test scores are normally distributed with a mean of 80 and standard
deviation of 5.

Question: How many percent of test scores are less than 85? We can use Excel to
do it for us!

Just open a blank Excel sheet and click on a cell to make it active and type
=NORMDIST(85,80,5,TRUE) in the Excel and click Enter. The cumulative
probability of all test scores less than or equal to 85 will be shown in the cell in
decimal value. The answer will be 0.841345 or almost 84%.

In this formula, from left to right, the first value is the given test value,X, the second
value is the mean, and the third value is standard deviation. The word TRUE means
yes give me the cumulative percentage from the X value all the way to the left of the
normal curve. If you use false in the formula you are asking for the probability of X
being between 84.5 and 85.5.

Now, if you are given the area under the curve (cumulative percentage) and you are
looking for the X value that corresponds to that cumulative probability you use the
following formula: =NORMINV(0.841345,80,5) equals 85. So, the X = 85.

If you are looking for a Z value given the cumulative percentage of an X, then you
use: NORMSINV command. So , NORMSINV(0.841345) equals Z = 1.3333

If you are looking for the Z value of any X value you can use the following formula.
For instance the Z value of X=42 (given value) when mean is 40 and standard
deviation is 1.5 equals =STANDARDIZE(42,40,1.5) which equals 1.333333
(Z=1.333333).

Вам также может понравиться