Вы находитесь на странице: 1из 24

Jan Czarina D.

Macatangay
BSN2-2

One Sample T-Test


The one sample t-test is a statistical procedure used to determine whether a sample of
observations could have been generated by a process with a specific mean. Suppose you are
interested in determining whether an assembly line produces laptop computers that weigh five
pounds. To test this hypothesis, you could collect a sample of laptop computers from the
assembly line, measure their weights, and compare the sample with a value of five using a
one-sample t-test.
Hypotheses
There are two kinds of hypotheses for a one sample t-test, the null hypothesis and the
alternative hypothesis. The alternative hypothesis assumes that some difference exists
between the true mean (μ) and the comparison value (m0), whereas the null hypothesis
assumes that no difference exists. The purpose of the one sample t-test is to determine if the
null hypothesis should be rejected, given the sample data. The alternative hypothesis can
assume one of three forms depending on the question being asked. If the goal is to measure
any difference, regardless of direction, a two-tailed hypothesis is used. If the direction of the
difference between the sample mean and the comparison value matters, either an upper-
tailed or lower-tailed hypothesis is used. The null hypothesis remains the same for each type
of one sample t-test. The hypotheses are formally defined below:
• The null hypothesis (H
 0
 
 ) assumes that the difference between the true
mean (μ
 ) and the comparison value (m
 0
 
 ) is equal to zero.
• The two-tailed alternative hypothesis (H
 1
 
 ) assumes that the difference
between the true mean (μ
 ) and the comparison value (m
 0
 
 ) is not equal to zero.
• The upper-tailed alternative hypothesis (H
 1
 
 ) assumes that the true mean
(μ
 ) of the sample is greater than the comparison value (m
 0
 
 ).
• The lower-tailed alternative hypothesis (H
 1
 
 ) assumes that the true mean
(μ
 ) of the sample is less than the comparison value (m
 0
 
 ).
The mathematical representations of the null and alternative hypotheses are defined below:
H
 0
 :μ=m
 0

H
 1
 :μ≠m
 0
 
 (two-tailed)
H
 1
 :μ>m
 0
 
 (upper-tailed)
H
 1
 :μ<m
 0
 
 lower-tailed)
Note. It is important to remember that hypotheses are never about data, they are about the
processes which produce the data. If you are interested in knowing whether the mean weight
of a sample of laptops is equal to five pounds, the real question being asked is whether the
process that produced those laptops has a mean of five.
Assumptions
As a parametric procedure (a procedure which estimates unknown parameters), the one
sample t-test makes several assumptions. Although t-tests are quite robust, it is good practice
to evaluate the degree of deviation from these assumptions in order to assess the quality of
the results. The one sample t-test has four main assumptions:
• The dependent variable must be continuous (interval/ratio).
• The observations are independent of one another.
• The dependent variable should be approximately normally distributed.
• The dependent variable should not contain any outliers.
Level of Measurement
The one sample t-test requires the sample data to be numeric and continuous, as it is based
on the normal distribution. Continuous data can take on any value within a range (income,
height, weight, etc.). The opposite of continuous data is discrete data, which can only take on
a few values (Low, Medium, High, etc.). Occasionally, discrete data can be used to
approximate a continuous scale, such as with Likert-type scales.
Independence
Independence of observations is usually not testable, but can be reasonably assumed if the
data collection process was random without replacement. In our example, we would want to
select laptop computers at random, compared to using any systematic pattern. This ensures
minimal risk of collecting a biased sample that would yield inaccurate results.
Normality
To test the assumption of normality, a variety of methods are available, but the simplest is to
inspect the data visually using a histogram or a Q-Q scatterplot. Real-world data are almost
never perfectly normal, so this assumption can be considered reasonably met if the shape
looks approximately symmetric and bell-shaped. The data in the example figure below is
approximately normally distributed.

Figure 1. Histogram of an approximately normally distributed variable.

Outliers
An outlier is a data value which is too extreme to belong in the distribution of interest. Let’s
suppose in our example that the assembly machine ran out of a particular component,
resulting in a laptop that was assembled at a much lower weight. This is a condition that is
outside of our question of interest, and therefore we can remove that observation prior to
conducting the analysis. However, just because a value is extreme does not make it an
outlier. Let’s suppose that our laptop assembly machine occasionally produces laptops which
weigh significantly more or less than five pounds, our target value. In this case, these extreme
values are absolutely essential to the question we are asking and should not be removed.
Box-plots are useful for visualizing the variability in a sample, as well as locating any outliers.
The boxplot on the left shows a sample with no outliers. The boxplot on the right shows a
sample with one outlier.
Figure 2. Boxplots of a variable without outliers (left) and with an outlier (right).

Procedure
The procedure for a one sample t-test can be summed up in four steps. The symbols to be
used are defined below:
Y=Random sample
y
 i
 =
 The i
 th
 
 observation in Y
n=
 The sample size
m
 0
 =
 The hypothesized value
y
 ¯
 ¯
 ¯
 =
 The sample mean
σ
 ^
 =
 The sample standard deviation
T=
 The critical value of a t-distribution with (n−1
 ) degrees of freedom
t=
 The t-statistic (t-test statistic) for a one sample t-test
p=
 The p
 -value (probability value) for the t-statistic.
The four steps are listed below:
1. Calculate the sample mean.
y
 ¯
 ¯
 ¯
 
 =y
 1
 +y
 2+⋯+y
 n
 
 n
 

2. Calculate the sample standard deviation.

σ
 ^
 =(y
 1
 −y
 ¯
 ¯
 ¯
 
 )
 2
 +(y
 2
 −y
 ¯
 ¯
 ¯
 
 )
 2
 +⋯+(y
 n
 −y
 ¯



¯
 ¯
 
 )
 2
 
 n−1
 
 
 −
 −
 −
 −
 −
 −
 −
 −
 −
 −
 −
 −
 −
 −
 −
 −
 −
 −
 −
 −
 −

−
 −
 
 √

3. Calculate the test statistic.
t=y
 ¯
 ¯
 ¯
 
 −m
 0
 
 σ
 ^
 /n
 
 √
 
 

4. Calculate the probability of observing the test statistic under the null
hypothesis. This value is obtained by comparing t to a t-distribution with (n−1
 )
degrees of freedom. This can be done by looking up the value in a table, such as those
found in many statistical textbooks, or with statistical software for more accurate
results.
p=2⋅ Pr(T>|t|)
 (two-tailed)
p=Pr(T>t)
 (upper-tailed)
p=Pr(T<t)
 (lower-tailed)
Once the assumptions have been verified and the calculations are complete, all that remains
is to determine whether the results provide sufficient evidence to reject the null hypothesis in
favor of the alternative hypothesis.
Interpretation
There are two types of significance to consider when interpreting the results of a one sample
t-test, statistical significance and practical significance.
Statistical Significance
Statistical significance is determined by looking at the p-value. The p-value gives the
probability of observing the test results under the null hypothesis. The lower the p-value, the
lower the probability of obtaining a result like the one that was observed if the null hypothesis
was true. Thus, a low p-value indicates decreased support for the null hypothesis. However,
the possibility that the null hypothesis is true and that we simply obtained a very rare result
can never be ruled out completely. The cutoff value for determining statistical significance is
ultimately decided on by the researcher, but usually a value of .05 or less is chosen. This
corresponds to a 5% (or less) chance of obtaining a result like the one that was observed if
the null hypothesis was true.
Practical Significance
Practical significance depends on the subject matter. In general, a result is practically
significant if the size of the effect is large (or small) enough to be relevant to the research
questions being investigated. It is not uncommon, especially with large sample sizes, to
observe a result that is statistically significant but not practically significant. Returning to the
example of laptop weights, an average difference of .002 pounds might be statistically
significant. However, a difference this small is unlikely to be of any interest. In most cases,
both practical and statistical significance are required to draw meaningful conclusions.

ONE SAMPLE/SINGLE SAMPLE T-TEST HYPOTHESIS TEST


A coffee shop relocates to Italy and wants to make sure that all lattes are consistent. They
believe that each latte has an average of 4 oz of espresso. If this is not the case, they must
increase or decrease the amount. A random sample of 25 lattes shows a mean of 4.6 oz of
espresso and a standard deviation of .22 oz. Use alpha = .05 and run a one sample t-test to
compare with the known population mean.
Step 1: What is Ho and Ha?
Ho: mean amount of espresso in a latte = 4 oz
Ha: mean amount of espresso in a latte NOT= 4oz.
Should you use t-test or z-test and why?
This is best as a t-test because our sample size is smaller (less than 30) and we know the
sample standard deviation, but not the population standard deviation.
Notice that this is a TWO-TAILED test. However, if the question had asked if the amount is
greater or less than, the test would have been different and would have been one-tailed.
It is also possible to run a one-tailed test here because the sample mean is greater than the
population mean. However, in this example, we will run the two-tailed test.
Our sample size n = 25.
Step 2: Calculating the t-test statistic (one sample t-test)
NOTE: There are three types of t-tests. There is the one sample t-test that compares a single
sample to a known population value (this example). There is an independent samples t-test
that compares two samples to each other. There is a paired data (also called correlated data)
t-test that compares two samples from data that is related (like pretest score and post test
score).
t -test = (sample mean – population mean)/[stddev/sqrt(n)]
The sample mean “x” is 4.6 oz
The “mean” is the population mean of 4 oz.
The sample std dev is .22 oz
n = 25
df = n – 1 = 24
t -test = (sample mean – population mean)/[stddev/sqrt(n)] =(4.6 – 4) / [.22/sqrt(25)]
= (.6)/[.22/5] = .6/.044 = 13.6
Therefore, the t-test value is 13.6
Step 3: Determine if this value is in a rejection region (reject Ho) or not (do not reject Ho)
Our degrees of freedom for this one sample t-test is :
df = n – 1 = 25 – 1 = 24
Our alpha value is .05
Our test is two-tailed
Therefore, using any t-table, the two “critical values” that represent the cut-off points for
rejection are:
tc = +/- 2.064
This tells is that if our t-test result (which in this case is 13.6) is either bigger than 2.064 or
less than -2.064 then we CAN reject the null because we ARE in the rejection region.
Result:
13.6 > 2.064
Reject Ho
Step 4: Understanding and writing the conclusion – what does this all mean
Recall that Ho says that there is no sig diff between our sample mean of the amount of
espresso in the coffee in Italy and the expected population amount. Ho always says that there
is no sig diff.
However, in this case, we REJECT Ho. In other words, we do not agree with Ho. We do not
think that Ho is correct (with a .05 error margin).
Because we reject Ho (do not choose Ho) we then choose Ha.
Ha tells us that there IS A SIG DIFF between the amount of espresso in the Italy coffee
versus the expected mean.
So – there is too much espresso being placed in the coffee in Italy and it should be reduced to
meet the normal (population) mean.

The Independent Samples t Test compares the means of two independent groups in order to
determine whether there is statistical evidence that the associated population means are
significantly different. The Independent Samples t Test is a parametric test.
This test is also known as:
Independent t Test
Independent Measures t Test
Independent Two-sample t Test
Student t Test
Two-Sample t Test
Uncorrelated Scores t Test
Unpaired t Test
Unrelated t Test
The variables used in this test are known as:
Dependent variable, or test variable
Independent variable, or grouping variable
A coffee chain has two locations, one in Queens and one in NYC. The coffee chain owner
wants to make sure that all lattes are consistent between the two locations.
A sample of 20 lattes is collected from the Queens store and is determined to have a sample
mean of 4.1 oz. of espresso per latte with a sample standard deviation of .12 oz.
A sample of 22 lattes is then taken from the NYC store and is determined to have a sample
mean of 4.2 oz. if espresso per latte and a standard deviation of .11 oz.
Use alpha = .05 and run an independent samples t-test to determine if there is a significant
difference between the amount of espresso between the two coffee shop lattes.
Step 1: What is Ho and Ha?
Ho: mean espresso in a latte in Queens = mean espresso in a latte in NYC
Ho: mean espresso in a latte in Queens ≠ mean espresso in a latte in NYC
Should you use t-test or z-test and why?
This is best as a t-test because we are comparing two sample means.
Notice that this is a TWO-TAILED test. We will determine if there is a significant difference.
Our sample size of the Queens sample is n1 = 20 and the std dev s1 is .12
Our sample size of the NYC sample is n2 = 22 and the std dev s2 is .11
Step 2: Calculating the t-test statistic for an independent samples t-test
NOTE: There are three types of t-tests. There is the one sample t-test that compares a single
sample to a known population value. There is an independent samples t-test (this example)
that compares two samples to each other. There is a paired data (also called correlated data)
t-test that compares two samples from data that is related (like pretest score and post test
score).
t -test = (sample mean 1 – sample mean 2)/[ sqrt ( s1^2/n1 + s2^2/n2) ]
Recall:
sample mean1 = 4.1, s1 = .12, n1 = 20
sample mean2 = 4.2 , s2 = .11, n2 = 22
NOTE: s1^2 = (.12) ^2 = .12 * .12 = .0144
NOTE: s2^2 = (.11) ^2 = .11 * .11 = .0121
T -test = (sample mean 1 – sample mean 2)/[ sqrt ( s1^2/n1 + s2^2/n2) ]
T-test = (4.1 – 4.2) / [sqrt ((.12)^2 / 20 + (.11)^2/22) ] =
T-test = (4.1 – 4.2) / [sqrt (.0144 / 20 + .0121/22) ] =
T-test = (4.1 – 4.2) / [sqrt (.00072 + .00055)] =
T-test = (4.1 – 4.2) / [sqrt (.00127)] =
T-test = (4.1 – 4.2) / [.03564]
T-test = (– 0.1) / [.03564]
T-test = – 2.8058
NOTE: You must use the order of operations when solving any expression.
Step 3: Determine if this value is in a rejection region (reject Ho) or not (do not reject Ho)
Next, using any t-table (these tables are always on the internet) we can get the critical values
(tc) for the two tailed test.
Our degrees of freedom for this one sample t-test are:
df = n1 + n2 – 2 = 20 + 22 – 2 = 40
Our alpha value is .05
Our test is two-tailed
Therefore, using any t-table, the two “critical values” that represent the cut-off points for
rejection are:
tc = +/- 2.042
ttableExample_05_30df_2tailed
This tells is that if our t-test result (which in this case is 13.6) is either bigger than 2.042 or
less than -2.042 then we CAN reject the null because we ARE in the rejection region.
Result:
-2.8058 < – 2.042
Therefore, we must Reject Ho
tcurve_indepT_example
Step 4: Understanding and writing the conclusion – what does this all mean
Recall that Ho says that there is no sig diff between the lattes in Queens and the lattes in
NYC.
However, in this case, we REJECT Ho. Our t-test result WAS in the rejection region because
the value was smaller than the cut-off of -2.042. In other words, we do not agree with Ho. We
do not think that Ho is correct (with a .05 error margin).
Because we reject Ho (do not choose Ho) we then choose Ha.
Ha tells us that there IS A SIG DIFF between the amount of espresso in the Queens lattes
and the amount in the NYC lattes.
2. Who do you think has a better sense of humor-women or men?
Researchers asked 10 men and 10 women in their study to categorize 30 cartoons as either
“funny” or “funny”. Each participant received a score that represents her or his percentage of
cartoons found to be “funny”. Below are fictional data for 9 people; these fictional data have
approximately the same means as were reported in the original study (Azim, Mobbs, Jo,
Menon and Reiss, 2005).
Percentage of cartoons labelled as “funny”.
Women: 84,97,58,90
Men: 88, 90,52,97,86
How can we conduct a independent-sample t test for this scenario, using a two-tailed test and
a significant level of 0.05.
Step 1: Assumptions
Population 1: Women exposed to humorous cartoons.
Population 2: Men exposed to humorous cartoons.
The comparison distribution will be a distribution of mean differences based on the null
hypothesis.
The hypothesis test will be a independent-samples t test because we have two samples
composed of different groups of participants.
Step 2: State the null and alternative hypotheses.
Remember, hypotheses are always about populations, not about our specific samples.
Null hypothesis: On average, women categorize the same percentage of cartoons as “funny”
as men. \[H_0: \mu_1 = \mu_2\]
Alternative hypothesis: On average, women categorize a different percentage of cartoons as
“funny” as compared with men.
\ [H_a: \mu_1 \neq \mu_2\]
Step 3: Determine the characteristics of the comparison distribution
We calculate the standard deviation for men and women separately. Remember that the
variance for the average of a sample is:
\[s^2_\bar{X} = \dfrac{s^2}{n}\].
For women:
Mean is
## [1] 82.25
And variance:
## [1] 72.4
For men:
Mean is
## [1] 82.6
And variance:
## [1] 61.96
Step 4: Determine the significance level
The exercise already did that for us.
\[\alpha = 0.05\]
Step 5: Calculate Test Statistic
\[t = \dfrac{\bar{x_1} - \bar{x_2} - \mu_1 - \mu_2}{\sqrt{s^2_1 + s^2_2}} = \dfrac{82.25 - 82.6 -
0}{\sqrt{72.39583 + 61.96}}\].
Hence, t statistic is
## [1] -0.0302
Step 6.1: Conclude (Statiscal way)
Because this is a two-sided test (alternative hypothesis: not equal), p-value is going to be:
\[P (|T| > |-0.03019533|) = 2P (T < -0.03019533)\].
Moreover, the degrees of freedom are
\[df = (N_1 -1) + (N_2 - 1) = 4 - 1 + 5 -1 = 7\].
Therefore, p-value is:
## [1] 0.9768
So, p-value > 0.05. Therefore, we do not reject the null hypothesis.
Step 6.2: Conclude (English)
There is not enough (or significant) evidence to conclude that there either men or women are
more likely than the opposite gender, on average, to find cartoons funny.
3. Calculate an independent samples t test for the following data sets:
Data set A: 1,2,2,3,3,4,4,5,5,6
Data set B: 1,2,4,5,5,5,6,6,7,9
Step 1: Sum the two groups:
A: 1 + 2 + 2 + 3 + 3 + 4 + 4 + 5 + 5 + 6 = 35
B: 1 + 2 + 4 + 5 + 5 + 5 + 6 + 6 + 7 + 9 = 50
Step 2: Square the sums from Step 1:
352 = 1225
492 = 2500
set these numbers aside for a moment.
Step 3: Calculate the means for the two groups:
A: (1 + 2 + 2 + 3 + 3 + 4 + 4 + 5 + 5 + 6)/10 = 35/10 = 3.5
B: (1 + 2 + 4 + 5 + 5 + 5 + 6 + 6 + 7 + 9) = 50/10 = 5
Set these numbers aside for a moment.
Step 4: Square the individual scores and then add them up:
A: 11 + 22 + 22 + 33 + 33 + 44 + 44 + 55 + 55 + 66 = 145
B: 12 + 22 + 44 + 55 + 55 + 55 + 66 + 66 + 77 + 99 = 298
Set these numbers aside for a moment.
Step 5: Insert your numbers into the following formula and solve:
(ΣA)2: Sum of data set A, squared (Step 2).
(ΣB)2: Sum of data set B, squared (Step 2).
μA: Mean of data set A (Step 3)
μB: Mean of data set B (Step 3)
ΣA2: Sum of the squares of data set A (Step 4)
ΣB2: Sum of the squares of data set B (Step 4)
nA: Number of items in data set A
nB: Number of items in data set B
Step 6: Find the Degrees of freedom (nA-1 + nB-1) = 18
Step 7: Look up your degrees of freedom (Step 6) in the t-table. If you don’t know what
your alpha level is, use 5% (0.05).
18 degrees of freedom at an alpha level of 0.05 = 2.10.
Step 8: Compare your calculated value (Step 5) to your table value (Step 7). The calculated
value of -1.79 is less than the cut-off of 2.10 from the table. Therefore p > .05. As the p-
value is greater than the alpha level, we cannot conclude that there is a difference between
means.
4. A study of the effect of caffeine on muscle metabolism used eighteen male volunteers who
each underwent arm exercise tests. Nine of the men were randomly selected to take a
capsule containing pure caffeine one hour before the test. The other men received a placebo
capsule. During each exercise the subject's respiratory exchange ratio (RER) was measured.
(RER is the ratio of CO2 produced to O2 consumed and is an indicator of whether energy is
being obtained from carbohydrates or fats).
The question of interest to the experimenter was whether, on average, caffeine changes RER.
The two populations being compared are “men who have not taken caffeine” and “men who
have taken caffeine”. If caffeine has no effect on RER the two sets of data can be regarded as
having come from the same population.
The results were as follows:
RER(%)

Placebo Caffeine

105 96

119 99

100 94

97 89

96 96

101 93

94 88

95 105
98 88

Mean 100.56 94.22

SD 7.70 5.61
The means show that, on average, caffeine appears to have altered RER from about 100.6%
to 94.2%, a change of 6.4%. However, there is a great deal of variation between the data
values in both samples and considerable overlap between them. So is the difference between
the two means simply due sampling variation, or does the data provide evidence that caffeine
does, on average, reduce RER? The p-value obtained from an independent samples t-test
answers this question.
The t-test tests the null hypothesis that the mean of the caffeine treatment equals the mean of
the placebo versus the alternative hypothesis that the mean of caffeine treatment is not equal
to the mean of the placebo treatment.
Computer output obtained for the RER data gives the sample means and the 95% confidence
interval for the difference between the means.
Computer output
The Independent Samples t-test in Minitab
Enter the data from both samples into one column and the group identity in a second column,
then select
Stat > Basic Statistics > 2-Sample t... to perform an independent sample t-test in Minitab
Two Sample T-Test and Confidence Interval
Two sample T for Caffeine vs Placebo
N Mean StDev SE Mean

Caffeine 9 94.22 5.61 1.9

Placebo 9 100.56 7.70 2.6


95% CI for mu Caffeine - mu Placebo: (-13.1, 0.4)
T-Test mu Caffeine = mu Placebo (not =): T = -1.99 P = 0.032 DF = 16
Both use Pooled StDev = 6.74
N.B. mu = m = mean
The Independent Samples t-test in SPSS
Enter the data from both samples into one column and the group identity in a second column,
then select
Analyze > Compare Means > Independent Samples T Test ...
T-Test
Note: The difference in signs obtained in the two outputs is because one calculation considers
caffeine – placebo values, and the other placebo – caffeine. It makes no difference to the
conclusions of the test, ie p = 0.063.
AResult
The p-value is 0.063 and, therefore, the difference between the two means is not statistically
significantly different from zero at the 5% level of significance. There is an estimated change
of 6.4% (SE = 3.17%). However, there is insufficient evidence (p = 0.063) to suggest that
caffeine does change the mean RER.

5. Data measured on n=3,539 participants who attended the seventh examination of the
Offspring in the Framingham Heart Study are shown below.

Men Women

Characteristic n S n s

Systolic Blood Pressure 1,623 128.2 17.5 1,911 126.5 20.1

Diastolic Blood Pressure 1,622 75.6 9.8 1,910 72.6 9.7

Total Serum Cholesterol 1,544 192.4 35.2 1,766 207.1 36.7

Weight 1,612 194.0 33.8 1,894 157.7 34.6

Height 1,545 68.9 2.7 1,781 63.4 2.5

Body Mass Index 1,545 28.8 4.6 1,781 27.6 5.9

Suppose we now wish to assess whether there is a statistically significant difference in mean
systolic blood pressures between men and women using a 5% level of significance.

Step 1. Set up hypotheses and determine level of significance

H0: μ1 = μ2

H1: μ1 ≠ μ2α=0.05

Step 2. Select the appropriate test statistic.

Because both samples are large (> 30), we can use the Z test statistic as opposed to t. Note
that statistical computing packages use t throughout. Before implementing the formula, we
first check whether the assumption of equality of population variances is reasonable. The
guideline suggests investigating the ratio of the sample variances, s 12/s22. Suppose we call
the men group 1 and the women group 2. Again, this is arbitrary; it only needs to be noted
when interpreting the results. The ratio of the sample variances is 17.52/20.12 = 0.76, which
falls between 0.5 and 2 suggesting that the assumption of equality of population variances is
reasonable. The appropriate test statistic is

Step 3. Set up decision rule.

This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H0 if Z < -
1.960 or is Z > 1.960.

Step 4. Compute the test statistic.

We now substitute the sample data into the formula for the test statistic identified in Step 2.
Before substituting, we will first compute Sp, the pooled estimate of the common standard
deviation.

Notice that the pooled estimate of the common standard deviation, Sp, falls in between the
standard deviations in the comparison groups (i.e., 17.5 and 20.1). Sp is slightly closer in
value to the standard deviation in the women (20.1) as there were slightly more women in the
sample. Recall, Sp is a weight average of the standard deviations in the comparison groups,
weighted by the respective sample sizes.

Now the test statistic:

 Step 5. Conclusion.

We reject H0 because 2.66 > 1.960. We have statistically significant evidence at α=0.05 to
show that there is a difference in mean systolic blood pressures between men and
women. The p-value is p < 0.010.

Here again we find that there is a statistically significant difference in mean systolic blood
pressures between men and women at p < 0.010. Notice that there is a very small difference
in the sample means (128.2-126.5 = 1.7 units), but this difference is beyond what would be
expected by chance. Is this a clinically meaningful difference? The large sample size in this
example is driving the statistical significance. A 95% confidence interval for the difference in
mean systolic blood pressures is: 1.7 + 1.26 or (0.44, 2.96). The confidence interval provides
an assessment of the magnitude of the difference between means whereas the test of
hypothesis and p-value provide an assessment of the statistical significance of the difference.

Above we performed a study to evaluate a new drug designed to lower total cholesterol. The
study involved one sample of patients, each patient took the new drug for 6 weeks and had
their cholesterol measured. As a means of evaluating the efficacy of the new drug, the mean
total cholesterol following 6 weeks of treatment was compared to the NCHS-reported mean
total cholesterol level in 2002 for all adults of 203. At the end of the example, we discussed
the appropriateness of the fixed comparator as well as an alternative study design to evaluate
the effect of the new drug involving two treatment groups, where one group receives the new
drug and the other does not. Here, we revisit the example with a concurrent or parallel control
group, which is very typical in randomized controlled trials or clinical trials

Correlated Samples t-Test

The correlated samples t-test, also called the direct difference t-test, compares scores from
two conditions in a within-subjects design or two groups in a matched-subjects design. The
formula shown here uses values that you have already learned to compute: Means,
Variances and Standard Deviations, and a product-moment correlation.
In this example, anxiety ratings were made before and after a brief treatment session. The
ratings range from 1 (no anxiety) to 5 (extreme anxiety). Pretest and posttest ratings of
anxiety are made on 91 subjects. Listed here are the summary statistics computed from the
raw data.
Group 1 Group 2

Mean 3.17 2.96

Standard Deviation 0.893 0.975

Sample Size N = 91

Correlation r = 0.84
Use the formula below to compute the value of t.

Look up the critical value of t (df = number of pairs - 1 = 91 - 1 = 90). With an alpha of 0.05,
the critical value of t is between 1.98 and 2.00. Because the observed t (3.75) exceeds the
critical value, we reject the null hypothesis and conclude that the treatment produced a
significant reduction in anxiety.

1. A psychologist is interested in his new learning program. 15 Subjects learn a list of 50


words. Learning performance is measured using a recall test. After the first test all subjects
are instructed how to use the learning program and then learn a second list of 50 words.
Learning performance is again measured with the recall test. In the following table the
numbers of correct remembered words are listed for both tests.
Score S
Subject core 1
2
2 - 0.
1 24
6 2 36
2 - 19
2 17
4 7 .36
3 12
3 32 1
1 .96
1 - 0.
4 14
7 3 16
1 - 2.
5 16
7 1 56
2 - 0.
6 22
5 3 16
2 12
7 26 1
5 .96
2 - 5.
8 19
4 5 76
2 - 0.
9 19
2 3 16
2 - 2.
10 22
3 1 56
2 - 5.
11 21
6 5 76
2 - 0.
12 25
8 3 16
1 - 0.
13 16
9 3 16
2 12
14 24 1
3 .96
2 - 1.
15 18
2 4 96
-
Σ 78
37

n=15
The observed t-value is smaller than the critical t-value (1%, one-tailed). The learning
program has a positive effect. Subjects remember more words after the training.

2. Salary Wizard is an online tool that allows you to look up incomes for specific jobs for cities
in the United State. We looked up the 25th percentile for income for six jobs in two cities:
Boise, Idaho, and Los Angeles, California. The data are below.
## Jobs Boise Los Angeles
## 1 Executive chef 53047 62490
## 2 Genetics counselor 49958 58850
## 3 Grants/proposal writer 41974 49445
## 4 Librarian 44366 52263
## 5 Public schoolteacher 40470 47674
## 6 Social worker (with bachelor's degree) 36963 43542
How can we conduct a paired-samples t test to determine whether income in one of these
cities differs, on average, from income in the other? We will use a two-tailed test and a
significant level of 0.05.
Notice that the same individual (income for specific job) is measured twice (two different
cities).
Step 1: Assumptions
For the one-sample t test, we use individual scores; for the paired-samples t test, we
use difference scores. The comparison distribution is a distribution of mean difference
scores (rather than a distribution of means).
Population 1: Job types in Boise, Idaho.
Population 2: Job types in Los Angeles, California
The comparison distribution will be a distribution of mean differences.
The hypothesis test will be a paired-samples t test because we have two samples, and all
participants are in both samples.
We do not know whether the population is normally distributed, there are only 12 observations
in total, and there is not much variability in the data in our samples, so we should proceed
with caution. The data were not randomly selected, so we should be cautious when
generalizing beyond this sample of job types.
Step 2: State the null and alternative hypotheses.
Remember, hypotheses are always about populations, not about our specific samples.
Null hypothesis: Jobs in Boise pay the same, on average, as jobs in Los Angeles: \[H_0:
\mu_1 = \mu_2\]
Alternative hypothesis: Jobs in Boise pay different incomes, on average, from jobs in Los
Angeles
\[H_a: \mu_1 \neq \mu_2\]
Step 3: Determine the characteristics of the comparison distribution
With the paired-samples t test, we have a sample of difference scores. According to the null
hypothesis, there is no difference; that is, the mean difference score is 0. So the mean of the
comparison distribution is always 0, as long as the null hypothesis posits no difference.
Therefore, we calculate the difference between the scores, then we basically consider them
our sample to perform a one sample t-test.
The characteristics of the comparison distribution are what values of mean and standard
deviation we will use in our test statistic.
## Jobs Boise Los Angeles Difference
## 1 Executive chef 53047 62490 9443
## 2 Genetics counselor4 9958 58850 8892
## 3 Grants/proposal writer 41974 49445 7471
## 4 Librarian 44366 52263 7897
## 5 Public schoolteacher 40470 47674 7204
## 6 Social worker (with bachelor's degree) 36963 43542 6579
We do not know the standard deviation of the true difference between the samples right? So
we have to calculate that as well as the mean for the difference.
The mean is
## [1] 7914
And the standard deviation is:
\[s_\bar{D} = \dfrac{s}{\sqrt{n}}\]
## [1] 438.8
Step 4: Determine the significance level
The exercise already did that for us.
\[\alpha = 0.05\]
Step 5: Calculate Test Statistic
This step is identical to that for the single- sample t test.
\[t = \dfrac{\bar{d} - \mu}{s_{\bar{D}}} = \dfrac{7914.333 - 0}{438.8313}\].
Hence, t statistic is
## [1] 18.04
Step 6.1: Conclude (Statiscal way)
Because this is a two-sided test (alternative hypothesis: not equal), p-value is going to be:
\[P(|T| > 18.03502) = 2P(T>18.03502)\].

Moreover, degrees of freedom: N - 1 = 6 - 1 = 5.


Therefore, p-value is:
## [1] 9.628e-06
So, p-value < 0.05. Therefore, we reject the null hypothesis.
Step 6.2: Conclude (English)
There is enough (or significant) evidence to conclude that there is difference between the
population mean of income of jobs in Los Angeles than income of jobs in Boise. (It appears
that jobs in Los Angeles pay more, on aver- age, than do jobs in Boise.)
3. Let's perform a dependent samples t-test: Researchers want to test a new anti-hunger
weight loss pill. They have 10 people rate their hunger both before and after taking the pill.
Does the pill do anything? Use alpha = 0.05
Figure 1.
Steps for Dependent Samples t-Test

1. Define Null and Alternative Hypotheses


2. State Alpha
3. Calculate Degrees of Freedom
4. State Decision Rule
5. Calculate Test Statistic
6. State Results
7. State Conclusion

Let's begin.
1. Define Null and Alternative Hypotheses

Figure 2.
2. State Alpha
Alpha = 0.05
3. Calculate Degrees of Freedom

Figure 3.
4. State Decision Rule
Using an alpha of 0.05 with a two-tailed test with 9 degrees of freedom, we would expect our
distribution to look something like this:

Figure 4.
Use the t-table to look up a two-tailed test with 9 degrees of freedom and an alpha of 0.05.
We find a critical value of 2.2622. Thus, our decision rule for this two-tailed test is:
If t is less than -2.2622, or greater than 2.2622, reject the null hypothesis.
5. Calculate Test Statistic
The first step is for us to calculate the difference score for each pairing:

Figure 5.

Figure 6.
6. State Results
t = 3.61
Result: Reject the null hypothesis.
7. State Conclusion
The anti-hunger weight loss pill significantly affected hunger, t = 3.61, p < 0.05.
4. Calculate a paired t test by hand for the following data:

Step 1: Subtract each Y score from each X score.

Step 2: Add up all of the values from Step 1.


Set this number aside for a moment.

Step 3: Square the differences from Step 1.

Step 4: Add up all of the squared differences from Step 3.


Step 5: Use the following formula to calculate the t-score:

ΣD: Sum of the differences (Sum of X-Y from Step 2)


ΣD2: Sum of the squared differences (from Step 4)
(ΣD)2: Sum of the differences (from Step 2), squared.

Step 6: Subtract 1 from the sample size to get the degrees of freedom. We have 11 items, so
11-1 = 10.
Step 7: Find the p-value in the t-table, using the degrees of freedom in Step 6. If you don’t
have a specified alpha level, use 0.05 (5%). For this sample problem, with df=10, the t-value
is 2.228.
Step 8: Compare your t-table value from Step 7 (2.228) to your calculated t-value (-2.74). The
calculated t-value is greater than the table value at an alpha level of .05. The p-value is less
than the alpha level: p <.05. We can reject the null hypothesis that there is no difference
between means.
Note: You can ignore the minus sign when comparing the two t-values, as ± indicates the
direction; the p-value remains the same for both directions.
5. Suppose we wish to determine if the cholesterol levels of the men in Dixon and Massey
study changed from 1952 to 1962. We will use the paired t-test.

 H0: The average difference in cholesterol is 0 from 1952 to 1962


 H1: The average difference in cholesterol is NOT 0 from 1952 to 1962.
 Our significance level is α = 0.05.

For α = 0.05 and 19 df, the critical value of t is 2.093. Since | -6.7| > 2.093, we reject H0 and
state that we have significant evidence that the average difference in cholesterol from 1952 to
1962 is NOT 0. Specifically, there was an average decrease of 69.8 from 1952 to 1962.

data pairedtest; set original;

d=x1-x2;

run;

proc ttest data=pairedtest h0=0;

var d;

run;

Hypotheses:

 H0: The average difference in cholesterol is 0 from 1952 to 1962


 H1: The average difference in cholesterol is NOT 0 from 1952 to 1962.

First, create the difference, dchol.

data dm; set dixonmassey;

dchol=chol62-chol52;

run;

proc ttest data=dm;

title 'Paired t-test with proc ttest, using dchol variable';


var dchol;

run;

Again, we reject H0 (because p<0.05) and state that we have significant evidence that
cholesterol levels changed from 1952 to 1962, with an average decrease of 69.8 units, with
95% confidence limits of (-91.6, -48.0).

Alternatively, we can (only for a test of H0: μd = 0) use proc means:

proc means data=pairedtest n mean std t prt clm;

title 'Paired t-test with proc means';

var dchol;

run;

Note that the t option produces the t statistic for testing the null hypothesis that the mean of a
variable is equal to zero, and the prt option gives the associated p-value. The clm option
produces a 95% confidence interval for the mean. In this case, where the variable is a
difference, dchol, the null hypothesis is that the mean difference is zero and the 95%
confidence interval is for the mean difference.

proc means data=dm n mean std t prt clm;

title 'Paired t-test with proc means';

var dchol;

run;
A third method is to use the original data with the paired option in proc t-test:

proc ttest data=original;

title 'Paired t-test with proc ttest, paired statement';

paired x1*x2;

run;

This produces identical output to the t-test on dchol.

Example:

proc ttest data=work.dm;

title 'Paired t-test with proc ttest, paired statement';

paired chol62*chol52;

run;

Reporting results

We conducted a paired t-test to determine whether, on average, there was a change in


cholesterol from 1952 to 1962.

 H0: There is no change, on average, in cholesterol from 1952 to 1962 (H0: μd = 0


where d = chol62 – chol52).
 H1: There is an average change in cholesterol from 1952 to 1962, i.e., H0: μd ≠ 0 .
 Level of significance: α=0.05
 Cholesterol decreased between 1952 and 1962 by an average of 69.8. The 95%
confidence interval for μd is (-91.6,-48.0).
 The test statistic is t = -6.70, with 19 degrees of freedom, and p < 0.0001. Because the
p-value is less than α=0.05, we reject the null hypothesis and state that there is a
difference, on average, in cholesterol between 1952 and 1962.
 Conclusion: There is significant evidence that cholesterol decreased from 1952 to
1962 (p <0.0001). On average, the cholesterol levels in 1962 were 69.8 mg/dl (95% CI:
48.0, 91.6) units lower than the cholesterol levels in 1952.

Вам также может понравиться