Вы находитесь на странице: 1из 200

Basic Education Department

Holy Cross College of Calinan

Test of Hypothesis

Grade 11: Statistics and Probability


Raffy S. Centeno
Outline
1. Hypothesis Testing
Introduction to Hypothesis Testing
The Rare Event Rule
Null and Alternative Hypotheses
Formulating Null and Alternative Hypotheses (One Sample)
One-tailed and Two-tailed Tests
2. Formulating a Conclusion in Hypothesis Testing
Types of Error in Hypothesis Testing
Interpreting the P-Value
3. Statistical Test for the Assumption of Normality
Outline
4. Preliminaries
5. Tests Concerning Means/Medians for One Sample Data
One Sample T-test
One Sample Wilcoxon Test
6. Tests Concerning Means/Medians for Paired Data
Paired T-test
Wilcoxon Signed-Rank Test
7. Statistical Test for the Assumption of Homoscedasticity for Two
Sample Data
Outline
8. Tests Concerning Means/Medians for Two Samples
Two Sample T-test
Two Sample T-test with Welch Correction
Mann-Whitney Test
9. Statistical Test for the Assumption of Homoscedasticity for More
than Two Sample Data
10. Tests Concerning Means/Medians for More than Two Samples
The Analysis of Variance
ANOVA with Welch Correction
Kruskal-Wallis Test
Outline

11. Test of Relationship


Pearson Coefficient of Correlation
Spearman Rank Coefficient of Correlation
Chapter I
Hypothesis Testing
Introduction to Hypothesis Making
Hypothesis Testing

Definition
In statistics, a hypothesis is an assumption about certain
characteristics of a population.
Hypothesis Testing

Definition
A hypothesis test is a statistical test that is used to determine
whether there is enough evidence in a sample of data to infer that a
certain condition is true for the entire population.

A hypothesis test examines two opposing hypotheses about a


population: the null hypothesis and the alternative hypothesis.
Hypothesis Testing

The Rare Event Rule


If, under a given assumption, the probability of a particular
observed event is exceptionally small, we conclude that the
assumption is probably not correct.
Null and Alternative Hypotheses
Null and Alternative Hypotheses
Definition
A null hypothesis, denoted by H0 , is a type of hypothesis used in
statistics that proposes that no statistical significance exists in a set
of given observations.

It is the hypothesis that the researcher is trying to disprove. It is


the statement being tested.

Remarks
The null hypothesis always includes the equal sign. Usually, it is a
statement of no effect or no difference.
Null and Alternative Hypothesis

Definition
An alternative hypothesis, denoted by Ha or H1 , is a hypothesis
used in hypothesis testing that is contrary to the null hypothesis.
Therefore, an alternative hypothesis is the negation of your null
hypothesis.

Remarks
The alternative hypothesis does not include the equal sign and it is
the statement you want to be able to conclude is true.
Null and Alternative Hypothesis

Note
If we reject the null hypothesis, we automatically accept the
alternative hypothesis because we were able to find evidence
against it. However, we do not accept the null hypothesis. We
just failed to reject it.

Remember: The absence of evidence is not the evidence of absence.


Formulating Null and Alternative
Hypothesis for One Sample Data
Formulating Null and Alternative
Hypothesis for One Sample Data

Example 1
Claim: According to Milos advertisement, 4 out of 5 kids have
energy gap.
Formulating Null and Alternative
Hypothesis for One Sample Data

Example 1
Claim: According to Milos advertisement, 4 out of 5 kids have
energy gap.

Answer
H0 : p = 0.80
Ha : p 6= 0.80
Formulating Null and Alternative
Hypothesis for One Sample Data

Example 2
Claim: According to research, the average life expectancy of
Filipinos is 68 years old.
Formulating Null and Alternative
Hypothesis for One Sample Data

Example 2
Claim: According to research, the average life expectancy of
Filipinos is 68 years old.

Answer
H0 : = 68 years
Ha : 6= 68 years
Formulating Null and Alternative
Hypothesis for One Sample Data

Example 3
Claim: Your friend claims that Coke litro is less than 1 liter.
Formulating Null and Alternative
Hypothesis for One Sample Data

Example 3
Claim: Your friend claims that Coke litro is less than 1 liter.

Answer
H0 : = 1 liter
Ha : < 1 liter
Formulating Null and Alternative
Hypothesis for One Sample Data

Example 4
Claim: A teacher is claiming that the average IQ of his students is
greater than 100.
Formulating Null and Alternative
Hypothesis for One Sample Data

Example 4
Claim: A teacher is claiming that the average IQ of his students is
greater than 100.

Answer
H0 : = 100
Ha : > 100
Formulating Null and Alternative
Hypothesis for One Sample Data

Example 5
Claim: The average weekly earning for men is higher than $670, the
womens average.
Formulating Null and Alternative
Hypothesis for One Sample Data

Example 5
Claim: The average weekly earning for men is higher than $670, the
womens average.

Answer
H0 : = $ 670
Ha : > $ 670
Formulating Null and Alternative
Hypothesis for One Sample Data

Example 6
Claim: The daily yield for a local chemical plant has averaged 880
tons for the last several years. The quality control manager would
like to know whether this average has changed in recent months.
Formulating Null and Alternative
Hypothesis for One Sample Data

Example 6
Claim: The daily yield for a local chemical plant has averaged 880
tons for the last several years. The quality control manager would
like to know whether this average has changed in recent months.

Answer
H0 : = 880 tons
Ha : 6= 880 tons
One-tailed and Two-tailed Tests
Two-tailed Tests

Definition
A two-tailed test states that the null hypothesis is wrong. A
two-tailed alternative hypothesis does not predict whether the
parameter of interest is larger or smaller than the reference value
specified in the null hypothesis.

The alternative hypothesis inequality is not equal to (6=).


Two-tailed Tests
Two-tailed Tests

Example of a Two-tailed Test


You want to test if the average life expectancy of Filipinos is not
equal 68 years old. Your hypotheses will be the following:

H0 : = 68 years
Ha : 6= 68 years
Two-tailed Tests

Example of a Two-tailed Test


You want to check if the average height of male Filipinos is 162
centimeters

H0 : = 162 cm
Ha : 6= 162 cm
Two-tailed Tests

Example of a Two-tailed Test


A teacher want to check if the IQ of his students is equal to 100.

H0 : = 180
Ha : 6= 180
One-tailed Tests
Definition
A one-tailed test states that the null hypothesis is wrong, and
also specifies whether the true value of the parameter is greater
than or less than the reference value specified in null hypothesis.
The alternative hypothesis inequality is either greater than or less
than > or <.

Remarks
The advantage of using a one-tailed test is increased power to
detect the specific effect you are interested in. The disadvantage is
that there is no power to detect an effect in the opposite direction.
One-tailed Tests
One-tailed Tests

Example of a One-tailed Test


You want to test if the average life expectancy of Filipinos is
greater than 68 years old. Your hypotheses will be the following:

H0 : = 68 years
Ha : > 68 years
One-tailed Tests

Example of a One-tailed Test


You want to check if the average height of male Filipinos is less
than 162 centimeters

H0 : = 162 cm
Ha : < 162 cm
One-tailed Tests

Example of a One-tailed Test


A teacher want to check if the IQ of his students is greater than
100.

H0 : = 180
Ha : > 180
Chapter II
Formulating a Conclusion in
Hypothesis Testing
Types of Error in Hypothesis Testing
Types of Error in Hypothesis Testing

Type I Error
When the null hypothesis is true and you reject it, you make
a type I error. The probability of making a type I error is , which
is the level of significance you set for your hypothesis test.

To lower this risk, you must use a lower value for . However, using
a lower value for alpha means that you will be less likely to detect a
true difference if one really exists.
Types of Error in Hypothesis Testing

Remarks
You must choose your significance level before you begin your
study. It protects you from choosing a significance level because it
conveniently gives you significant results!

The common alpha values of 0.05 and 0.01 are simply based on
tradition. For a significance level of 0.05, expect to obtain sample
means in the critical region 5% of the time when the null
hypothesis is true.
Types of Error in Hypothesis Testing

Type II Error
When the null hypothesis is false and you fail to reject it,
you make a type II error. The probability of making a type II error
is , which depends on the power of the test.

You can decrease your risk of committing a type II error by ensuring


your sample size is large enough to detect a practical difference
when one truly exists.
Interpreting the P-Value
The P-Value

Definition
Statistically speaking, the p-value is the probability of obtaining a
result as extreme as the result actually obtained when the null
hypothesis is true.

The p-value is basically the probability of obtaining your sample


data if the null hypothesis is true.
The P-value
Interpreting P-value
I If p-value , then it indicates strong evidence against the
null hypothesis, so you reject the null hypothesis.
I If p-value > , then it indicates weak evidence against the
null hypothesis, so you fail to reject the null hypothesis.

Remarks
Always report the p-value so your readers can draw their own
conclusions.
Interpreting P-value
Example Scenario 1
You want to check if the average height of male Filipinos is 162 cm.

H0 : = 162 cm
Ha : 6= 162 cm
Then, you conducted a survey with correct sampling procedures and
research design. You found out that the average height of your
respondents is 163.5 cm with sample standard deviation of 4 cm.
After running the analysis, you obtained a p-value of 0.3547. What
will be your conclusion if your = 0.05?
Interpreting P-value
Example Scenario 2
Your friend claims that Coke litro is less than 1 liter.

H0 : = 1 liter
Ha : < 1 liter
Then, you measure the volume of the Coke litro everytime you buy
one. After a couple of days, you were able to measure 50 Coke
litros and found out that the average volume of your sample is 0.99
liter with sample standard deviation of 0.09 liter. Suppose you were
able to measure accurately all your samples, will you support your
friends claim if your p-value is 0.6742 at = 0.05?
Interpreting P-value

Example 3
At = 0.05, what is your conclusion about the given result:

p-value = 0.9874
Interpreting P-value

Example 3
At = 0.05, what is your conclusion about the given result:

p-value = 0.9874

Conclusion
We failed to reject our null hypothesis.
Interpreting P-value

Example 4
At = 0.05, what is your conclusion about the given result:

p-value = 0.3746
Interpreting P-value

Example 4
At = 0.05, what is your conclusion about the given result:

p-value = 0.3746

Conclusion
We failed to reject our null hypothesis.
Interpreting P-value

Example 5
At = 0.05, what is your conclusion about the given result:

p-value = 0.0393
Interpreting P-value

Example 5
At = 0.05, what is your conclusion about the given result:

p-value = 0.0393

Conclusion
We reject our null hypothesis and accept our alternative hypothesis.
Interpreting P-value

Example 6
At = 0.01, what is your conclusion about the given result:

p-value = 0.0393
Interpreting P-value

Example 6
At = 0.01, what is your conclusion about the given result:

p-value = 0.0393

Conclusion
We failed to reject our null hypothesis.
Interpreting P-value

Example 7
At = 0.01, what is your conclusion about the given result:

p-value = 0.0523
Interpreting P-value

Example 7
At = 0.01, what is your conclusion about the given result:

p-value = 0.0523

Conclusion
We failed to reject our null hypothesis.
Interpreting P-value

Example 8
At = 0.01, what is your conclusion about the given result:

p-value = 0.0523
Interpreting P-value

Example 8
At = 0.01, what is your conclusion about the given result:

p-value = 0.0523

Conclusion
We failed to reject our null hypothesis.
Chapter III
Statistical Test for the Assumption of
Normality
Statistical Test for the Assumption of
Normality
The Shapiro-Wilk Test

Definition
The Shapiro-Wilk test for normality is one of three general
normality tests designed to detect all departures from normality. Its
hypotheses are as follows:

H0 : The sample data is normally distributed.


Ha : The sample data is not normally distributed.

Normally, the test sets its significance level at = 0.05.


The Shapiro-Wilk Test

Some Alternative Tests


I Kolmogorov-Smirnov Test
I Anderson-Darling Test
I QQ Plot
I Histogram
The Shapiro-Wilk Test
The Shapiro-Wilk Test
The Shapiro-Wilk Test
The Shapiro-Wilk Test
The Shapiro-Wilk Test

Consider the wages, in thousands, of the teachers at Holy Cross


College of Marilog below:

Teacher 1 2 3 4 5 6 7 8 9 10
Salary 15 18 16 14 15 15 12 17 30 35

Test, at = 0.05, if the data violates the assumption of normality


using Shapiro-Wilk Test.
The Shapiro-Wilk Test

The following data are the weights (in pounds) of 27 packages of


ground beef in a supermarket meat display:

1.08 0.99 0.97 1.18 1.41 1.28 0.83


1.06 1.14 1.38 0.75 0.96 1.08 0.87
0.89 0.89 0.96 1.12 1.12 0.93 1.24
0.89 0.98 1.14 0.92 1.18 1.17

Test, at = 0.05, if the data violates the assumption of normality


using Shapiro-Wilk Test.
The Shapiro-Wilk Test
A geologist collected 20 different ore samples, all of the same
weight, and randomly divided them into two groups. The titanium
contents of the samples, found using two different methods, are
listed below.
Method 1 Method 2
0.011 0.013 0.013 0.011 0.016 0.013
0.015 0.014 0.013 0.012 0.015 0.012
0.010 0.013 0.011 0.017 0.013 0.014
0.012 0.015
Test, at = 0.05, if the data violates the assumption of normality
using Shapiro-Wilk Test.
The Shapiro-Wilk Test
The table below shows the academic ability (X) and the arithmetic
achievement test scores (Y) of 10 students.

Academic Ability 130 125 115 108 104


98 92 88 84 80
Arithmetic Achievement 18 22 16 17 14
12 8 10 8 4

Test, at = 0.05, if the data violates the assumption of normality


using Shapiro-Wilk Test.
Chapter IV
Preliminaries
Conducting a Statistical Analysis

Steps in Conducting an Analysis


1. Formulate your hypotheses.
2. Select a tentative statistical tool.
3. Check if your data passes the assumptions.
4. Select your final statistical tool.
5. Analyze your data.
6. Make your conclusions.
Chapter V
Tests Concerning Means/Medians
for One Sample Data
Tests Concerning Means/Medians for
One Sample Data
Tests Concerning Means/Medians for
One Sample Data

Use
The One Sample Test determines whether the sample
mean/median is statistically different from a known or hypothesized
population mean/median.
Hypotheses

Null Hypothesis
H0 : There is no significant difference between the true mean and
the comparison value m0 .

H0 : = m0
Hypotheses

Alternative Hypothesis - Two Tailed


Ha : There is a significant difference between the true mean and
the comparison value m0

Ha : 6= m0
Hypotheses

Alternative Hypothesis - One Tailed


Ha : The true mean is greater than the comparison value m0

Ha : > m0
Hypotheses

Alternative Hypothesis - One Tailed


Ha : The true mean is less than the comparison value m0

Ha : < m0
Example Scenarios

Scenario 1
Suppose you are interested in determining whether an assembly line
produces laptop computers that weigh five pounds.
Example Scenarios

Scenario 1
Suppose you are interested in determining whether an assembly line
produces laptop computers that weigh five pounds.

Scenario 2
You want to check if the average height of male Filipinos is 162 cm.
One Sample T-test
One Sample T-test

Use
It is a parametric statistical procedure used to examine the mean
difference between the sample and the known value of the
population mean.

R Function
t.test(object1, mu=comparison value, alternative= )
One Sample T-test

Assumptions
I Your variable should be measured at the interval or ratio level.
I The data are independent, which means that there is no
relationship between the observations.
I There should be no significant outliers.
I Your dependent variable should be approximately normally
distributed.
One Sample T-test
One Sample T-test
One Sample T-test
One Sample Wilcoxon Test
One Sample Wilcoxon Test

Use
It is a non-parametric alternative to a one-sample t-test. The test
determines whether the median of the sample is equal to some
specified value. Use this test if your data violates at least one
assumption in your parametric test.

R Function
wilcox.test(object1, mu=comparison value, alternative= )
One Sample Wilcoxon Test

Assumptions
I Your variable should be measured at the ordinal level or higher.
I The data are independent, which means that there is no
relationship between the observations.
One Sample T-test
One Sample T-test
One Sample T-test
START

One
One Is your data No Sample
Sample normally
Yes Wilcoxon
T-test distributed?
Test
One Sample Test

Consider the wages, in thousands, of the teachers at Holy Cross


College of Marilog below:

Teacher 1 2 3 4 5 6 7 8 9 10
Salary 15 18 16 14 15 15 12 17 30 35

Test, at = 0.05, if your monthly salary of 14.5k is less than their


salary.
The Shapiro-Wilk Test

The following data are the weights (in pounds) of 27 packages of


ground beef in a supermarket meat display:

1.08 0.99 0.97 1.18 1.41 1.28 0.83


1.06 1.14 1.38 0.75 0.96 1.08 0.87
0.89 0.89 0.96 1.12 1.12 0.93 1.24
0.89 0.98 1.14 0.92 1.18 1.17

Test, at = 0.05, if these packages is signicantly different from


1.00 pound
Chapter VI
Tests Concerning Means/Medians
for Paired Data
Tests Concerning Means/Medians for
Paired Data

Use
The Paired test is a statistical hypothesis test used when
comparing two related samples, matched samples, or repeated
measurements on a single sample to assess whether their
population mean/mean ranks differ.
Hypotheses

Null Hypothesis
H0 : The true mean difference (d ) is equal to zero.

H0 : d = 0
Hypotheses

Alternative Hypothesis - Two Tailed


Ha : The true mean difference (d ) is not equal to zero.

Ha : d 6= 0
Hypotheses

Alternative Hypothesis - One Tailed


Ha : The true mean difference (d ) is greater than zero.

Ha : d > 0
Hypotheses

Alternative Hypothesis - One Tailed


Ha : The true mean difference (d ) is less than zero.

Ha : d < 0
Example Scenarios

Scenario 1
You are a teacher and you want to check if your students learned
something in your discussion. You conducted a Pre-test and
Post-test to measure their learnings before and after the discussion.
Example Scenarios

Scenario 1
You are a teacher and you want to check if your students learned
something in your discussion. You conducted a Pre-test and
Post-test to measure their learnings before and after the discussion.

Scenario 2
You are a coach and you want to check if your players improved
after taking some enhancement drugs.
Paired T-test
Paired T-test

Use
The paired sample t-test, sometimes called the dependent
sample t-test, is a parametric statistical procedure used to
determine whether the mean difference between two sets of
observations is zero. In a paired sample t-test, each subject or
entity is measured twice, resulting in pairs of observations.

R Function
t.test(object1, object2, paired=T, alternative= )
Paired T-test

Assumptions
I Your variable should be measured at the interval or ratio level.
I Your independent variable should consist of two catergorical,
related groups or matched pairs.
I There should be no significant outliers.
I The differences between two sets of values should be
approximately normally disstributed.
Paired T-test
Paired T-test
Paired T-test
Wilcoxon Signed-Rank Test
Wilcoxon Signed-Rank Test

Use
The Wilcoxon Signed-Rank test is a non-parametric statistical
hypothesis test used when comparing two related samples, matched
samples, or repeated measurements on a single sample to assess
whether their population mean ranks differ.

R Function
wilcox.test(object1, object2, paired=T, alternative= )
Wilcoxon Signed-Rank Test

Assumptions
I Your variable should be measured at the ordinal level or higher.
I The data are independent, which means that there is no
relationship between the observations.
Wilcoxon Signed-Rank Test
Wilcoxon Signed-Rank Test
Wilcoxon Signed-Rank Test
START

Is the difference of No Wilcoxon


Paired
your data normally Signed-
T-test Yes distributed? Rank Test
Paired T-test

A psychology class performed an experiment to compare whether a


recall score in which instructions to form images of 25 words were
given is better than an initial recall score for which no imagery
instructions were given. Twenty pupils participated in the
experiment with the following results shown on the next slide.

Does it appear that the average recall score is higher when imagery
is used?
With Without With Without
Pupil Pupil
Imagery Imagery Imagery Imagery
1 20 5 11 17 8
2 24 9 12 20 16
3 20 5 13 20 10
4 18 9 14 16 12
5 22 6 15 24 7
6 19 11 16 22 9
7 20 8 17 25 21
8 19 11 18 21 14
9 17 7 19 19 12
10 21 9 20 23 13
Paired T-test
A taxi company manager is trying to decide whether the use of
radial tires instead of regular belted tires improves fuel economy.
Twelve cars were equipped with radial tires and driven over a
prescribed test course. Without changing drivers, the same cars
were then equipped with regular belted tires and driven once again
over the test course. The gasoline consumption, in kilometers per
liter, was recorded.

Can we conclude that cars equipped with radial tires give better
fuel economy than those equipped with belted tires at = 0.05?
Car Radial Tires Belted Tires Car Radial Tires Belted Tires
1 4.2 4.1 7 5.7 5.7
2 4.7 4.9 8 6.0 5.8
3 6.6 6.2 9 7.4 6.9
4 7.0 6.9 10 4.9 4.7
5 6.7 6.8 11 6.1 6.0
6 4.5 4.4 12 5.2 4.9
Chapter VII
Statistical Test for the
Assumption of Homoscedasticity
for Two Sample Data
Statistical Test for the Assumption of
Homoscedasticity
Statistical Test for the Assumption of
Homoscedasticity
Definition
An F-test is used to test if the variances of two populations are
equal. Its hypotheses are as follows:

H0 : The true ratio of variances is equal to 1.


Ha : The true ratio of variances is not equal to 1.

R Function
var.test(object1, object2)
F-test

Some Alternative Tests


I Bartletts Test
I Levenes Test
I Chi-Square Test
I Residual Plot
F-test
F-test
Chapter VIII
Tests Concerning Means/Medians
for Two Samples
Tests Concerning Means/Medians for
Two Samples

Use
The Independent Samples Test compares the means/medians of
two independent groups in order to determine whether there is sta-
tistical evidence that the associated population means/medians are
significantly different.
Hypotheses

Null Hypothesis
H0 : The population means/medians from the two unrelated groups
are equal.

H0 : 1 = 2
Hypotheses

Alternative Hypothesis - Two Tailed


Ha : The population means/medians from the two unrelated groups
are not equal.

Ha : 1 6= 2
Hypotheses

Alternative Hypothesis - One Tailed


Ha : The population mean/median from group one is greater than
population mean/median from group two.

Ha : 1 > 2
Hypotheses

Alternative Hypothesis - One Tailed


Ha : The population mean/median from group one is less than
population mean/median from group two.

Ha : 1 < 2
Example Scenarios

Scenario 1
You want to check whether there is a significant (or only random)
difference in the average cycle time to deliver a pizza from Pizza
Company A vs. Pizza Company B.
Example Scenarios

Scenario 1
You want to check whether there is a significant (or only random)
difference in the average cycle time to deliver a pizza from Pizza
Company A vs. Pizza Company B.

Scenario 2
Do two types of music, type-I and type-II, have different effects upon
the ability of college students to perform a series of mental tasks
requiring concentration?
Two Sample T-test
Two Sample T-test

Use
The two-sample t-test is a hypothesis test for answering questions
about the mean where the data are collected from two random sam-
ples of independent observations.

R Function
t.test(object1, object2, var.equal=T, alternative= )
Two Sample T-test

Assumptions
I Your variable should be measured at the interval or ratio level.
I The data are independent, which means that there is no
relationship between the observations.
I There should be no significant outliers.
I Your dependent variable should be approximately normally
distributed.
I The variance of each group should be equal.
Two Sample T-test with Welch
Correction
Two Sample T-test with Welch
Correction

Use
Welchs t-test is an adaptation of Students t-test, that is, it has
been derived with the help of Students t-test and is more reliable
when the two samples have unequal variances.

R Function
t.test(object1, object2, var.equal=F, alternative= )
Two Sample T-test with Welch
Correction
Assumptions
I Your variable should be measured at the interval or ratio level.
I The data are independent, which means that there is no
relationship between the observations.
I There should be no significant outliers.
I Your dependent variable should be approximately normally
distributed.
Mann-Whitney Test
Mann-Whitney Test

Use
The Mann-Whitney U test is used to compare differences between
two independent groups when the dependent variable is either ordinal
or continuous, but not normally distributed.

R Function
wilcox.test(object1, object2, alternative= )
Mann-Whitney Test

Assumptions
I Your variable should be measured at the ordinal level or higher.
I The data are independent, which means that there is no
relationship between the observations.
Tests Concerning Means/Medians for
Two Samples
Tests Concerning Means/Medians for
Two Samples
Tests Concerning Means/Medians for
Two Samples
Is your data No Mann-
START normally Whitney
distributed? Test

Yes

Two
Two Are your Sample
No
Sample variances T-test
T-test Yes equal? with Welch
Correction
We want to compare the heights in inches of two groups of individ-
uals. The data are given below.

Group A 175 168 168 190 156 181 182 175 174 179
Group B 120 180 125 188 130 190 110 185 112 188

Test at = 0.05 if there is a significant difference between the two


groups.
We want to check if the average womens weight differs from the
average mens weight. The data are given below:

Woman 38.9 71.2 73.3 21.8 63.4 64.6 38.4 28.8 28.5
Man 67.8 60.0 63.4 76.0 89.4 73.3 67.3 61.3 62.4

Test at = 0.05 if the claim is true.


Consider the gain in weight of 19 female rats between 28 and 84
days after birth. 12 were fed on a high protein diet and 7 on a low
protein diet.

High Protein 134 146 104 119 124 161 107


83 113 129 97 123
Low Protein 70 118 101 85 107 132 94

Test at = 0.05 if the average weight of the high protein diet rats
is significantly greater than the other group.
Chapter IX
Statistical Test for the
Assumption of Homoscedasticity
for More than Two Sample Data
Statistical Test for the
Assumption of Homoscedasticity
for More than Two Sample Data
Definition
Bartletts test is used to test if k samples are from populations
with equal variances. Equal variances across populations is called
homoscedasticity or homogeneity of variances.

R Function
bartlett.test(data group, data = object)
Statistical Test for the
Assumption of Homoscedasticity
for More than Two Sample Data

Some Alternative Tests


I Levenes Test
I Standard Deviation Plot
I Boxplots
Chapter X
Tests Concerning Means/Medians
for More than Two Samples
Tests Concerning Means/Medians for
More than Two Samples

Use
Tests concerning means/medians for more than two samples
compares the means of two or more independent groups in order to
determine whether there is statistical evidence that the associated
population means/medians are significantly different.
Hypotheses

Null Hypothesis
H0 : The population mean/median is equal for all groups.

H0 : 1 = 2 = ... = n
Hypotheses

Alternative Hypothesis - Two Tailed


Ha : At least one population mean/median is significantly different
from the others.
Example Scenarios
Scenario 1
Suppose we want to test the effect of five different exercises. For
this, we recruit 20 men and assign one type of exercise to 4 men (5
groups). Their weights are recorded after a few weeks.
Example Scenarios
Scenario 1
Suppose we want to test the effect of five different exercises. For
this, we recruit 20 men and assign one type of exercise to 4 men (5
groups). Their weights are recorded after a few weeks.

Scenario 2
You want to study the effect of fertilizers on yield of wheat. We
apply five fertilizers, each of different quality, on five plots of land
each of wheat. The yield from each plot of land is recorded and the
difference in yield among the plots is observed.
The Analysis of Variance
The Analysis of Variance

Use
The one-way analysis of variance (ANOVA) is used to determine
whether there are any statistically significant differences between the
means of three or more independent (unrelated) groups.

R Function
oneway.test(data group, data = object1, var.equal = T)
The Analysis of Variance

Assumptions
I Your variable should be measured at the interval or ratio level.
I The data are independent, which means that there is no
relationship between the observations.
I There should be no significant outliers.
I Your dependent variable should be approximately normally
distributed.
I The variance of each group should be equal.
ANOVA with Welch Correction
ANOVA with Welch Correction

Use
Welchs ANOVA compares three or more means to see if they are
equal. It is an alternative to the Classic ANOVA and can be used even
if your data violates the assumption of homogeneity of variances.

R Function
oneway.test(data group, data = object1, var.equal = F)
ANOVA with Welch Correction

Assumptions
I Your variable should be measured at the interval or ratio level.
I The data are independent, which means that there is no
relationship between the observations.
I There should be no significant outliers.
I Your dependent variable should be approximately normally
distributed.
Kruskal-Wallis Test
Kruskal-Wallis Test

Use
The Kruskal-Wallis H test (sometimes also called the one-way
ANOVA on ranks) is a rank-based nonparametric test that can be
used to determine if there are statistically significant differences be-
tween two or more groups of an independent variable on a continuous
or ordinal dependent variable.

R Function
kruskal.test(data group, data = object1)
Kruskal-Wallis Test

Assumptions
I Your variable should be measured at the interval or ratio level.
I The data are independent, which means that there is no
relationship between the observations.
Is your data No Kruskal-
START normally Wallis
distributed? Test

Yes

Are your ANOVA


No
ANOVA variances with Welch
Yes equal? Correction
Susan Sound predicts that students will learn most effectively with a
constant background sound, as opposed to an unpredictable sound or
no sound at all. She randomly divides twenty-four students into three
groups of eight. All students study a passage of text for 30 minutes.
Those in group 1 study with background sound at a constant volume
in the background. Those in group 2 study with noise that changes
volume periodically. Those in group 3 study with no sound at all.
After studying, all students take a 10 point multiple choice test over
the material. Their scores follow:
Constant 7 4 6 8 6 6 2 9
Random 5 5 3 4 4 7 2 2
No Sound 2 4 7 1 2 1 5 5
A farmer wants to check if there is a significant difference among the
insect sprays he uses on his farm. The counts of insects in agricultural
experimental units treated with different insecticides are listed below.

Spray A 10 7 20 14 14 12 10 23 17 20 14 13
Spray B 11 17 21 11 16 14 17 17 19 21 7 13
Spray C 3 5 3 5 3 6 1 1 3 2 6 4
Spray D 11 9 15 22 15 16 13 10 26 26 24 13

Test, at = 0.05, if his claim is true.


Does physical exercise alleviate depression? We find some depressed
people and check that they are all equivalently depressed to begin
with. Then we allocate each person randomly to one of three groups:
no exercise; 20 minutes of jogging per day; or 60 minutes of jogging
per day. At the end of a month, we ask each participant to rate how
depressed they now feel, on a Likert scale that runs from 1 (totally
miserable) through to 100 (ecstatically happy).

No Exercise 23 26 51 49 58 37 29 44
20 minutes 22 27 39 29 46 48 49 65
60 minutes 59 66 38 49 56 60 56 62

Test, at = 0.05, if there is a significant difference among groups.


Chapter XI
Test of Relationship
Test of Correlation

Use
Test of correlation is a statistical method which enables the re-
searcher to find whether two variables are related and to what extent
they are related. Correlation is considered as the sympathetic move-
ment of two or more variables. We can observe this when a change in
one particular variable is accompanied by changes in other variables
as well, and this happens either in the same or opposite direction,
then the resultant variables are said to be correlated.
Hypotheses

Null Hypothesis
H0 : There is no significant relationship between two variables.

H0 : = 0
Hypotheses

Alternative Hypothesis - Two Tailed


Ha : There is a significant relationship between two variables.

Ha : 6= 0
Hypotheses

Alternative Hypothesis - One Tailed


Ha : There is a significant positive relationship between two
variables.

Ha : > 0
Hypotheses

Alternative Hypothesis - One Tailed


Ha : There is a significant negative relationship between two
variables.

Ha : < 0
Interpretation of Correlation Coefficient

Correlation Coefficent Degree of Correlation


= 1.00 Perfect Positive Correlation
0.80 < 1.00 Very Strong Positive Correlation
0.60 < 0.80 Strong Positive Correlation
0.40 < 0.60 Moderate Positive Correlation
0.20 < 0.40 Weak Positive Correlation
0 < < 0.20 Very Weak Positive Correlation
=0 No Correlation
Interpretation of Correlation Coefficient

Correlation Coefficent Degree of Correlation


= 1.00 Perfect Negative Correlation
0.80 < 1.00 Very Strong Negative Correlation
0.60 < 0.80 Strong Negative Correlation
0.40 < 0.60 Moderate Negative Correlation
0.20 < 0.40 Weak Negative Correlation
0 < < 0.20 Very Weak Negative Correlation
Example Scenarios

Scenario 1
You want to identify the level of linear relationship between the age
and IQ.
Example Scenarios

Scenario 1
You want to identify the level of linear relationship between the age
and IQ.

Scenario 2
You want to check if there is a significant linear relationship between
the behavior and the academic performance of your students.
Pearson Coefficient of Correlation
Pearson Coefficient of Correlation

Use
The Pearson Correlation Coefficient is a measure of the linear
dependence between two variables X and Y, giving a value between
+1 and 1 inclusive, where 1 is total positive linear correlation, 0 is
no linear correlation, and 1 is total negative linear correlation.

R Function
cor.test(object1, object2, method=pearson, alternative= )
Pearson Coefficient of Correlation

Assumptions
I Your data should be measured at the interval or ratio level.
I Each participant or observation should have a pair of values.
I There should be no significant outliers.
I There is a linear relationship between your variables.
I Your variables should be approximately normally distributed.
I The variance of each group should be equal.
Spearman Rank Coefficient of Correlation
Spearman Rank Coefficient of Correlation

Use
The Spearman Rank Correlation Coefficient is a nonparametric
measure of rank correlation between two variables. It assesses how
well the relationship between two variables can be described using a
monotonic function.

R Function
cor.test(object1, object2, method=spearman, alternative= )
Spearman Rank Coefficient of Correlation

Definition
A monotonic relationship is a relationship that does one of the
following:
I as the value of one variable increases, so does the value of the

other variable; or
I as the value of one variable increases, the other variable value

decreases.
Spearman Rank Coefficient of Correlation
Spearman Rank Coefficient of Correlation

Assumptions
I Your data should be measured at least ordinal scale.
I Each participant or observation should have a pair of values.
Yes Yes Equal
START Linear? Normal?
Variances?

No No No Yes

Spearman
Pearson
Rank
Correlation
Correlation
Coefficient
Coefficient
The popular ice cream franchise Coldstone Creamery posted the nu-
tritional information for its ice cream offerings in three serving sizes
- Like it, Love it, and Gotta Have it - on their website. A
portion of that information for the Like it serving is shown in the
table in the next slide.

Analyze the data and identify the level of linear relationship between
Calories and Total Fat (grams) and check if the linear relationship
is significant using Pearson Correlation Coefficient at = 0.05. As-
sume that the data passes the assumptions of Pearson Correlation
Coefficient.
Flavor Calories Total Fat (grams)
Cake Batter 340 19
Cinnamon Bun 370 21
French Toast 330 19
Mocha 320 20
OREO Creme 440 31
Peanut Butter 370 24
Strawberry Cheesecake 320 21
Math is simply beautiful, almost perfect and it takes time
and creativity to uncover its mask.

-Anonymous

Вам также может понравиться