Data Analysis and Hypothesis Testing

Selection of
Appropriate test for

Analysis
Hypothesis
 A hypothesis is an educated guess about something in the
world around you. It should be testable, either by experiment
or observation. For example:
 If (decrease the amount of water given to herbs) then (the
herbs will increase in size).
Hypothesis Testing
 Hypothesis testing in statistics is a way for you to test the results of
a survey or experiment to see if you have meaningful results. You’re
basically testing whether your results are valid by figuring out the
odds that your results have happened by chance. If your results may
have happened by chance, the experiment won’t be repeatable and
so has little use.
 To test hypothesis all you need to do is:
 Figure out your hypothesis,
 State your hypothesis,
 Choose what kind of test you need to perform,
 Either support or reject the hypothesis.
Steps in Hypothesis Testing
 Formulate Null and Alternative Hypothesis
 Select an appropriate test
 Choose the level of significance
 Collect data and calculate test statistics
 Compare test statistics with critical value
 Reject or do not reject Null hypothesis
Hypothesis Testing
Hypothesis testing is formulated in terms of two hypotheses:
• H0: the null hypothesis
• H1: the alternate hypothesis
The hypothesis we want to test is if H1 is “likely” true. So, there

are two possible outcomes:
• Reject H0 and accept H1 because of sufficient evidence in the sample in favor or H1
• Do not reject H0 because of insufficient evidence to support H1.

Hypothesis testing
A company manufacturing RAM chips claims the defective rate of
the population is 5%. Let p denote the true defective probability.
We want to test if:
• H0 : p = 0.05
• H1 : p > 0.05
We are going to use a sample of 100 chips from the production

to test
Types of errors
 Because we are making a decision based on a finite sample,
there is a possibility that we will make mistakes. The possible
outcomes are:
 Type I Error: The acceptance of H1 when H0 is true is called a
Type I error. The probability of committing a type I error is
called the level of significance and is denoted by α.
 Type II Error: Definition Failure to reject H0 when H1 is true is

called a Type II error. The probability of committing a type II
error is denoted by β.
How to select appropriate statistical test?
 What Kind of Data we are Dealing with
 More specifically, is it a continuous or categorical variable?
 Whether our Data Follow the Normal Distribution or Not?

 There is no need to check distribution in the case of ordinal and nominal data.
Distribution should only be checked in the case of ratio and interval data. If your
data are following the normal distribution, parametric statistical test should be
used and nonparametric tests should only be used when normal distribution is
not followed.
 What is the Aim of the Study?

 Want to study differences or relationship
Scheme of Analysis
Dependent Variable
Categorical Continuous
Chi Square test

Independent t-test
Variable Categorical Logistic regression
One-way ANOVA
McNemar's test.
Logistic regression Correlation

Continuous
Ordinal regression Linear regression
Parametric v/s Non Parametric Test
BASIS FOR COMPARISON PARAMETRIC TEST NONPARAMETRIC TEST
Meaning A statistical test, in which specific A statistical test used in the case
assumptions are made about the of non-metric independent
population parameter is known as variables, is called non-
parametric test. parametric test.
Basis of test statistic Distribution Arbitrary
Measurement level Interval or ratio Nominal or ordinal
Measure of central Mean Median
tendency
Correlation test Pearson Spearman
Parametric Vs Non Parametric Test
 Parametric Tests  Non Parametric Test

Independent Sample t Test Mann-Whitney test
Paired samples t test Wilcoxon signed Rank test
One way Analysis of Kruskal Wallis Test
Variance (ANOVA)
Spearman correlation
Pearson Correlation
Logistic Regression
Linear Regression
Independent t-test
 A t-test is commonly used to determine whether the mean of a
population significantly differs from a specific value (called
the hypothesized mean) or from the mean of another population.
Assumptions
 One independent, categorical variable that has two levels/groups.
 One continuous dependent variable.
 The independent t-test requires that the dependent variable is
approximately normally distributed within each group.
 Homogeneity: the outcome variable has equal standard deviations
in our 2 (sub)populations. Levene's test is used for testing this
assumption.
Paired t -test
 The dependent t-test (called the paired-samples t-test in SPSS
Statistics) compares the means between two related groups on
the same continuous, dependent variable. For example, you
could use a dependent t-test to understand whether there was
a difference in smokers' daily cigarette consumption before and
after a 6 week hypnotherapy programme
 Assumptions
 Your dependent variable should be measured on a continuous scale
 independent variable should consist of matched pairs“ with same
subjects in both groups.
One way Analysis of Variance (ANOVA)
 The One-Way ANOVA ("analysis of variance") compares the
means of two or more independent groups in order to
determine whether there is statistical evidence that the
associated population means are significantly different.
 Dependent variable that is continuous (i.e., interval or ratio level)
 Independent variable that is categorical (i.e., two or more groups)
 Normal distribution (approximately) of the dependent variable for
each group (i.e., for each level of the factor)
 Homogeneity of variances (i.e., variances approximately equal across
groups) Welch statistics.
 Each group should have at least 6 subjects
One way Analysis of Variance (ANOVA)
 H0: µ1 = µ2 = µ3 = ... = µk ("all k population population means
are equal")
H1: At least one µi different ("at least one of the k population
means is not equal to the others")
 Post hoc tests?
Post hoc tests?
 ANOVA test tells you whether you have an overall difference
between your groups, but it does not tell you which specific
groups differed – post hoc tests do.
 Which post hoc test should be use?

 if data met the assumption of homogeneity of variances, use Tukey's
honestly significant difference (HSD) post hoc test.
 If your data did not meet the homogeneity of variances assumption,
you should consider running the Games Howell post hoc test.
Mann-Whitney U Test
 The Mann-Whitney U test is used to compare differences
between two independent groups when the dependent
variable is either ordinal or continuous, but not normally
distributed.
 Assumptions
 dependent variable should be measured at the ordinal or continuous
level.
 independent variable should consist of two categorical, independent
groups.
 when two variables are not normally distributed.
Kruskal Wallis Test
 The Kruskal-Wallis H test (sometimes also called the "one-way
ANOVA on ranks") is a rank-based nonparametric test.
 Kruskal-Wallis H test can be used to understand whether exam
performance, measured on a continuous scale from 0-100,
differed based on test anxiety levels (i.e., your dependent
variable would be "exam performance" and your independent
variable would be "test anxiety level", which has three
independent groups: students with "low", "medium" and
"high" test anxiety levels)
Chi-Square Test of Independence
 The Chi-Square Test of Independence determines whether there
is an association between categorical variables (i.e., whether
the variables are independent or related). It is a nonparametric
test.
 H0: "[Variable 1] is independent of [Variable 2]"
H1: "[Variable 1] is not independent of [Variable 2]“
Assumptions
 Relatively large sample size. Expected frequencies for each cell are at least 1.
 Expected frequencies should be at least 5 for the majority (80%) of the cells.

Data Analysis and Hypothesis Testing

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Analysis and Hypothesis Testing

Загружено:

Авторское право:

Доступные форматы

Selection of

Appropriate test for

The hypothesis we want to test is if H1 is “likely” true. So, there

• Do not reject H0 because of insufficient evidence to support H1.

We are going to use a sample of 100 chips from the production

 Type II Error: Definition Failure to reject H0 when H1 is true is

 Whether our Data Follow the Normal Distribution or Not?

 What is the Aim of the Study?

Chi Square test

Logistic regression Correlation

BASIS FOR COMPARISON PARAMETRIC TEST NONPARAMETRIC TEST

 Parametric Tests  Non Parametric Test

 Which post hoc test should be use?

Вам также может понравиться