Chapter 11

Analyzing Experimental Data
Chapter 11
Null Hypothesis Significance Testing (NHST)

Treatment condition means can differ because of:
The treatment effect Error variance (e.g., individual differences,
measurement error)
NSHT allows us to make a decision:
Is the difference large enough to be highly unlikely
(e.g., p < .05) to be due to chance? OR, is the difference small enough that it could reflect error variance?

Null hypothesis (H0) the independent variable has
no effect on the dependent variable

Experimental or alternate hypothesis (H1) the
independent variable does have an effect on the dependent variable

Even though we are interested in H1, NHST tests the
null hypothesis (H0).

Estimate how much the means might differ due to
error variance even if the independent variable has no effect (H0). If the obtained difference exceeds that amount, reject H0. (the difference is highly unlikely [e.g., p <
.05] to be due to chance)
Otherwise, fail to reject H0 (no significant
difference).
Any differences may have been due to chance (p > .05) Still, we cannot logically conclude that there is no
effect. We might not have enough power.
Errors in Hypothesis Testing

Type I Errors:
Reject H0 (the null hypothesis) when it is actually
true Concluding that there is an effect when there actually is not Alpha level (a) is the probability of making a Type I error.
Type II Errors:
Fail to reject H0 (the null hypothesis) when it is
actually false Concluding that there is not an effect when there actually is Beta () is the probability of making a Type II error
Type I and Type II Errors

Actual Situation No Effect (H0 True) Experimenters Decision Effect Exists (H0 False)
Correct Decision
Reject H0 Type I error

Retain H0
Correct Decision
Type II error
Statistical Power
The power of a test is the probability that the test will
correctly reject a false null hypothesis

i.e., it will detect a treatment effect if one exists Power = 1 , or 1 the probability of making a Type II error
Power is usually estimated before starting a study, using:

Effect size Sample size Alpha level Type of test (one- vs. two-tailed)
Power analysis suggests how many participants are
needed to detect a significant effect at a specific power level.
Influences on Power
Power is increased with larger effect sizes larger sample sizes larger (less stringent) alpha levels (e.g., a = .05) one-tailed (instead of two-tailed tests) tests
Estimating sample size needed a priori with G*Power
Can also estimate post hoc power
Power Illustration
Sampling distribution if H0 is true Sampling distribution if H0 is false
a = probability of making a Type I error = probability of making a Type II error
Power = 1
Power =.9 1-tailed z-test, a = .05

H0
H1

H0
H1
Power (1-b)

H0
H1

H0
H1
Power (1-b)
Effect Size
Statistical significance (rejecting H0) does not
necessarily tell you about the size or practical significance of an effect Statistically significant effects are not always practically significant!
Effect size measures the absolute magnitude of a
treatment effect, independent of sample size
Two Kinds of Effect Size Indicators

Proportion of variance effect size (r2) the
proportion of variance in the dependent variable that is due to the independent variable; ranges from .00 to 1.00
Mean difference effect size (Cohens d) size of
the difference between two means relative to the standard deviation of the data; typically ranges from -3.0 to +3.0
Cohens d (size of difference in standard deviatio

Magnitude of d Evaluation of Effect Size
d = .20
d = .50 d = .80
Small effect
Medium effect Large effect
r2 (percentage of variance accounted for)

Magnitude of r2 Evaluation of Effect Size
r2 = .01 r2 = .09 r2 = .25
Small effect Medium effect Large effect
What power is adequate?

Power of .80 (80% chance of detecting an effect)
is generally considered desirable In psychology, average power levels are about:

.20 for detecting small effect sizes .50 - .60 for detecting medium effect sizes (Clark-Carter, 1997; Cohen, 1962; Rossi, 1990;
Sedlmeier & Gigerenzer, 1989)
Power examples
All 2-tailed tests with a = .05, calculated with GPower

Type of Test Effect size Small, |r|=.10 Correlation Medium, |r|=.30 Number of participants needed to get power = .80 N = 779 N = 82
Large, |r|=.50
Small, d =.20 t-test with equal Ns Medium, d =.50 Large, d =.80 Small, f = .10 2x2 ANOVA Medium, f = .25 (interaction term) Large, f = .40
N = 26
N = 788 N = 128 N = 52 N = 787 N = 128 N = 52
Issues with Null Hypothesis Significance Testing (NHST)

There is nothing sacred about p < .05
What about p = .051? False dichotomy (effect vs. no effect)
NHST cannot directly tell us the probability that
the null hypothesis (H0) is true

We may just not have enough power to detect the
(small) effect In reality, most null hypotheses may be false (Loftus, 1996)
Example H0: m1 = m2 = m3 = m4 This means there are ZERO differences (not even .00001)
in the population
How can we ameliorate the problems with NHST?

Always report effect size in addition to statistical
significance. Use meta-analysis of effect sizes across studies, when possible Try to maintain adequate power (ideally > .80). Report plots and graphs of data, when appropriate. Report confidence intervals (CIs)
E.g., We can be 95% confident that the population
mean difference lies between 0.04 and 1.36 CIs can be used to test hypotheses (does the CI include 0?) CIs also provide a direct estimate of the population means
t-tests
t-tests are used to test the difference between two
means
One-sample t-test: Compare to population mean or
baseline Independent samples t-test: Compare means for two independent groups Related samples or paired t-test: Compare means for two related groups (matched subjects or repeated measures)
Error variance is used to estimate how much the
means of the conditions would differ due to error variance alone. The observed difference between the means is compared with this estimate.
Conducting an independent measures t-test

Calculate a critical t-value using the t-table
Pick alpha level (usually a = .05) Find df
Calculate the obtained t-statistic

Difference between means Standard error of the differences between means
Compare the obtained and critical t-statistics

If obtained t exceeds critical t, reject the null
hypotheses If not, fail to reject the null hypothesis
Hypotheses
Directional hypothesis states which of the two
condition means is expected to be larger

Use a one-tailed test
Nondirectional hypothesis states that the two
means are expected to differ but does not specify which will be larger
Use a two-tailed test

Chapter 11

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Chapter 11

Загружено:

Авторское право:

Доступные форматы

Analyzing Experimental Data

Null Hypothesis Significance Testing (NHST)

Null Hypothesis Significance Testing (NHST)

no effect on the dependent variable

independent variable does have an effect on the dependent variable

null hypothesis (H0).

Null Hypothesis Significance Testing (NHST)

effect. We might not have enough power.

Errors in Hypothesis Testing

Type I and Type II Errors

Reject H0 Type I error

correctly reject a false null hypothesis

Power is usually estimated before starting a study, using:

Power analysis suggests how many participants are

needed to detect a significant effect at a specific power level.

Estimating sample size needed a priori with G*Power

Can also estimate post hoc power

a = probability of making a Type I error = probability of making a Type II error

Power =.9 1-tailed z-test, a = .05

Power =.9 1-tailed z-test, a = .05

Power =.85 2-tailed z-test, a = .05

Power =.85 2-tailed z-test, a = .05

treatment effect, independent of sample size

Two Kinds of Effect Size Indicators

Cohens d (size of difference in standard deviatio

r2 (percentage of variance accounted for)

r2 = .01 r2 = .09 r2 = .25

Small effect Medium effect Large effect

What power is adequate?

is generally considered desirable In psychology, average power levels are about:

Sedlmeier & Gigerenzer, 1989)

All 2-tailed tests with a = .05, calculated with GPower

Issues with Null Hypothesis Significance Testing (NHST)

NHST cannot directly tell us the probability that

the null hypothesis (H0) is true

How can we ameliorate the problems with NHST?

Conducting an independent measures t-test

Calculate the obtained t-statistic

Compare the obtained and critical t-statistics

hypotheses If not, fail to reject the null hypothesis

condition means is expected to be larger

Nondirectional hypothesis states that the two

Вам также может понравиться