Вы находитесь на странице: 1из 24

Analyzing Experimental Data

Chapter 11

Null Hypothesis Significance Testing (NHST)


Treatment condition means can differ because of:
The treatment effect Error variance (e.g., individual differences,

measurement error)
NSHT allows us to make a decision:
Is the difference large enough to be highly unlikely

(e.g., p < .05) to be due to chance? OR, is the difference small enough that it could reflect error variance?

Null Hypothesis Significance Testing (NHST)


Null hypothesis (H0) the independent variable has

no effect on the dependent variable


Experimental or alternate hypothesis (H1) the

independent variable does have an effect on the dependent variable


Even though we are interested in H1, NHST tests the

null hypothesis (H0).

Null Hypothesis Significance Testing (NHST)


Estimate how much the means might differ due to

error variance even if the independent variable has no effect (H0). If the obtained difference exceeds that amount, reject H0. (the difference is highly unlikely [e.g., p <
.05] to be due to chance)
Otherwise, fail to reject H0 (no significant

difference).
Any differences may have been due to chance (p > .05) Still, we cannot logically conclude that there is no

effect. We might not have enough power.

Errors in Hypothesis Testing


Type I Errors:
Reject H0 (the null hypothesis) when it is actually

true Concluding that there is an effect when there actually is not Alpha level (a) is the probability of making a Type I error.
Type II Errors:
Fail to reject H0 (the null hypothesis) when it is

actually false Concluding that there is not an effect when there actually is Beta () is the probability of making a Type II error

Type I and Type II Errors


Actual Situation No Effect (H0 True) Experimenters Decision Effect Exists (H0 False)
Correct Decision

Reject H0 Type I error


Retain H0
Correct Decision

Type II error

Statistical Power
The power of a test is the probability that the test will

correctly reject a false null hypothesis


i.e., it will detect a treatment effect if one exists Power = 1 , or 1 the probability of making a Type II error

Power is usually estimated before starting a study, using:


Effect size Sample size Alpha level Type of test (one- vs. two-tailed)

Power analysis suggests how many participants are

needed to detect a significant effect at a specific power level.

Influences on Power
Power is increased with larger effect sizes larger sample sizes larger (less stringent) alpha levels (e.g., a = .05) one-tailed (instead of two-tailed tests) tests

Estimating sample size needed a priori with G*Power

Can also estimate post hoc power

Power Illustration
Sampling distribution if H0 is true Sampling distribution if H0 is false

a = probability of making a Type I error = probability of making a Type II error

Power = 1

Power =.9 1-tailed z-test, a = .05


H0

H1

Power =.9 1-tailed z-test, a = .05


H0

H1

Power (1-b)

Power =.85 2-tailed z-test, a = .05


H0

H1

Power =.85 2-tailed z-test, a = .05


H0

H1

Power (1-b)

Effect Size
Statistical significance (rejecting H0) does not

necessarily tell you about the size or practical significance of an effect Statistically significant effects are not always practically significant!
Effect size measures the absolute magnitude of a

treatment effect, independent of sample size

Two Kinds of Effect Size Indicators


Proportion of variance effect size (r2) the

proportion of variance in the dependent variable that is due to the independent variable; ranges from .00 to 1.00
Mean difference effect size (Cohens d) size of

the difference between two means relative to the standard deviation of the data; typically ranges from -3.0 to +3.0

Cohens d (size of difference in standard deviatio


Magnitude of d Evaluation of Effect Size

d = .20
d = .50 d = .80

Small effect
Medium effect Large effect

r2 (percentage of variance accounted for)


Magnitude of r2 Evaluation of Effect Size

r2 = .01 r2 = .09 r2 = .25

Small effect Medium effect Large effect

What power is adequate?


Power of .80 (80% chance of detecting an effect)

is generally considered desirable In psychology, average power levels are about:


.20 for detecting small effect sizes .50 - .60 for detecting medium effect sizes (Clark-Carter, 1997; Cohen, 1962; Rossi, 1990;

Sedlmeier & Gigerenzer, 1989)

Power examples

All 2-tailed tests with a = .05, calculated with GPower


Type of Test Effect size Small, |r|=.10 Correlation Medium, |r|=.30 Number of participants needed to get power = .80 N = 779 N = 82

Large, |r|=.50
Small, d =.20 t-test with equal Ns Medium, d =.50 Large, d =.80 Small, f = .10 2x2 ANOVA Medium, f = .25 (interaction term) Large, f = .40

N = 26
N = 788 N = 128 N = 52 N = 787 N = 128 N = 52

Issues with Null Hypothesis Significance Testing (NHST)


There is nothing sacred about p < .05
What about p = .051? False dichotomy (effect vs. no effect)

NHST cannot directly tell us the probability that

the null hypothesis (H0) is true


We may just not have enough power to detect the

(small) effect In reality, most null hypotheses may be false (Loftus, 1996)
Example H0: m1 = m2 = m3 = m4 This means there are ZERO differences (not even .00001)

in the population

How can we ameliorate the problems with NHST?


Always report effect size in addition to statistical

significance. Use meta-analysis of effect sizes across studies, when possible Try to maintain adequate power (ideally > .80). Report plots and graphs of data, when appropriate. Report confidence intervals (CIs)
E.g., We can be 95% confident that the population

mean difference lies between 0.04 and 1.36 CIs can be used to test hypotheses (does the CI include 0?) CIs also provide a direct estimate of the population means

t-tests
t-tests are used to test the difference between two

means
One-sample t-test: Compare to population mean or

baseline Independent samples t-test: Compare means for two independent groups Related samples or paired t-test: Compare means for two related groups (matched subjects or repeated measures)
Error variance is used to estimate how much the

means of the conditions would differ due to error variance alone. The observed difference between the means is compared with this estimate.

Conducting an independent measures t-test


Calculate a critical t-value using the t-table
Pick alpha level (usually a = .05) Find df

Calculate the obtained t-statistic


Difference between means Standard error of the differences between means

Compare the obtained and critical t-statistics


If obtained t exceeds critical t, reject the null

hypotheses If not, fail to reject the null hypothesis

Hypotheses
Directional hypothesis states which of the two

condition means is expected to be larger


Use a one-tailed test

Nondirectional hypothesis states that the two

means are expected to differ but does not specify which will be larger
Use a two-tailed test

Вам также может понравиться