Вы находитесь на странице: 1из 60

Research Questions,

Variables, and
Hypotheses
2/21/06

1
Review
 What are research questions?
 What are variables?
 Definition
 Function

 Measurement Scale

2
Hypotheses
 OK, now that we know how to set up a
research project by posing research
questions and labeling variables, it is
time to move on to a more formal way
of structuring and interpreting research.

3
Hypotheses Definitions
 Hypotheses are predictions about the
relationship among two or more
variables or groups based on a theory
or previous research (Pittenger, 2003)
 Hypotheses are assumptions or theories
that a researcher makes and tests.
 Why are hypotheses important?

4
Importance of Hypotheses
 Hypotheses:
 Direct our observations
 Identifies the variables examined and data to
be collected
 Describe a relationship among variables
 Can state that as one variable increases, the
other will decrease; as one variables increases,
the other will increase, and so on.
 Refer to populations
 Hypotheses help researchers infer that results
of a sample will translate to a population

5
4 Functions of Hypotheses
 Hypotheses can:
 Estimate Population Characteristics
 Correlate Variables

 Display Differences among Two or more


populations
 Show possible Cause and Effect

What research designs relate to each


of these 4 functions?
6
Symbols used in Hypotheses
 M= mean
 µ (mu: mew)= population mean
 Roman Letters (e.g., A, B, C, D) are used to
represent statistics
 Greek Letters (e.g., α, β) are used to represent
parameters
 α= significance level; probability of committing a
Type I Error (α= .05)
 p= probability value (p= .05)
 Null Hypothesis= (H0: µ1 - µ2 = 0 or H0: µ1 = µ2)
 Alternative Hypothesis= (H1: µ1-µ2 ≠ 0 or H1: µ1 ≠
µ2 )
 Sometimes you may see it noted as HA
7
Types of Hypotheses
 Research Hypotheses
 Statistical Hypotheses

8
Research Hypotheses
 Research Hypothesis: A statement of the
relationship among two or more variables or
groups.
 The acceptance or non-acceptance of which
is based on resolving a logical alternative with
a null hypothesis.

 Example: Graduate students who read the


text in research methods will score higher on
their comprehensive exams than graduate
students who did not read their research
methods text. 9
Research Hypotheses Cont.
 Research hypotheses can be stated as
Directional or Non-directional.
 Directional hypotheses predict the specific
relationship among two or more variables or
groups:
 Graduate students who read the text in research
methods will score higher on their comprehensive
exams than graduate students who did not read
their research methods text.
H0: µ1 < µ2 H1: µ1 > µ2
 IQ scores will correlate in a positive manner with
Self Esteem Scores
 Cats will bark less frequently than Dogs
10
Research Hypotheses Cont.
 Non-Directional Hypotheses predict that there
will be differences among two or more
groups, but do not specify the direction
of the differences
 Men and Women will differ in their recall of phone
numbers
 The scores on the Geriatric Depression Scale will
differ between people with Stroke and people with
Alzheimer’s disease
 IQ scores will correlate with Self Esteem scores
H0: µ1 = µ2 H1: µ1 ≠ µ2
11
Research Hypotheses Cont.
 Your Turn!
 Come up with some directional and
non-directional research hypotheses
 Share with the class

12
Statistical Hypotheses
 Statistical Hypotheses are mathematical, or logical
statements that help researchers interpret the results of
research

 Statistical hypotheses consist of the Null Hypothesis


(H0), the hypothesis of no difference and the Alternative
Hypothesis (H1 or HA) which is similar in form to the
research hypothesis.

 Null: (H0: µ1 - µ2 = 0 )
 Alternative: (H1: µ1 - µ2 ≠ 0)

13
 Written out examples:
 Null: There will be no difference in the
comprehensive test scores of graduate students
who read the text in research methods and those
who did not read their research methods text.

 Alternative: Graduate students who read the text


in research methods will score higher on their
comprehensive exams than graduate students
who did not read their research methods text.
14
Statistical Hypotheses Cont.
 Remember, and this is important:

 The null hypothesis always implies that


there is no relation or statistical difference
between variables or groups

 The alternative hypothesis implies that


there is a meaningful relationship among
variables or groups

15
Testing Hypotheses
 Is it possible that our observations
occurred by chance?

16
A story…(borrowed from Gonick
& Smith, 1993)
In the southern US several legal cases
examined racial bias in jury selection.

In the 50’s & 60’s panels of jurors were


drawn from a list of eligible citizens and very
few African Americans (roughly 4 out of 80)
were found on the jury panels despite the
fact that approximately 50% of the eligible
citizens were African American.

Can this be a result of pure chance?


17
A story - continued
If the selection of potential jurors was random – then
the number of African-Americans on an 80 person
panel would be X (a binomial random variable) with
n=80 trials and p=.5

This means the chances for getting a panel of 80


jurors with only four African Americans is
.0000000000000000014 or 1.4 X 10-18. These odds
are similar to getting three consecutive royal flushes
in poker.

18
Testing Hypotheses Cont.

 We use a variety of statistical procedures to


test null hypotheses. The choice of which
procedure we use depends on a variety of
factors including:

 the research hypothesis,


 the data,
 the sampling strategy,
 and what we want to be able to say as a result of
our testing.
19
Types of Tests
 Statistical procedures that are commonly used for
hypothesis testing include: correlation, analysis of
variance (ANOVA), analysis of covariance (ANCOVA),
regression, multivariate analysis of variance (MANOVA), t-
tests, and Chi-Square. Each of these procedures has an
associated test statistic, which is used to determine
significance. For example ANOVA, ANCOVA, and
regression use F statistics and their associated p-values.
 Multivariate procedures, like MANOVA, use a variety of
test statistics with interesting names, like Wilk’s lambda.
These are then related to a more common test statistic,
like F.
 The secret here, for the layperson, is that all test statistics
are eventually related to a probability distribution and a p-
value. These p-values mean the same thing across test
statistics. 20
Error Types
 In hypothesis testing, we must contend with two
types of errors -- Type I and Type II.
 Errors are mistakes that we can make when judging the null
hypothesis

 Type I error is what happens when the tested


hypothesis is falsely rejected. (It is when you say you
found something, but that something is really an
error.) A type I error is a false positive.
 Type II error is what happens when a false tested
hypothesis is not rejected (Hays, 1986). (It is when
you don’t find something that is, in fact, there.) A
type II error is a false negative.
21
Error Types Cont.
 Alpha is the level of probability (pre-set by the
researcher) that the tested hypothesis will be
falsely rejected. Alpha is the pre-set risk of a Type I
error. In other words, alpha is the degree of risk that
you accept, in advance of conducting the study, that
what you find will be an error.
 Beta is the probability (often neglected by the
researcher) that a false null hypothesis will not be
rejected. Beta is the probability that you won’t find
what you are looking for if, in fact, it is really there.

22
Error Types Cont.
 The picture on the next slide, which is
modified from Hays (1986) and
Ferguson and Takane (1989), illustrates
the relationships among Type I and
Type II errors and alpha and beta.

23
Error Types Cont.
 Error Types Chart
H0 is True H1 is True

Reject H0 Type I Correct


α 1- β
Decision
Fail to
Reject Correct Type II
(decide in 1- α β
favor of H0)
24
Example
Defendant Defendant
Do we use Null Innocent Guilty
Hypotheses in the real
world? Reject
Presumption
Type I Correct
of Innocence
Error
(Guilty
“Innocent until Proven Verdict)

Guilty”
Fail to Reject Correct Type II
Presumption
of Innocence
Error
(Not Guilty
Verdict)

25
26
Power, Effect Size, and
Measurement
STATISTICAL POWER

 Statistical power is “the probability of rejecting a null hypothesis


that is, in fact, false” (Williams, 1986, p. 67). Put more simply,
statistical power is the probability of finding relationships or
differences that in fact exist (Cohen, 1988).

 In our legal example, power is the probability of being found


guilty, if you committed a crime. In terms of beta (the
probability of a Type II error – odds of getting off despite being
guilty) is statistical power = 1 - beta.

 So, if you have a 20 percent chance of being found not guilty, even
though you committed the crime, the statistical power of the
judicial system is:

1 - .20 = .80. 27
Power
 Statistical power is a function of “the preset
significance criterion [alpha], the reliability of sample
results, and the effect size [the actual size of the
difference or strength of the relationship]...” (Cohen,
1988, p. 4).

 Considering complex interrelationships of the above


criteria, one can say that The researcher can easily
set alpha, but cannot easily set beta.
 Alpha and beta are directly, but not perfectly related.
 Lowering alpha increases beta and lowers the power.
 Increasing alpha decreases beta and increases power.
28
Power, Effect Size, and
Measurement Cont.
Statistical power is then related to:
• Sample size
• Effect size
• Statistical design (including number of
groups, 1- vs. 2-tailed tests)
• Significance criteria

29
Power, Effect Size, and
Measurement Cont.
EFFECT SIZE
 Effect size (ES) refers to the amount of
common variance between the
independent variable(s) (IV) and the
dependent variable(s) (DV), or the
degree to which changes in the IV(s)
result in changes in the DV(s).

30
Power, Effect Size, and
Measurement Cont.
Effect Size Cont.

 For example, if I am interested in the differences


between male and female UF students concerning
the number of parking tickets over two years, my
effect size would be the size of the difference
between the means of the two groups.

 Or, if I wanted to test a specific intervention for


students with learning disabilities, and I had a test,
which I believed measured the effectiveness of my
intervention; then my effect size might be the
difference in test scores between an experimental
group that received the intervention and a control
group that did not receive the intervention. 31
Power, Effect Size, and
Measurement Cont.
Effect Size Cont.
 Similarly, if I wanted to examine the
impact of a specific course on research
anxiety, effect size could be the
differences in the mean scores of
research anxiety between an
experimental group who completed the
course and a control group who did not.
32
Power, Effect Size, and
Measurement Cont.
Effect Size Cont.
 Here is a large problem: Effect size depends on what
measure we use to operationalize the construct.
 For example, effect size depends on the net we use,
the test we select, etc. Actual effect sizes may be
much larger than observed effect sizes. What might
be considered a moderate to large effect in a
laboratory situation may appear as a small effect in
the real world where you can’t control numerous
sources of extraneous variance, e.g., variability in
individual characteristics, treatment implementation,
environmental characteristics (Cohen, 1988).
 Small effect sizes are common and should be expected in ex
post facto and quasi experimental situations (Cohen, 1988).
33
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH
DESIGN, ANDSTATISTICAL POWER

 This is just a conceptual introduction. We will return


to validity of measurement in a future lecture.

 All research depends on an operational definition of


the constructs of interest. In intervention research,
the operational definitions of both the treatments and
the outcomes influence effect size. As we are all
aware, there are a variety of frames of reference
regarding interventions and outcomes.
34
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH
DESIGN, ANDSTATISTICAL POWER Cont.
 Consider the elephant fable with the researchers who
mapped different parts of the elephant. Their
descriptions of the elephant differed considerably.

 What we see in research depends, at least in part, on


what facet(s) of the construct of interest is (are)
operationalized by our outcome measure(s). It is always
better to look at the construct in more than one way
(more than one facet) in order to limit threats to validity
from mono-operational bias.

 In other words, looking at the elephant from different


angles can improve the degree to which our descriptions
35
of the elephant actually describe the elephant.
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH
DESIGN, ANDSTATISTICAL POWER Cont.
 Now, consider measuring the same elephant with
portable X-Ray machines. Pictures of each part of the
elephant are taken and then compared with each
other. Not only do these pictures not resemble each
other, but they also don’t resemble the descriptions
provided by the previous group of researchers. This
chapter of the elephant fable indicates how what we
see is indicated by our method of observation or
measurement.
 Again, a researcher interested in a deeper
understanding of the elephant may choose multiple
methods of measurement in order to avoid threats to
validity from mono-method bias. 36
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH
DESIGN, ANDSTATISTICAL POWER Cont.
 The relationship of measurement, research design, and
statistical power means that large treatment effects can
actually be observed as small effects. In other words,
even if an intervention is very effective, measurement and
design complications may make the effect appear small
and thus require high statistical power for detection.

 The following telescope model depicts the interrelation.


The effect is obscured when we only look at part of the
construct of interest. The apparent effect size is then
attenuated by the extent to which our operational
definitions (including our measurement techniques) do not
reliably and validly capture the construct of interest (i.e.,
intervention effectiveness). 37
Power, Effect Size, and
Measurement Cont.
 Telescope Model
Actual Effect Size Observed Effect Size

Statistical Design
Research Design
Measurement

38
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH
DESIGN, ANDSTATISTICAL POWER Cont.
 Apparent effect size is further attenuated when
research design does not fully filter out extraneous
sources of variation (e.g., counselor or client
differences). Violations of assumptions of statistical
procedures can further attenuate effect size.
Interestingly, problems in research design and
statistical design can also introduce sources of Type 1
error (e.g., dust on the lens or false positive results).
 The relationship of effect size, measurement, and
design is further complicated by the frame of
reference or angle from which one approaches or
operationalizes the construct. E.g. the researcher
that only looks at the leg of the elephant. 39
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH
DESIGN, ANDSTATISTICAL POWER Cont.

 Validity is a key element of the relationship of effect size,


measurement, and design. Clearly, qualitative methods can
further valid operationalization of constructs. Multiple
operational definitions and multiple methods as
recommended by Cook and Campbell (1979) can enhance
the validity of research, including counseling effectiveness
research.

40
Power, Effect Size, and
Measurement Cont.
PRE-ANALYSIS STATISTICAL POWER

 Pre-analysis statistical power estimation is a recommended


technique. The following steps will allow you to consider
statistical power in research planning.

1. Estimate effect size from past research and the type of


experimental design planned. When you are unsure,
underestimate effect size so as to overestimate power.
Also, in quasi-experimental or ex post facto circumstances,
it is usually best to estimate a small effect size unless
otherwise indicated.

2. Decide on exact statistical test and significance criterion.

3. Determine acceptable level of power, .80 is nice but .70


may be acceptable in some circumstances. 41
Power, Effect Size, and
Measurement Cont.
PRE-ANALYSIS STATISTICAL POWER
Cont.

4. Use power tables for that statistical test or an appropriate


computer program to determine the number of subjects
required for the specified significance criterion and desired
level of power.

5. If you have a fixed number of subjects, consider adjusting


the significance criterion (alpha) or statistical design if
necessary to obtain adequate power.

 Recall, the .05 significance criterion is not sacred, especially


when it results in a power of less than .30 (i.e., less than a
30% chance of finding differences that actually exist).
(Szymanski & Parker, 1992) 42
Power, Effect Size, and
Measurement Cont.
ALPHA INFLATION
 Multiple comparisons can increase
alpha, the probability of a Type I error.

 The probability of a Type I error


escalates with the number of
comparisons made in the study. The
experiment-wise alpha is computed as:
1-(1-alpha)n
43
Power, Effect Size, and
Measurement Cont.
ALPHA INFLATION

 One way to guard against alpha inflation is to use a


Bonneferoni-type procedure and to split alpha by the
number of comparisons. There are a variety of such
procedures that can be used (see e.g., Marasciulo &
Serlin, 1988) according to the relative importance of
the tested hypotheses.

 The problem with reducing alpha is that it inflates


beta. In situations in which alpha inflation is accepted
due to a problem with power, one must look to
replications for confidence in the findings.
44
Power, Effect Size, and
Measurement Cont.
 See Power Tables

45
Test Statistics, Probability,
and Significance
 In order to test a hypothesis, we compare the
obtained value of a test statistic (e.g., the
obtained F) to a critical value of the test
statistic (e.g., a critical F) that is associated
with the preset significance level (alpha).
 If the obtained value of the test statistic is
greater than the critical value, we determine
that there is a significant difference or
relationship.

46
Test Statistics, Probability,
and Significance Cont.
 Test Statistic: The specific statistic (i.e., the tool)
that is chosen to test the null hypothesis. Examples
include F, t, r.
 Obtained Value: The actual value obtained when
applying the test statistic to the data of interest. The
probability value associated with the obtained value
is p.
 Critical Value: The critical value of the test statistic
that is associated with the chosen significance level
(alpha). If the obtained value is greater that the
critical value, the result is significant.

47
Test Statistics, Probability,
and Significance Cont.
 Probability Value: The probability that observed
relationships or differences are due to chance.
 Alpha: Alpha is also known as significance level or
rejection region. It is the level of probability set by
the researcher as grounds for rejection of the null
hypothesis (Williams, 1986, p. 58). Alpha is the
probability level associated with the critical value of
the test statistic.
 In other words, alpha is our predetermined risk that
differences that we declare to be real are actually
due to chance.
 Obtained: This is also known as the obtained
probability (p): significance of the test statistic.
It is the “probability that the data could have arisen if
Ho were true” (Cohen, 1994, p. 998). 48
Test Statistics, Probability,
and Significance Cont.
 Significance: What happens when the obtained
probability p is less than our predetermined alpha.
Significance also occurs when the obtained value of the
test statistic is greater than the critical value of the test
statistic.
 Test Statistic Probability Value
 Critical Value Significance Level (alpha)
 Obtained Value Obtained or Actual Probability (p)
 Note that larger obtained values of test statistics are generally
related with smaller values of p.
 If Obtained Value > Critical Value, then * Significance *
 If p < Alpha, then * Significance *

49
Revisiting the Bell Curve

50
Test Statistics, Probability,
and Significance Cont.
 Whether you are looking at obtained values
of test statistics in relation to critical values
or you are looking at actual probability
levels, it is important to note that the test
statistics and their associated probabilities
only tell us the probability that a difference
or relationship occurred by chance…

 These statistics DO NOT tell us the size of


the group differences or the strength of
relationships.
51
Steps in Hypothesis Testing for
Quantitative Research Designs

 Hypothesis testing is a 4 phase


procedure:
 Phase I: Research Hypotheses,
Design, and Variables
 Phase II: Statistical Hypotheses

 Phase III: Hypotheses Testing

 Phase IV: Decision/Interpretation

52
Phase I: Research Hypotheses,
Design, and Variables
1. State your research hypotheses.
2. Decide on a research design based on your
research problem, your hypotheses, and what you
really want to be able to say about your results
(e.g., if you want to say that A caused B, you will
need an experimental or time-series design; if
probable cause is sufficient, a quasi-experimental
design would be appropriate).
3. Operationally define your variables. Recall that one
variable can have more than one operational
definition.

53
Phase II: Statistical
Hypotheses
1. Consider your chosen statistical
procedures.
2. Write one statistical null hypotheses
for each operational definition of each
variable that reflects that statistical
operations to be performed.

54
Phase III: Hypotheses Testing
 Complete the following steps for each statistical null hypothesis:

1. Select a significance level (alpha).

1. Compute the value of the test statistic (e.g., F, r, t).

1. Compare the obtained value of the test statistics with the critical value
associated with the selected significance level or compare the obtained p-
value with the pre-selected alpha value.

1. If the obtained value of the test statistic is greater than the critical value (or
if the obtained p-value is less than the pre-selected alpha value), reject the
null hypothesis. If the obtained value is less than the critical value of the
test hypothesis, fail to reject the null hypothesis.

 Another way of looking it: If p is less than or equal to alpha,


reject the null hypothesis.
55
Phase IV:
Decision/Interpretation
1. For each research hypothesis, consider the
decisions regarding the statistical null hypotheses.
2. For each research hypothesis, consider qualitative
contextual information relating potential plausibility.
3. Cautiously explain your findings with respect to the
research hypotheses.
4. List and discuss the limitations (threats to valid
inference).
 Note: Null hypothesis testing is currently under scrutiny
(see e.g., Cohen, 1994; Kirk, 1996).

 It is generally recommended that you report the effect


size along with the value of the test statistic and the p-
value. An alternative is to report confidence intervals.
56
Points to Consider about
Hypotheses Testing
 FISHING IN LAKE ALICE
 We don’t prove the null
hypothesis.
 If you go fishing on Lake Alice and you
don’t catch fish, you cannot conclude that
there are no fish in the lake!!!

57
Points to Consider about
Hypotheses Testing Cont.
 What circumstances might keep us from
finding fish in the lake? Possible problems
include:
1. wrong or insensitive outcome measures (using
the large net for small fish),
2. sampling problems (looking in the wrong part of
the lake), or
3. methodological problems (scooping the top of
the lake rather than the bottom, where the fish
hang out).

58
Points to Consider about
Hypotheses Testing Cont.
 Returning to hypothesis testing:
 Failure to reject the null hypothesis cannot
be interpreted as proof that no differences
or relationships exist. Existing differences or
relationships might be obscured by:
1. insensitive outcome measures (the wrong
fishnet),
2. inappropriate statistical designs,
3. poor sampling strategies, and
4. low statistical power.
59
Questions about Hypotheses

60

Вам также может понравиться