Вы находитесь на странице: 1из 60

Research

Questions,
Variables, and
Hypotheses
2/21/06

1
Review
What are research questions?
What are variables?
Definition
Function

Measurement Scale

2
Hypotheses
OK,now that we know how to set
up a research project by posing
research questions and labeling
variables, it is time to move on to a
more formal way of structuring and
interpreting research.

3
Hypotheses Definitions
Hypotheses are predictions about the
relationship among two or more
variables or groups based on a theory
or previous research (Pittenger, 2003)
Hypotheses are assumptions or
theories that a researcher makes and
tests.
Why are hypotheses important?

4
Importance of Hypotheses
Hypotheses:
Direct our observations
Identifies the variables examined and data to
be collected
Describe a relationship among variables
Can state that as one variable increases, the
other will decrease; as one variables
increases, the other will increase, and so on.
Refer to populations
Hypotheses help researchers infer that
results of a sample will translate to a
population

5
4 Functions of Hypotheses
Hypotheses can:
Estimate Population Characteristics
Correlate Variables

Display Differences among Two or


more populations
Show possible Cause and Effect

What research designs relate to


each of these 4 functions?
6
Symbols used in
Hypotheses
M= mean
(mu: mew)= population mean
Roman Letters (e.g., A, B, C, D) are used to
represent statistics
Greek Letters (e.g., , ) are used to
represent parameters
= significance level; probability of
committing a Type I Error (= .05)
p= probability value (p= .05)
Null Hypothesis= (H0: 1 - 2 = 0 or H0: 1 = 2)
Alternative Hypothesis= (H1: 1-2 0 or H1: 1
2 )
Sometimes you may see it noted as H A
7
Types of Hypotheses
Research Hypotheses
Statistical Hypotheses

8
Research Hypotheses
Research Hypothesis: A statement of
the relationship among two or more
variables or groups.
The acceptance or non-acceptance of
which is based on resolving a logical
alternative with a null hypothesis.

Example: Graduate students who read


the text in research methods will score
higher on their comprehensive exams
than graduate students who did not
read their research methods text. 9
Research Hypotheses
Cont.
Research hypotheses can be stated as
Directional or Non-directional.
Directional hypotheses predict the
specific relationship among two or more
variables or groups:
Graduate students who read the text in
research methods will score higher on their
comprehensive exams than graduate
students who did not read their research
methods text.
H0 : 1 < 2 H1: 1 > 2
IQ scores will correlate in a positive manner
with Self Esteem Scores
Cats will bark less frequently than Dogs 10
Research Hypotheses
Cont.
Non-Directional Hypotheses predict that
there will be differences among two or
more groups, but do not specify the
direction of the differences
Men and Women will differ in their recall of
phone numbers
The scores on the Geriatric Depression Scale
will differ between people with Stroke and
people with Alzheimers disease
IQ scores will correlate with Self Esteem
scores
H 0: 1 = 2 H1 : 1 2 11
Research Hypotheses
Cont.
YourTurn!
Come up with some directional and
non-directional research
hypotheses
Share with the class

12
Statistical Hypotheses
Statistical Hypotheses are mathematical, or logical
statements that help researchers interpret the
results of research

Statistical hypotheses consist of the Null


Hypothesis (H0), the hypothesis of no difference
and the Alternative Hypothesis (H1 or HA) which
is similar in form to the research hypothesis.

Null: (H0: 1 - 2 = 0 )
Alternative: (H1: 1 - 2 0)

13
Written out examples:
Null: There will be no difference in the

comprehensive test scores of graduate


students who read the text in research
methods and those who did not read their
research methods text.

Alternative: Graduate students who read the


text in research methods will score higher on
their comprehensive exams than graduate
students who did not read their research
methods text. 14
Statistical Hypotheses
Cont.
Remember, and this is important:

The null hypothesis always implies that


there is no relation or statistical
difference between variables or groups

The alternative hypothesis implies that


there is a meaningful relationship
among variables or groups

15
Testing Hypotheses
Isit possible that our observations
occurred by chance?

16
A story(borrowed from
Gonick & Smith, 1993)
In the southern US several legal cases
examined racial bias in jury selection.

In the 50s & 60s panels of jurors were


drawn from a list of eligible citizens and
very few African Americans (roughly 4
out of 80) were found on the jury panels
despite the fact that approximately 50%
of the eligible citizens were African
American.

Can this be a result of pure chance? 17


A story - continued
If the selection of potential jurors was random
then the number of African-Americans on an
80 person panel would be X (a binomial
random variable) with n=80 trials and p=.5

This means the chances for getting a panel of


80 jurors with only four African Americans is .
0000000000000000014 or 1.4 X 10-18. These
odds are similar to getting three consecutive
royal flushes in poker.

18
Testing Hypotheses Cont.

We use a variety of statistical procedures


to test null hypotheses. The choice of
which procedure we use depends on a
variety of factors including:

the research hypothesis,


the data,
the sampling strategy,
and what we want to be able to say as a
result of our testing.
19
Types of Tests
Statistical procedures that are commonly used for
hypothesis testing include: correlation, analysis of
variance (ANOVA), analysis of covariance (ANCOVA),
regression, multivariate analysis of variance
(MANOVA), t-tests, and Chi-Square. Each of these
procedures has an associated test statistic, which is
used to determine significance. For example ANOVA,
ANCOVA, and regression use F statistics and their
associated p-values.
Multivariate procedures, like MANOVA, use a variety of
test statistics with interesting names, like Wilks
lambda. These are then related to a more common test
statistic, like F.
The secret here, for the layperson, is that all test
statistics are eventually related to a probability
distribution and a p-value. These p-values mean the
same thing across test statistics. 20
Error Types
In hypothesis testing, we must contend with two
types of errors -- Type I and Type II.
Errors are mistakes that we can make when judging
the null hypothesis

Type I error is what happens when the tested


hypothesis is falsely rejected. (It is when you
say you found something, but that something is
really an error.) A type I error is a false positive.
Type II error is what happens when a false
tested hypothesis is not rejected (Hays, 1986).
(It is when you dont find something that is, in
fact, there.) A type II error is a false negative.
21
Error Types Cont.
Alpha is the level of probability (pre-set by
the researcher) that the tested hypothesis
will be falsely rejected. Alpha is the pre-set risk
of a Type I error. In other words, alpha is the
degree of risk that you accept, in advance of
conducting the study, that what you find will
be an error.
Beta is the probability (often neglected by the
researcher) that a false null hypothesis will not
be rejected. Beta is the probability that you
wont find what you are looking for if, in fact, it
is really there.

22
Error Types Cont.
The picture on the next slide,
which is modified from Hays (1986)
and Ferguson and Takane (1989),
illustrates the relationships among
Type I and Type II errors and alpha
and beta.

23
Error Types Cont.
Error Types Chart
H0 is H1 is
True True

Reject H0 Type I Correct


Decisio 1-
n Fail to
Reject Correct Type II
(decide in 1- 24
favor of
Example
Defendant Defendant
Do we use Null Innocent Guilty
Hypotheses in the
Reject Type I Correct
real world? Presumptio
n of
Error
Innocence
(Guilty
Innocent until Proven Verdict)
Guilty
Fail to Correct Type II
Reject
Presumptio
Error
n of
Innocence
(Not Guilty
Verdict)

25
26
Power, Effect Size, and
Measurement
STATISTICAL POWER

Statistical power is the probability of rejecting a null


hypothesis that is, in fact, false (Williams, 1986, p. 67). Put
more simply, statistical power is the probability of finding
relationships or differences that in fact exist (Cohen, 1988).

In our legal example, power is the probability of being found


guilty, if you committed a crime. In terms of beta (the
probability of a Type II error odds of getting off despite being
guilty) is statistical power = 1 - beta.

So, if you have a 20 percent chance of being found not guilty, even
though you committed the crime, the statistical power of the
judicial system is:

1 - .20 = .80.

27
Power
Statistical power is a function of the preset
significance criterion [alpha], the reliability of
sample results, and the effect size [the actual
size of the difference or strength of the
relationship]... (Cohen, 1988, p. 4).

Considering complex interrelationships of the


above criteria, one can say that The
researcher can easily set alpha, but cannot
easily set beta.
Alpha and beta are directly, but not perfectly related.
Lowering alpha increases beta and lowers the power.
Increasing alpha decreases beta and increases power. 28
Power, Effect Size, and
Measurement Cont.
Statistical power is then related to:
Sample size
Effect size
Statistical design (including
number of groups, 1- vs. 2-tailed
tests)
Significance criteria
29
Power, Effect Size, and
Measurement Cont.
EFFECT SIZE
Effect size (ES) refers to the
amount of common variance
between the independent
variable(s) (IV) and the dependent
variable(s) (DV), or the degree to
which changes in the IV(s) result in
changes in the DV(s).
30
Power, Effect Size, and
Measurement Cont.
Effect Size Cont.

For example, if I am interested in the differences


between male and female UF students concerning
the number of parking tickets over two years, my
effect size would be the size of the difference
between the means of the two groups.

Or, if I wanted to test a specific intervention for


students with learning disabilities, and I had a test,
which I believed measured the effectiveness of my
intervention; then my effect size might be the
difference in test scores between an experimental
group that received the intervention and a control
group that did not receive the intervention.
31
Power, Effect Size, and
Measurement Cont.
Effect Size Cont.
Similarly,if I wanted to examine the
impact of a specific course on research
anxiety, effect size could be the
differences in the mean scores of
research anxiety between an
experimental group who completed
the course and a control group who did
not.
32
Power, Effect Size, and
Measurement Cont.
Effect Size Cont.
Here is a large problem: Effect size depends on what
measure we use to operationalize the construct.
For example, effect size depends on the net we use,
the test we select, etc. Actual effect sizes may be
much larger than observed effect sizes. What might
be considered a moderate to large effect in a
laboratory situation may appear as a small effect in
the real world where you cant control numerous
sources of extraneous variance, e.g., variability in
individual characteristics, treatment implementation,
environmental characteristics (Cohen, 1988).
Small effect sizes are common and should be expected in ex
post facto and quasi experimental situations (Cohen, 1988).

33
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH
DESIGN, ANDSTATISTICAL POWER

This is just a conceptual introduction. We will return


to validity of measurement in a future lecture.

All research depends on an operational definition of


the constructs of interest. In intervention research,
the operational definitions of both the treatments
and the outcomes influence effect size. As we are
all aware, there are a variety of frames of reference
regarding interventions and outcomes.

34
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN,
ANDSTATISTICAL POWER Cont.
Consider the elephant fable with the researchers who
mapped different parts of the elephant. Their descriptions of
the elephant differed considerably.

What we see in research depends, at least in part, on what


facet(s) of the construct of interest is (are) operationalized
by our outcome measure(s). It is always better to look at the
construct in more than one way (more than one facet) in
order to limit threats to validity from mono-operational bias.

In other words, looking at the elephant from different angles


can improve the degree to which our descriptions of the
elephant actually describe the elephant.

35
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH
DESIGN, ANDSTATISTICAL POWER Cont.
Now, consider measuring the same elephant with
portable X-Ray machines. Pictures of each part of the
elephant are taken and then compared with each other.
Not only do these pictures not resemble each other,
but they also dont resemble the descriptions provided
by the previous group of researchers. This chapter of
the elephant fable indicates how what we see is
indicated by our method of observation or
measurement.
Again, a researcher interested in a deeper
understanding of the elephant may choose multiple
methods of measurement in order to avoid threats to
validity from mono-method bias.

36
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH
DESIGN, ANDSTATISTICAL POWER Cont.
The relationship of measurement, research design, and
statistical power means that large treatment effects can
actually be observed as small effects. In other words, even
if an intervention is very effective, measurement and
design complications may make the effect appear small and
thus require high statistical power for detection.

The following telescope model depicts the interrelation. The


effect is obscured when we only look at part of the
construct of interest. The apparent effect size is then
attenuated by the extent to which our operational
definitions (including our measurement techniques) do not
reliably and validly capture the construct of interest (i.e.,
intervention effectiveness).

37
Power, Effect Size, and
Measurement Cont.
Telescope Model
Actual Effect Size Observed Effect Size

Statistical Design
Research Design

Measurement

38
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH
DESIGN, ANDSTATISTICAL POWER Cont.
Apparent effect size is further attenuated when
research design does not fully filter out extraneous
sources of variation (e.g., counselor or client
differences). Violations of assumptions of statistical
procedures can further attenuate effect size.
Interestingly, problems in research design and
statistical design can also introduce sources of Type 1
error (e.g., dust on the lens or false positive results).
The relationship of effect size, measurement, and
design is further complicated by the frame of reference
or angle from which one approaches or operationalizes
the construct. E.g. the researcher that only looks at
the leg of the elephant.

39
Power, Effect Size, and
Measurement Cont.
RELATIONSHIP OF MEASUREMENT, RESEARCH
DESIGN, ANDSTATISTICAL POWER Cont.

Validity is a key element of the relationship of effect


size, measurement, and design. Clearly, qualitative
methods can further valid operationalization of
constructs. Multiple operational definitions and
multiple methods as recommended by Cook and
Campbell (1979) can enhance the validity of
research, including counseling effectiveness
research.

40
Power, Effect Size, and
Measurement Cont.
PRE-ANALYSIS STATISTICAL POWER

Pre-analysis statistical power estimation is a


recommended technique. The following steps will allow
you to consider statistical power in research planning.

1. Estimate effect size from past research and the type of


experimental design planned. When you are unsure,
underestimate effect size so as to overestimate power.
Also, in quasi-experimental or ex post facto
circumstances, it is usually best to estimate a small effect
size unless otherwise indicated.

2. Decide on exact statistical test and significance criterion.

3. Determine acceptable level of power, .80 is nice but .70


may be acceptable in some circumstances.
41
Power, Effect Size, and
Measurement Cont.
PRE-ANALYSIS STATISTICAL POWER
Cont.

4. Use power tables for that statistical test or an appropriate


computer program to determine the number of subjects
required for the specified significance criterion and desired
level of power.

5. If you have a fixed number of subjects, consider adjusting


the significance criterion (alpha) or statistical design if
necessary to obtain adequate power.

Recall, the .05 significance criterion is not sacred, especially


when it results in a power of less than .30 (i.e., less than a
30% chance of finding differences that actually exist).
(Szymanski & Parker, 1992)

42
Power, Effect Size, and
Measurement Cont.
ALPHA INFLATION
Multiple comparisons can increase
alpha, the probability of a Type I error.

The probability of a Type I error


escalates with the number of
comparisons made in the study. The
experiment-wise alpha is computed
as: 1-(1-alpha)n
43
Power, Effect Size, and
Measurement Cont.
ALPHA INFLATION

One way to guard against alpha inflation is to use


a Bonneferoni-type procedure and to split alpha
by the number of comparisons. There are a
variety of such procedures that can be used (see
e.g., Marasciulo & Serlin, 1988) according to the
relative importance of the tested hypotheses.

The problem with reducing alpha is that it inflates


beta. In situations in which alpha inflation is
accepted due to a problem with power, one must
look to replications for confidence in the findings.
44
Power, Effect Size, and
Measurement Cont.
See Power Tables

45
Test Statistics,
Probability, and
Significance
In order to test a hypothesis, we
compare the obtained value of a test
statistic (e.g., the obtained F) to a
critical value of the test statistic (e.g., a
critical F) that is associated with the
preset significance level (alpha).
If the obtained value of the test statistic
is greater than the critical value, we
determine that there is a significant
difference or relationship.

46
Test Statistics,
Probability, and
Significance
Test Statistic: TheCont.
specific statistic (i.e., the
tool) that is chosen to test the null hypothesis.
Examples include F, t, r.
Obtained Value: The actual value obtained
when applying the test statistic to the data of
interest. The probability value associated with
the obtained value is p.
Critical Value: The critical value of the test
statistic that is associated with the chosen
significance level (alpha). If the obtained value is
greater that the critical value, the result is
significant.
47
Test Statistics,
Probability, and
Significance Cont.
Probability Value: The probability that observed
relationships or differences are due to chance.
Alpha: Alpha is also known as significance level or
rejection region. It is the level of probability set by the
researcher as grounds for rejection of the null
hypothesis (Williams, 1986, p. 58). Alpha is the
probability level associated with the critical value of the
test statistic.
In other words, alpha is our predetermined risk that
differences that we declare to be real are actually due
to chance.
Obtained: This is also known as the obtained
probability (p): significance of the test statistic . It is
the probability that the data could have arisen if Ho
were true (Cohen, 1994, p. 998).

48
Test Statistics,
Probability, and
Significance
Cont.
Significance: What happens when the obtained
probability p is less than our predetermined
alpha. Significance also occurs when the obtained
value of the test statistic is greater than the
critical value of the test statistic.
Test Statistic Probability Value
Critical Value Significance Level (alpha)
Obtained Value Obtained or Actual Probability (p)
Note that larger obtained values of test statistics are
generally related with smaller values of p.
If Obtained Value > Critical Value, then * Significance *
If p < Alpha, then * Significance *

49
Revisiting the Bell Curve

50
Test Statistics,
Probability, and
Significance Cont.
Whether you are looking at obtained
values of test statistics in relation to
critical values or you are looking at
actual probability levels, it is important
to note that the test statistics and their
associated probabilities only tell us the
probability that a difference or
relationship occurred by chance

These statistics DO NOT tell us the size


of the group differences or the strength
of relationships. 51
Steps in Hypothesis Testing
for Quantitative Research
Designs
Hypothesis testing is a 4 phase
procedure:
Phase I: Research Hypotheses,
Design, and Variables
Phase II: Statistical Hypotheses

Phase III: Hypotheses Testing

Phase IV: Decision/Interpretation

52
Phase I: Research
Hypotheses, Design, and
Variables
1. State your research hypotheses.
2. Decide on a research design based on your
research problem, your hypotheses, and
what you really want to be able to say about
your results (e.g., if you want to say that A
caused B, you will need an experimental or
time-series design; if probable cause is
sufficient, a quasi-experimental design would
be appropriate).
3. Operationally define your variables. Recall
that one variable can have more than one
operational definition.

53
Phase II: Statistical
Hypotheses
1. Consider your chosen statistical
procedures.
2. Write one statistical null
hypotheses for each operational
definition of each variable that
reflects that statistical operations
to be performed.

54
Phase III: Hypotheses
Testing
Complete the following steps for each statistical null
hypothesis:

1. Select a significance level (alpha).

1. Compute the value of the test statistic (e.g., F, r, t).

1. Compare the obtained value of the test statistics with the critical
value associated with the selected significance level or compare
the obtained p-value with the pre-selected alpha value.

1. If the obtained value of the test statistic is greater than the critical
value (or if the obtained p-value is less than the pre-selected alpha
value), reject the null hypothesis. If the obtained value is less than
the critical value of the test hypothesis, fail to reject the null
hypothesis.

Another way of looking it: If p is less than or equal to


alpha, reject the null hypothesis. 55
Phase IV:
Decision/Interpretation
1. For each research hypothesis, consider the
decisions regarding the statistical null hypotheses.
2. For each research hypothesis, consider qualitative
contextual information relating potential plausibility.
3. Cautiously explain your findings with respect to the
research hypotheses.
4. List and discuss the limitations (threats to valid
inference).
Note: Null hypothesis testing is currently under scrutiny
(see e.g., Cohen, 1994; Kirk, 1996).

It is generally recommended that you report the effect size


along with the value of the test statistic and the p-value.
An alternative is to report confidence intervals.

56
Points to Consider about
Hypotheses Testing
FISHING IN LAKE ALICE
We dont prove the null
hypothesis.
If you go fishing on Lake Alice and
you dont catch fish, you cannot
conclude that there are no fish in the
lake!!!

57
Points to Consider about
Hypotheses Testing Cont.
What circumstances might keep us
from finding fish in the lake? Possible
problems include:
1. wrong or insensitive outcome measures
(using the large net for small fish),
2. sampling problems (looking in the wrong
part of the lake), or
3. methodological problems (scooping the
top of the lake rather than the bottom,
where the fish hang out).

58
Points to Consider about
Hypotheses Testing Cont.
Returning to hypothesis testing:
Failure to reject the null hypothesis cannot
be interpreted as proof that no differences
or relationships exist. Existing differences
or relationships might be obscured by:
1. insensitive outcome measures (the wrong
fishnet),
2. inappropriate statistical designs,
3. poor sampling strategies, and
4. low statistical power.

59
Questions about
Hypotheses

60

Вам также может понравиться