Академический Документы
Профессиональный Документы
Культура Документы
Deale et. al. Cognitive behavior therapy for chronic fatigue syndrome: A randomized controlled trial. The American
1
Study design
2
Results
Good outcome
Yes No Total
Treatment 19 8 27
Group
Control 5 21 26
Total 24 29 53
3
Results
Good outcome
Yes No Total
Treatment 19 8 27
Group
Control 5 21 26
Total 24 29 53
3
Results
Good outcome
Yes No Total
Treatment 19 8 27
Group
Control 5 21 26
Total 24 29 53
4
Understanding the results
• Suppose you flip a coin 100 times. While the chance a coin
lands heads in any given coin flip is 50%, we probably won’t
observe exactly 50 heads. This type of fluctuation is part of
almost any type of data generating process.
• The observed difference between the two groups (70 - 19 =
51%) may be real, or may be due to natural variation.
• Since the difference is quite large, it is more believable that
the difference is real.
• We need statistical tools to determine if the difference is so
large that we should reject the notion that it was due to
chance.
4
Generalizing the results
Are the results of this study generalizable to all patients with chronic
fatigue syndrome?
5
Generalizing the results
Are the results of this study generalizable to all patients with chronic
fatigue syndrome?
5
Data basics
Classroom survey
6
Data matrix
variable
↓
Stu. gender intro extra ··· dread
1 male extravert ··· 3
2 female extravert ··· 2
3 female introvert ··· 4 ←
4 female extravert ··· 2 observation
.. .. .. .. ..
. . . . .
86 male extravert ··· 3
7
Types of variables
all variables
numerical categorical
regular
continuous discrete categorical ordinal
8
Types of variables (cont.)
• gender:
9
Types of variables (cont.)
• gender: categorical
9
Types of variables (cont.)
• gender: categorical
• sleep:
9
Types of variables (cont.)
• gender: categorical
• sleep: numerical, continuous
9
Types of variables (cont.)
• gender: categorical
• sleep: numerical, continuous
• bedtime:
9
Types of variables (cont.)
• gender: categorical
• sleep: numerical, continuous
• bedtime: categorical, ordinal
9
Types of variables (cont.)
• gender: categorical
• sleep: numerical, continuous
• bedtime: categorical, ordinal
• countries:
9
Types of variables (cont.)
• gender: categorical
• sleep: numerical, continuous
• bedtime: categorical, ordinal
• countries: numerical, discrete
9
Types of variables (cont.)
• gender: categorical
• sleep: numerical, continuous
• bedtime: categorical, ordinal
• countries: numerical, discrete
• dread:
9
Types of variables (cont.)
• gender: categorical
• sleep: numerical, continuous
• bedtime: categorical, ordinal
• countries: numerical, discrete
• dread: categorical, ordinal - could also be used as numerical
9
Practice
10
Practice
10
Relationships among variables
4.0
GPA
3.5
3.0
0 10 20 30 40 50 60 70
11
Relationships among variables
4.0
GPA
3.5
3.0
0 10 20 30 40 50 60 70
Can you spot anything unusual about any of the data points?
11
Relationships among variables
4.0
GPA
3.5
3.0
0 10 20 30 40 50 60 70
Can you spot anything unusual about any of the data points?
There is one student with GPA > 4.0, this is likely a data error.
11
Explanatory and response variables
12
Two primary types of data collection
13
Two primary types of data collection
13
Association vs. causation
14
http:// xkcd.com/ 552/
Practice
●
● ● ●
●
●
●
●
● ● ●
● ● ● ● ● ● ●
● ●
55 ● ●
●● ●
●
●
●
● ●
●
● ●
●
● ●
●
●
●
● ●
●
●
50 ●
85 90 95 100
15
Practice
●
● ● ●
●
●
●
●
● ● ●
● ● ● ● ● ● ●
● ●
55 ● ●
●● ●
●
●
●
● ●
●
● ●
●
● ●
●
●
●
● ●
●
●
50 ●
85 90 95 100
15
Sampling principles and strategies
Populations and samples
finding-your-ideal-running-form
16
Populations and samples
finding-your-ideal-running-form
16
Populations and samples
finding-your-ideal-running-form
16
Populations and samples
finding-your-ideal-running-form
16
Populations and samples
finding-your-ideal-running-form
finding-your-ideal-running-form
18
Census
18
http:// www.npr.org/ templates/ story/ story.php?storyId=125380052
19
Exploratory analysis to inference
• Sampling is natural.
20
Exploratory analysis to inference
• Sampling is natural.
• Think about sampling something you are cooking - you taste
(examine) a small part of what you’re cooking to get an idea
about the dish as a whole.
20
Exploratory analysis to inference
• Sampling is natural.
• Think about sampling something you are cooking - you taste
(examine) a small part of what you’re cooking to get an idea
about the dish as a whole.
• When you taste a spoonful of soup and decide the spoonful
you tasted isn’t salty enough, that’s exploratory analysis.
20
Exploratory analysis to inference
• Sampling is natural.
• Think about sampling something you are cooking - you taste
(examine) a small part of what you’re cooking to get an idea
about the dish as a whole.
• When you taste a spoonful of soup and decide the spoonful
you tasted isn’t salty enough, that’s exploratory analysis.
• If you generalize and conclude that your entire soup needs
salt, that’s an inference.
20
Exploratory analysis to inference
• Sampling is natural.
• Think about sampling something you are cooking - you taste
(examine) a small part of what you’re cooking to get an idea
about the dish as a whole.
• When you taste a spoonful of soup and decide the spoonful
you tasted isn’t salty enough, that’s exploratory analysis.
• If you generalize and conclude that your entire soup needs
salt, that’s an inference.
• For your inference to be valid, the spoonful you tasted (the
sample) needs to be representative of the entire pot (the
population).
• If your spoonful comes only from the surface and the salt is
collected at the bottom of the pot, what you tasted is probably
not representative of the whole pot.
20
• If you first stir the soup thoroughly before you taste, your
Sampling bias
21
Sampling bias
21
Sampling bias
21
Sampling bias
21
Sampling bias
In 1936, Landon
sought the
Republican
presidential
nomination
opposing the
re-election of FDR.
22
The Literary Digest Poll
23
The Literary Digest Poll – what went wrong?
24
Large samples are preferable, but...
25
Practice
(a) Only I (b) I and II (c) I and III (d) III and (e) Only IV
IV
26
Practice
(a) Only I (b) I and II (c) I and III (d) III and (e) Only IV
IV
26
Observational studies
27
Obtaining good samples
28
Simple random sample
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ● ●
●
●
●
●
●
●
●
● ●
● ●
● ●
● ● ●
● ● ●
●
● ●
●
●
●●
●
● ●
●
●
●
●
● ●
●
● ● ●
●
● ●
● ●
● ● ●
● ●
●
● ● ●
●
● ●
●
● ●
●
●
●
● ● ●
● ●
●
●
●
●
● ●
● ●
●
● ● ●
● ●
●
●
●
● ●
●
● ●
●
● ●
●
●
●
●
●
● ●
● ●
●
● ●
●
● ●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
● ●
Stratified sample ●
●
●
● ●
●
●
●
●
● ●
●
● ● ●
●
● ●
●
● ●
●
●
●
● ● ●
● ●
●
●
●
● ● ●
●
● ●
●
●
●
● ●
● ●
●
●
● ●
random sample
● from each stratum.
●
●
●
● ●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
● ●
● ●
● ● ●
● ●●
● ●
● ●
●
● ● ● ●
● ● ● ●
● ● ●
● ●
Stratum 1 ●
●
● ●●
● ●
●
● ●
●
● ●
●
● ● ●
● ●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●● ●
●
●
●
●
● ●
● ●
● ●
● ●
● ●
●
●
● ●
● ●
●● ● ●
●
●
● ●
● ●
● ●
Stratum 5
30
Cluster sample
Cluster 9
Cluster 2 Cluster 5
●
●
Cluster 7 ●
●
● ●
● ●
● ●
● ●
● ● ● ● ● ●
● ●
● ● ●
● ●●
●●
●
●
Cluster 3 ●
●
● ●
●
● ● ●
● ● ● ● ●
●
● ●● ● ●
●
●● ●● ●
●
●
●
●
●
● ● ● ●
● ● ●●
● ● ●
Cluster 8
● ●
●
●
●
●
●
●
Cluster 4
●
●
● ●
●
●
●
● ● ●
●
● ●
●
● ●
●
●●
●
●
●
● ●
●
●
●● ● ●
●
● ●
●●●
●
●
●
●
● ●
●
●
●● ●
●
● ●
● ● ●●
●
●
● ●
●
●
● ●
●
●
●
● ●
●
●● ●
●
●
●
● ●
● ●
●
●
●
● ●
●
● ●
●
●● ●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
● ●
● ●
● ●
●
●
●
●
Cluster 6 ●
● ●
● ●
●●
●
●●
●
●
Cluster 1
31
Cluster 9
Cluster 5
● ● ● ● ● ●
● ● ●
● ●●
●
●●
●
●
Cluster 3 ●
●
● ●
●
● ● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●●
●
● ● ●
Cluster 8
● ●
●
●
●
●
●
●
Cluster 4
●
●
● ●
●
● ●
● ● ● ●
● ●
● ●
●
●
●●
●
●
●
● ●
●
●
●● ● ●
●
●
●●
●
Clusters are usually not made up●of homogeneous observations.
● ● ●●
●
● ●
●
●
●
●●
● ●
●●
● ●
● ● ● ● ● ●
●
●
●
●
●
●
●● ●
●●
●
● ●
●
●
●
● ●
● ●
● ●
● ●
●
●● ●
●
●
● ●
●
●
●
●
Cluster 6 ●
● ●
● ● ●
●● ●
●
●
simple random
Cluster 1 sample of observations from the sampled clusters.
Cluster 9
Cluster 2 Cluster 5
Index ●
●● ● Cluster 7 ● ● ●
●
● ● ● ● ●
●
● ● ●
● ● ●
●
● ● ●
● ●
● ● ● ● ● ●
● Cluster 3 ●
●
●
●
●●
●
● ● ● ●
● ● ●
● ●
● ●
● ● ●
●
● ●
● ● ●
●
●
●
●
●
●
Cluster 8
● ●
● ●
● ●
●● Cluster 4 ●●
●
●
●
●
● ●
●
●
●
●
● ● ● ●
●
●
●
● ●
● ●
● ●
● ● ● ●
● ● ● ●
●●
●
● ●
●
●●
●
●
●
●
●
●●
●
●
● ●●
● ●
●
●● ●
●
● ● ●
●
●
●
●
●●
●
● ●
● ●
●
● Cluster 6 ●
● ●
● ●
● ●
Cluster 1
32
Practice
33
Practice
33
Experiments
Principles of experimental design
34
More on blocking
35
More on blocking
35
More on blocking
35
More on blocking
A study is designed to test the effect of light level and noise level on
exam performance of students. The researcher also believes that
light and noise levels might have different effects on males and fe-
males, so wants to make sure both genders are equally represented
in each group. Which of the below is correct?
A study is designed to test the effect of light level and noise level on
exam performance of students. The researcher also believes that
light and noise levels might have different effects on males and fe-
males, so wants to make sure both genders are equally represented
in each group. Which of the below is correct?
37
More experimental design terminology...
38
Practice
39
Practice
39
Random assignment vs. random sampling
most
ideal Random No random observational
experiment assignment assignment studies
Causal conclusion, No causal conclusion,
Random correlation statement
generalized to the whole Generalizability
sampling population. generalized to the whole
population.
most bad
Causation Correlation observational
experiments
studies
40