Академический Документы
Профессиональный Документы
Культура Документы
October 2, 2017
Identical Twins Case Study
Outline:
1. What’s the population of interest?
October 9, 2017
Assumptions
Since the t-tools are based on computing sample means, they are
not resistant to outliers.
1. reducing skewness
The idea: if our samples don’t meet the t-tool assumptions on the
original scale of measurement, maybe they will on a transformed
scale.
The Log Transformation
√
I Square root transformation, Y.
I only for positive data
I good for counts; measurements of area
1
I Reciprocal transformation, Y
I good for waiting times (e.g., to recurrence, to arrival)
√ Y
I Arcsin square root, arcsin Y , and logit, log 1−Y ,
transformations
I good for proportion data (between 0 and 1)
Multiplicative Treatment Effect
log(Y ∗ ) = log(Y ) + γ
Multiplicative Treatment Effect
Y ∗ = e log(Y )+γ
= e log(Y ) e γ
= Ye γ
Y ∗ = Ye γ .
Multiplicative Treatment Effect
(Display 3.9)
Some Caution
I I’ve written Y ∗ = Y + δ
October11, 2017
Log Transformations
(2) If the two samples are paired then you should use the paired
t-test.
I You should still examine a plot of the two samples, with lines
connecting the pairs, to make sure that any difference in the
pairs can be adequately explained by an additive effect.
I Starting on Friday we’ll look at some alternatives to the paired
t-test.
Assumptions
(3) If the two underlying observations from which the samples are
obtained are Normally shaped, then the t-distribution is the
exact distribution for evaluating differences in population
means, regardless of sample sizes and regardless of differences
in variance.
I It’s based on the ranks (i.e., the order) of the data rather than
the data themselves, making it useful when there are censored
observations.
H0 : δ = 0
It turns out that this approximation works well except when the
sample sizes are small (e.g., 5 or so), or there are a lot of ties (i.e.,
so many of the ranks are the same).
Rank-Sum in R
The two approaches give virtually identical results when there are
not a lot of ties, and when the sample sizes are relatively large.
The Challenger Data
Launch
Temperature Number of O-ring incidents
Below 65◦ F 1 1 1 3
0 0 0 0 0 0 0 0 0 0
Above 65◦ F 0 0 0 0 0 0 0 1 1 2
The Challenger Data
I The sample sizes are small (n1 = 4, n2 = 20) and not the
same.
1. Denote by Tobs the value of the test statistic for the observed
data.
n! = n × (n − 1) × (n − 2) × · · · × 1.
The Number of Permutations
First, for the Challenger data, let’s write down the specific
permutations that result in a sum of 6 or more in the cold
temperature group:
This is a test that is designed for paired data (e.g., the identical
twins data).
I The idea:
I count the number of pairs in which the observation in group 1
exceeds that in group 2
I under the null hypothesis of no difference, this count should be
roughly one half of the pairs
In R, you can perform the sign test using the binom.test function.
The Wilcoxon Signed-Rank Test
Like the Sign Test, this is distribution-free and resistant test for
paired samples.
s12
F = ,
s22
where s12 and s22 are the sample variances from samples 1 and 2,
respectively.
Provided that the two samples are drawn from Normal populations,
the sampling distribution of F is an F-distribution with n1 − 1 and
n2 − 2 df.
var.test in R.