Вы находитесь на странице: 1из 9

AMS572.

01 Final Exam Fall, 2013

Name ___________________________________ID ______________________Signature________________________


Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please provide
complete solutions for full credit. The exam goes from 11:15am - 1:45pm. (*Extended time at the DSS as required.*)
Calculator is allowed. Please use the given statistical tables. Good luck!

1. Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or
more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales
are shown below.
Number of TV Ads 1 3 2 1 3
Number of Cars Sold 14 24 18 17 27
(a) Find the least squares regression line.
(b) Test at α = 0.05 whether there is a significant linear relationship between these two variables.
(c) What percentage of variation in numbers of cars sold is explained by the number of TV ads?
(d) Please write up the entire SAS code necessary to answer questions (a), (b), (c) above. In addition, please
write up a SAS program to compute the sample correlation coefficient between the two variables and to
test whether the corresponding population correlation is zero or not.
Solution: This is a simple linear regression problem.
(a) 𝑛 = 5, 𝑥̅ = 2, 𝑦̅ = 20
𝑆𝑥𝑦 = ∑ 𝑥𝑦 − 𝑛𝑥̅ 𝑦̅ = 220 − 5 ∗ 2 ∗ 20 = 20

𝑆𝑥𝑥 = ∑ 𝑥 2 − 𝑛𝑥̅ 2 = 24 − 5 ∗ 22 = 4

𝑆𝑦𝑦 = ∑ 𝑦 2 − 𝑛𝑦̅ 2 = 2114 − 5 ∗ 202 = 114


𝑆𝑥𝑦 20
𝛽̂1 = = =5
𝑆𝑥𝑥 4
𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅ = 20 − 5 ∗ 2 = 10
The fitted least square regression line is:
𝑦̂ = 10 + 5𝑥
(*Dear students, many of you forgot to add the hat to Y – then I realized that we had such a typo in the
homework solutions. So you are forgiven and no point was taken. But please do remember the hat for the future.)

(b) The mean square error estimate of σ is:


𝑆𝑆𝐸 𝑆𝑆𝑇 − 𝑆𝑆𝑅 𝑆𝑦𝑦 − 𝛽̂12 𝑆𝑥𝑥 114 − 52 ∗ 4
̂ = √𝑀𝑆𝐸 =
σ √ = √ = √ =√ = 2.16
𝑛−2 𝑛−2 𝑛−2 5−2
The hypotheses are: 𝐻0 : 𝛽1 = 0 versus 𝐻𝑎 : 𝛽1 ≠ 0
Test statistic:
𝛽̂1 − 0 𝛽̂1 5
𝑡0 = = = = 4.63 > 𝑡3,0.025 = 3.182
SE(𝛽̂1 ) ̂
σ 2.16
√𝑆𝑥𝑥 √4
Therefore we reject the null hypothesis at α = 0.05 and conclude that there is a significant linear
relationship between these two variables.
(c)
2
𝑆𝑥𝑦 2 202
𝑅 = = = 0.877
𝑆𝑥𝑥 𝑆𝑦𝑦 4 ∗ 114
Therefore we claim that 87.7% of variation in number of cars sold is explained by the number of TV ads. .
(d)
1
Data carsell;
input x y;
datalines;
1 14
3 24
2 18
1 17
3 27;
run;

proc reg data = carsell;


model y = x;
run;

proc corr data = carsell;


var x y;
run;

2. A firm wishes to compare four programs for training workers to perform a certain manual task. Twenty new
employees are randomly assigned to the training programs, with 5 in each program. At the end of the training
period, a test is conducted to see how quickly trainees can perform the task. The number of times the task is
performed per minute is recorded for each trainee, with the following results:
Observation Program 1 Program 2 Program 3 Program 4
1 9 10 12 9
2 12 6 14 8
3 14 9 11 11
4 11 9 13 7
5 13 10 11 8
(a) Using the hypothetical data provided below, test at α = 0.05 whether the four training programs are
equally effective. What assumptions are necessary for your test?
(b) Please write up the entire SAS code necessary to answer question (a) above.
(c) Please compare training programs 3 and 4 using the usual pooled-variance t-test at the significance level
α = 0.05.
(d) At α = 0.05, please compare training programs 3 and 4 using an optimal test – that is, the best test you
can find based on the given data – this test should be better than the pooled variance t-test in part (c).
(e) Please derive your optimal test in part (d) using the pivotal quantity method under the same assumptions
mentioned in part (a), and for the following general setting, at the significance level α.
Observation Program 1 Program 2 ⋯ Program k
1 X11 X21 ⋯ Xk1
2 X12 X22 ⋯ Xk2
⋮ ⋮ ⋮ ⋮ ⋮
n X1n X2n ⋯ Xkn
(f) (extra credit) Please derive your optimal test in part (d) using the likelihood ratio test method using the
same assumptions and general setting given in (e). Prove whether the tests in (e) and (f) are equivalent.

Solution: This is a one-way ANOVA problem with 4 independent samples.


(a) We need to perform an ANOVA F-test. The first assumption is that all four populations are normal. The
second is that all four population variances are unknown but equal.

H0 : μ1 = μ2 = μ3 = μ4
Ha : the above is not true

Analysis of Variance
Source SS d.f. MS F
2
Training Program 54.95 3 18.32 7.04
Error 41.6 16 2.6
Total 96.55 19

Since F0 = 7.04 > F3,16,0.05 = 3.24, we reject the null hypothesis, and claim that the four training programs
are not equally effective.
.(b)
data training;
input program speed;
datalines;
;
1 9
1 12
1 14
1 11
1 13
2 10
2 6
2 9
2 9
2 10
3 12
3 14
3 11
3 13
3 11
4 9
4 8
4 11
4 7
4 8
;
run;

proc anova data = training;


class program;
model speed = program;
run;

(c) By the ANOVA assumption, we assume that both populations are normal, and the population variances are
unknown but equal (  12   22 ).
Now we perform the pooled variance t-test to test whether the two population means are equal.
H 0 :  3   4  0 versus H a :  3   4  0

n1  1S 32  n2  1S 42 5  1 *1.7  5  1 * 2.3  2


S 
2

n3  n 4  2 552
p

Test Statistic :

T0 
X 3  X 4  0

12.2  8.6
 4.02
S p 1 / n3  1 / n 4 1 1
2*   
5 5

3
 T0  4.02  t 8, 0.025  2.306 (p-value = 0.00384, 2-sided)

∴ We reject the null hypothesis at 𝛂 = 𝟎. 𝟎𝟓, and conclude that there is evidence of a difference in mean
speed between these two training programs.

(b) Derivation of the pooled-variance t-test (2-sided test) using the pivotal quantity approach
Suppose we have k independent random samples each of size n from k normal populations with unknown but
equal population variances: Xi1 , Xi2 , ⋯ , Xin ~N(μi , σ2 ), i = 1, ⋯ , k. Here is a simple outline of the derivation of
the test: H 0 :  i   j  0 versus H a :  i   j  0 , where 1 ≤ i ≠ j ≤ k, using the pivotal quantity approach.

[1]. We start with the point estimator for the parameter of interest  i   j  : ̅
Xi − ̅
Xj . Its distribution is
 
N i   j ,  2 1/ n  1/ n using the mgf for N  ,  2   
which is M t   exp t   2 t 2 / 2 , and the
X  X j    i   j 
independence properties of the random samples. From this we have Z  ~ N 0,1 .
i

 2/ n
Unfortunately, Z can not serve as the pivotal quantity because σ is unknown.
[2]. We next look for a way to get rid of the unknown σ following a similar approach in the construction of the
 
pooled-variance t-statistic. We found that W  n  1S12  n  1S 22  n  1S k2 /  2 ~  k2n1 using the mgf
k/2
1
for  k2 which is M t     , and the independence properties of the random samples.
 2t 
[3]. Then we found, from the theorem of sampling from the normal population, and the independence properties
of the random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we
Z X i  X j   i   j 
have obtained our pivotal quantity: T   ~ t k n 1 , where
W /k n  1 S 2/n
S12  S 22   S k2
S 
2
 MSE is the pooled sample variance from all k samples.
k
[4]. The rejection region is derived from PT0  c | H 0    , where T0 
X X j   0 H0
i
~ t k n 1 . Thus
S 2/n
c  t k ( n 1), / 2 . Therefore at the significance level of α, we reject H 0 in favor of H a iff T0  t k n1, / 2
For the given problem, we have: H 0 :  3   4  0 versus H a :  3   4  0

X  X 4   0  12.2  8.6  3.53


Test statistic:
T0  3
S 2/n 1 1
2.6 *   
5 5
 T0  3.53  t16, 0.025  2.12 (p-value = 0.00278, 2-sided -- we can see that this p-value is indeed smaller than the
pooled-variance t-test in part (c) because this t-test is more optimal, with the largest degree of freedom possible.)
∴ We reject the null hypothesis at 𝛂 = 𝟎. 𝟎𝟓, and conclude that there is evidence of a difference in mean
speed between these two training programs.

(c) Derivation of the pooled-variance t-test (2-sided test) using the likelihood ratio test approach

Given that we have two independent random samples from two normal populations with equal but unknown
variances. Now we derive the likelihood ratio test for:

4
𝐻0 : 𝜇𝑖 = 𝜇𝑗 𝑣𝑠 𝐻𝑎 : 𝜇𝑖 ≠ 𝜇𝑗 --- Without loss of generality, for the sake of simplicity, we will set 𝑖 = 1, 𝑗 = 2 for
the derivation of the likelihood ratio test.

Let 𝜇1 = 𝜇2 = 𝜇, then,
={−∞ < 𝜇1 = 𝜇2 = 𝜇, 𝜇3 , 𝜇4 , ⋯ , 𝜇𝑘 < +∞, 0 ≤ 𝜎 2 < +∞},
𝛺 = {−∞ < 𝜇1 , 𝜇2 , 𝜇3 , 𝜇4 , ⋯ , 𝜇𝑘 < +∞, 0 < 𝜎 2 < +∞}
𝑘𝑛
1 1
𝐿(𝜔) = 𝐿(𝜇, 𝜇3 , 𝜇4 , ⋯ , 𝜇𝑘 , 𝜎 2 ) = (2𝜋𝜎2 ) 2 𝑒𝑥𝑝[− 2𝜎2 (∑𝑛𝑚=1(𝑥1𝑚 − 𝜇)2 + ∑𝑛𝑚=1(𝑥2𝑚 − 𝜇)2 +
∑𝑘𝑙=3 ∑𝑛𝑚=1(𝑥𝑙𝑚 − 𝜇𝑙 )2 )],
and there are k parameters .
𝑘𝑛 1
𝑙𝑛𝐿(𝜔) = − 2 𝑙𝑛(2𝜋𝜎 − 2𝜎2 (∑𝑚=1(𝑥1𝑚 − 𝜇)2 + ∑𝑛𝑚=1(𝑥2𝑚 − 𝜇)2 + ∑𝑘𝑙=3 ∑𝑛𝑚=1(𝑥𝑙𝑚 − 𝜇𝑙 )2 ), for it
2) 𝑛

contains k parameters, we do the partial derivatives with 𝜇, 𝜇3 , 𝜇4 , ⋯ , 𝜇𝑘 𝑎𝑛𝑑 𝜎 2 respectively and let the partial
derivatives equal to 0. Then we have:
𝑋̅1 + 𝑋̅2
𝜇̂ =
2

𝜇̂ 𝑙,𝜔 = 𝑋̅𝑙 , 𝑙 = 3, ⋯ , 𝑘

1 𝑛 𝑛 𝑘 𝑛 2
𝜎̂2
𝜔 = [∑ (𝑥1𝑚 − 𝜇̂ )2 + ∑ (𝑥2𝑚 − 𝜇̂ )2 + ∑ ∑ (𝑥𝑙𝑚 − 𝜇̂ 𝑙,𝜔 ) ]
𝑘𝑛 𝑚=1 𝑚=1 𝑙=3 𝑚=1

𝑘𝑛
1 1
𝐿(𝛺) = 𝐿(𝜇1 , 𝜇2 , 𝜇3 , 𝜇4 , ⋯ , 𝜇𝑘 , 𝜎 2 ) = (2𝜋𝜎2 ) 2 𝑒𝑥𝑝[− 2𝜎2 (∑𝑘𝑙=1 ∑𝑛𝑚=1(𝑥𝑙𝑚 − 𝜇𝑙 )2 )],
and there are (𝑘 + 1) parameters.
𝑘𝑛 1 𝑘 𝑛
𝑙𝑛𝐿(𝛺) = − 𝑙𝑛(2𝜋𝜎 2 ) − 2 (∑ ∑ (𝑥𝑙𝑚 − 𝜇𝑙 )2 )
2 2𝜎 𝑙=1 𝑚=1
We do the partial derivatives with 𝜇1 , 𝜇2 , 𝜇3 , 𝜇4 , ⋯ , 𝜇𝑘 and σ2 respectively and let them all equal to 0. Then we
have:
𝜇̂ 𝑙,𝛺 = 𝑋̅𝑙 , 𝑙 = 1,2,3, ⋯ , 𝑘

𝑘 𝑛
̂2 = 1 [∑ ∑
𝜎
2
(𝑥𝑙𝑚 − 𝜇̂ 𝑙,𝛺 ) ] =
𝑛−1
𝑀𝑆𝐸
𝛺
𝑘𝑛 𝑙=1 𝑚=1 𝑛
𝑛−1 2
= 𝑆 (𝑤ℎ𝑒𝑟𝑒 𝑆 2 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑜𝑜𝑙𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑓𝑟𝑜𝑚 𝑎𝑙𝑙 𝑘 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑎𝑠 𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑖𝑛 𝑝𝑎𝑟𝑡 𝑑)
𝑛
At this time, we have done all the estimation of parameters. Then, after some cancellations/simplifications, we
have:
𝑘𝑛
2
1 𝑘𝑛
( ̂2 ) ̂2 2
𝐿(𝜔̂) 2𝜋𝜎𝜔 𝜎
𝜆= = = [ 𝛺]
𝐿(𝛺̂ ) 2
𝑘𝑛
𝜎̂2
𝜔
1
( ̂2 )
2𝜋𝜎𝛺
𝑘𝑛
2 2
∑𝑘𝑙=1 ∑𝑛𝑚=1(𝑥𝑙𝑚 − 𝜇̂ 𝑙,𝛺 )
=[ 2]
∑𝑛𝑚=1(𝑥1𝑚 − 𝜇̂ )2 + ∑𝑛𝑚=1(𝑥2𝑚 − 𝜇̂ )2 + ∑𝑘𝑙=3 ∑𝑛𝑚=1(𝑥𝑙𝑚 − 𝜇̂ 𝑙,𝜔 )

5
𝑡02 𝑘𝑛
= [1 + ]− 2
𝑘(𝑛 − 1)
where 𝑡0 is the test statistic in the pooled variance t-test. Therefore, 𝜆 ≤ 𝜆∗ is equivalent to |𝑡0 |≥ 𝑐. Thus at the
significance levelα, we reject the null hypothesis in favor of the alternative when |𝑡0 | ≥ c = 𝑡𝑘(𝑛−1),𝛼/2 . This
test is identical to the test we have derived in part (b).

3. In order to test the accuracy of speedometers purchased from a subcontractor, the purchasing department of
an automaker orders a test of a sample of speedometers at a controlled speed of 55 mph. At this speed, it is
estimated that the variance of the readings is 1.
(a) How many speedometers need to be tested to have a 95% power to detect a bias of 0.5 mph or greater
using a 0.01 level test?
(b) A sample of the size determined in (a) has a mean of 55.2 and standard deviation of 0.8. Can you
conclude that the speedometers have a bias?
(c) Calculate the power of the test if 50 speedometers are tested and the actual bias is 0.5 mph. Assume a
population standard deviation of 0.8.

Solution:

 H 0 :   0  55
(a) 
 H a :   a  55.5  55
power  0.95    0.05.   1,   0.01 .

( z  z ) 2  2 (2.326  1.645)212 (2.326  1.645)2 0.82


n   63.1  64 (*Note, if   0.8 , n   40.4  41 )
(  a  0 ) 2 (55.5  55) 2 (55.5  55)2
Hence, 64 packages of cereal speedometers need to be tested. (*Note, only 41 packages are needed if   0.8 )

 H 0 :   0  55
(b) 
 H a :   55
s  0.8, n  64,   0.01 . X  55.2 .

X  0 55.2  55 X  0 H 0
z0    2 . (*Note, Z 0  ~ N  0,1 -- This is the large sample z-test by the central limit
s / n 0.8 / 64 s/ n
theorem that is suitable even if the population distribution is not normal.)
Since z0  2  Z0.01  2.326 , we can not conclude that the speedometers have a bias.

(**Note: Here you can also use the t-test – but remember to mention that the t-test is suitable if we assume the population
distribution is normal!)
(c)   0.8,   0.01, n  50

 H 0 :   0  55

 H a :   a  55.5  55

6
Power  P (reject H 0 | H a )
 P ( Z 0  z0.01 |   55.5)
X  0
 P(  z0.01 |   55.5)
/ n
X  a   0
 P(  z0.01  a |   55.5)
/ n / n
55.5  55
 P ( Z  2.326  )
0.8 / 50
 P ( Z  2.09)  0.9817

4. You are an epidemiologist for the US Department of Health and Human Services. You are studying the
prevalence of a certain disease in two states (MA and CA). In MA, 74 of 1500 people surveyed were
diseased and in CA, 129 of 1500 were diseased.
(a) At the significance level of .05, can you conclude that the prevalence rates are different?
(b) Can you test the hypotheses mentioned in (a) using another test?
(c) Are the two tests in parts (a) and (b) equivalent or not? Please justify your claim in a general setting –
that is, suppose we have X1 diseased subjects among a total of n1 people surveyed in MA, and X2
diseased subjects among a total of n2 people surveyed in CA. Furthermore, the significance level is α.
Solution:
(a)
Diseased Not-Diseased Total
MA a (74) (𝑃1 ) b (1500-74) 1500
CA c (129) (𝑃2 ) d (1500-129) 1500

𝐻0 : 𝑃1 = 𝑃2
𝐻𝛼 : 𝑃1 ≠ 𝑃2

74 + 129
𝑃̂ = ≈ 0.0677
1500 + 1500

p̂1 − p̂2 − 0 0.0493 − 0.086


Z0 = = ≈ −3.998
1 1
√p̂(1 − p̂) ( + ) √0.0677(1 − 0.0677) ( 1 + 1 )
n1 n2 1500 1500

|Z0 | ≈ 3.998 > Z0.025 = 1.96


We not reject H 0 at α = 0.05 and conclude that based on the given data, the prevalence of this disease is
different between CA and MA.

(b) & (c) Now we denote the probabilities of the four table cells as follows:
Diseased Not-Diseased
MA 𝑝11 (= 𝑝1 ) 𝑝12 (= 1 − 𝑝1 )
CA 𝑝21 (= 𝑝2 ) 𝑝22 (= 1 − 𝑝2 )
The original hypotheses of equal population proportions (versus not equal):
𝐻0 : 𝑝1 = 𝑝2
𝐻𝑎 : 𝑝1 ≠ 𝑝2
7
are equivalent to the hypotheses for the homogeneity test for a two-way contingency table:
𝐻0 : 𝑝11 = 𝑝21 , 𝑝12 = 𝑝22
𝐻𝑎 : the above is not true

The (large sample) test statistic is:


2 2 2 2
2
[𝑎 − 𝑃̂(𝑎 + 𝑏)] [𝑏 − (1 − 𝑃̂)(𝑎 + 𝑏)] [𝑐 − 𝑃̂(𝑐 + 𝑑)] [𝑑 − (1 − 𝑃̂)(𝑐 + 𝑑)]
𝜒0 = + + +
𝑃̂(𝑎 + 𝑏) (1 − 𝑃̂)(𝑎 + 𝑏) 𝑃̂(𝑐 + 𝑑) (1 − 𝑃̂)(𝑐 + 𝑑)
(𝑎 + 𝑐)(𝑎 + 𝑏) 2 (𝑏 + 𝑑)(𝑎 + 𝑏) 2
[𝑎 − ] [𝑏 − ]
𝑎+𝑏+𝑐+𝑑 𝑎+𝑏+𝑐+𝑑
= +
(𝑎 + 𝑐)(𝑎 + 𝑏) (𝑏 + 𝑑)(𝑎 + 𝑏)
𝑎+𝑏+𝑐+𝑑 𝑎+𝑏+𝑐+𝑑
2 2
(𝑎 + 𝑐)(𝑐 + 𝑑) (𝑏 + 𝑑)(𝑐 + 𝑑)
[𝑐 − ] [𝑑 − ]
𝑎+𝑏+𝑐+𝑑 𝑎+𝑏+𝑐+𝑑
+ + = 𝑍02
(𝑎 + 𝑐)(𝑐 + 𝑑) (𝑏 + 𝑑)(𝑐 + 𝑑)
𝑎+𝑏+𝑐+𝑑 𝑎+𝑏+𝑐+𝑑
𝑎+𝑐
Where 𝑃̂ = 𝑎+𝑏+𝑐+𝑑 and 𝜒02 ~𝜒(2−1)(2−1)
2

At the significance level α, we reject the null hypothesis iff 𝜒02 > 𝜒1,𝛼,𝑢𝑝𝑝𝑒𝑟
2
which is equivalent to rejecting the
2 2
null hypothesis iff |Z0 | > Zα/2 . This is because: α = P(|Z0 | > Zα/2 ) = P(𝑍02 > 𝑍𝛼/2 ) = P(𝜒02 > 𝑍𝛼/2 ), and
2 2
therefore: 𝜒1,𝛼,𝑢𝑝𝑝𝑒𝑟 = 𝑍𝛼/2 . Thus we claim that these two tests are entirely equivalent. So we have done part
(c).
Now back to part (b), we can simply plug in the values (or you can do part (c) first and then use its general
result to perform part (b), either way is Ok with me) and obtain the chi-square statistic as follows:
𝜒02 = 𝑍02 = (−3.998)2 ≈ 16
2
Since 𝜒02 = 16 > 𝜒1,0.05,𝑢𝑝𝑝𝑒𝑟 = 3.84 (𝑖𝑡 𝑖𝑠 𝑖𝑛𝑑𝑒𝑒𝑑 1.96 𝑠𝑞𝑢𝑎𝑟𝑒𝑑), we reject the null hypothesis and claim
that the prevalence rates are different between MA and CA.

iid iid
5. We have two independent samples X1 , , X n1 ~ N ( 1 , 12 ) and Y1 , , Yn2 ~ N (2 ,  22 ) , where
 H :   2  0
 12   2 2   2 (the variance is unknown), and n1 = n2 = n. For the hypothesis of  0 1
 H a : 1  2    0
(a) Please derive the general formula for power calculation for the pooled variance t-test based on an effect
size of EFF at the significance level of α.

Recall - Definition: Effect size = EFF = (e.g. Eff=1)

(b) With a sample size of 26 per group, α = 0.05, and an estimated effect size ranging from 1 to 1.5, please
calculate the power of your pooled variance t-test.
Solution:

(X Y)  0 ( X  Y ) H0
(a) T.S : T0 =  ~ t2 n  2
1 1 2
Sp  Sp
n1 n2 n

8
At α=0.05, reject H 0 in favor of H a iff T0  t2 n  2,

Power = 1-β = P(reject H 0 | H a ) = P(T0  t2 n 2, | H a : 1  2    0)

(X Y)
= P(  t2 n 2, | H a : 1  2  )
2
Sp
n
(X Y)   
= P(  t2 n 2,  | H a : 1  2  )
2 2
Sp Sp
n n
n  
≈ P(T  t2 n2,  Eff * | H a : 1  2  ) (Effect size =  )
2  Sp
(b) With n = 26, α = 0.05, Eff = 1 to 1.5, the power is calculated as follows:
26
Power (Eff = 1) = 𝑃 (𝑇 ≥ 𝑡50,0.05 − 1 ∗ √ 2 |𝐻𝑎 : 𝜇1 − 𝜇2 = ∆)

= 𝑃(𝑇 ≥ 1.676 − 3.606|𝐻𝑎 : 𝜇1 − 𝜇2 = ∆) = 𝑃(𝑇 ≥ −1.93|𝐻𝑎 : 𝜇1 − 𝜇2 = ∆)


By our t-table, we estimate that the above power is between 95% and 97.5%.
(In fact if you check with R, the above power is about 97%)
26
Power (Eff = 1.5) = 𝑃 (𝑇 ≥ 𝑡50,0.05 − 1.5 ∗ √ 2 |𝐻𝑎 : 𝜇1 − 𝜇2 = ∆)

= 𝑃(𝑇 ≥ 1.676 − 5.409|𝐻𝑎 : 𝜇1 − 𝜇2 = ∆) = 𝑃(𝑇 ≥ −3.733|𝐻𝑎 : 𝜇1 − 𝜇2 = ∆)


By our t-table, we estimate that the above power is greater than 99.95%.
(In fact if you check with R, the above power is about 99.98%)

Note: the T statistic above follows a t-distribution with 50 (=26+26-2) degrees of freedom.
Therefore we conclude that the power will range from 95% to 99.95% for a given effect size ranging from 1 to 1.5.

Вам также может понравиться