Академический Документы
Профессиональный Документы
Культура Документы
MODULE NAME
Statistics with Excel®
MODULE OBJECTIVE
To introduce the readers to the use of
Microsoft® Office Excel® 2007 to analyze
statistical data
Topics:
Introduction, Tests of Hypotheses for two
means, paired observations and two
variances. One-way and two-way anova, and
simple and multiple regression analysis
Evaluations:
Examples and exercises. Solution to the
exercises.
Hypotheses
α = p(Type I error)
= p(Reject Ho when Ho is true)
β = p(Type II error)
= p(Accept Ho when Ho is false)
X1 −X2
a ) ( n1 , n2 ) ≥ 30 TS : Z = 2 2
S S
Independent random samples + 1 2
n1 n2
S12 S22
CI = X1 −X2 ± Zα / 2 +
n1 n2
The Academy of Aerospace
Quality
b ) ( n 1 , n 2 ) < 30, Sampling from independent normal populations with
unknown variances
X1 − X 2
b1) σ 1 = σ 2 TS: t =
1 1
(df=n1+n2-2)
Sp +
n1 n 2
− 2
+ − 2
( n1 1)S1 ( n 2 1)S 2
Sp =
n1 + n 2 − 2
1 1
CI = X 1 − X 2 ± tα / 2 , n1 + n 2 −2Sp +
n1 n 2
The Academy of Aerospace
Quality
X1 − X 2
b 2) σ1 ≠ σ2 EP : t =
S12 S22
+
n1 n 2
S12 S22
CI = X1 −X 2 ± t α/ 2 ,df +
n1 n 2
2
S S
2 2
n +n
1
2
df = 12 2
2
S1
2
S2
2
n
n
1 + 2
n1 −1 n 2 −1
The Academy of Aerospace
Quality
Example
Certain engine head dimension is going to be compared
from two different production lines. A sample of 10 items
was taken from from line 1 and a sample of 12 items was
taken from line 2:
First one needs to check if the DATA ANALYSIS menu is activated: Under
DATA select DATA ANALYSIS. If it’s empty, then select EXCEL OPTIONS,
ADD-INS, GO and select ANALYSIS TOOLPAK and ANALYSIS TOOLPAK-
VBA and hit ok. Go again to the DATA ANALYSIS menu and you should see
Variable 1 Variable 2
Mean 415.196172 415.2041
Variance 0.00076576 0.001233
Observations 10 12
Pooled Variance 0.00102293
Hypothesized Mean Difference 0
df 20
t Stat -0.5804293
P(T<=t) one-tail 0.28405404
t Critical one-tail 1.72471822
For a two-sided test the null
P(T<=t) two-tail 0.56810808
hypothesis is not rejected
t Critical two-tail 2.08596344
(p-value=0.568>0.05 (alpha))
Ho : µ d = 0 Ha : µ d ≠ 0
d−δ
TS : t = reject Ho if t > t α / 2,n −1
Sd
n
Variable 1 Variable 2
Mean 47.544 47.428
Variance 0.014027 0.014773
Observations 10 10
Pearson Correlation 0.994779
Hypothesized Mean Difference 0
df 9
t Stat 29
P(T<=t) one-tail 1.68E-10
t Critical one-tail 1.833113
P(T<=t) two-tail 3.36E-10
t Critical two-tail 2.262157
σ
Ho: 12
= σ 2
2 Rejection Region of Ho (RRHo)
F transformation
1
Fprob, n -1, n -1 =
1 2
F1-prob, n -1, n -1
2 1
Line 1 Line 2
415.20 415.23
415.20 415.24
415.23 415.18
415.16 415.24
415.22 415.24
415.15 415.16
415.23 415.15
415.18 415.22
415.19 415.22
415.20 415.16
415.19
415.23
The Academy of Aerospace
Quality
Using Excel: Data, Data Analysis, F-test Two-sample for Variances
Variable 1 Variable 2
Mean 415.196 415.205
Variance 0.00073778 0.0012091
Observations 10 12
df 9 11
F 0.61019215
P(F<=f) one-tail 0.23368318
F Critical one-tail 0.32232223
Since p-value=0.233 is greater than alpha (5%), the equallity of the two
variances is not rejected
The Academy of Aerospace
Quality
The Analysis of Variance
Replicates
Material 1 2 3 yi s2i
A 2.05 2.03 2.02 2.033333 0.0002333
B 1.98 1.99 2.00 1.990000 0.0001000
C 2.07 2.05 2.05 2.056667 0.0001333
SSt =
∑ yi 2
y
− =
2
∑ ( Each group sum) 2
−
(Total sum ) 2
n N n N
SSt =
∑ ( Each group sum ) 2
−
( Total sum ) 2
n N
(6.1) + (5.97) + (6.17) (18.24)
2 2 2 2
= − = 0.006867
3 9
Sources of Variation SS df MS F
SSt MSt
Treatments (t) SSt a −1 MSt =
a −1 MSE
Error (E) SSE
SSE N−a MSE =
(by difference) N−a
Total SST N −1
MS=SS/df
F=Comparison between the within and between sources
of variation
The Academy of Aerospace
Quality
For this example:
Sources of Variation SS df MS F
Treatments (t) 0.006867 2 0.003434 22.08
Total 0.00780 8
0.3 Región
Hode
rejection αα ((
rechazoregion
de=5%=0.05)
Ho
=5%=0.05)
0.2
p-value=0.000...
p-
value=0.002
0.1
0.0
0 5 10 15 20 25
F(0.05,2,6)=5.14 F(anova)=22.08
Since the p-value < alpha, the equality of processes means
is rejected The Academy of Aerospace
Quality
Using Excel: Data, Data Analysis
Select ANOVA-SINGLE FACTOR and enter the input range (select all the table)
and hit OK
SUMMARY
Groups Count Sum Average Variance
A 3 6.1 2.033333 0.000233
B 3 5.97 1.99 0.0001
C 3 6.17 2.056667 0.000133
F(anova)=22.08 F(0.05,2,6)=5.14
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 0.006867 2 0.003433 22.07143 0.001713 5.143253
Within Groups 0.000933 6 0.000156
Total 0.0078 8
Uses of regression
y = β 0 + β1x + ε
y=dependent variable (response)
x=independent variable (predictor of y)
ε =error component, RV
β 0=intersection. If data include zero, it represents the
mean of the distribution of y when x=0. It doesn’t
have a particular meaning if data don’t include zero
β 1=slope. It’s the change in the mean of y for every
unit change of x
ˆβ = Sxy βˆ 0 = y − βˆ 1 x
1
Sxx
(Σx )(Σy) ( Σ x ) 2
Sxy = Σxy − Sxx = Σx 2 −
n n
ŷ = βˆ 0 + βˆ 1x
Coefficients
β0 Intercept 75.269132
β1 X Variable 1 -0.268447184
Although it’s vital to check model assumptions, these are not going to be
covered here
.. . .. . . . . ..
. . ...
.. . . . . .
.. . .. .
β1=0 . β1≠0
Sources of Variation SS df MS F
Ho :β1 = 0 Ha :β1 ≠ 0
T.S. F=MSR/MSE
Reject Ho if F > F(tables)= Fα ,1, n − 2
ANOVA
df SS MS F Significance F
Regression 1 168.3701 168.3701 76.63029 8.23247E-07
Residual 13 28.56326 2.197174
Total 14 196.9333
Since the SIGNIFICANCE F (or p-value) is less than an alpha value of 0.05
then the regression model is statistically significant
For β1:
Ho : β1 = 0 Ha : β1 ≠ 0
βˆ 1 βˆ 1 − 0.2684
t= = = = −8.75
se(β )ˆ MSE 2.1995
1
Sxx 2336.4
t α / 2,n − 2 = t 0.025,13 = 2.16
β0
Coefficients Std. Error t Stat P-value
Intercept 75.269132 3.736385138 20.14491 3.47E-11
X Variable 1 -0.2684472 0.030666104 -8.75387 8.23E-07
β1 P-value for
t-tests
the t-tests
Exercise 5
Perform t tests for the study of Voltage vs.
Current (see Exercise 3) The Academy of Aerospace
Quality
Coefficient of determination
SSR
R =r =
2 2
0 ≤ R2 ≤1
Syy
It’s the proportion of variation explained by the
regression model, or the % of variation in y
explained by x. You can get it as part of the
standard regression printout
Regression Statistics
Multiple R 0.92464
R Square 0.85496
Adjusted R Square 0.843803
Standard Error 5.105593
Observations 15
y = β0 + β1x1 + β 2 x 2 + ... + β k x k + ε
(hyperplane in K dimensions)
n=number of data
p=number of parameters (β s)
k=number of variables (Xs)
p=k+1
to have the model ŷ = Xβˆ
The Academy of Aerospace
Quality
Example
Assume that in the thermal treatment example one additional
variable is introduced: Unit Temperature
Hardness(y)
TEMP(x1)
U. temp(x2)
49 101 848
44 115 845
46 115 847
38 140 837
43 123 844
47 107 847
41 135 840
38 135 838
47 105 846
45 110 845
43 110 844
37 135 836
44 125 845
40 132 840
39 130 839
The Academy of Aerospace
Quality
Using Excel (Regression)
Input
Input X Range
Y Range
The Academy of Aerospace
Quality
β0
Coefficients Standard Error t Stat P-value
Intercept -574.4598054 107.2563109 -5.35595 0.000172
X Variable 1-0.060622474 0.037783719 -1.60446 0.134592
X Variable 2 0.741089213 0.122318119 6.058703 5.68E-05
β1 β2 t-tests P-values
PROTOTYPE DENSITY
1 4.5 7.8 6.7
2 3.8 5.6 9.1
3 7.6 4.6 7.6
4 3.5 3.5 4.8
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 13.60917 3 4.536389 1.329345 0.33111 4.066181
Within Groups 27.3 8 3.4125
Total 40.90917 11
Total 380.875 7
ANOVA
df SS MS F Significance F
Regression 1 48173.33 48173.33 2136.125 1.8389E-18
Residual 16 360.827759 22.55173
Total 17 48534.1578
Exercise 5
Perform t tests for the study of Voltage vs.
Current (see Exercise 3)
ANOVA
df SS MS F Significance F
Regression 1 48173.33 48173.33 2136.125 1.8389E-18
Residual 16 360.827759 22.55173
Total 17 48534.1578
cept is not needed in the model. It may be deleted provided certain conditions a