Industrial Statistics
1. Introduction to Statistical Inference
1. Introduction to Statistical Inference
1 / 53
Industrial Statistics
1. Introduction to Statistical Inference
1.1 Overview
The aim of statistical inference is to make decisions and draw conclusions about populations.
step 1: selection of a suitable distribution family problem: characteristic X ⇠ F but F is unknown. Frequently the user has a prior information about F . It holds that F 2 F = { F _{#} : # 2 ⇥ } . ⇥ is called parameter set. Example:
X = quality of produced bulbs (1 intact, else 0). Here X ⇠ B (1 , p) with p = P ( X = 1) ,
thus # = p, ⇥ = (0 , 1) , and F = { B (1 , p) : p 2 (0 , 1) } .
X = b ody height. Various studies have shown that the body height is roughly normally
distributed. Thus X ⇠ N ( µ, ^{2} ) , # = ( µ, ) and ⇥ = IR ⇥ (0 , 1) .
Note that if X is a continuous variable then the true distribution is usually no member of the selected distribution family. These distribution only provide an approximation.
step 2: drawing of a sample
In order to draw conclusions on # , data x _{1} , is called sample space .
basic idea: x _{1} ,
X _{1} ,
In many cases it is assumed that the variables X _{1} , distributed (brieﬂy: i.i.d.).
, x _{n} are collected. The set of all possible samples
, X _{n} from X .
,
x _{n} are considered to be realizations of the random sample X _{1} ,
, X _{n} have the same distribution as X ( identically distributed ).
, X _{n} are independent and identically
major areas of statistical inference: parameter estimation, conﬁdence intervals , and hypothesis testing
2 / 53
Industrial Statistics
1.2 Conﬁdence Intervals
1. Introduction to Statistical Inference
Assume that a random sample X _{1} ,
,X
_{n} is given with X _{i} ⇠ F _{#} , # 2 ⇥ .
aim: derivation of an area (interval) which contains the parameter # with a given probability
Let ↵ 2 (0 , 1) . Suppose that L and U only depend on the sample variables X _{1} , 
, 
X _{n} . If 

P _{#} ( L # U ) 

1 ↵ 
8# 2 ⇥ 
( ⇤ ) 

then the interval [ L, U ] is called a twosided 100(1 ↵ )% conﬁdence interval for # . L is called lower control limit , U upper control limit, and 1 ↵ is the conﬁdence coe cient . If both sides in (*) are equal then the conﬁdence interval is called exact . 

In practice ↵ is usually chosen equal to 0 . 1 , 0 . 05 or 0 . 01 . 

interpretation: Because U L should be small it is desirable to have exact conﬁdence intervals. 

^{ˆ} L, 1 ^{´} is called a onesided lower 100(1 ↵ )% conﬁdence interval for # if P _{#} ^{`} L # ) 1 ↵ 8# and ^{`} 1, U ^{˜} is called a onesided upper 100(1 ↵ )% conﬁdence interval for # if P _{#} ^{`} # U ) 1 ↵ 8# . 
Example:
risk behavior of a ﬁnancial investment ; upper c. i. tear strength of a rope ; lower c. i.
3 / 53
Industrial Statistics
1. Introduction to Statistical Inference
1.2.1 Conﬁdence Intervals for the Parameters of a Normal Distribution
,
aim: conﬁdence intervals for µ if is known
development of the conﬁdence interval: ﬁrst estimate µ by X
Since X ⇠ N ( µ, ^{2} /n) the following structure is chosen for the conﬁdence interval
Suppose that the sample variables X _{1} ,
X _{n} are i.i.d. with X _{i} ⇠ N ( µ, ^{2} ) for i = 1 ,
¯
, n.
¯
¯
[ X c
^{p} n ^{,}
¯
X
+ c p
n ^{]}
with c > 0 . c is chosen as a function of ↵ such that (*) is valid. Note that
¯
µ 2 » X c
¯
Since ^{p} n( X µ ) / ⇠ the quantity c is determined such that
c «
=
2 ( c ) 1
= ! 1 ↵ .
Consequently c = ^{} ^{1} (1 ↵ / 2) = z _{↵}_{/}_{2} . z _{↵}_{/}_{2} is the upper 100 ↵ / 2 percentage point of the standard normal distribution.
100(1 ↵ )% conﬁdence interval for µ ( known)
Industrial Statistics
1. Introduction to Statistical Inference
now: conﬁdence interval for µ if is unknown
100(1 ↵ )% conﬁdence interval for µ ( unknown)
n 1 degrees of freedom.
Example: mean annual rainfall (in millimeters) in Australia from 1983 to 2002:
1983 
1984 
1985 
1986 
1987 
1988 
1989 
1990 
1991 
1992 
499 . 2 
555 . 2 
398 . 8 
391 . 9 
453 . 4 
459 . 8 
483 . 7 
417 . 6 
469 . 2 
452 . 4 
1993 
1994 
1995 
1996 
1997 
1998 
1999 
2000 
2001 
2002 
499 . 3 
340 . 6 
522 . 8 
469 . 9 
527 . 2 
565 . 5 
584 . 1 
727 . 3 
558 . 6 
338 . 6 
It is n = 20 , x¯ = 485 . 755 , s = 90 . 33872 , and t _{1}_{9}_{;}_{0}_{.}_{0}_{2}_{5} = 2 . 093 . Thus the conﬁdence interval is given by
[485 . 755 42 . 27934 , 485 . 755 + 42 . 27934] = [443 . 4757 , 528 . 0343] .
5 / 53
program:
Industrial Statistics
1. Introduction to Statistical Inference
year < c(1983:2002); rain < c(499.2, 555.2, 398.8, 391.9, 453.4, 459.8, 483.7, 417.6, 469.2, 452.4, 499.3, 340.6, 522.8, 469.9, 527.2, 565.5, 584.1, 727.3, 558.6, 338.6);
# (or rain < read.table("rain.txt");
# use setwd() (set working directory) and getwd() (get working directory))
# histogram #
hist(Rain, breaks = 8, freq = FALSE, main = "histogram", xlab = "Mean Annual Rainfall", ylab = "");
# box plot #
boxplot(rain, range = 0, ylab = "Mean Annual Rainfall");
# normal qq plot #
qqnorm(rain, datax = TRUE, main = "Normal QQ Plot"); qqline(rain, datax = TRUE);
6 / 53
Industrial Statistics
1. Introduction to Statistical Inference
Sample Quantiles
7 / 53
program:
Industrial Statistics
# confidence intervall #
1. Introduction to Statistical Inference
alpha < 0.05; n < length(rain); lcl < mean(rain)  qt(1  alpha / 2, n  1) * sd(rain) / sqrt(n); ucl < mean(rain) + qt(1  alpha / 2, n  1) * sd(rain) / sqrt(n); ci < c(lcl, ucl); print(ci);
output:
result: [443 . 4752 , 528 . 0348]
8 / 53
Industrial Statistics
1. Introduction to Statistical Inference
now: conﬁdence interval for if µ is unknown
9 / 53
Industrial Statistics
1. Introduction to Statistical Inference
1.2.2 LargeSample Conﬁdence Intervals
Using the central limit theorem largesample conﬁdence intervals for arbitrary distributions (discrete or continuous) can be derived. This means that
_{n}_{!}_{1} P _{#} ( L( X _{1} , ,
lim
Example: Suppose that X _{1} , X _{2} ,
X _{n} ) # U ( X _{1} ,
, X _{n} )) 1 ↵
8 # 2 ⇥ .
, are i.i.d. with E ( X _{i} ) = µ for all i 1 .
i) conﬁdence interval for µ if ^{2} = V ar ( X _{i} ) is known
rule of thumb: n 30
ii) conﬁdence interval for µ if ^{2} = V ar ( X _{i} ) is unknown
rule of thumb: n 40
10 / 53
Industrial Statistics
1. Introduction to Statistical Inference
example: mercury contamination in largemouth bass (in ppm)  a sample of ﬁsh was selected from 53 Florida lakes
1 . 230 1 . 330 0 . 040 0 . 044 1 . 200 0 . 270
0 . 490 0 . 190 0 . 830 0 . 810 0 . 710 0 . 500
0 . 490 1 . 160 0 . 050 0 . 150 0 . 190 0 . 770
1 . 080 0 . 980 0 . 630 0 . 560 0 . 410 0 . 730
0 . 590 0 . 340 0 . 340 0 . 840 0 . 500 0 . 340
0 . 280 0 . 340 0 . 750 0 . 870 0 . 560 0 . 170
0 . 180 0 . 190 0 . 040 0 . 490 1 . 100 0 . 160
0 . 100 0 . 210 0 . 860 0 . 520 0 . 650 0 . 270
0 . 940 0 . 400 0 . 430 0 . 250 0 . 270
It holds that n = 53 , x¯ = 0 . 5319583 , s = 0 . 3567051 , and z _{0} _{.}_{0}_{2}_{5} = 1 . 96 . Thus the asymptotic conﬁdence interval is equal to [0 . 4311 , 0 . 6188] .
program:
# histogram #
hist(MerCon, breaks = 14, freq = FALSE, main = "histogram", xlab = "concentration", ylab = "");
# normal qq plot #
qqnorm(MerCon, datax = TRUE, main = "normal qq plot"); qqline(MerCon, datax = TRUE);
11 / 53
output:
Industrial Statistics
1. Introduction to Statistical Inference
12 / 53
program:
Industrial Statistics
# confidence interval #
1. Introduction to Statistical Inference
alpha < 0.05; n < length(MerCon); lcl < mean(MerCon)  qnorm(1  alpha / 2) * sd(MerCon) / sqrt(n); ucl < mean(MerCon) + qnorm(1  alpha / 2) * sd(MerCon) / sqrt(n); ci < c(lcl, ucl); print(ci);
output:
result: ci = [0 . 4311 , 0 . 6188]
13 / 53
Industrial Statistics
1. Introduction to Statistical Inference
Overview: Conﬁdence Intervals
14 / 53
Industrial Statistics
1. Introduction to Statistical Inference
1.3 Hypothesis Testing
1.3.1 Introduction
Let X be the characteristic of interest with X ⇠ F _{#} , # 2 ⇥ . The test problem is given by H _{0} : # 2 ⇥ _{0} against H _{1} : # 2 ⇥ _{1} . H _{0} is called null hypothesis and H _{1} is called alternative hypothesis . ⇥ _{0} and ⇥ _{1} are disjoint and ⇥ _{0} [ ⇥ _{1} = ⇥ . A decision problem between two hypotheses is present, a socalled (test problem)
Example: burning rate of solid propellant  H _{0} : µ = 50 (centimeters per second ) against H _{1} : µ 6= 50
procedure: Based on the sample x _{1} ,
Such a decision rule is called a statistical test .
,x
_{n} a decision about a particular hypothesis is made.
15 / 53
Industrial Statistics
1. Introduction to Statistical Inference
procedure:
An upper bound ↵ for the type I error is ﬁxed, e.g.
for the test (reject H _{0} ) is determined such that the type I error fulﬁlls this condition.
↵ 2 ^{˘} 0 . 01 , 0 . 05 , 0 . 1 ^{¯} . The critical region C
Such a test is called test of signiﬁcance at level ↵ for H _{0} if the probability of a type I error is smaller or equal to ↵ , i.e.
P _{#} (( x _{1} ,
↵ is called signiﬁcance level .
, x _{n} ) 2 C ) ↵
for all
# 2 ⇥ _{0} .
Because only the type I error and not the type II error is controlled by a test of signiﬁcance, the size of the type II error may be large. For that reason it is only possible to accept H _{1} , i.e. to reject H _{0} . It is incorrect to accept the null hypothesis H _{0} .
16 / 53
Industrial Statistics
1. Introduction to Statistical Inference
1.3.2 Tests for Univariate Samples
Suppose that the random sample is i.i.d. with X _{i} ⇠ N ( µ, ^{2} )
test problem:
H _{0} : µ = µ _{0} against H _{1} : µ 6= µ _{0}
Gauss test ( known)
17 / 53
Industrial Statistics
¯
1. Introduction to Statistical Inference
Example: Power function G( µ ) = P _{µ} ( ^{p} n X µ _{0}  / > z _{↵}_{/}_{2} ) of the twosided Gauss test (i.e. probability to accept H _{1} as a function of µ ) for ↵ = 0 . 05 , n = 5 , = 1 and µ _{0} = 0 )
H 0
µ
18 / 53
Industrial Statistics
1. Introduction to Statistical Inference
LargeSample Tests
Suppose that the variables X _{1} , X _{2} ,
i = 1 , , n.
are i.i.d. with E ( X _{i} ) = µ and V ar ( X _{i} ) = ^{2} for
Because in most cases the distribution of the underlying characteristic X is unknown approximations to the critical values are determined using the asymptotic distribution of the test statistic
test problem: H _{0} : µ = µ _{0} against H _{1} : µ 6= µ _{0}
19 / 53
Industrial Statistics
1. Introduction to Statistical Inference
Tests for the Mean and the Variance of a Single Sample
The random sample X _{1} , X _{n} is assumed to be i.i.d
,
distribution H _{0}
normal distr. 
µ 
= µ _{0} 

known 
µ 
µ _{0} 

µ 
µ _{0} 

unknown 
µ 
= µ _{0} 

µ 
µ _{0} 

µ 
µ _{0} 

arbitrary distr. 
µ 
= µ _{0} 

with EW µ 
µ 
µ _{0} 

µ 
µ _{0} 

binomial distr. 
p 
= p _{0} 

n 
small 
p 
p _{0} 

p 
p _{0} 

n 
100 or 
p 
= p _{0} 

npˆ(1 pˆ) > 5 
p 
p _{0} 

p 
p _{0} 

normal distr. 
^{2} =

2 

0 

^{2} ^{2}

2 

0 

2 

0 

^{⇤} – for known expectation µ , 
S ^{2} =
H _{1}
µ 6= µ _{0}
µ>µ
µ<µ
µ 6= µ _{0}
µ>µ
µ<µ
µ 6= µ _{0}
µ>µ
µ<µ
p 6= p _{0}
p>p
_{0}
_{0}
_{0}
_{0}
_{0}
_{0}
_{0}
_{0}
p<p
p 6= p _{0}
p>p
p<p
^{2} 6=
^{2} >
^{2} <
1
_{0}
_{0}
n 1
2
0
2
0
2
0
test statistic T
¯
X µ _{0}
/ ^{p} n
¯
X µ _{0}
S/ ^{p} n
¯
X µ _{0}
S/ ^{p} n
number
of successes
(ˆp p _{0} )
p p _{0} (1 p _{0} )/n
( n 1) S ^{2} /
( n S ^{2} /
˜
2
0
) ^{⇤}
2
0
n
X ( X _{i} X ) ^{2} ,
¯
i=1
S ˜ ^{2} =
T under H _{0}
^{t}
( for
n > 100 )
approx.
( t _{n} _{} _{1} for
30 n 100 )
B ( n, p _{0} )
n 1
approx.
^{2}
n
1
( ^{2}
n ^{)} ^{⇤}
1
n
n
X
i=1
( X _{i} µ ) ^{2}
reject H _{0} if
T  > z _{↵} _{/} _{2}
>z _{↵}
< z _{↵}
T
T
^{}^{T} ^{} ^{>} ^{t} n 1;↵ / 2 ^{(} ^{z} ↵ / 2 ^{)}
>t _{n} _{} _{1}_{;}_{↵} ( z _{↵} )
< t _{n} _{} _{1}_{;}_{↵} ( z _{↵} )
T
T
^{}^{T} ^{} ^{>} ^{z} ↵ / 2 ^{(} ^{t} n 1;↵ / 2 ^{)}
T
T
^{T}
T
T
of
T  > z _{↵} _{/} _{2}
>z _{↵} ( t _{n} _{} _{1}_{;}_{↵} )
< z _{↵} ( t _{n} _{} _{1}_{;}_{↵} )
^{6}^{2} ˆ ^{c} 1 ↵ / 2 ^{,} ^{c} ↵ / 2 ˜
>c _{↵}
<c _{1} _{} _{↵} (percentage points
B ( n,
p _{0} ) )
T
T
>z _{↵}
< z _{↵}
n 1;1 ↵ / 2 ^{,} ^{} n ^{2} 1;↵ / 2
T 62 [ ^{2}
T
T
> ^{2}
n
< ^{2}
n
1;↵
1;1 ↵
20 / 53
Industrial Statistics
1. Introduction to Statistical Inference
Example: performance of new golf clubs  ratio of outgoing velocity of a golf ball to the incoming velocity (coe cient of restitution)
0 . 8411 0 . 8580 0 . 8042
0 . 8191 0 . 8532 0 . 8730
0 . 8182 0 . 8483 0 . 8282
0 . 8125 0 . 8276 0 . 8359
0 . 8750 0 . 7983 0 . 8660
test problem: H _{0} : µ 0 . 82 against H _{1} : µ > 0 . 82
Since n = 15 , x¯ = 0 . 83724 , s = 0 . 0245571 the test statistic is equal to
_{T} _{=} x¯ 0 .82 s/ ^{p} 15
= 2 . 718979 . Assuming normality the percentile is t _{1}_{4}_{;}_{0}_{.}_{0}_{5} = 1 . 761 . Because it is
smaller than T the null hypothesis is rejected.
program:
CoR < c(0.8411, 0.8191, 0.8182, 0.8125, 0.8750, 0.8580, 0.8532, 0.8483, 0.8276, 0.7983, 0.8042, 0.8730, 0.8282, 0.8359, 0.8660);
# box plot #
boxplot(CoR, range = 0, ylab = "Coefficient of Restitution");
# histogram #
hist(CoR, breaks = 8, freq = FALSE, main = "Histogram", xlab = "Coefficient of Restitution", ylab = "");
21 / 53
program:
Industrial Statistics
# normal qq plot #
1. Introduction to Statistical Inference
qqnorm(CoR, datax = TRUE, main = "Normal QQ Plot"); qqline(CoR, datax = TRUE);
#unpaired ttest#
t.test(CoR, alternative = "two.sided", mu = 0.82, conf.level = 0.95); #> ttest for mu.0 = 0.82 including confidence level
#alternative procedure: direct calculation of the quantities#
mu.0 < 0.82;
n < length(CoR);
T < (mean(CoR)  mu.0) / sd(CoR) * sqrt(n); #> test statistics p.value < dt(T, n  1); #> directly to pvalue
22 / 53
Coefficient of Restitution
Industrial Statistics
1. Introduction to Statistical Inference
Sample Quantiles
23 / 53
Industrial Statistics
1. Introduction to Statistical Inference
The P –Value Approach
The pvalue ↵ ^{⇤} is equal to the probability to reject the null hypothesis for the given data set. Thus it is the smallest level of signiﬁcance that would lead to rejection of the null hypothesis H _{0} for the given data. It can be cosnidered as the observed signiﬁcance level. For the Gauss test it holds that ↵ ^{⇤} = P _{µ} _{0} ( T >t T = t)=1 ( t) .
↵ = 0.05
1sided Gauss test
1 ↵
_{/}
z
r
1 ↵
H 0
H 1
r
t
↵ ^{⇤} = 0 .0179
↵
⇤
The smaller the pvalue ↵ ^{⇤} the more unlikely isH _{0} .
If an upper bound ↵ for the type I error is given then H _{0} is rejected if ↵ ^{⇤} < ↵ . Else, H _{0} is not rejected.
24 / 53
Industrial Statistics
1. Introduction to Statistical Inference
Relationship between tests of signiﬁcance and conﬁdence intervals
To illustrate the relationship suppose that X ⇠ N ( µ, ^{2} ) . The sample variables are assumed to be independent and identically distributed. We are interested in statements about the expectation.
Supp ose that a conﬁdence interval for µ with level 1 ↵ is given. Then
ci = »
¯
^{X} ^{} ^{t} n 1;1 ↵/2
S
^{p} n ^{,}
¯
^{X} ^{+} ^{t} n 1;1 ↵/2
n – ^{.}
S
p
In order to deal with the test problem H _{0} : µ = µ _{0} against H _{1} : µ 6= µ _{0} , it is su cient to check whether µ _{0} 2 ci. If this is the case then H _{1} is accepted. This procedure is equivalent to the ttest.
Supp ose that the ttest for the test problem H _{0} : µ = µ _{0} against H _{1} : µ 6= µ _{0} is given. Then the test statistic is equal to
Consequently
P (  T  t _{n} _{} _{1}_{;}_{1} _{} _{↵}_{/}_{2} )=1 ↵ = P _{µ} _{0} ( µ _{0} 2 KI )
with ci as above. ci is a conﬁdence interval for µ with conﬁdence level 1 ↵ . Thus the
test directly provides a conﬁdence interval.
_{2}_{5} _{/} _{5}_{3}
Overview: Statistical Inference for a Single Sample
Industrial Statistics
1. Introduction to Statistical Inference
1.4 Statistical Inference for Two Samples
Suppose that X _{1} and X _{2} are independent characteristics. Let X _{1}_{1} , X _{1} _{n} _{1} be a random sample
of X _{1} and let X _{2}_{1} , X _{2} _{n} _{2} be a random sample of X _{2} .
,
,
Two Independent Samples
Assumption: Suppose that X _{1}_{1} , V ar ( X _{i} ) = for i = 1 , 2 .
2
i
,
X _{1} _{n} _{1} , X _{2}_{1} ,
,
X _{2} _{n} _{2} are independent. Let E ( X _{i} ) = µ _{i} and
Test on Di ↵ erence in Means
test problem: H _{0} : µ _{1} = µ _{2} against H _{1} : µ _{1} 6= µ _{2}
Test of Equality of the Variances
test problem: H _{0} : ^{2} = against H _{1} : ^{2} 6=
1
2
2
1
2
2
27 / 53
Industrial Statistics
1. Introduction to Statistical Inference
Tests for the Means and the Variances of a Bivariate Sample I
distribution 
H 0 
H 1 
test statistic T 
T under H _{0} 
reject H _{0} if 

¯ 
¯ 

normal distr. 
µ _{1} 6= µ _{2} 
X _{1} 
^{X} 2 

T  > z _{↵} _{/} _{2} 

µ _{1} = µ _{2} 
r 
^{2} 1 2 2 ^{+} ^{n} 2


n 
1 

_{1} and _{2} 
µ _{1} µ _{2} 
µ _{1} > µ _{2} 
T 
>z _{↵} 

known 
µ _{1} µ _{2} 
µ _{1} < µ _{2} 
T 
< z _{↵} 

¯ 
¯ 

X _{1} ^{X} 2 
^{t} n _{1} +n _{2} 2 
^{}^{T} ^{} ^{>} ^{t} n _{1} +n _{2} 2;↵ / 2 

_{1} , _{2} unknown 
µ _{1} = µ _{2} 
µ _{1} 6= µ _{2} 
q ˜ ^{2} ( 
1 1 ^{+} n 
1 n 
2 ^{)} 

_{1} = _{2} , 
µ _{1} µ _{2} 
µ _{1} > µ _{2} 
^{T} 
^{>}^{t} n _{1} +n _{2} 2;↵ 

small sample size 
µ _{1} µ _{2} 
µ _{1} < µ _{2} 
^{T} 
^{<} ^{} ^{t} n _{1} +n _{2} 2;↵ 

¯ 
¯ 

_{1} , _{2} unknown 
µ _{1} 6= µ _{2} 
X _{1} 
^{X} 2 
t _{d}_{f} , df = 
^{}^{T} ^{} ^{>} ^{t} df ;↵ / 2 

µ _{1} = µ _{2} 
r 
^{2} 
2 

S n 
1 1 S 2 ^{+} ^{n} 2 

_{1} 6= _{2} , 
µ _{1} µ _{2} 
(1+R ) ^{2} 
^{T} 

µ _{1} > µ _{2} 
2
R
1
n
1 + n 2 1
1

^{>}^{t} df ;↵ 

µ _{1} µ _{2} 
= ^{n} ^{2} ^{S} 2 2 2 1 n _{1} S 
T 

µ _{1} < µ _{2} 
< t df ;↵ 
with S
2
k ^{=}
1
n 1
n
X
i=1
( X _{k}_{i} X _{k} ) ^{2} , k = 1 , 2 ,
¯
˜ ^{2} =
n _{1} 1 n _{1} +n _{2} 2 ^{S}
2
1
+
n _{2} 1 n _{1} +n _{2} 2 ^{S}
2
^{2}
28 / 53
Industrial Statistics
1. Introduction to Statistical Inference
Tests for the Means and the Variances of a Bivariate Sample II
distribution
arbitrary distr.
_{1} , _{2} unknown,
large sample size
binomial distr.
large sample size
normal distr.
H _{0}
µ _{1} = µ _{2}
µ _{1} µ _{2}
µ _{1} µ _{2}
p _{1} = p _{2}
p _{1}
p
_{2}
p _{1} p _{2}
^{2} =
1
^{2}
1
^{2}
1
2
2
2
2
2
2
H _{1}
µ _{1} 6= µ _{2}
µ _{1} > µ _{2}
µ _{1} < µ _{2}
p _{1} 6= p _{2}
p _{1} >
p
_{2}
p _{1} < p _{2}
^{2}
1
^{2}
1
^{2}
1
6=
>
<
2
2
2
2
2
2
test statistic T
¯
X _{1}
¯
^{X} 2
r
S
n
^{2}
2
1 ^{+} ^{n} 2
2
1
S
pˆ
_{1} pˆ _{2}
q
pˆ(1
pˆ)(
1
1 ^{+}
n
1
2 ^{)}
n
_{w}_{i}_{t}_{h} _{p}_{ˆ} _{=} ^{n} ^{1} n ^{p}^{ˆ} ^{1} _{1} ^{+}^{n} +n ^{2} _{2} ^{p}^{ˆ} ^{2}
S
^{2}
1
S
2
2
T under H _{0} 
reject H _{0} if 


T  > z _{↵} _{/} _{2} 

T 
>z _{↵} 

T 
< z _{↵} 


T  > z _{↵} _{/} _{2} 

T 
>z _{↵} 

T 
< z _{↵} 

^{F} n _{1} 1,n _{2} 1 
^{T} 
^{>}^{F} n _{1} 1,n _{2} 1;↵ / 2 
or 

^{T} 
^{<}^{F} n _{1} 1,n _{2} 1;1 ↵ 

^{T} 
^{>}^{F} n _{1} 1,n _{2} 1;↵ 

^{T} 
^{<}^{F} n _{1} 1,n _{2} 1;1 ↵ 
29 / 53
Industrial Statistics
1. Introduction to Statistical Inference
Example: arsenic in drinking water  drinking water arsenic concentration in parts per billion (ppb) for 10 metropolitan Phoenix communities and 10 communities in rural Arizona
Metro Phoenix 
Rural Arizona 
Phoenix, 3 
Rimrock, 48 
Chandler, 7 
Goodyear, 44 
Gilbert, 25 
New River, 40 
Glendale, 10 
Apache Junction, 38 
Mesa, 15 
Buckeye, 33 
Paradise Valley, 6 Nogales, 21
Peoria, 12 Scottsdale, 25 Tempe, 15 Sun City, 7
Black Canyon City, 20 Sedona, 12 Payson, 1 Casa Grande, 18
program:
# arsenic concentration #
MetroPhoenix < c(3, 7, 25, 10, 15, 6, 12, 25, 15, 7); RuralArizona < c(48, 44, 40, 38, 33, 21, 20, 12, 1, 18); ArsenicConcentration < cbind(MetroPhoenix, RuralArizona);
30 / 53
program:
# box plot #
Industrial Statistics
boxplot(ArsenicConcentration);
# normal qq plot #
1. Introduction to Statistical Inference
qqnorm(MetroPhoenix, datax = TRUE, main = "Normal QQ Plot: Metro Phoenix"); qqline(MetroPhoenix, datax = TRUE);
qqnorm(RuralArizona, datax = TRUE, main = "Normal QQ Plot: Rural Arizona"); qqline(RuralArizona, datax = TRUE);
# test for variance homogeneity #
var.test(MetroPhoenix, RuralArizona, ratio = 1, conf.level = 0.95);
# unpaired ttest #
t.test(MetroPhoenix, RuralArizona, alternative = "two.sided", paired = FALSE, var.equal = FALSE, conf.level = 0.95);
31 / 53
1.5
1.0
Theoretical Quantiles
0.5
0.0
− 0.5
− 1.0
− 1.5
output:
32 / 53
output:
Industrial Statistics
1. Introduction to Statistical Inference
F test to compare two variances
data: MetroPhoenix and RuralArizona
Гораздо больше, чем просто документы.
Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.
Отменить можно в любой момент.