Вы находитесь на странице: 1из 53

Industrial Statistics

1. Introduction to Statistical Inference

1. Introduction to Statistical Inference

Industrial Statistics

1. Introduction to Statistical Inference

1.1 Overview

The aim of statistical inference is to make decisions and draw conclusions about populations.

step 1: selection of a suitable distribution family problem: characteristic X F but F is unknown. Frequently the user has a prior information about F . It holds that F 2 F = { F # : # 2 } . is called parameter set. Example:

X = quality of produced bulbs (1 intact, else 0). Here X B (1 , p) with p = P ( X = 1) ,

thus # = p, = (0 , 1) , and F = { B (1 , p) : p 2 (0 , 1) } .

X = b ody height. Various studies have shown that the body height is roughly normally

distributed. Thus X N ( µ, 2 ) , # = ( µ, ) and = IR (0 , 1) .

Note that if X is a continuous variable then the true distribution is usually no member of the selected distribution family. These distribution only provide an approximation.

step 2: drawing of a sample

In order to draw conclusions on # , data x 1 , is called sample space .

basic idea: x 1 ,

X 1 ,

In many cases it is assumed that the variables X 1 , distributed (brieﬂy: i.i.d.).

, x n are collected. The set of all possible samples

, X n from X .

,

x n are considered to be realizations of the random sample X 1 ,

, X n have the same distribution as X ( identically distributed ).

, X n are independent and identically

major areas of statistical inference: parameter estimation, conﬁdence intervals , and hypothesis testing

Industrial Statistics

1.2 Conﬁdence Intervals

1. Introduction to Statistical Inference

Assume that a random sample X 1 ,

,X

n is given with X i F # , # 2 .

aim: derivation of an area (interval) which contains the parameter # with a given probability

 Let ↵ 2 (0 , 1) . Suppose that L and U only depend on the sample variables X 1 , , X n . If P # ( L  #  U ) 1 ↵ 8# 2 ⇥ ( ⇤ ) then the interval [ L, U ] is called a two-sided 100(1 ↵ )% conﬁdence interval for # . L is called lower control limit , U upper control limit, and 1 ↵ is the conﬁdence coe cient . If both sides in (*) are equal then the conﬁdence interval is called exact . In practice ↵ is usually chosen equal to 0 . 1 , 0 . 05 or 0 . 01 . interpretation: Because U L should be small it is desirable to have exact conﬁdence intervals. ˆ L, 1 ´ is called a one-sided lower 100(1 ↵ )% -conﬁdence interval for # if P # ` L  # ) 1 ↵ 8# and ` 1, U ˜ is called a one-sided upper 100(1 ↵ )% -conﬁdence interval for # if P # ` #  U ) 1 ↵ 8# .

Example:

risk behavior of a ﬁnancial investment ; upper c. i. tear strength of a rope ; lower c. i.

Industrial Statistics

1. Introduction to Statistical Inference

1.2.1 Conﬁdence Intervals for the Parameters of a Normal Distribution

,

aim: conﬁdence intervals for µ if is known

development of the conﬁdence interval: ﬁrst estimate µ by X

Since X N ( µ, 2 /n) the following structure is chosen for the conﬁdence interval

Suppose that the sample variables X 1 ,

X n are i.i.d. with X i N ( µ, 2 ) for i = 1 ,

¯

, n.

¯

¯

[ X c

p n ,

¯

X

+ c p

n ]

with c > 0 . c is chosen as a function of such that (*) is valid. Note that

¯

µ 2 » X c

¯ ¯
X µ |
¯
X + c p n – , p n |
 c.
p n ,

Since p n( X µ ) / the quantity c is determined such that ¯
X
µ |
P µ „ p n |

c «

=

2 ( c ) 1

= ! 1 .

Consequently c = 1 (1 / 2) = z /2 . z /2 is the upper 100 / 2 percentage point of the standard normal distribution.

100(1 )% conﬁdence interval for µ ( known) »
¯
¯
X
X
p
z ↵/2
+ z ↵/2
p n ,
n –
4 / 53

Industrial Statistics

1. Introduction to Statistical Inference

now: conﬁdence interval for µ if is unknown

100(1 )% conﬁdence interval for µ ( unknown) »
S
S
¯
¯
p
X t n 1;↵/2
X + t n 1;↵/2
p n ,
n –
1
1 (1 ↵ / 2) (upp er 100 ↵ / 2% percentage point of the t distribution with
with t n 1;↵/2 = t
n

n 1 degrees of freedom.

Example: mean annual rainfall (in millimeters) in Australia from 1983 to 2002:

 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 499 . 2 555 . 2 398 . 8 391 . 9 453 . 4 459 . 8 483 . 7 417 . 6 469 . 2 452 . 4 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 499 . 3 340 . 6 522 . 8 469 . 9 527 . 2 565 . 5 584 . 1 727 . 3 558 . 6 338 . 6

It is n = 20 , x¯ = 485 . 755 , s = 90 . 33872 , and t 19;0.025 = 2 . 093 . Thus the conﬁdence interval is given by

[485 . 755 42 . 27934 , 485 . 755 + 42 . 27934] = [443 . 4757 , 528 . 0343] . program:

Industrial Statistics

1. Introduction to Statistical Inference

year <- c(1983:2002); rain <- c(499.2, 555.2, 398.8, 391.9, 453.4, 459.8, 483.7, 417.6, 469.2, 452.4, 499.3, 340.6, 522.8, 469.9, 527.2, 565.5, 584.1, 727.3, 558.6, 338.6);

# use setwd() (set working directory) and getwd() (get working directory))

#--- histogram ---#

hist(Rain, breaks = 8, freq = FALSE, main = "histogram", xlab = "Mean Annual Rainfall", ylab = "");

#--- box plot ---#

boxplot(rain, range = 0, ylab = "Mean Annual Rainfall");

#--- normal qq plot ---#

qqnorm(rain, datax = TRUE, main = "Normal QQ Plot"); qqline(rain, datax = TRUE);

Industrial Statistics

1. Introduction to Statistical Inference Histogram
output:
300
400
500
600
700
Mean Annual Rainfall
Normal QQ Plot
400
500
600
700
0.000
0.002
0.004
0.006
0.008
Theoretical Quantiles
− 2
− 1
0
1
2
Mean Annual Rainfall
400
500
600
700

Sample Quantiles

7 / 53

program: Industrial Statistics

#--- confidence intervall ---#

1. Introduction to Statistical Inference

alpha <- 0.05; n <- length(rain); lcl <- mean(rain) - qt(1 - alpha / 2, n - 1) * sd(rain) / sqrt(n); ucl <- mean(rain) + qt(1 - alpha / 2, n - 1) * sd(rain) / sqrt(n); ci <- c(lcl, ucl); print(ci);

output: result: [443 . 4752 , 528 . 0348]

Industrial Statistics

1. Introduction to Statistical Inference

now: conﬁdence interval for if µ is unknown 100(1 ↵ )% conﬁdence interval for 2
2
" ( n 1) S 2 , ( n 1) S
2
2
n
1;↵/2
n 1;1 ↵/2 #

Industrial Statistics

1. Introduction to Statistical Inference

1.2.2 Large-Sample Conﬁdence Intervals

Using the central limit theorem large-sample conﬁdence intervals for arbitrary distributions (discrete or continuous) can be derived. This means that

n!1 P # ( L( X 1 , ,

lim

Example: Suppose that X 1 , X 2 ,

X n ) # U ( X 1 ,

, X n )) 1

8 # 2 .

, are i.i.d. with E ( X i ) = µ for all i 1 .

i) conﬁdence interval for µ if 2 = V ar ( X i ) is known large-sample conﬁdence interval for µ ( known)
»
¯
¯
X
X
p
z ↵/2
+ z ↵/2
p n ,
n

rule of thumb: n 30

ii) conﬁdence interval for µ if 2 = V ar ( X i ) is unknown large-sample conﬁdence interval for µ ( unknown)
»
S
S
¯
¯
X
X
p
z ↵/2
+ z ↵/2
p n ,
n

rule of thumb: n 40

Industrial Statistics

1. Introduction to Statistical Inference

example: mercury contamination in largemouth bass (in ppm) - a sample of ﬁsh was selected from 53 Florida lakes

1 . 230 1 . 330 0 . 040 0 . 044 1 . 200 0 . 270

0 . 490 0 . 190 0 . 830 0 . 810 0 . 710 0 . 500

0 . 490 1 . 160 0 . 050 0 . 150 0 . 190 0 . 770

1 . 080 0 . 980 0 . 630 0 . 560 0 . 410 0 . 730

0 . 590 0 . 340 0 . 340 0 . 840 0 . 500 0 . 340

0 . 280 0 . 340 0 . 750 0 . 870 0 . 560 0 . 170

0 . 180 0 . 190 0 . 040 0 . 490 1 . 100 0 . 160

0 . 100 0 . 210 0 . 860 0 . 520 0 . 650 0 . 270

0 . 940 0 . 400 0 . 430 0 . 250 0 . 270

It holds that n = 53 , x¯ = 0 . 5319583 , s = 0 . 3567051 , and z 0 .025 = 1 . 96 . Thus the asymptotic conﬁdence interval is equal to [0 . 4311 , 0 . 6188] . program:

#--- histogram ---#

hist(MerCon, breaks = 14, freq = FALSE, main = "histogram", xlab = "concentration", ylab = "");

#--- normal qq plot ---#

qqnorm(MerCon, datax = TRUE, main = "normal qq plot"); qqline(MerCon, datax = TRUE); output:

Industrial Statistics

1. Introduction to Statistical Inference Histogram
Normal QQ Plot
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Concentration
Sample Quantiles
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Theoretical Quantiles
−2
− 1
0
1
2

program: Industrial Statistics

#--- confidence interval ---#

1. Introduction to Statistical Inference

alpha <- 0.05; n <- length(MerCon); lcl <- mean(MerCon) - qnorm(1 - alpha / 2) * sd(MerCon) / sqrt(n); ucl <- mean(MerCon) + qnorm(1 - alpha / 2) * sd(MerCon) / sqrt(n); ci <- c(lcl, ucl); print(ci);

output: result: ci = [0 . 4311 , 0 . 6188]

Industrial Statistics

1. Introduction to Statistical Inference

Overview: Conﬁdence Intervals ˆ
distribution
#
#
conﬁdence interval
¯
¯
¯
n
large
µ
X
X
p
n z ↵ / 2  µ  X +
p
z ↵ / 2 (or S instead of if unknown)
n
´
´
¯
¯
¯
¯
X ` 1
X
X ` 1
X
¯
¯
B
(1 , p ) , n very large
p
X
X
r
¯
z ↵ / 2  p  X + r
z
n
n
↵ / 2
n h
¯
1
2
¯
1
2
X +
X +
2
n z ↵ / 2 B i
2
n z ↵ / 2 + B i
¯
B
(1 , p ) , n large
p
X
 p  n h
with
2
2
n+z
n+z
/ 2
/ 2
´
¯
¯
X ` 1
X
1
B
+ `
= z 1 ↵ / 2 r
n
2
n
z ↵ / 2 ´ 2
¯
¯
¯
N
( µ, 2 ) , 2 known
µ
X
X
p
n z ↵ / 2  µ  X +
p
z ↵ / 2
n
¯
¯
S
¯
S
,
2 unknown
µ
X
X
p
X +
p
n t n 1;↵ / 2  µ 
n t n 1;↵ / 2
˜
˜
˜
2
n
S 2
n S 2
N
( µ, 2 ) , µ known
2
S
2 
n;↵ 2 / 2
n;1 2 ↵ / 2
2
(n 1) S 2
(n 1) S 2
,
µ unknown
2
S
2 
2
2
n
1;↵ / 2
n
1;1 ↵ / 2
2
⇢ˆ 1 ⇢ˆ 2
2-dim. normal distr.
⇢ˆ
p
z ↵ / 2  ⇢  ⇢ˆ + 1 ⇢ˆ
z ↵ / 2
n
p n
n
1
˜
with S 2 =
X
( X i µ ) 2
n
i=1

Industrial Statistics

1. Introduction to Statistical Inference

1.3 Hypothesis Testing

1.3.1 Introduction

Let X be the characteristic of interest with X F # , # 2 . The test problem is given by H 0 : # 2 0 against H 1 : # 2 1 . H 0 is called null hypothesis and H 1 is called alternative hypothesis . 0 and 1 are disjoint and 0 [ 1 = . A decision problem between two hypotheses is present, a so-called (test problem)

Example: burning rate of solid propellant - H 0 : µ = 50 (centimeters per second ) against H 1 : µ 6= 50

procedure: Based on the sample x 1 ,

Such a decision rule is called a statistical test .

,x Table: Type I Error and Type II Error
reality
decision
# 2 ⇥ 0
# 2 ⇥ 1
H 0 not rejected
no error
type II error
H 1 accepted
type I error
no error

Industrial Statistics

1. Introduction to Statistical Inference

procedure:

An upper bound for the type I error is ﬁxed, e.g.

for the test (reject H 0 ) is determined such that the type I error fulﬁlls this condition.

2 ˘ 0 . 01 , 0 . 05 , 0 . 1 ¯ . The critical region C

Such a test is called test of signiﬁcance at level for H 0 if the probability of a type I error is smaller or equal to , i.e.

P # (( x 1 ,

is called signiﬁcance level .

, x n ) 2 C )

for all

# 2 0 .

Because only the type I error and not the type II error is controlled by a test of signiﬁcance, the size of the type II error may be large. For that reason it is only possible to accept H 1 , i.e. to reject H 0 . It is incorrect to accept the null hypothesis H 0 .

Industrial Statistics

1. Introduction to Statistical Inference

1.3.2 Tests for Univariate Samples

Suppose that the random sample is i.i.d. with X i N ( µ, 2 )

test problem:

H 0 : µ = µ 0 against H 1 : µ 6= µ 0

Gauss test ( known) p
¯
X µ 0
n|
;
accept H 0 ( reject H 0 )
| > z ↵/2
p
¯
X µ 0
n|
;
fail to reject H 0
|  z ↵/2
t test ( unknown)
p
¯
X
µ 0
n|
;
accept H 1 ( reject H 0 )
| > t n 1;↵/2
S
p
¯
X
µ 0
n|
;
fail to reject H 0
|  t n 1;↵/2
S
2
2
test problem: H 0 : 2 = against H 1 : 2 6=
0
0
H 0 is rejected if
2
( n 1) S 2 <
2 or
( n 1) S 2
>
2
n 1;1 ↵/2
2
n
1;↵/2
0
0

Industrial Statistics

¯

1. Introduction to Statistical Inference

Example: Power function G( µ ) = P µ ( p n| X µ 0 | / > z /2 ) of the two-sided Gauss test (i.e. probability to accept H 1 as a function of µ ) for = 0 . 05 , n = 5 , = 1 and µ 0 = 0 ) G( µ)
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
1.0
0.8
0.6
0.4
0.2
3
2
1
0
123
H 1
"
H 1
!

H 0

µ

Industrial Statistics

1. Introduction to Statistical Inference

Large-Sample Tests

Suppose that the variables X 1 , X 2 ,

i = 1 , , n.

are i.i.d. with E ( X i ) = µ and V ar ( X i ) = 2 for

Because in most cases the distribution of the underlying characteristic X is unknown approximations to the critical values are determined using the asymptotic distribution of the test statistic

test problem: H 0 : µ = µ 0 against H 1 : µ 6= µ 0 ¯
X µ 0
The null hypothesis H 0 is rejected if p n|
| > z ↵/2 .
S
rule of thumb: n > 100 , for 30  n  100 use t n 1;↵/2 instead of z ↵/2

Industrial Statistics

1. Introduction to Statistical Inference

Tests for the Mean and the Variance of a Single Sample

The random sample X 1 , X n is assumed to be i.i.d

,

distribution H 0

 normal distr. µ = µ 0 known µ  µ 0 µ µ 0 unknown µ = µ 0 µ  µ 0 µ µ 0 arbitrary distr. µ = µ 0 with EW µ µ  µ 0 µ µ 0 binomial distr. p = p 0 n small p  p 0 p p 0 n 100 or p = p 0 npˆ(1 pˆ) > 5 p  p 0 p p 0 normal distr. 2 = 2 0 2  2 2 0 2 0 ⇤ – for known expectation µ ,

S 2 =

H 1

µ 6= µ 0

µ>µ

µ<µ

µ 6= µ 0

µ>µ

µ<µ

µ 6= µ 0

µ>µ

µ<µ

p 6= p 0

p>p

0

0

0

0

0

0

0

0

p<p

p 6= p 0

p>p

p<p

2 6=

2 >

2 <

1

0

0

n 1

2

0

2

0

2

0

test statistic T

¯

X µ 0

/ p n

¯

X µ 0

S/ p n

¯

X µ 0

S/ p n

number

of successes

p p 0 )

p p 0 (1 p 0 )/n

( n 1) S 2 /

( n S 2 /

˜

2

0

)

2

0

n

X ( X i X ) 2 ,

¯

i=1

S ˜ 2 =

T under H 0

t

( for

n > 100 )

approx.

( t n 1 for

30 n 100 )

B ( n, p 0 )

n 1

approx.

2

n

1

( 2

n )

1

n

n

X

i=1

( X i µ ) 2

reject H 0 if

|T | > z / 2

>z

< z

T

T

|T | > t n 1;/ 2 ( z / 2 )

>t n 1; ( z )

< t n 1; ( z )

T

T

|T | > z / 2 ( t n 1;/ 2 )

T

T

T

T

T

of

|T | > z / 2

>z ( t n 1; )

< z ( t n 1; )

62 ˆ c 1 / 2 , c / 2 ˜

>c

<c 1 (percentage points

B ( n,

p 0 ) )

T

T

>z

< z

n 1;1 / 2 , n 2 1;/ 2

T 62 [ 2

T

T

> 2

n

< 2

n

1;

1;1

20 / 53

Industrial Statistics

1. Introduction to Statistical Inference

Example: performance of new golf clubs - ratio of outgoing velocity of a golf ball to the incoming velocity (coe cient of restitution)

0 . 8411 0 . 8580 0 . 8042

0 . 8191 0 . 8532 0 . 8730

0 . 8182 0 . 8483 0 . 8282

0 . 8125 0 . 8276 0 . 8359

0 . 8750 0 . 7983 0 . 8660

test problem: H 0 : µ 0 . 82 against H 1 : µ > 0 . 82

Since n = 15 , x¯ = 0 . 83724 , s = 0 . 0245571 the test statistic is equal to

T = x¯ 0 .82 s/ p 15

= 2 . 718979 . Assuming normality the percentile is t 14;0.05 = 1 . 761 . Because it is

smaller than T the null hypothesis is rejected. program:

CoR <- c(0.8411, 0.8191, 0.8182, 0.8125, 0.8750, 0.8580, 0.8532, 0.8483, 0.8276, 0.7983, 0.8042, 0.8730, 0.8282, 0.8359, 0.8660);

#--- box plot ---#

boxplot(CoR, range = 0, ylab = "Coefficient of Restitution");

#--- histogram ---#

hist(CoR, breaks = 8, freq = FALSE, main = "Histogram", xlab = "Coefficient of Restitution", ylab = "");

21 / 53 program:

Industrial Statistics

#--- normal qq plot ---#

1. Introduction to Statistical Inference

qqnorm(CoR, datax = TRUE, main = "Normal QQ Plot"); qqline(CoR, datax = TRUE);

#---unpaired t-test---#

t.test(CoR, alternative = "two.sided", mu = 0.82, conf.level = 0.95); #---> t-test for mu.0 = 0.82 including confidence level

#---alternative procedure: direct calculation of the quantities---#

mu.0 <- 0.82;

n <- length(CoR);

T <- (mean(CoR) - mu.0) / sd(CoR) * sqrt(n); #---> test statistics p.value <- dt(T, n - 1); #---> directly to p-value

Coefficient of Restitution

Industrial Statistics Histogram
output:
0.80
0.82
0.84
0.86
0.88
0
5
10
15
20

1. Introduction to Statistical Inference 0.80
0.82
0.84
0.86 Coefficient of Restitution
Normal QQ Plot
0.80
0.82
0.84
0.86
Theoretical Quantiles
1
0
1

Sample Quantiles

23 / 53

Industrial Statistics

1. Introduction to Statistical Inference

The P –Value Approach

The p-value is equal to the probability to reject the null hypothesis for the given data set. Thus it is the smallest level of signiﬁcance that would lead to rejection of the null hypothesis H 0 for the given data. It can be cosnidered as the observed signiﬁcance level. For the Gauss test it holds that = P µ 0 ( T >t| T = t)=1 ( t) .

= 0.05

1-sided Gauss test

1

/

z

r

1

H 0

H 1

r

t

= 0 .0179

The smaller the p-value the more unlikely isH 0 .

If an upper bound for the type I error is given then H 0 is rejected if < . Else, H 0 is not rejected.

Industrial Statistics

1. Introduction to Statistical Inference

Relationship between tests of signiﬁcance and conﬁdence intervals

To illustrate the relationship suppose that X N ( µ, 2 ) . The sample variables are assumed to be independent and identically distributed. We are interested in statements about the expectation.

Supp ose that a conﬁdence interval for µ with level 1 is given. Then

ci = »

¯

X t n 1;1 /2

S

p n ,

¯

X + t n 1;1 /2

n .

S

p

In order to deal with the test problem H 0 : µ = µ 0 against H 1 : µ 6= µ 0 , it is su cient to check whether µ 0 2 ci. If this is the case then H 1 is accepted. This procedure is equivalent to the t-test.

Supp ose that the t-test for the test problem H 0 : µ = µ 0 against H 1 : µ 6= µ 0 is given. Then the test statistic is equal to

Consequently ¯
X µ 0
T = p n
.
S

P ( | T | t n 1;1 /2 )=1 = P µ 0 ( µ 0 2 KI )

with ci as above. ci is a conﬁdence interval for µ with conﬁdence level 1 . Thus the

test directly provides a conﬁdence interval.

25 / 53

Overview: Statistical Inference for a Single Sample Industrial Statistics

1. Introduction to Statistical Inference

1.4 Statistical Inference for Two Samples

Suppose that X 1 and X 2 are independent characteristics. Let X 11 , X 1 n 1 be a random sample

of X 1 and let X 21 , X 2 n 2 be a random sample of X 2 .

,

,

Two Independent Samples

Assumption: Suppose that X 11 , V ar ( X i ) = for i = 1 , 2 .

2

i

,

X 1 n 1 , X 21 ,

,

X 2 n 2 are independent. Let E ( X i ) = µ i and

Test on Di erence in Means

test problem: H 0 : µ 1 = µ 2 against H 1 : µ 1 6= µ 2

Test of Equality of the Variances

test problem: H 0 : 2 = against H 1 : 2 6=

1

2

2

1

2

2

Industrial Statistics

1. Introduction to Statistical Inference

Tests for the Means and the Variances of a Bivariate Sample I

 distribution H 0 H 1 test statistic T T under H 0 reject H 0 if ¯ ¯ normal distr. µ 1 6= µ 2 X 1 X 2 |T | > z ↵ / 2 µ 1 = µ 2 r 2 1 2 2 + n 2 n 1 1 and 2 µ 1  µ 2 µ 1 > µ 2 T >z ↵ known µ 1 µ 2 µ 1 < µ 2 T < z ↵ ¯ ¯ X 1 X 2 t n 1 +n 2 2 |T | > t n 1 +n 2 2;↵ / 2 1 , 2 unknown µ 1 = µ 2 µ 1 6= µ 2 q ˜ 2 ( 1 1 + n 1 n 2 ) 1 = 2 , µ 1  µ 2 µ 1 > µ 2 T >t n 1 +n 2 2;↵ small sample size µ 1 µ 2 µ 1 < µ 2 T < t n 1 +n 2 2;↵ ¯ ¯ 1 , 2 unknown µ 1 6= µ 2 X 1 X 2 t df , df = |T | > t df ;↵ / 2 µ 1 = µ 2 r 2 2 S n 1 1 S 2 + n 2 1 6= 2 , µ 1  µ 2 (1+R ) 2 T µ 1 > µ 2 2 R 1 n 1 + n 2 1 1 >t df ;↵ µ 1 µ 2 = n 2 S 2 R 2 2 1 n 1 S T µ 1 < µ 2 < t df ;↵

with S

2

k =

1

n 1

n

X

i=1

( X ki X k ) 2 , k = 1 , 2 ,

¯

˜ 2 =

n 1 1 n 1 +n 2 2 S

2

1

+

n 2 1 n 1 +n 2 2 S

2

2

28 / 53

Industrial Statistics

1. Introduction to Statistical Inference

Tests for the Means and the Variances of a Bivariate Sample II

distribution

arbitrary distr.

1 , 2 unknown,

large sample size

binomial distr.

large sample size

normal distr.

H 0

µ 1 = µ 2

µ 1 µ 2

µ 1 µ 2

p 1 = p 2

p 1

p

2

p 1 p 2

2 =

1

2

1

2

1

2

2

2

2

2

2

H 1

µ 1 6= µ 2

µ 1 > µ 2

µ 1 < µ 2

p 1 6= p 2

p 1 >

p

2

p 1 < p 2

2

1

2

1

2

1

6=

>

<

2

2

2

2

2

2

test statistic T

¯

X 1

¯

X 2

r

S

n

2

2

1 + n 2

2

1

S

pˆ

1 pˆ 2

q

pˆ(1

pˆ)(

1

1 +

n

1

2 )

n

with pˆ = n 1 n pˆ 1 1 +n +n 2 2 pˆ 2

S

2

1

S

2

2

 T under H 0 reject H 0 if |T | > z ↵ / 2 T >z ↵ T < z ↵ |T | > z ↵ / 2 T >z ↵ T < z ↵ F n 1 1,n 2 1 T >F n 1 1,n 2 1;↵ / 2 or T F n 1 1,n 2 1;↵ T

Industrial Statistics

1. Introduction to Statistical Inference

Example: arsenic in drinking water - drinking water arsenic concentration in parts per billion (ppb) for 10 metropolitan Phoenix communities and 10 communities in rural Arizona

 Metro Phoenix Rural Arizona Phoenix, 3 Rimrock, 48 Chandler, 7 Goodyear, 44 Gilbert, 25 New River, 40 Glendale, 10 Apache Junction, 38 Mesa, 15 Buckeye, 33

Peoria, 12 Scottsdale, 25 Tempe, 15 Sun City, 7

Black Canyon City, 20 Sedona, 12 Payson, 1 Casa Grande, 18 program:

#--- arsenic concentration ---#

MetroPhoenix <- c(3, 7, 25, 10, 15, 6, 12, 25, 15, 7); RuralArizona <- c(48, 44, 40, 38, 33, 21, 20, 12, 1, 18); ArsenicConcentration <- cbind(MetroPhoenix, RuralArizona); program:

#--- box plot ---#

Industrial Statistics

boxplot(ArsenicConcentration);

#--- normal q-q plot ---#

1. Introduction to Statistical Inference

qqnorm(MetroPhoenix, datax = TRUE, main = "Normal Q-Q Plot: Metro Phoenix"); qqline(MetroPhoenix, datax = TRUE);

qqnorm(RuralArizona, datax = TRUE, main = "Normal Q-Q Plot: Rural Arizona"); qqline(RuralArizona, datax = TRUE);

#--- test for variance homogeneity ---#

var.test(MetroPhoenix, RuralArizona, ratio = 1, conf.level = 0.95);

#--- unpaired t-test ---#

t.test(MetroPhoenix, RuralArizona, alternative = "two.sided", paired = FALSE, var.equal = FALSE, conf.level = 0.95);

1.5

1.0

Theoretical Quantiles

0.5

0.0

0.5

1.0

1.5 output: Industrial Statistics
1. Introduction to Statistical Inference
MetroPhoenix
RuralArizona
0
10
20
30
40 Normal Q − Q Plot: Metro Phoenix
Normal Q − Q Plot: Rural Arizona
5
10
15
20
25
0
10
20
30
40
Sample Quantiles
Sample Quantiles
Theoretical Quantiles
− 1.5
− 1.0
− 0.5
0.0
0.5
1.0
1.5

32 / 53 output:

Industrial Statistics

1. Introduction to Statistical Inference

F test to compare two variances

data: MetroPhoenix and RuralArizona