Вы находитесь на странице: 1из 53

Industrial Statistics

1. Introduction to Statistical Inference

1. Introduction to Statistical Inference

Industrial Statistics

1. Introduction to Statistical Inference

1.1 Overview

The aim of statistical inference is to make decisions and draw conclusions about populations.

step 1: selection of a suitable distribution family problem: characteristic X F but F is unknown. Frequently the user has a prior information about F . It holds that F 2 F = { F # : # 2 } . is called parameter set. Example:

X = quality of produced bulbs (1 intact, else 0). Here X B (1 , p) with p = P ( X = 1) ,

thus # = p, = (0 , 1) , and F = { B (1 , p) : p 2 (0 , 1) } .

X = b ody height. Various studies have shown that the body height is roughly normally

distributed. Thus X N ( µ, 2 ) , # = ( µ, ) and = IR (0 , 1) .

Note that if X is a continuous variable then the true distribution is usually no member of the selected distribution family. These distribution only provide an approximation.

step 2: drawing of a sample

In order to draw conclusions on # , data x 1 , is called sample space .

basic idea: x 1 ,

X 1 ,

In many cases it is assumed that the variables X 1 , distributed (briefly: i.i.d.).

, x n are collected. The set of all possible samples

, X n from X .

,

x n are considered to be realizations of the random sample X 1 ,

, X n have the same distribution as X ( identically distributed ).

, X n are independent and identically

major areas of statistical inference: parameter estimation, confidence intervals , and hypothesis testing

Industrial Statistics

1.2 Confidence Intervals

1. Introduction to Statistical Inference

Assume that a random sample X 1 ,

,X

n is given with X i F # , # 2 .

aim: derivation of an area (interval) which contains the parameter # with a given probability

Let 2 (0 , 1) . Suppose that L and U only depend on the sample variables X 1 ,

,

X n . If

P # ( L # U )

1

8# 2

( )

then the interval [ L, U ] is called a two-sided 100(1 )% confidence interval for # . L is called lower control limit , U upper control limit, and 1 is the confidence coe cient . If both sides in (*) are equal then the confidence interval is called exact .

In practice is usually chosen equal to 0 . 1 , 0 . 05 or 0 . 01 .

 

interpretation: Because U L should be small it is desirable to have exact confidence intervals.

ˆ L, 1 ´ is called a one-sided lower 100(1 )% -confidence interval for # if P # ` L # ) 1 8# and ` 1, U ˜ is called a one-sided upper 100(1 )% -confidence interval for # if P # ` # U ) 1 8# .

 

Example:

risk behavior of a financial investment ; upper c. i. tear strength of a rope ; lower c. i.

Industrial Statistics

1. Introduction to Statistical Inference

1.2.1 Confidence Intervals for the Parameters of a Normal Distribution

,

aim: confidence intervals for µ if is known

development of the confidence interval: first estimate µ by X

Since X N ( µ, 2 /n) the following structure is chosen for the confidence interval

Suppose that the sample variables X 1 ,

X n are i.i.d. with X i N ( µ, 2 ) for i = 1 ,

¯

, n.

¯

¯

[ X c

p n ,

¯

X

+ c p

n ]

with c > 0 . c is chosen as a function of such that (*) is valid. Note that

¯

µ 2 » X c

¯

¯ X µ | ¯ X + c p n – , p n |
¯
X µ |
¯
X + c p n – , p n |
 c.
p n ,

Since p n( X µ ) / the quantity c is determined such that

¯ X µ | P µ „ p n |
¯
X
µ |
P µ „ p n |

c «

=

2 ( c ) 1

= ! 1 .

Consequently c = 1 (1 / 2) = z /2 . z /2 is the upper 100 / 2 percentage point of the standard normal distribution.

100(1 )% confidence interval for µ ( known)

» ¯ ¯ X X p z ↵/2 + z ↵/2 p n , n
»
¯
¯
X
X
p
z ↵/2
+ z ↵/2
p n ,
n –
4 / 53

Industrial Statistics

1. Introduction to Statistical Inference

now: confidence interval for µ if is unknown

100(1 )% confidence interval for µ ( unknown)

» S S ¯ ¯ p X t n 1;↵/2 X + t n 1;↵/2
»
S
S
¯
¯
p
X t n 1;↵/2
X + t n 1;↵/2
p n ,
n –
1
1 (1 ↵ / 2) (upp er 100 ↵ / 2% percentage point of the t distribution with
with t n 1;↵/2 = t
n

n 1 degrees of freedom.

Example: mean annual rainfall (in millimeters) in Australia from 1983 to 2002:

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

499 . 2

555 . 2

398 . 8

391 . 9

453 . 4

459 . 8

483 . 7

417 . 6

469 . 2

452 . 4

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

499 . 3

340 . 6

522 . 8

469 . 9

527 . 2

565 . 5

584 . 1

727 . 3

558 . 6

338 . 6

It is n = 20 , x¯ = 485 . 755 , s = 90 . 33872 , and t 19;0.025 = 2 . 093 . Thus the confidence interval is given by

[485 . 755 42 . 27934 , 485 . 755 + 42 . 27934] = [443 . 4757 , 528 . 0343] .

program: Industrial Statistics 1. Introduction to Statistical Inference year <- c(1983:2002); rain <- c(499.2,

program:

Industrial Statistics

1. Introduction to Statistical Inference

year <- c(1983:2002); rain <- c(499.2, 555.2, 398.8, 391.9, 453.4, 459.8, 483.7, 417.6, 469.2, 452.4, 499.3, 340.6, 522.8, 469.9, 527.2, 565.5, 584.1, 727.3, 558.6, 338.6);

# (or rain <- read.table("rain.txt");

# use setwd() (set working directory) and getwd() (get working directory))

#--- histogram ---#

hist(Rain, breaks = 8, freq = FALSE, main = "histogram", xlab = "Mean Annual Rainfall", ylab = "");

#--- box plot ---#

boxplot(rain, range = 0, ylab = "Mean Annual Rainfall");

#--- normal qq plot ---#

qqnorm(rain, datax = TRUE, main = "Normal QQ Plot"); qqline(rain, datax = TRUE);

Industrial Statistics

1. Introduction to Statistical Inference

Histogram output: 300 400 500 600 700 Mean Annual Rainfall Normal QQ Plot ● ●
Histogram
output:
300
400
500
600
700
Mean Annual Rainfall
Normal QQ Plot
400
500
600
700
0.000
0.002
0.004
0.006
0.008
Theoretical Quantiles
− 2
− 1
0
1
2
Mean Annual Rainfall
400
500
600
700

Sample Quantiles

7 / 53

program:Industrial Statistics #--- confidence intervall ---# 1. Introduction to Statistical Inference alpha <- 0.05; n

Industrial Statistics

#--- confidence intervall ---#

1. Introduction to Statistical Inference

alpha <- 0.05; n <- length(rain); lcl <- mean(rain) - qt(1 - alpha / 2, n - 1) * sd(rain) / sqrt(n); ucl <- mean(rain) + qt(1 - alpha / 2, n - 1) * sd(rain) / sqrt(n); ci <- c(lcl, ucl); print(ci);

output:alpha / 2, n - 1) * sd(rain) / sqrt(n); ci <- c(lcl, ucl); print(ci); result:

result: [443 . 4752 , 528 . 0348]

Industrial Statistics

1. Introduction to Statistical Inference

now: confidence interval for if µ is unknown

100(1 ↵ )% confidence interval for 2 2 " ( n 1) S 2 ,
100(1 ↵ )% confidence interval for 2
2
" ( n 1) S 2 , ( n 1) S
2
2
n
1;↵/2
n 1;1 ↵/2 #

Industrial Statistics

1. Introduction to Statistical Inference

1.2.2 Large-Sample Confidence Intervals

Using the central limit theorem large-sample confidence intervals for arbitrary distributions (discrete or continuous) can be derived. This means that

n!1 P # ( L( X 1 , ,

lim

Example: Suppose that X 1 , X 2 ,

X n ) # U ( X 1 ,

, X n )) 1

8 # 2 .

, are i.i.d. with E ( X i ) = µ for all i 1 .

i) confidence interval for µ if 2 = V ar ( X i ) is known

large-sample confidence interval for µ ( known) » ¯ ¯ X X p z ↵/2
large-sample confidence interval for µ ( known)
»
¯
¯
X
X
p
z ↵/2
+ z ↵/2
p n ,
n

rule of thumb: n 30

ii) confidence interval for µ if 2 = V ar ( X i ) is unknown

large-sample confidence interval for µ ( unknown) » S S ¯ ¯ X X p
large-sample confidence interval for µ ( unknown)
»
S
S
¯
¯
X
X
p
z ↵/2
+ z ↵/2
p n ,
n

rule of thumb: n 40

Industrial Statistics

1. Introduction to Statistical Inference

example: mercury contamination in largemouth bass (in ppm) - a sample of fish was selected from 53 Florida lakes

1 . 230 1 . 330 0 . 040 0 . 044 1 . 200 0 . 270

0 . 490 0 . 190 0 . 830 0 . 810 0 . 710 0 . 500

0 . 490 1 . 160 0 . 050 0 . 150 0 . 190 0 . 770

1 . 080 0 . 980 0 . 630 0 . 560 0 . 410 0 . 730

0 . 590 0 . 340 0 . 340 0 . 840 0 . 500 0 . 340

0 . 280 0 . 340 0 . 750 0 . 870 0 . 560 0 . 170

0 . 180 0 . 190 0 . 040 0 . 490 1 . 100 0 . 160

0 . 100 0 . 210 0 . 860 0 . 520 0 . 650 0 . 270

0 . 940 0 . 400 0 . 430 0 . 250 0 . 270

It holds that n = 53 , x¯ = 0 . 5319583 , s = 0 . 3567051 , and z 0 .025 = 1 . 96 . Thus the asymptotic confidence interval is equal to [0 . 4311 , 0 . 6188] .

interval is equal to [0 . 4311 , 0 . 6188] . program: #--- histogram ---#

program:

#--- histogram ---#

hist(MerCon, breaks = 14, freq = FALSE, main = "histogram", xlab = "concentration", ylab = "");

#--- normal qq plot ---#

qqnorm(MerCon, datax = TRUE, main = "normal qq plot"); qqline(MerCon, datax = TRUE);

output: Industrial Statistics 1. Introduction to Statistical Inference Histogram Normal QQ Plot ● ● ●

output:

Industrial Statistics

1. Introduction to Statistical Inference

Histogram Normal QQ Plot ● ● ● ● ● ● ● ● ● ● ●
Histogram
Normal QQ Plot
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Concentration
Sample Quantiles
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Theoretical Quantiles
−2
− 1
0
1
2

program:Industrial Statistics #--- confidence interval ---# 1. Introduction to Statistical Inference alpha <- 0.05; n

Industrial Statistics

#--- confidence interval ---#

1. Introduction to Statistical Inference

alpha <- 0.05; n <- length(MerCon); lcl <- mean(MerCon) - qnorm(1 - alpha / 2) * sd(MerCon) / sqrt(n); ucl <- mean(MerCon) + qnorm(1 - alpha / 2) * sd(MerCon) / sqrt(n); ci <- c(lcl, ucl); print(ci);

output:- alpha / 2) * sd(MerCon) / sqrt(n); ci <- c(lcl, ucl); print(ci); result: ci =

result: ci = [0 . 4311 , 0 . 6188]

Industrial Statistics

1. Introduction to Statistical Inference

Overview: Confidence Intervals

ˆ distribution # # confidence interval ¯ ¯ ¯ n large µ X X p
ˆ
distribution
#
#
confidence interval
¯
¯
¯
n
large
µ
X
X
p
n z ↵ / 2  µ  X +
p
z ↵ / 2 (or S instead of if unknown)
n
´
´
¯
¯
¯
¯
X ` 1
X
X ` 1
X
¯
¯
B
(1 , p ) , n very large
p
X
X
r
¯
z ↵ / 2  p  X + r
z
n
n
↵ / 2
n h
¯
1
2
¯
1
2
X +
X +
2
n z ↵ / 2 B i
2
n z ↵ / 2 + B i
¯
B
(1 , p ) , n large
p
X
 p  n h
with
2
2
n+z
n+z
/ 2
/ 2
´
¯
¯
X ` 1
X
1
B
+ `
= z 1 ↵ / 2 r
n
2
n
z ↵ / 2 ´ 2
¯
¯
¯
N
( µ, 2 ) , 2 known
µ
X
X
p
n z ↵ / 2  µ  X +
p
z ↵ / 2
n
¯
¯
S
¯
S
,
2 unknown
µ
X
X
p
X +
p
n t n 1;↵ / 2  µ 
n t n 1;↵ / 2
˜
˜
˜
2
n
S 2
n S 2
N
( µ, 2 ) , µ known
2
S
2 
n;↵ 2 / 2
n;1 2 ↵ / 2
2
(n 1) S 2
(n 1) S 2
,
µ unknown
2
S
2 
2
2
n
1;↵ / 2
n
1;1 ↵ / 2
2
⇢ˆ 1 ⇢ˆ 2
2-dim. normal distr.
⇢ˆ
p
z ↵ / 2  ⇢  ⇢ˆ + 1 ⇢ˆ
z ↵ / 2
n
p n
n
1
˜
with S 2 =
X
( X i µ ) 2
n
i=1

Industrial Statistics

1. Introduction to Statistical Inference

1.3 Hypothesis Testing

1.3.1 Introduction

Let X be the characteristic of interest with X F # , # 2 . The test problem is given by H 0 : # 2 0 against H 1 : # 2 1 . H 0 is called null hypothesis and H 1 is called alternative hypothesis . 0 and 1 are disjoint and 0 [ 1 = . A decision problem between two hypotheses is present, a so-called (test problem)

Example: burning rate of solid propellant - H 0 : µ = 50 (centimeters per second ) against H 1 : µ 6= 50

procedure: Based on the sample x 1 ,

Such a decision rule is called a statistical test .

,x

n a decision about a particular hypothesis is made.

Table: Type I Error and Type II Error reality decision # 2 ⇥ 0 #
Table: Type I Error and Type II Error
reality
decision
# 2 ⇥ 0
# 2 ⇥ 1
H 0 not rejected
no error
type II error
H 1 accepted
type I error
no error

Industrial Statistics

1. Introduction to Statistical Inference

procedure:

An upper bound for the type I error is fixed, e.g.

for the test (reject H 0 ) is determined such that the type I error fulfills this condition.

2 ˘ 0 . 01 , 0 . 05 , 0 . 1 ¯ . The critical region C

Such a test is called test of significance at level for H 0 if the probability of a type I error is smaller or equal to , i.e.

P # (( x 1 ,

is called significance level .

, x n ) 2 C )

for all

# 2 0 .

Because only the type I error and not the type II error is controlled by a test of significance, the size of the type II error may be large. For that reason it is only possible to accept H 1 , i.e. to reject H 0 . It is incorrect to accept the null hypothesis H 0 .

Industrial Statistics

1. Introduction to Statistical Inference

1.3.2 Tests for Univariate Samples

Suppose that the random sample is i.i.d. with X i N ( µ, 2 )

test problem:

H 0 : µ = µ 0 against H 1 : µ 6= µ 0

Gauss test ( known)

p ¯ X µ 0 n| ; accept H 0 ( reject H 0 )
p
¯
X µ 0
n|
;
accept H 0 ( reject H 0 )
| > z ↵/2
p
¯
X µ 0
n|
;
fail to reject H 0
|  z ↵/2
t test ( unknown)
p
¯
X
µ 0
n|
;
accept H 1 ( reject H 0 )
| > t n 1;↵/2
S
p
¯
X
µ 0
n|
;
fail to reject H 0
|  t n 1;↵/2
S
2
2
test problem: H 0 : 2 = against H 1 : 2 6=
0
0
H 0 is rejected if
2
( n 1) S 2 <
2 or
( n 1) S 2
>
2
n 1;1 ↵/2
2
n
1;↵/2
0
0

Industrial Statistics

¯

1. Introduction to Statistical Inference

Example: Power function G( µ ) = P µ ( p n| X µ 0 | / > z /2 ) of the two-sided Gauss test (i.e. probability to accept H 1 as a function of µ ) for = 0 . 05 , n = 5 , = 1 and µ 0 = 0 )

G( µ)
G( µ)
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
1.0
0.8
0.6
0.4
0.2
3
2
1
0
123
H 1
"
H 1
!

H 0

µ

Industrial Statistics

1. Introduction to Statistical Inference

Large-Sample Tests

Suppose that the variables X 1 , X 2 ,

i = 1 , , n.

are i.i.d. with E ( X i ) = µ and V ar ( X i ) = 2 for

Because in most cases the distribution of the underlying characteristic X is unknown approximations to the critical values are determined using the asymptotic distribution of the test statistic

test problem: H 0 : µ = µ 0 against H 1 : µ 6= µ 0

¯ X µ 0 The null hypothesis H 0 is rejected if p n| |
¯
X µ 0
The null hypothesis H 0 is rejected if p n|
| > z ↵/2 .
S
rule of thumb: n > 100 , for 30  n  100 use t n 1;↵/2 instead of z ↵/2

Industrial Statistics

1. Introduction to Statistical Inference

Tests for the Mean and the Variance of a Single Sample

The random sample X 1 , X n is assumed to be i.i.d

,

distribution H 0

normal distr.

 

µ

= µ 0

known

µ

µ 0

 

µ

µ 0

unknown

 

µ

= µ 0

 

µ

µ 0

µ

µ 0

arbitrary distr.

µ

= µ 0

with EW µ

 

µ

µ 0

 

µ

µ 0

binomial distr.

p

= p 0

n

small

p

p 0

 

p

p 0

n

100 or

p

= p 0

npˆ(1 pˆ) >

5

p

p 0

p

p 0

normal distr.

 

2 =

2

0

 

2 2

2

0

2

0

– for known expectation µ ,

S 2 =

H 1

µ 6= µ 0

µ>µ

µ<µ

µ 6= µ 0

µ>µ

µ<µ

µ 6= µ 0

µ>µ

µ<µ

p 6= p 0

p>p

0

0

0

0

0

0

0

0

p<p

p 6= p 0

p>p

p<p

2 6=

2 >

2 <

1

0

0

n 1

2

0

2

0

2

0

test statistic T

¯

X µ 0

/ p n

¯

X µ 0

S/ p n

¯

X µ 0

S/ p n

number

of successes

p p 0 )

p p 0 (1 p 0 )/n

( n 1) S 2 /

( n S 2 /

˜

2

0

)

2

0

n

X ( X i X ) 2 ,

¯

i=1

S ˜ 2 =

T under H 0

t

( for

n > 100 )

approx.

( t n 1 for

30 n 100 )

B ( n, p 0 )

n 1

approx.

2

n

1

( 2

n )

1

n

n

X

i=1

( X i µ ) 2

reject H 0 if

|T | > z / 2

>z

< z

T

T

|T | > t n 1;/ 2 ( z / 2 )

>t n 1; ( z )

< t n 1; ( z )

T

T

|T | > z / 2 ( t n 1;/ 2 )

T

T

T

T

T

of

|T | > z / 2

>z ( t n 1; )

< z ( t n 1; )

62 ˆ c 1 / 2 , c / 2 ˜

>c

<c 1 (percentage points

B ( n,

p 0 ) )

T

T

>z

< z

n 1;1 / 2 , n 2 1;/ 2

T 62 [ 2

T

T

> 2

n

< 2

n

1;

1;1

20 / 53

Industrial Statistics

1. Introduction to Statistical Inference

Example: performance of new golf clubs - ratio of outgoing velocity of a golf ball to the incoming velocity (coe cient of restitution)

0 . 8411 0 . 8580 0 . 8042

0 . 8191 0 . 8532 0 . 8730

0 . 8182 0 . 8483 0 . 8282

0 . 8125 0 . 8276 0 . 8359

0 . 8750 0 . 7983 0 . 8660

test problem: H 0 : µ 0 . 82 against H 1 : µ > 0 . 82

Since n = 15 , x¯ = 0 . 83724 , s = 0 . 0245571 the test statistic is equal to

T = x¯ 0 .82 s/ p 15

= 2 . 718979 . Assuming normality the percentile is t 14;0.05 = 1 . 761 . Because it is

smaller than T the null hypothesis is rejected.

it is smaller than T the null hypothesis is rejected. program: CoR <- c(0.8411, 0.8191, 0.8182,

program:

CoR <- c(0.8411, 0.8191, 0.8182, 0.8125, 0.8750, 0.8580, 0.8532, 0.8483, 0.8276, 0.7983, 0.8042, 0.8730, 0.8282, 0.8359, 0.8660);

#--- box plot ---#

boxplot(CoR, range = 0, ylab = "Coefficient of Restitution");

#--- histogram ---#

hist(CoR, breaks = 8, freq = FALSE, main = "Histogram", xlab = "Coefficient of Restitution", ylab = "");

21 / 53

program: Industrial Statistics #--- normal qq plot ---# 1. Introduction to Statistical Inference qqnorm(CoR, datax

program:

Industrial Statistics

#--- normal qq plot ---#

1. Introduction to Statistical Inference

qqnorm(CoR, datax = TRUE, main = "Normal QQ Plot"); qqline(CoR, datax = TRUE);

#---unpaired t-test---#

t.test(CoR, alternative = "two.sided", mu = 0.82, conf.level = 0.95); #---> t-test for mu.0 = 0.82 including confidence level

#---alternative procedure: direct calculation of the quantities---#

mu.0 <- 0.82;

n <- length(CoR);

T <- (mean(CoR) - mu.0) / sd(CoR) * sqrt(n); #---> test statistics p.value <- dt(T, n - 1); #---> directly to p-value

Coefficient of Restitution

Industrial Statistics

Histogram output: 0.80 0.82 0.84 0.86 0.88 0 5 10 15 20
Histogram
output:
0.80
0.82
0.84
0.86
0.88
0
5
10
15
20

1. Introduction to Statistical Inference

0.80 0.82 0.84 0.86
0.80
0.82
0.84
0.86
Coefficient of Restitution Normal QQ Plot ● ● ● ● ● ● ● ● ●
Coefficient of Restitution
Normal QQ Plot
0.80
0.82
0.84
0.86
Theoretical Quantiles
1
0
1

Sample Quantiles

23 / 53

Industrial Statistics

1. Introduction to Statistical Inference

The P –Value Approach

The p-value is equal to the probability to reject the null hypothesis for the given data set. Thus it is the smallest level of significance that would lead to rejection of the null hypothesis H 0 for the given data. It can be cosnidered as the observed significance level. For the Gauss test it holds that = P µ 0 ( T >t| T = t)=1 ( t) .

= 0.05

1-sided Gauss test

1

/

z

r

1

H 0

H 1

r

t

= 0 .0179

The smaller the p-value the more unlikely isH 0 .

If an upper bound for the type I error is given then H 0 is rejected if < . Else, H 0 is not rejected.

Industrial Statistics

1. Introduction to Statistical Inference

Relationship between tests of significance and confidence intervals

To illustrate the relationship suppose that X N ( µ, 2 ) . The sample variables are assumed to be independent and identically distributed. We are interested in statements about the expectation.

Supp ose that a confidence interval for µ with level 1 is given. Then

ci = »

¯

X t n 1;1 /2

S

p n ,

¯

X + t n 1;1 /2

n .

S

p

In order to deal with the test problem H 0 : µ = µ 0 against H 1 : µ 6= µ 0 , it is su cient to check whether µ 0 2 ci. If this is the case then H 1 is accepted. This procedure is equivalent to the t-test.

Supp ose that the t-test for the test problem H 0 : µ = µ 0 against H 1 : µ 6= µ 0 is given. Then the test statistic is equal to

Consequently

¯ X µ 0 T = p n . S
¯
X µ 0
T = p n
.
S

P ( | T | t n 1;1 /2 )=1 = P µ 0 ( µ 0 2 KI )

with ci as above. ci is a confidence interval for µ with confidence level 1 . Thus the

test directly provides a confidence interval.

25 / 53

Overview: Statistical Inference for a Single Sample

Overview: Statistical Inference for a Single Sample

Industrial Statistics

1. Introduction to Statistical Inference

1.4 Statistical Inference for Two Samples

Suppose that X 1 and X 2 are independent characteristics. Let X 11 , X 1 n 1 be a random sample

of X 1 and let X 21 , X 2 n 2 be a random sample of X 2 .

,

,

Two Independent Samples

Assumption: Suppose that X 11 , V ar ( X i ) = for i = 1 , 2 .

2

i

,

X 1 n 1 , X 21 ,

,

X 2 n 2 are independent. Let E ( X i ) = µ i and

Test on Di erence in Means

test problem: H 0 : µ 1 = µ 2 against H 1 : µ 1 6= µ 2

Test of Equality of the Variances

test problem: H 0 : 2 = against H 1 : 2 6=

1

2

2

1

2

2

Industrial Statistics

1. Introduction to Statistical Inference

Tests for the Means and the Variances of a Bivariate Sample I

distribution

H 0

H 1

test statistic T

 

T

under H 0

reject H 0 if

     

¯

¯

   

normal distr.

µ 1 6= µ 2

X 1

X

2

 

|T | > z / 2

µ 1 = µ 2

r

2

1

2

2

+ n 2

 
 

n

1

 

1 and 2

µ 1 µ 2

µ 1 > µ 2

 

T

>z

known

µ 1 µ 2

µ 1 < µ 2

T

< z

     

¯

¯

   
 

X 1

X

2

 

t n 1 +n 2 2

|T | > t n 1 +n 2 2;/ 2

1 , 2 unknown

µ 1 = µ 2

µ 1 6= µ 2

q ˜ 2 (

1

1 +

n

1

n

2 )

1 = 2 ,

µ 1 µ 2

µ 1 > µ 2

   

T

>t n 1 +n 2 2;

small sample size

µ 1 µ 2

µ 1 < µ 2

T

< t n 1 +n 2 2;

   

¯

¯

   

1 , 2 unknown

µ 1 6= µ 2

X 1

X

2

 

t df , df =

|T | > t df ;/ 2

µ 1 = µ 2

r

 

2

2

   

S

n

1

1

S

2

+ n 2

     

1 6= 2 ,

µ 1 µ 2

 

(1+R ) 2

T

µ 1 > µ 2

2 R 1 n 1 + n 2 1 1
2
R
1
n
1 + n 2 1
1
 

>t df ;

 

µ 1 µ 2

= n 2 S 2
R

2

2

1

n 1 S

T

µ 1 < µ 2

 

< t df ;

with S

2

k =

1

n 1

n

X

i=1

( X ki X k ) 2 , k = 1 , 2 ,

¯

˜ 2 =

n 1 1 n 1 +n 2 2 S

2

1

+

n 2 1 n 1 +n 2 2 S

2

2

28 / 53

Industrial Statistics

1. Introduction to Statistical Inference

Tests for the Means and the Variances of a Bivariate Sample II

distribution

arbitrary distr.

1 , 2 unknown,

large sample size

binomial distr.

large sample size

normal distr.

H 0

µ 1 = µ 2

µ 1 µ 2

µ 1 µ 2

p 1 = p 2

p 1

p

2

p 1 p 2

2 =

1

2

1

2

1

2

2

2

2

2

2

H 1

µ 1 6= µ 2

µ 1 > µ 2

µ 1 < µ 2

p 1 6= p 2

p 1 >

p

2

p 1 < p 2

2

1

2

1

2

1

6=

>

<

2

2

2

2

2

2

test statistic T

¯

X 1

¯

X 2

r

S

n

2

2

1 + n 2

2

1

S

pˆ

1 pˆ 2

q

pˆ(1

pˆ)(

1

1 +

n

1

2 )

n

with pˆ = n 1 n pˆ 1 1 +n +n 2 2 pˆ 2

S

2

1

S

2

2

T under H 0

reject H 0 if

|T | > z / 2

T

>z

T

< z

|T | > z / 2

T

>z

T

< z

F n 1 1,n 2 1

T

>F n 1 1,n 2 1;/ 2

or

T

<F n 1 1,n 2 1;1

T

>F n 1 1,n 2 1;

T

<F n 1 1,n 2 1;1

Industrial Statistics

1. Introduction to Statistical Inference

Example: arsenic in drinking water - drinking water arsenic concentration in parts per billion (ppb) for 10 metropolitan Phoenix communities and 10 communities in rural Arizona

Metro Phoenix

Rural Arizona

Phoenix, 3

Rimrock, 48

Chandler, 7

Goodyear, 44

Gilbert, 25

New River, 40

Glendale, 10

Apache Junction, 38

Mesa, 15

Buckeye, 33

Paradise Valley, 6 Nogales, 21

Peoria, 12 Scottsdale, 25 Tempe, 15 Sun City, 7

Black Canyon City, 20 Sedona, 12 Payson, 1 Casa Grande, 18

Canyon City, 20 Sedona, 12 Payson, 1 Casa Grande, 18 program: #--- arsenic concentration ---# MetroPhoenix

program:

#--- arsenic concentration ---#

MetroPhoenix <- c(3, 7, 25, 10, 15, 6, 12, 25, 15, 7); RuralArizona <- c(48, 44, 40, 38, 33, 21, 20, 12, 1, 18); ArsenicConcentration <- cbind(MetroPhoenix, RuralArizona);

program: #--- box plot ---# Industrial Statistics boxplot(ArsenicConcentration); #--- normal q-q plot ---# 1. Introduction

program:

#--- box plot ---#

Industrial Statistics

boxplot(ArsenicConcentration);

#--- normal q-q plot ---#

1. Introduction to Statistical Inference

qqnorm(MetroPhoenix, datax = TRUE, main = "Normal Q-Q Plot: Metro Phoenix"); qqline(MetroPhoenix, datax = TRUE);

qqnorm(RuralArizona, datax = TRUE, main = "Normal Q-Q Plot: Rural Arizona"); qqline(RuralArizona, datax = TRUE);

#--- test for variance homogeneity ---#

var.test(MetroPhoenix, RuralArizona, ratio = 1, conf.level = 0.95);

#--- unpaired t-test ---#

t.test(MetroPhoenix, RuralArizona, alternative = "two.sided", paired = FALSE, var.equal = FALSE, conf.level = 0.95);

1.5

1.0

Theoretical Quantiles

0.5

0.0

0.5

1.0

1.5

1.0 Theoretical Quantiles 0.5 0.0 − 0.5 − 1.0 − 1.5 output: Industrial Statistics 1. Introduction

output:

Industrial Statistics 1. Introduction to Statistical Inference MetroPhoenix RuralArizona 0 10 20 30 40
Industrial Statistics
1. Introduction to Statistical Inference
MetroPhoenix
RuralArizona
0
10
20
30
40
Normal Q − Q Plot: Metro Phoenix Normal Q − Q Plot: Rural Arizona ●
Normal Q − Q Plot: Metro Phoenix
Normal Q − Q Plot: Rural Arizona
5
10
15
20
25
0
10
20
30
40
Sample Quantiles
Sample Quantiles
Theoretical Quantiles
− 1.5
− 1.0
− 0.5
0.0
0.5
1.0
1.5

32 / 53

output: Industrial Statistics 1. Introduction to Statistical Inference F test to compare two variances data:

output:

Industrial Statistics

1. Introduction to Statistical Inference

F test to compare two variances

data: MetroPhoenix and RuralArizona