Вы находитесь на странице: 1из 22

ST102: Text for the gaps in lecture slides Lent term

Version: March 20, 2014


This document contains the text that has been omitted from the lecture slides and filled in
during the lectures. The slides are referred to by page number.
This document is updated after each section of material. The date of the latest update is under
the title above.
Slide 507: A scientific subject of collecting and making sense of data.
Slide 508: Statistics provides a way of answering these type of questions using data.
Slide 510: In both cases, the conclusion is drawn on a population (i.e. all the objects
concerned) based on the information from a sample (i.e. a subset of the population).
Slide 511: We do not know the entire population in practice.
A sample is a (randomly) selected subset of a population, and is known in practice.
Slide 512: The population is unknown. We represent a population by a probability
distribution.
... in other words, that the data are representative of the population.
Slide 515: Some basic descriptive statistics are n = 253, x
= 47.126, and s = 6.843.
x
s = (40.283, 53.969)
Also,
x
1.96s = (33.714, 60.538)
Therefore this suggests the data are N (
x, s2 )!
Slide 517: For a given problem, we typically assume a population to be a probability
distribution F (; ), where the form of distribution F is known (such as Normal,
Poisson etc.), and denotes some unknown characteristic (such as the mean, variance etc.)
and is called a parameter.
Slide 519: A sample of size n: {X1 , . . . , Xn }, is also called a random sample.
Furthermore, a sample is also viewed as n independent and identically distributed (IID)
random variables, when we assess the performance of a statistical method.
a good estimator for the unknown true lifetime ?
Slide 520: Is the sample mean X
concentrates closely around (unknown) , X
is a good estimator
If the distribution of X
for .
Any known function of a random sample is called a statistic.
1

Slide 521: They are seen as a realisation of n IID random variables X1 , . . . , Xn .


Slide 523: Probability: A mathematical subject
Statistics: An application-oriented subject (which uses probability heavily)
Slide 529: Such a b is called a point estimator for .
Slide 530: Since is the mean of the population, a natural estimator would be the sample

mean
b=X
The value 11.2 is an estimate for .
Slide 531: denote the k-th population moment, k = 1, 2, . . .. Then k depends on the
unknown parameter , as everything else about the distribution F (; ) is known. Denote the
k-th sample moment by
The MM estimator (MME) b for is the solution of the p equations
b = Mk ,
k ()

k = 1, . . . , p.

Slide 532: Let

=
1 = M1 ,

2 = M2 =

1X 2
Xi .
n
i=1

Slide 533: We call E(b


2 ) 2 = 2 /n the estimation bias.
...and it has zero bias, i.e. it is an unbiased estimator.
Slide 534: A useful formula for computation:
1
S2 =
n1

n
X

!
2
Xi2 nX

i=1

...converges to
Slide 538: where X(1) , . . . , X(n) are the order statistics obtained by rearranging X1 , . . . , Xn
into ascending order
Slide 539: A good estimator would make |b | as small as possible.
The Mean Squared Error (MSE) of b is defined as
h
i
b = E (b )2 ,
MSE()
and the Mean Absolute Deviation (MAD) of b is defined as
h
i
b = E |b | .
MAD()
2

 
Slide 540: If E b2 < , it holds that
 
  h
 i2
+ Var b ,
MSE b = Bias b
 
 
b
where the bias is Bias = E b .
b = 0, b is an unbiased estimator for .
Slide 541: If Bias()
Slide 542: We compute the bias and variance separately.
!
n
n
n
1X
1X
1X
Xi =
E(Xi ) =
= .
E(b
) = E
n
n
n
i=1

i=1

i=1

= 2 /n.
Slide 543: Hence, MSE(b
) = MSE(X)
is a better estimator than X1 as
Slide 544:
b=X
MSE(b
) =

2
< MSE(X1 ) = 2 .
n

Such an estimator is called a (mean-square) consistent estimator.


Slide 545: . . . it holds that
= ,
E(X)

= 2 /n.
Var(X)

Slide 547: Least squares estimator (LSE):


= min

b=X
a

n
X

(Xi a)2 .

i=1

Slide 548: Gauge estimation errors: Standard error


By the Central Limit Theorem, as n ,


X
x (x),
P
/ n

Slide 549: This gives an approximation

| 1.96S/ n) 0.95.
P (|X
3

The standard error is


#1/2
"
n
X
S
1
= =
2
S.E.(X)
(Xi X)
.
n(n 1)
n
i=1

Slide 552: An alternative way to estimate may be to minimise the sum of absolute deviations:
min
a

n
X

|Xi a|.

i=1

Slide 553: Suppose we toss a coin 10 times, and record the number of Heads as a r.v. X.
Then
X Bin(10; )
Slide 554: Maximising L() is equivalent to maximising
l() = log(L()) = 8 log + 2 log(1 ) + c,
where c is a constant. Let dl()/d = 0, we obtain the MLE
b = 0.8.
Slide 556: Then the maximum likelihood estimator (MLE) for based on the observations
X1 , . . . , Xn is defined as the b for which
b = max f (X1 , . . . , Xn ; ).
f (X1 , . . . , Xn ; )

i. The MLE defined above depends on the observations X1 , . . . , Xn only:


b 1 , . . . , Xn )
b = (X
i.e. b is a statistic.
ii. If {X1 , . . . , Xn } is a random sample from a population with probability density function
f (x; ), the joint probability density function for (X1 , . . . , Xn ) is
n
Y

f (xi ; ).

i=1

Slide 557: The joint pdf is

n
Q

f (xi ; ).

i=1

Slide 558: The likelihood function is L() =

n
Q

f (Xi ; ).

i=1

b = max L().
ii. The MLE: b = max L(), i.e. L()

iii. It is often more convenient to use the log-likelihood function:


l() = log L() =

n
X

log(f (Xi ; )),

i=1

Slide 559: as it transforms a product in L() into a sum. Note


b = max l(),

or

b = max l()
l()

Slide 561: Log-likelihood function:


n
n
1 X
l(, 2 ) = log( 2 ) 2
(Xi )2 c,
2
2
i=1

where c = (n/2) log(2).


It follows from the Lemma below that
b2 =

n
P

2 /n.
(Xi X)

i=1

Slide 562: Since = (, ), by the invariance principle the MLE for is


1
n

b = (b
,
b) =

n
P

Xi

b
i=1
=s
.

b
n
P
1
2
(Xi X)
n
i=1

Slide 563: Likelihood function: L() = 2n exp[

n
Q

Pn

i=1 Xi ]

Xi

i=1

n
Q Xi .
= 2n exp[nX]
i=1

Slide 564:
L() =

n
Y

p(Xi ; ) = p(1; )n1 p(2; )n2 p(3; )nn1 n2

i=1

2n1

(2(1 ))n2 (1 )2(nn1 n2 ) 2n1 +n2 (1 )2n2n1 n2 .

Slide 566: where I() is the Fisher information defined as


Z
I() =

f (x; )

2 log f (x; )
dx.
2

Slide 567: Therefore

Z
I() =

1
1
f (x; ) dx = 2 .
2

Slide 568: Therefore

1 X
1
1
I() = 2
xp(x; ) = 2 E(X1 ) = .

x=0

Slide 572: Point estimation is simple but not informative enough, since a point estimator is
always subject to errors.
An intuitive guess: For estimating the population mean,
k S.E.(X),

L=X

+ k S.E.(X),

U =X

Slide 573: Then, typically, the coverage probability


P (L(X1 , . . . , Xn ) < < U (X1 , . . . , Xn )) < 1.

Slide 574: Therefore the interval covering with probability 0.95 is

1.96/ n, X
+ 1.96/ n),
(X

0.98, X
+
Slide 575: Answer: (1.27, 3.23) is one realisation of the random interval (X
0.98) which covers with probability 0.95.
Slide 576:

|/ 1.645)
0.90 = P ( n|X

1.645/ n < < X


+ 1.645/ n),
P (X

|/ 1.960)
0.95 = P ( n|X

+ 1.960/ n),
1.960/ n < < X
P (X

|/ 2.576)
0.99 = P ( n|X

2.576/ n < < X


+ 2.576/ n).
P (X
Slide 578: leading to the confidence interval for of the form

kS/ n, X
+ kS/ n),
(X
where k is a constant determined by the confidence level and also by the distribution of the
statistic

X
.
S/ n
6

Slide 579: Therefore, we have an approximate 95% confidence interval for :

1.96 S/ n, X
+ 1.96 S/ n),
(X

Slide 581: This leads to the following approximate 95% confidence interval for :


1/2

1/2 
b
b
b
b
1.96 nI()
, + 1.96 nI()
.

Slide 582: Let Z1 , . . . , Zk be independent N (0, 1) r.v.s. Let


X=

Z12

+ +

Zk2

k
X

Zi2 .

i=1

Slide 584: where all Zi s are independent N (0, 1) r.v.s. Hence


X1 + X2 =

k+p
X

Zi2 2k+p .

i=1

Slide 588: Hence

n
1 X
(Yi )2 2n .
2
i=1

Slide 589:

n
1 X
(Yi Y )2 2n1 .
2
i=1

Thus decomposition (1) is an instance of the relationship

2n = 2n1 + 21 .
Slide 590: Hence a 100(1 )% confidence interval for 2 is
(M/k2 , M/k1 ).

Slide 593: Then the distribution of the random variable


Z
T =p
X/k
7

Slide 594: As k , the distribution of tk converges to N (0, 1).


Slide 596: The table below lists some values of c defined by the equation
P (tk > c ) =

and S 2 are independent, and therefore


Slide 597: X



n(X )/
X
X
=
p
=
tn1 .
S/ n
E.S.E.(X)
(n 1)S 2 /(n 1) 2
Slide 602: Based on the data, a (statistical) test is to make a binary decision on a hypothesis,
denoted by H0 :
Reject H0 or Not reject H0 .
Slide 603: ...this particular hypothesis can be formally represented as
H0 : = 0.5.

Slide 604: Again statistical estimation cannot provide a firm answer, due to random
fluctuations between different samples.
Slide 606: The observed value x
= 17 is one standard deviation away from , and may be
regarded as a typical observation from the distribution.
Slide 607: The observed value x
= 17 begins to look a bit extreme, as it is two standard
deviations away from .
Slide 608: The observed value x
= 17 is very extreme, as it is three standard deviations
away from .
Slide 609: This probability is called the p-value.
Slide 610: But this does not imply that this hypothesis is necessarily true, as, for example,
= 17 or 18 are at least as likely as = 16.
Not Reject 6= Accept
Slide 611: P -values may be seen as a risk measure of rejecting H0 .
Slide 612: where 0 is a fixed value, 1 is a set, and 0 6 1 .
H0 is called the null hypothesis.
8

H1 is called the alternative hypothesis.


Decision: Reject H0 if p-value .
Slide 614: We are interested in testing
H0 : = 0.5

v.

H1 : 6= 0.5.

Slide 615: Under H0 , E(Y ) = n0 = 10. Hence 3 is as extreme...


Slide 616: Hence the test statistic may be defined as
Slide 618: For a given significance level , we may find the critical value c such that
P0 (|T | > c ) = .
Slide 619: With a given significance level , the critical value c should be chosen such that
= P0 (T c) = P (N (0, 1) c).

Slide 621: The distribution of a test statistic under H0 must be known in order to calculate
p-values or critical values.
Slide 622: The test statistic is then the famous t statistic:

.
0
n(X 0 )
X
0 )
= n(X
T =
=
S
S/ n

1 X
2
(Xi X)
n1
i=1

Slide 623: Under H0 , T tn1 . Hence...


Slide 627: Identify a critical region C such that
PH0 ( T C ) = .
Slide 628:

True state
of nature

H0
H1

Decision made
H0 not rejected
H0 rejected
Correct decision Type I error
Type II error Correct decision

Slide 629: Power: The power function of the test is defined as


() = P (H0 is rejected),
i.e. () = 1 Probability of a Type II error.
9

1 ,

!1/2
.

Slide 631: Let S 2 =

n
P

2 /(n 1). Then (n 1)S 2 / 2 2 . Under H0 ,


(Xi X)
n1

i=1

(n 1)S 2
=
T =
02

n
P

2
(Xi X)

i=1

02

2n1 .

Slide 632:
() = P (H0 is rejected) = P (T > 2, n1 )




(n 1)S 2
(n 1)S 2
02 2
2
= P
> , n1 = P
> 2 , n1
2

02


2

= P 2n1 > 02 2, n1 ,

Slide 633: Clearly, () % as 2 %.


Slide 634: For a two-sided alternative H1 : 2 6= 02 , we should reject H0 if
t 2/2, n1

t 21/2, n1 ,

or

Slide 635:

Null hypothesis, H0

= 0

2 = 02

X
0
/ n

X
0
S/ n

(n1)S 2
02

N (0, 1)

tn1

2n1

= 0
( 2 known)

Test statistic, T
Distribution of T
under H0

Slide 636: We are interested in testing the hypothesis


H0 : X = Y .

(1)

Slide 637: Let Zi = Xi Yi , i = 1, . . . , n. Then {Z1 , . . . , Zn } is a random sample from the


population N (, 2 ), where
2
2 = X
+ Y2 .

= X Y ,

10

Slide 638: Hypothesis (1) can be expressed as


H0 : = 0.

Slide 639:
() = P (H0 is rejected) = P (T > t, n1 )



n(Z )
n

> t, n1
= P ( nZ/S > t, n1 ) = P
S
S

= P (tn1 > t, n1 n/S).


Slide 640:
samples...

Let

{X1 , . . . , Xn }

and

{Y1 , . . . , Ym }

be

two

independent

random

Slide 641:
q 2
Y (X Y ))
X /n + Y2 /m
(X
q

2 / 2 + (m 1)S 2 / 2 /(n + m 2)
(n 1)SX
X
Y
Y
s
Y (X Y )
X
n+m2
=
q
tn+m2 .
1/n + 1/m
(n 1)S 2 + (m 1)S 2
X

Slide 642: A 100(1 )% confidence interval for (X Y ) is




q
q
Y + z/2 2 /n + 2 /m .
Y z/2 2 /n + 2 /m, X
X
X
Y
X
Y

Slide 643: A 100(1 )% confidence interval for (X Y ) is


Y t/2, n+m2
X


1/n + 1/m
2
(n 1)SX
+ (m 1)SY2
n+m2

1/2
.

Slide 646: i. Zero is not in the confidence interval for X Y .


Slide 647: Hence, we reject H0 when |t| > 1.96 at the 5% significance level, where
q 2
Y )
T = (X
X /100 + Y2 /100.
Slide 648: Hence we reject H0 if |t| > t0.025, 198 = 1.97 where
Slide 649: Different methods lead to different but not contradictory conclusions, as
Not reject 6= Accept.
11

Slide 651: measures only the linear relationship between X and Y . When = 0, X and
Y are linearly independent.
...as there may exist some non-linear relationship between X and Y .
Slide 652: Given pairwise observations (Xi , Yi ), i = 1, . . . , n, a natural estimator for is
defined as
n
P
i Y )
(Xi X)(Y
i=1
b =
!1/2 ,
n
n
P
P
2
(Xi X)
(Yj Y )2
i=1

j=1

Slide 656: It may be proved that under H0 the statistic


r
n2
T = b
tn2 .
1 b2

Slide 657: Two r.v.s X and Y are jointly Normal if aX + bY is Normal for any constants
a, b.
Slide 658: Then the distribution of
X=

U/p
V /k

Slide 659: Therefore,


2
2
Y2 SX
S 2 /X
2 = X
Fn1, m1 .
2
2
X SY
SY /Y2

2 is
Slide 660: ...a 100(1 )% confidence interval for Y2 /X
 2
 2
F1/2, n1, m1 SY2 SX
, F/2, n1, m1 SY2 SX
.

Slide 663: We measure the risks in terms of variances, and test


2
H0 : X
= Y2

v.

12

2
H1 : X
> Y2 .

Slide 664:

H0

X Y =
2 ,
(X

Y2

2
(X

known)

Y
X
2 /n+ 2 /m
X
Y

Dist.

X Y =

n+m2
1/n+1/m

N (0, 1)

Y2

=0

2
Y
2
X

=k

(n = m)

unknown)

Y
X
2 +(m1)S 2
(n1)SX
Y

tn+m2

S2

q
n2
b 1b
2

k SX2

tn2

Fn1, m1

Slide 670: Therefore the methods are more robust as they are free from any distributional
assumptions. But they are less powerful when applied to Normal distributions.
Slide 671: We are interested in testing
H0 : = 0

v.

H1 : > 0

(or < 0 ).

Slide 672: Intuitively, Y % as %. Hence H0 is rejected for...


Then the p-value is
p = P0 (Y y) =

n
X
i=y

n!
i (1 0 )ni .
i!(n i)! 0

Slide 673: Is there significant evidence that this type of policy offers a poor financial return?
We cannot assume a Normal distribution!
Slide 674: ...under fair play, i.e. the median =3.5.
H0 : median = 3.5

v.

H1 : median < 3.5.

The hypotheses concerned may be equivalently expressed as


H0 : = 0.5

Slide 675: Let Y =

10
P

v.

H1 : < 0.5.

Xi . Then Y Bin(10, ).

i=1

Slide 676: Hence E(Xi ) = and Var(Xi ) = (1 ). By the CLT,


n

X
Y n
1
p
=p
(Xi E(Xi )) N (0, 1)
n(1 )
n(1 ) i=1
13

Slide 677:

v.

H0 :

Audiences are indifferent about the two endings

H1 :

Audiences prefer the happy ending.

Normal distribution is not applicable!


Slide 678: If we read + as 1, and as 0, these are binary data. Let
Y = No. of +s .
Then Y Bin(8, ), where = P (audience prefers happy ending). We are testing
H0 : = 0.5,

v.

H1 : > 0.5.

Slide 679: Have we used all the information available to us?


Slide 681: Wilcoxon signed-rank test statistic:
T = sum of the negative ranks.
Slide 683: For large n,
P (T t) = P
P

!
t E(T )
T E(T )
p
p
Var(T )
Var(T )
!
t E(T )
N (0, 1) p
.
Var(T )

Slide 684: ...if a given distribution (such as N (, 2 ), Bin(n, ) etc.) fits the data well.
Slide 685: where
Oj = No. of Xi s equal to j,

j = 1, . . . , k,

Slide 686: Under H0 , the expected no. of observations in the j-th category is
Ej = npj0 ,

j = 1, . . . , k.

If H0 is true, we expect Oj Ej = npj0 when n is large.


Slide 687: Test statistic: T =

k
P


(Oj Ej )2 Ej .

j=1

It is important that all Ei 5 (at least).


14

Slide 688: The underlying distribution has three categories: 1 A, 2 B, and 3 Others,
and the null hypothesis is
H0 : p1 = 0.45, p2 = 0.4, p3 = 0.15.

Slide 690: The expected frequencies are then calculated using


b
Ei = npi (),

i = 1, . . . , k.

Slide 691: The log-likelihood function: l() = log()

100
P

Xi 100.

i=1

b = 1.81, the expected frequencies are


Slide 692: With
i
b = 100 (1.81) e1.81 ,
Ei = n pi ()
i!

i = 0, 1, . . . .

Slide 693: In the latter case, we replace the unknown parameters by their MLEs.
Slide 694: We test instead a simplified null hypothesis
H0 : pi = F0 (ai ) F0 (ai1 ), i = 1, . . . , k.
Slide 696: Certainly the most frequent assumption made in statistics, either explicitly or
implicitly, is that data are drawn from a Normal distribution N (, 2 ).
Slide 698: The expected frequencies in all the intervals are n 10% = 5.
Slide 699: Therefore there is no significant evidence to reject the Normal distribution at
the 5% significance level.
Slide 701: If we subtract 65.5 from all the observations, there are 25 negative points and 25
positive points. Let
Y = No. of positive points.
Slide 702: Since we observe y = 25, the p-value is
P (Y 25) = P

Y 50 0.5

25 50 0.5

p
p
50 0.5(1 0.5)
50 0.5(1 0.5)

P (N (0, 1) 0) = 0.5.
Slide 704: The p-value is
!
T E(T )
473 E(T )
P (T 473) = P p
p
Var(T )
Var(T )


473 637.5
P N (0, 1)
= P (N (0, 1) 1.512888) = 0.0561.
103.59
15

Slide 706: X and Y are independent iff


pij = pi pj

for i = 1, . . . , r and j = 1, . . . , c.

Slide 708: The goodness-of-fit test statistic is defined as


T =

r X
c
X


(Oij Eij )2 Eij .

i=1 j=1

Slide 709: For testing independence, it always holds that


Oi Ei = 0

and

Oj Ej = 0.

Slide 711: ...there is significant evidence from the data indicating that beer preference and
the gender of beer drinker are not independent.
Slide 713: We are interested in testing the hypothesis
H0 : p11 = p12 = p13 .

Slide 715: This is different from the independence tests, now we test for equal binomial
probabilities using independent samples.
Slide 717: We are asked to test the hypothesis H0 : p11 = p22 .
Slide 718: The treatment above is still valid as conditional inference, conditionally on
O1 = 200 and O2 = 100.
Slide 719: First we may test for independence: H0 : pij = pi pj for all i, j.
Slide 720: Under H0 ,
2 X
3
X
(Oij Eij )2
2(31)(21) = 22 .
T =
Eij
i=1 j=1

Slide 721:
the 7am shift produces the best quality, as
O11 E11 >> 0, and O12 E12 << 0,
the 11pm shift is the worst, as
O31 E31 << 0, and O32 E32 >> 0.
Slide 722: The quality of the 3pm shift is between the other two. Looking at the
three together, the differences become less significant.
16

Slide 727: It aims to model an explicit relationship between one dependent variable, often
denoted as y, and one or several regressors (also called covariates, or independent variables)
Slide 728: ...a straight line through the middle of the data points:
y = 0 + 1 x + ,
For a given population x, the predicted sales are yb = 0 + 1 x.
Slide 729: ...through the middle of the data cloud:
y = 0 + 1 x + ,

For a given height x, the predicted value yb = 0 + 1 x may be viewed as a kind of standard
weight.
Slide 730: How to draw a line through data clouds (i.e. to estimate 0 and 1 )?
Slide 731: Parameters in the model: 0 , 1 and 2 .
Treating x1 , . . . , xn as constants, we have
Var(yi ) = 2 ,

E(yi ) = 0 + 1 xi ,
Slide 732:


b0 , b1 are the values of (0 , 1 ) at which the function
n
X

L(0 , 1 ) =

(yi 0 1 xi )2

i=1

obtains its minimum.


Theorem: b0 = y b1 x
, and
b1 =

n
X

n
X
(xi x
)(yi y)
(xi x
)2 .

i=1

i=1

Slide 733: This leads to


n
X

yi n0 1

i=1

n
X

xi = 0,

or 0 = y 1 x
.

i=1

This leads to
0 =

n
X

xi (yi 0 1 xi ) =

i=1

n
X
i=1

n
X

xi (yi y (1 xi 1 x
))

i=1

xi (yi y) 1

n
X

xi (xi x
).

i=1

17

Slide 734: since

n
X

n
X
c(yi y) = c
(yi y) = 0.

i=1

i=1

Slide 735:
L(0 , 1 ) =


n 
X


 2
yi b0 b1 xi + b0 0 + b1 1 xi

i=1
n 
X

= L b0 , b1 +

b0 0 + (b1 1 )xi

2

+ 2B,

i=1

Slide 736:
n 

 X



 2
L (0 , 1 ) = L b0 , b1 +
b0 0 + b1 1 xi L b0 , b1 .
i=1

Slide 737: Therefore,


n
P

b1 =

i=1
n
P

n
P

xi (yi y)
=

(xi x
)(yi y)

i=1

xi (xi x
)

i=1

n
P

.
(xi

x
)2

i=1

Slide 738: Proposition: b0 and b1 are unbiased estimators, i.e.


 
 
E b0 = 0 ,
E b1 = 1 .

Slide 739: Now

 


E b0 = E y b1 x
= 0 + 1 x
1 x
= 0 .

Slide 740: To work out the variances, the key is to write b1 and b0 as linear estimators
(i.e. linear combinations of yi )
Slide 743: To calculate b1 numerically, use the formula:
n
P

b1 =

xi yi n
xy

i=1
n
P
i=1

.
x2i n
x2

Slide 745: Residuals:


bi = yi ybi
= yi b0 b1 xi ,
18

i = 1, . . . , n.

This is the basic idea for diagnostic checking of the model.


Slide 746: The model is linear in 0 and 1 , when x may be replaced by...
Slide 750: Plot residuals bi against xi : validating a model assumption.
Slide 751: The LSEs b0 and b1 are the best linear unbiased estimators for, respectively,
0 and 1 .
where ci , di are some constants, it holds that
 
 
Var e0 Var b0
and

 
 
Var e1 Var b1

Slides 752: Then y1 , . . . , yn are independent (but not identically distributed), and
yi N (0 + 1 xi , 2 ).
Slide 753: In practice, we replace 2 by
n

2
1 X

b =
yi b0 b1 xi
n2
2

i=1

Slide 754: Theorem:


i. (n 2)b
2 / 2 =

n 
P

yi b0 b1 xi

i=1

2

/ 2 2n2 .



 
ii. b0 and
b2 are independent, hence b0 0 /S.E. b0 tn2 .


 
iii. b1 and
b2 are independent, hence b1 1 /S.E. b1 tn2 .
Slide 755: To validate the use of the regression model, we need to make sure that 1 6= 0, or
more practically b1 is significantly non-zero.
Under H0 ,
T =

b1
  tn2 .
S.E. b1

Slide 758: Therefore, for any 0 , 1 and 2 > 0,





l 0 , 1 , 2 l b0 , b1 , 2 .
Slide 759: The MLE
e2 is a biased estimator for 2 .
Slide 762: The magnitude of b1 itself is not important...
19

Slide 763: Note

bi
i

= 0.

Slide 766: In case x


= 0, b0 and b1 are uncorrelated.
Slide 772: We have the ANOVA decomposition:
n
n
n 
2
X
X
X
2
2
2
b
(yi y) =
1 (xi x
) +
yi b0 b1 xi .
i=1

i=1

i=1

Slide 773:
Total SS (Sum of Squares):

n
P

(yi y)2 2 2n1 .

i=1

Regression SS:

n
P
b12 (xi x
)2 2 21 .

i=1

Residual (error) SS:

n 
P

yi b0 b1 xi

i=1

2

2 2n2 .

Slide 776: If we view Total SS as the total variation (or energy) in y, R2 is the percentage
of the total variation explained by x.
Slide 782: For the analysis to be more informative, we would like to have some error bars
for our prediction.
Slide 783: Theorem:
b(x) is normally distributed with mean (x) and variance

Var (b
(x)) =

n
P

(xi x)2

i=1

n
n P

(xj x
)2

j=1

Slide 787: It does not cover y with probability 1 .


Slide 788: We may assume that the y to be predicted is independent from y1 , . . . , yn used
in estimation.
Therefore
(y
b(x))

b2

!!1/2
Pn
2
(x

x)
i
1 + Pi=1
tn2 .
n nj=1 (xj x
)2

Slide 789: The predictive interval for y is longer than the confidence interval for E(y).
Slide 790: For the first task, a predictive interval would be more appropriate. For the
second task, he needs to know the average price and, therefore, a confidence interval.
20

Slide 791: Because predicting the selling price for one car is more difficult, the corresponding
interval is wider.
Slide 793: ...are (approximately) independent and Normal with constant variance 2 .
Slide 794: ...it is very likely that at least one assumption is violated!
Slide 800: To mitigate the impact of both outliers and influential observations, we could use
robust regression, i.e. estimate 0 and 1 by minimising the sum of absolute deviations:
SAD(0 , 1 ) =

n
X

|yi 0 1 xi |

i=1

Slide 801: Stock returns:


Current price Previous price
Return =
log
Previous price

current price
previous price

Slide 802: Daily prices are definitely not independent. However, daily returns may be seen
as a sequence of uncorrelated random variables.
Slide 804: There is clear synchronisation between the movements of the two series of returns.
Slide 807: The null hypothesis H0 : 1 = 0 is rejected with p-value 0.000: extremely
significant.
There are many standardised residual values 2 or 2, indicating a non-Normal error
distribution.
Slide 811: ... the residual distribution has heavier tails than N (0, 2 ).
Slide 813: Parameters in the model: 0 , 1 , . . . , p and 2 .
Slide 814: The LSEs b0 , b1 , . . . , bp are obtained by minimising
n
X

yi 0

i=1

p
X

2
xij j .

j=1

Slide 815: Testing for all zero regression coefficients:


H0 : 1 = = p = 0.
Slide 816: The effect of changing xj on y, holding other xs fixed this is unfortunately
not always practical.
Slide 837: Special care should be exercised when predicting with x out of the range of the
observations used to fit the model, which is called extrapolation.
21

Slide 844: We need to test the hypothesis


H0 : 1 = 2 = 3 .
Slide 845: i.e. all of them should be close to the overall mean
= (X
1 + X
2 + X
3 )/3 = (79 + 74 + 66)/3 = 73,
X
Slide 847: k independent samples available from k Normal distributions N (j , 2 )
Test the hypothesis H0 : 1 = = k (against the alternative H1 : not all j are the same).
Slide 848: Between-treatments variation: B =

k
P

j X)
2 , with k 1 degrees of
nj (X

j=1

freedom.
Within-treatments variation: W =

nj
k P
P

j )2 , with. . .
(Xij X

j=1 i=1

Slide 849: B and W are also called, respectively, between-groups variation and
within-groups variation.
Slide 856: When k = 2, ANOVA is effectively a t test.
Slide 868: The variation of Xij is driven by a treatment factor at different levels 1 , . . . , k ,
in addition to random fluctuations (i.e. random errors).
: average effect
j : treatment effect at the j-th level.
Slide 869:
represents the average effect
1 , . . . , c represent c different treatment (column) levels
1 , . . . , r represent r different block (row) levels
ij N (0, 2 ) and are independent.

22

Вам также может понравиться