Академический Документы
Профессиональный Документы
Культура Документы
mean
b=X
The value 11.2 is an estimate for .
Slide 531: denote the k-th population moment, k = 1, 2, . . .. Then k depends on the
unknown parameter , as everything else about the distribution F (; ) is known. Denote the
k-th sample moment by
The MM estimator (MME) b for is the solution of the p equations
b = Mk ,
k ()
k = 1, . . . , p.
=
1 = M1 ,
2 = M2 =
1X 2
Xi .
n
i=1
n
X
!
2
Xi2 nX
i=1
...converges to
Slide 538: where X(1) , . . . , X(n) are the order statistics obtained by rearranging X1 , . . . , Xn
into ascending order
Slide 539: A good estimator would make |b | as small as possible.
The Mean Squared Error (MSE) of b is defined as
h
i
b = E (b )2 ,
MSE()
and the Mean Absolute Deviation (MAD) of b is defined as
h
i
b = E |b | .
MAD()
2
Slide 540: If E b2 < , it holds that
h
i2
+ Var b ,
MSE b = Bias b
b
where the bias is Bias = E b .
b = 0, b is an unbiased estimator for .
Slide 541: If Bias()
Slide 542: We compute the bias and variance separately.
!
n
n
n
1X
1X
1X
Xi =
E(Xi ) =
= .
E(b
) = E
n
n
n
i=1
i=1
i=1
= 2 /n.
Slide 543: Hence, MSE(b
) = MSE(X)
is a better estimator than X1 as
Slide 544:
b=X
MSE(b
) =
2
< MSE(X1 ) = 2 .
n
= 2 /n.
Var(X)
b=X
a
n
X
(Xi a)2 .
i=1
| 1.96S/ n) 0.95.
P (|X
3
Slide 552: An alternative way to estimate may be to minimise the sum of absolute deviations:
min
a
n
X
|Xi a|.
i=1
Slide 553: Suppose we toss a coin 10 times, and record the number of Heads as a r.v. X.
Then
X Bin(10; )
Slide 554: Maximising L() is equivalent to maximising
l() = log(L()) = 8 log + 2 log(1 ) + c,
where c is a constant. Let dl()/d = 0, we obtain the MLE
b = 0.8.
Slide 556: Then the maximum likelihood estimator (MLE) for based on the observations
X1 , . . . , Xn is defined as the b for which
b = max f (X1 , . . . , Xn ; ).
f (X1 , . . . , Xn ; )
f (xi ; ).
i=1
n
Q
f (xi ; ).
i=1
n
Q
f (Xi ; ).
i=1
b = max L().
ii. The MLE: b = max L(), i.e. L()
n
X
i=1
or
b = max l()
l()
n
P
2 /n.
(Xi X)
i=1
b = (b
,
b) =
n
P
Xi
b
i=1
=s
.
b
n
P
1
2
(Xi X)
n
i=1
n
Q
Pn
i=1 Xi ]
Xi
i=1
n
Q Xi .
= 2n exp[nX]
i=1
Slide 564:
L() =
n
Y
i=1
2n1
f (x; )
2 log f (x; )
dx.
2
Z
I() =
1
1
f (x; ) dx = 2 .
2
1 X
1
1
I() = 2
xp(x; ) = 2 E(X1 ) = .
x=0
Slide 572: Point estimation is simple but not informative enough, since a point estimator is
always subject to errors.
An intuitive guess: For estimating the population mean,
k S.E.(X),
L=X
+ k S.E.(X),
U =X
1.96/ n, X
+ 1.96/ n),
(X
0.98, X
+
Slide 575: Answer: (1.27, 3.23) is one realisation of the random interval (X
0.98) which covers with probability 0.95.
Slide 576:
|/ 1.645)
0.90 = P ( n|X
|/ 1.960)
0.95 = P ( n|X
+ 1.960/ n),
1.960/ n < < X
P (X
|/ 2.576)
0.99 = P ( n|X
kS/ n, X
+ kS/ n),
(X
where k is a constant determined by the confidence level and also by the distribution of the
statistic
X
.
S/ n
6
1.96 S/ n, X
+ 1.96 S/ n),
(X
Slide 581: This leads to the following approximate 95% confidence interval for :
1/2
1/2
b
b
b
b
1.96 nI()
, + 1.96 nI()
.
Z12
+ +
Zk2
k
X
Zi2 .
i=1
k+p
X
Zi2 2k+p .
i=1
n
1 X
(Yi )2 2n .
2
i=1
Slide 589:
n
1 X
(Yi Y )2 2n1 .
2
i=1
2n = 2n1 + 21 .
Slide 590: Hence a 100(1 )% confidence interval for 2 is
(M/k2 , M/k1 ).
Slide 604: Again statistical estimation cannot provide a firm answer, due to random
fluctuations between different samples.
Slide 606: The observed value x
= 17 is one standard deviation away from , and may be
regarded as a typical observation from the distribution.
Slide 607: The observed value x
= 17 begins to look a bit extreme, as it is two standard
deviations away from .
Slide 608: The observed value x
= 17 is very extreme, as it is three standard deviations
away from .
Slide 609: This probability is called the p-value.
Slide 610: But this does not imply that this hypothesis is necessarily true, as, for example,
= 17 or 18 are at least as likely as = 16.
Not Reject 6= Accept
Slide 611: P -values may be seen as a risk measure of rejecting H0 .
Slide 612: where 0 is a fixed value, 1 is a set, and 0 6 1 .
H0 is called the null hypothesis.
8
v.
H1 : 6= 0.5.
Slide 621: The distribution of a test statistic under H0 must be known in order to calculate
p-values or critical values.
Slide 622: The test statistic is then the famous t statistic:
.
0
n(X 0 )
X
0 )
= n(X
T =
=
S
S/ n
1 X
2
(Xi X)
n1
i=1
True state
of nature
H0
H1
Decision made
H0 not rejected
H0 rejected
Correct decision Type I error
Type II error Correct decision
1 ,
!1/2
.
n
P
i=1
(n 1)S 2
=
T =
02
n
P
2
(Xi X)
i=1
02
2n1 .
Slide 632:
() = P (H0 is rejected) = P (T > 2, n1 )
(n 1)S 2
(n 1)S 2
02 2
2
= P
> , n1 = P
> 2 , n1
2
02
2
= P 2n1 > 02 2, n1 ,
t 21/2, n1 ,
or
Slide 635:
Null hypothesis, H0
= 0
2 = 02
X
0
/ n
X
0
S/ n
(n1)S 2
02
N (0, 1)
tn1
2n1
= 0
( 2 known)
Test statistic, T
Distribution of T
under H0
(1)
= X Y ,
10
Slide 639:
() = P (H0 is rejected) = P (T > t, n1 )
n(Z )
n
> t, n1
= P ( nZ/S > t, n1 ) = P
S
S
Let
{X1 , . . . , Xn }
and
{Y1 , . . . , Ym }
be
two
independent
random
Slide 641:
q 2
Y (X Y ))
X /n + Y2 /m
(X
q
2 / 2 + (m 1)S 2 / 2 /(n + m 2)
(n 1)SX
X
Y
Y
s
Y (X Y )
X
n+m2
=
q
tn+m2 .
1/n + 1/m
(n 1)S 2 + (m 1)S 2
X
1/n + 1/m
2
(n 1)SX
+ (m 1)SY2
n+m2
1/2
.
Slide 651: measures only the linear relationship between X and Y . When = 0, X and
Y are linearly independent.
...as there may exist some non-linear relationship between X and Y .
Slide 652: Given pairwise observations (Xi , Yi ), i = 1, . . . , n, a natural estimator for is
defined as
n
P
i Y )
(Xi X)(Y
i=1
b =
!1/2 ,
n
n
P
P
2
(Xi X)
(Yj Y )2
i=1
j=1
Slide 657: Two r.v.s X and Y are jointly Normal if aX + bY is Normal for any constants
a, b.
Slide 658: Then the distribution of
X=
U/p
V /k
2 is
Slide 660: ...a 100(1 )% confidence interval for Y2 /X
2
2
F1/2, n1, m1 SY2 SX
, F/2, n1, m1 SY2 SX
.
v.
12
2
H1 : X
> Y2 .
Slide 664:
H0
X Y =
2 ,
(X
Y2
2
(X
known)
Y
X
2 /n+ 2 /m
X
Y
Dist.
X Y =
n+m2
1/n+1/m
N (0, 1)
Y2
=0
2
Y
2
X
=k
(n = m)
unknown)
Y
X
2 +(m1)S 2
(n1)SX
Y
tn+m2
S2
q
n2
b 1b
2
k SX2
tn2
Fn1, m1
Slide 670: Therefore the methods are more robust as they are free from any distributional
assumptions. But they are less powerful when applied to Normal distributions.
Slide 671: We are interested in testing
H0 : = 0
v.
H1 : > 0
(or < 0 ).
n
X
i=y
n!
i (1 0 )ni .
i!(n i)! 0
Slide 673: Is there significant evidence that this type of policy offers a poor financial return?
We cannot assume a Normal distribution!
Slide 674: ...under fair play, i.e. the median =3.5.
H0 : median = 3.5
v.
10
P
v.
H1 : < 0.5.
Xi . Then Y Bin(10, ).
i=1
X
Y n
1
p
=p
(Xi E(Xi )) N (0, 1)
n(1 )
n(1 ) i=1
13
Slide 677:
v.
H0 :
H1 :
v.
H1 : > 0.5.
!
t E(T )
T E(T )
p
p
Var(T )
Var(T )
!
t E(T )
N (0, 1) p
.
Var(T )
Slide 684: ...if a given distribution (such as N (, 2 ), Bin(n, ) etc.) fits the data well.
Slide 685: where
Oj = No. of Xi s equal to j,
j = 1, . . . , k,
Slide 686: Under H0 , the expected no. of observations in the j-th category is
Ej = npj0 ,
j = 1, . . . , k.
k
P
(Oj Ej )2 Ej .
j=1
Slide 688: The underlying distribution has three categories: 1 A, 2 B, and 3 Others,
and the null hypothesis is
H0 : p1 = 0.45, p2 = 0.4, p3 = 0.15.
i = 1, . . . , k.
100
P
Xi 100.
i=1
i = 0, 1, . . . .
Slide 693: In the latter case, we replace the unknown parameters by their MLEs.
Slide 694: We test instead a simplified null hypothesis
H0 : pi = F0 (ai ) F0 (ai1 ), i = 1, . . . , k.
Slide 696: Certainly the most frequent assumption made in statistics, either explicitly or
implicitly, is that data are drawn from a Normal distribution N (, 2 ).
Slide 698: The expected frequencies in all the intervals are n 10% = 5.
Slide 699: Therefore there is no significant evidence to reject the Normal distribution at
the 5% significance level.
Slide 701: If we subtract 65.5 from all the observations, there are 25 negative points and 25
positive points. Let
Y = No. of positive points.
Slide 702: Since we observe y = 25, the p-value is
P (Y 25) = P
Y 50 0.5
25 50 0.5
p
p
50 0.5(1 0.5)
50 0.5(1 0.5)
P (N (0, 1) 0) = 0.5.
Slide 704: The p-value is
!
T E(T )
473 E(T )
P (T 473) = P p
p
Var(T )
Var(T )
473 637.5
P N (0, 1)
= P (N (0, 1) 1.512888) = 0.0561.
103.59
15
for i = 1, . . . , r and j = 1, . . . , c.
r X
c
X
(Oij Eij )2 Eij .
i=1 j=1
and
Oj Ej = 0.
Slide 711: ...there is significant evidence from the data indicating that beer preference and
the gender of beer drinker are not independent.
Slide 713: We are interested in testing the hypothesis
H0 : p11 = p12 = p13 .
Slide 715: This is different from the independence tests, now we test for equal binomial
probabilities using independent samples.
Slide 717: We are asked to test the hypothesis H0 : p11 = p22 .
Slide 718: The treatment above is still valid as conditional inference, conditionally on
O1 = 200 and O2 = 100.
Slide 719: First we may test for independence: H0 : pij = pi pj for all i, j.
Slide 720: Under H0 ,
2 X
3
X
(Oij Eij )2
2(31)(21) = 22 .
T =
Eij
i=1 j=1
Slide 721:
the 7am shift produces the best quality, as
O11 E11 >> 0, and O12 E12 << 0,
the 11pm shift is the worst, as
O31 E31 << 0, and O32 E32 >> 0.
Slide 722: The quality of the 3pm shift is between the other two. Looking at the
three together, the differences become less significant.
16
Slide 727: It aims to model an explicit relationship between one dependent variable, often
denoted as y, and one or several regressors (also called covariates, or independent variables)
Slide 728: ...a straight line through the middle of the data points:
y = 0 + 1 x + ,
For a given population x, the predicted sales are yb = 0 + 1 x.
Slide 729: ...through the middle of the data cloud:
y = 0 + 1 x + ,
For a given height x, the predicted value yb = 0 + 1 x may be viewed as a kind of standard
weight.
Slide 730: How to draw a line through data clouds (i.e. to estimate 0 and 1 )?
Slide 731: Parameters in the model: 0 , 1 and 2 .
Treating x1 , . . . , xn as constants, we have
Var(yi ) = 2 ,
E(yi ) = 0 + 1 xi ,
Slide 732:
b0 , b1 are the values of (0 , 1 ) at which the function
n
X
L(0 , 1 ) =
(yi 0 1 xi )2
i=1
n
X
n
X
(xi x
)(yi y)
(xi x
)2 .
i=1
i=1
yi n0 1
i=1
n
X
xi = 0,
or 0 = y 1 x
.
i=1
This leads to
0 =
n
X
xi (yi 0 1 xi ) =
i=1
n
X
i=1
n
X
xi (yi y (1 xi 1 x
))
i=1
xi (yi y) 1
n
X
xi (xi x
).
i=1
17
n
X
n
X
c(yi y) = c
(yi y) = 0.
i=1
i=1
Slide 735:
L(0 , 1 ) =
n
X
2
yi b0 b1 xi + b0 0 + b1 1 xi
i=1
n
X
= L b0 , b1 +
b0 0 + (b1 1 )xi
2
+ 2B,
i=1
Slide 736:
n
X
2
L (0 , 1 ) = L b0 , b1 +
b0 0 + b1 1 xi L b0 , b1 .
i=1
b1 =
i=1
n
P
n
P
xi (yi y)
=
(xi x
)(yi y)
i=1
xi (xi x
)
i=1
n
P
.
(xi
x
)2
i=1
E b0 = E y b1 x
= 0 + 1 x
1 x
= 0 .
Slide 740: To work out the variances, the key is to write b1 and b0 as linear estimators
(i.e. linear combinations of yi )
Slide 743: To calculate b1 numerically, use the formula:
n
P
b1 =
xi yi n
xy
i=1
n
P
i=1
.
x2i n
x2
i = 1, . . . , n.
Var e1 Var b1
Slides 752: Then y1 , . . . , yn are independent (but not identically distributed), and
yi N (0 + 1 xi , 2 ).
Slide 753: In practice, we replace 2 by
n
2
1 X
b =
yi b0 b1 xi
n2
2
i=1
n
P
yi b0 b1 xi
i=1
2
/ 2 2n2 .
ii. b0 and
b2 are independent, hence b0 0 /S.E. b0 tn2 .
iii. b1 and
b2 are independent, hence b1 1 /S.E. b1 tn2 .
Slide 755: To validate the use of the regression model, we need to make sure that 1 6= 0, or
more practically b1 is significantly non-zero.
Under H0 ,
T =
b1
tn2 .
S.E. b1
bi
i
= 0.
i=1
i=1
Slide 773:
Total SS (Sum of Squares):
n
P
i=1
Regression SS:
n
P
b12 (xi x
)2 2 21 .
i=1
n
P
yi b0 b1 xi
i=1
2
2 2n2 .
Slide 776: If we view Total SS as the total variation (or energy) in y, R2 is the percentage
of the total variation explained by x.
Slide 782: For the analysis to be more informative, we would like to have some error bars
for our prediction.
Slide 783: Theorem:
b(x) is normally distributed with mean (x) and variance
Var (b
(x)) =
n
P
(xi x)2
i=1
n
n P
(xj x
)2
j=1
b2
!!1/2
Pn
2
(x
x)
i
1 + Pi=1
tn2 .
n nj=1 (xj x
)2
Slide 789: The predictive interval for y is longer than the confidence interval for E(y).
Slide 790: For the first task, a predictive interval would be more appropriate. For the
second task, he needs to know the average price and, therefore, a confidence interval.
20
Slide 791: Because predicting the selling price for one car is more difficult, the corresponding
interval is wider.
Slide 793: ...are (approximately) independent and Normal with constant variance 2 .
Slide 794: ...it is very likely that at least one assumption is violated!
Slide 800: To mitigate the impact of both outliers and influential observations, we could use
robust regression, i.e. estimate 0 and 1 by minimising the sum of absolute deviations:
SAD(0 , 1 ) =
n
X
|yi 0 1 xi |
i=1
current price
previous price
Slide 802: Daily prices are definitely not independent. However, daily returns may be seen
as a sequence of uncorrelated random variables.
Slide 804: There is clear synchronisation between the movements of the two series of returns.
Slide 807: The null hypothesis H0 : 1 = 0 is rejected with p-value 0.000: extremely
significant.
There are many standardised residual values 2 or 2, indicating a non-Normal error
distribution.
Slide 811: ... the residual distribution has heavier tails than N (0, 2 ).
Slide 813: Parameters in the model: 0 , 1 , . . . , p and 2 .
Slide 814: The LSEs b0 , b1 , . . . , bp are obtained by minimising
n
X
yi 0
i=1
p
X
2
xij j .
j=1
k
P
j X)
2 , with k 1 degrees of
nj (X
j=1
freedom.
Within-treatments variation: W =
nj
k P
P
j )2 , with. . .
(Xij X
j=1 i=1
Slide 849: B and W are also called, respectively, between-groups variation and
within-groups variation.
Slide 856: When k = 2, ANOVA is effectively a t test.
Slide 868: The variation of Xij is driven by a treatment factor at different levels 1 , . . . , k ,
in addition to random fluctuations (i.e. random errors).
: average effect
j : treatment effect at the j-th level.
Slide 869:
represents the average effect
1 , . . . , c represent c different treatment (column) levels
1 , . . . , r represent r different block (row) levels
ij N (0, 2 ) and are independent.
22