Академический Документы
Профессиональный Документы
Культура Документы
1.1 Setup
Let given observed sample set be {(x1 , y1 ), (x2 , y2 ), · · · (xN , yN )}.
X, Y are random variables that can take on any value (xi , yi ) within range of sample set.
One variable is always independent or regressor or predictor variable, typically X and another
is dependent or regressand or predicted variable, typically Y
N
X
Sxy = Syx = (xi − x)2 (yi − y)2 constant (1)
i=1
N
X
Sxx = (xi − x)2 constant (2)
i=1
XN
Syy = (yi − y)2 constant (3)
i=1
The above equation is called Population Regression Function, PRF. Including the error ε, the
prediction of dependent variable would be
which is called simple linear regression model for population.E(Y |x) is often hypothetical be-
cause we would not know β0 , β1 unless we know population. We do not care about distribution of
Y (µY , σY2 ) here as regression is always one sided 3 . For Y , we do the other way, but that is another
story in similar lines.
1
https://en.wikipedia.org/wiki/Estimator
2
https://stats.stackexchange.com/a/17789/202481
3
unless we standardize dataset, which leads to symmetry and correlation coefficient
1
Other Main Parameters: β0 , β1
For given sample (xi , yi ) from sample set (X, Y ), a fitted value and residual are
Using OLS,
P
(x,y) (y − Y )(x − X)
β̂1 = Slope RV, Estimator of RV β1 (10)
− X)2
P
x (x
β̂0 = Y − β̂1 X y-intercept RV, Estimator of RV β0 (11)
P
(yi − y)(xi − x) Sxy
b1 = i P 2
= Slope constant, Estimate of RV β1 (12)
i (xi − x) Sxx
b0 = y − b1 x y-intercept constant, Estimate of RV β0 (13)
βˆ0 , βˆ1 are estimators of β0 , β1 for any sample set. b0 , b1 are estimates of β0 , β1 for given sample
set
Estimator(Estimates): ε̂(0, s2 ), X̂(x, s2X ), Ŷ (ŷ = y, s2Y |x = s2 ), βˆ1 (b1 ), βˆ0 (b0 )
N
X N
X
SSE = (yi − yˆi )2 = [yi − (b0 + b1 xi )]2 constant (14)
i=1 i=1
Variance Estimation σ:
− Ŷ )2
P
2 2 y (y
S = σ̂ = RV, Variance Estimator of RV ε (15)
n−2
2
where n − 2 is the degrees of freedom because it requires βˆ0 , βˆ1 to be calculated (in other words,
β0 , β1 to be estimated) before summation.
PN
2 − yˆi )2
i=1 (yi SSE
s = = constant, Variance Estimate of RV ε (16)
n−2 n−2
S 2 is estimator of σ 2 for any sample set. s2 is estimate of σ 2 for given sample set
N
X
SST = Syy = (yi − y)2 constant (17)
i=1
SSE
rd2 = 1 − constant (18)
SST
0 ≤ rd ≤ 1 (19)
2
r = rd2 where r is sample correlation coefficient (20)
Note in above table, for columns 2 and 3, the estimand is parameter of estimator βˆ1 itself, not
that of β1 . That is, we are interested in the mean and variance of estimator βˆ1 .
P
(x,y) (x − X)(y − Y ) X
β̂1 = = cy Slope RV, Estimator of RV β1 (21)
X)2
P
x (x − y
x−X
c= P 2
constant (22)
x (x − X)
PN N
i (xi − x)(yi − y) X
b1 = PN = ci yi Slope constant, Estimate of RV β1 (23)
2
i (xi − x) i
xi − x
ci = PN constant (24)
i (xi − x)2
3
Because each Yi is normal (as underlying ε is normal), β̂1 also should be normal.
Mean of βˆ1 :
Variance of βˆ1 :
σ2
σβ̂2 = V (βˆ1 ) = P 2
RV (26)
x (x − x)
1
S2
Sβ2ˆ = σc
βˆ1 = P 2
RV, Variance Estimator of RV σβ̂1 (27)
x (x − x)
1
s2 s2
s2βˆ = PN = constant, Variance Estimate of RV σβ̂1 (28)
1
i (xi − x)
2 Sxx
Sβ2ˆ is estimator of σβ̂2 for resultant any sampling distribution of βˆ1 or multiple SRFs
1 1
s2ˆ is estimate of σ 2 for resultant given sampling distribution of βˆ1 or multiple SRFs
β1 β̂1
From here, Confidence intervals and Hypothesis testing procedures for β1 could be built (im-
mediate next step would be seeing standardized β̂1 having t distribution with df N − 2)
Sxy
r=√ p
Sxx Syy