Академический Документы
Профессиональный Документы
Культура Документы
Logistic Regression
Logistic Regression
• Consider the Multiple Linear Regression Model:
1
Logistic Regression
• The response variable, y, is really just a
Bernoulli trial, with
E(y) = π
where
π = probability of a success on any given
trial
• π can only take on values between 0 and 1
Logistic Regression
• Thus, the Multiple Regression Model
π = µ y|x = β 0 + β1xi1 + β 2 xi2 + + βk xik
2
Logistic Regression
• When the response variable is dichotomous, a
more appropriate linear model is the
π
logit(π ) = log = α + β1xi1 + β 2 xi2 + + βk xik
1− π
Logistic Regression
• The ratio
π
1− π
3
Logistic Regression
• Consider the model with one predictor
(k=1):
π
Logit log =α + βx
1− π
π
Odds
1− π
=e α +β x
=e α
(e )
β x
Logistic Regression
π
Odds
1− π
=e α
(e )β x
4
Logistic Regression
π
x=1
1− π
( )
= eα e β
π
( ) ( )( eβ )
2
x=2 = eα e β = eα e β
1− π
π
( ) ( )( eβ )( eβ )
3
x=3 = eα e β = eα e β
1− π
Logistic Regression
• Thus, eβ is the odds ratio, comparing the
odds at x+1 with the odds at x
• An odds ratio equal to 1 (i.e., eβ = 1)
occurs when β = 0, which describes the
situation where the predictor x has no
association with the response y.
5
Logistic Regression
• As with regular linear regression, we
obtain a sample of n observations, with
each observation measured on all k
predictor variables and on the response
variable.
• We use these sample data to fit our model
and estimate the parameters
Logistic Regression
• Using the sample data, we obtain the
model:
p
log = a + bx
1-p
for a single predictor
6
Logistic Regression
• The estimate of π is
ea+bx
p=
1+ea+bx
100
(Multiplicative) (Additive)
7
Example: Coronary Heart Disease
Probability of Coronary Heart Disease
1.00
0.80
0.60
Proportion
0.40
predicted
observed
0.20
0.00
1 2 3 4 5 6 7 8
Age Group
Parameter Estimation
• A 100(1-α)% Confidence Interval for β is:
βˆ ± zα / 2 × a.s.e. ( βˆ )
8
Parameter Estimation
• A 100(1-α)% Confidence Interval for the odds
ratio eβ is:
βˆ ± zα / 2 ×a.s.e.( βˆ )
e
Hypothesis Testing
H0: β = 0
H1: β ≠ 0
9
Hypothesis Testing
2
βˆ
Wald Test QW = ∼ χ12
( )
a.s.e. βˆ
Goodness of Fit
• Let m = # of levels of x (m=8 for CHD example)
• Let ni = number of observations in the ith level of x
• Let k = number of parameters (k=2 for CHD
example)
2
XPearson
m
= ∑ ni
( pi − πˆi )
2
2
∼ χ m-k
i=1 πˆi
2
m p 2
XDeviance = 2 ∑ ni pi log i ∼ χ m-k
i=1 πˆi
10
Example: Coronary Heart Disease
Odds Log(Odds) Predicted
Age Age Group (x) CHD Present N p p/(1-p) log(p/(1-p)=a+bx π-hat
25 1 1 10 0.10 0.09834 -2.31929 0.08954
30 2 2 15 0.13 0.17184 -1.76117 0.14664
35 3 3 12 0.25 0.30028 -1.20305 0.23093
40 4 5 15 0.33 0.52470 -0.64493 0.34413
45 5 6 13 0.46 0.91685 -0.08681 0.47831
50 6 5 8 0.63 1.60209 0.47131 0.61569
55 7 13 17 0.76 2.79947 1.02943 0.73681
60 8 8 10 0.80 4.89175 1.58755 0.83027
100
(Multiplicative) (Additive)
Age
Obs Age Group CHD
1 25 1 1
2 25 1 0
3 25 1 0
4 25 1 0
5 25 1 0
6 25 1 0
7 25 1 0
8 25 1 0
9 25 1 0
10 25 1 0
11 30 2 1
12 30 2 1
13 30 2 0
14 30 2 0
15 30 2 0
16 30 2 0
17 30 2 0
18 30 2 0
19 30 2 0
20 30 2 0
21 30 2 0
22 30 2 0
23 30 2 0
24 30 2 0
25 30 2 0
11
SAS Code:
Proc Logistic
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
12
Proc Logistic Output: CHD Example
13