Вы находитесь на странице: 1из 43

Quantile Regression

Quantile Regression
The Problem
The Estimator
Computation
Properties of the Regression
Properties of the Estimator
Hypothesis Testing
Bibliography
Software
Quantile Regression
The Problem
Quantile Regression
Problem
The distribution of Y, the dependent variable,
conditional on the covariate X, may have thick tails.
The conditional distribution of Y may be asymmetric.
The conditional distribution of Y may not be
unimodal.

Neither regression nor ANOVA will give us robust
results. Outliers are problematic, the mean is pulled
toward the skewed tail, multiple modes will not be
revealed.
Quantile Regression
Problem
ANOVA and regression provide information
only about the conditional mean.
More knowledge about the distribution of the
statistic may be important.
The covariates may shift not only the location
or scale of the distribution, they may affect the
shape as well.
Quantile Regression
0
5
10
15
20
25
Five Treatments, 100 Patients
1 2 3 4 5
Raw Data
S
c
o
r
e
Treatment
Quantile Regression
1 2 3 4 5
Treatment
0
4
8
12
16
S
c
o
r
e
Five Treatments, 100 Patients
Means with Error Bars
Quantile Regression
Quantiles
0
4
8
12
16
20
0 1 2 3 4 5 6
Treatment
S
c
o
r
e
10th %tile
25th %tile
50th %tile
75th %tile
90th %tile
Mean
Quantile Regression
The Estimator
Quantile Regression
Ordinarily we specify a quadratic loss
function. That is, L(u) = u
2
Under quadratic loss we use the
conditional mean, via regression or
ANOVA, as our predictor of Y for a given
X=x.
Quantile Regression
Definition: Given p [0, 1]. A pth quantile
of a random variable Z is any number
p

such that Pr(Z<
p
) p Pr(Z
p
). The
solution always exists, but need not be
unique.
Ex: Suppose Z={3, 4, 7, 9, 9, 11, 17, 21}
and p=0.5 then
Pr(Z<9) = 3/8 1/2 Pr(Z 9) = 5/8
Quantile Regression
Quantiles can be used to characterize a
distribution:
Median
Interquartile Range
Interdecile Range
Symmetry = (
.75
-
.5
)/(
.5
-
.25
)
Tail Weight = (
.90
-
.10
)/(
.75
-
.25
)

Quantile Regression
Suppose Z is a continuous r.v. with cdf
F(.), then Pr(Z<z) = Pr(Zz)=F(z) for every
z in the support and a pth quantile is any
number
p
such that F(
p
) = p
If F is continuous and strictly increasing
then the inverse exists and
p
=F
-1
(p)
Quantile Regression
Definition: The asymmetric absolute loss function is





Where u is the prediction error we have made and I(u) is
an indicator function of the sort

| |
| |u ) 0 u ( I p
u ) 0 u ( I ) p 1 ( ) 0 u ( I p L
p
< =
< + > =

<
>
= >
0 u if 0
0 u if 1
) 0 u ( I
Quantile Regression
Absolute Loss vs. Quadratic Loss
0
0.5
1
1.5
2
2.5
3
-2 -1 0 1 2
Quad
p=.5
p=.7
Quantile Regression
Proposition: Under the asymmetric
absolute loss function L
p
a best predictor of
Y given X=x is a p
th
conditional quantile.
For example, if p=.5 then the best predictor
is the median.
) x (
p
,
Quantile Regression
Definition: A parametric quantile regression
model is correctly specified if, for example,


That is, is a particular linear combination
of the independent variable(s) such that




where F( ) is some univariate distribution.
x ) , x ( q ) x (
p
| + o = u = ,
) x ) x ( ( F ) x Y Pr(
) x | ) x ( ( F ) x X ) x ( Y Pr( p
p
p p
| o , = | + o s =
, = = , s =
x | + o
Quantile Regression
.25
x ) x (
25 .
| + o = ,
Quantile Regression
Definition: A quantile regression model is
identifiable if



has a unique solution.
) x Y ( L E min
p F
,
| o
| o
Quantile Regression
A family of conditional quantiles of Y given
X=x.
Let Y= + x + u with = = 1
x
Y
90%
75%
50%
25%
10%
Quantile Regression
Daily High Temperature
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40 50
Yesterday
T
o
d
a
y
Quantile Regression
5 10 15 20
20
40
60
80
Cool Yest erday (n=259)
Temperature Today
F
r
e
q
u
e
n
c
y
75
1
X
1
( )
18.4 7. 6
X
0
( )
Quantile Regression
15 20 25 30 35 40 45
20
40
60
80
Hot Yest erday (n=259)
Temperature Today
F
r
e
q
u
e
n
c
y
61
6
X
1
( )
42.55 14
X
0
( )
Quantile Regression
Quantiles at .9, .75, .5, .25, and .10. Todays temperature fitted to a quartic on yesterdays temperature.
Temperature Quantiles
0
10
20
30
40
50
60
5 15 25 35 45
Yesterday
T
o
d
a
y
Quantile Regression
Computation of the Estimate

Quantile Regression
Estimation
The quantile regression coefficients are the
solution to


The k first order conditions are
( ) | |( ) ) 1 ( x y x y sgn p
n
1
min
n
1 i
'
i i
'
i i 2
1
2
1

=
|
| |
( ) ) 2 ( 0 x

x y sgn
2
1
2
1
p
n
1
n
1 i
i p
'
i i
=
(

| +

=
Quantile Regression
Estimation
The fitted line will go through k data points.
The # of negative residuals np # of neg
residuals + # of zero residuals
The computational algorithm is to set up the
objective function as a linear programming
problem
The solution to (1) (2) [previous slide] need not
be unique.

Quantile Regression
Properties of the Regression
Quantile Regression
Properties of the regression
Transformation equivariance
For any monotone function, h(.),



since P(T<t|x) = P(h(T)<h(t)|x). This is
especially important where the response
variable has been censored, I.e. top coded.
)) x | ( Q ( h ) x | ( Q
T ) T ( h
t = t
Properties
The mean does not have transformation
equivariance since Eh(Y) h(E(Y))
Quantile Regression
Equivariance
Practical implications
( ) ( )
( ) ( )
( ) ( )
( ) ( ) gular sin non A x , y ,

A xA , y ,

x , y ,

x , x y ,

0 x , y ,

x , y , 1

0 x , y ,

x , y ,

1
k
u | = u |
9 e + u | = + u |
s u | = u |
> u | = u |

Equivariance
(i) and (ii) imply scale equivariance
(iii) is a shift or regression equivariance
(iv) is equivariance to reparameterization
of design
Quantile Regression
Properties
Robust to outliers. As long as the sign of
the residual does not change, any Y
i
may
be changed without shifting the conditional
quantile line.
The regression quantiles are correlated.
Quantile Regression
Properties of the Estimator
Quantile Regression
Properties of the Estimator
Asymptotic Distribution



The covariance depends on the unknown
f(.) and the value of the vector x at which
the covariance is being evaluated.
( ) ( )
| | ( ) | | | | ( )
1
i i i i i
1
i i i
L
x x ) x | 0 ( f E x x E x x ) x | 0 ( f E ) 1 (
where
, 0 N

n

u
u u u
' ' '
u u = A
A | |
Quantile Regression
Properties of the Estimator
When the error is independent of x then the
coefficient covariance reduces to



where

( )
( )
( ) ( )
1
2
u
x x E
0 f
1

u
'
u u
= A

=
'
=
'
n
1 i
i i
x x
n
1
) x x ( E

Quantile Regression
Properties of the Estimator
In general the quantile regression estimator
is more efficient than OLS
The efficient estimator requires knowledge
of the true error distribution.
Quantile Regression
Coefficient Interpretation



The marginal change in the th conditional
quantile due to a marginal change in the jth
element of x. There is no guarantee that
the ith person will remain in the same
quantile after her x is changed.
( )
ij
i i
x
x | y Q
c
c
u
Quantile Regression
Hypothesis Testing
Quantile Regression
Hypothesis Testing
Given asymptotic normality, one can construct
asymptotic t-statistics for the coefficients
The error term may be heteroscedastic. The test
statistic is, in construction, similar to the Wald
Test.
A test for symmetry, also resembling a Wald
Test, can be built relying on the invariance
properties referred to above.

Heteroscedasticity
Model: y
i
=
o
+
1
x
i
+u
i
, with iid errors.
The quantiles are a vertical shift of one
another.
Model: y
i
=
o
+
1
x
i
+(x
i
)u
i
, errors are
now heteroscedastic.
The quantiles now exhibit a location shift as
well as a scale shift.
Khmaladze-Koenker Test Statistic
Quantile Regression
Bibliography
Koenker and Hullock (2001), Quantile
Regression, Journal of Economic
Perspectives, Vol. 15, Pps. 143-156.
Buchinsky (1998), Recent Advances in
Quantile Regression Models, Journal of
Human Resources, Vo. 33, Pps. 88-126.
Quantile Regression
S+ Programs - Lib.stat.cmu.edu/s
www.econ.uiuc.edu/~roger
http://Lib.stat.cmu.edu/R/CRAN
TSP
Limdep