Answer Key Problem Set 2

International Journal of Forecasting xxx (xxxx) xxx
Contents lists available at ScienceDirect
International Journal of Forecasting

journal homepage: www.elsevier.com/locate/ijforecast
Quantile forecasting with mixed-frequency data

∗
Luiz Renato Lima a,b , , Fanning Meng a , Lucas Godeiro c
a
Department of Economics, University of Tennessee, Knoxville, United States
b
Department of Economics, Federal University of Paraiba, Brazil
c
Department of Applied Social Sciences, Federal Rural University of the Semi-arid Region - UFERSA, Brazil
article info a b s t r a c t
Keywords: We analyze the quantile combination approach (QCA) of Lima and Meng (2017) in sit-
High-frequency predictors uations with mixed-frequency data. The estimation of quantile regressions with mixed-
Quantile regression frequency data leads to a parameter proliferation problem, which can be addressed through
LASSO
extensions of the MIDAS and soft (hard) thresholding methods towards quantile regression.
Elastic net
We use the proposed approach to forecast the growth rate of the industrial production
index, and our results show that including high-frequency information in the QCA achieves
substantial gains in terms of forecasting accuracy.
Published by Elsevier B.V. on behalf of International Institute of Forecasters.
1. Introduction the random walk model produces biased estimates of β̂ but

with low variance. In contrast, OLS estimation of a larger
Elliott and Timmermann (2016) point out that a central model that includes many high-frequency predictors will
issue in out-of-sample point forecasting is the minimiza- produce low bias with high variance. Thus, dimension re-
tion of the effect of estimation error on the predictive ac- duction techniques, such as MIDAS or shrinkage methods,
curacy. In order to understand this issue, consider a linear can be interpreted as a combination of these two cases,
regression model: y = X β + e, where E (e) = 0 and which can lead to a biased estimator with a low variance.
Var (e) = σ 2 . Tibshirani (1996) showed that the MSE for Hence, dimension reduction techniques may still generate
the forecast of the model is related directly to the MSE forecasts which are better (have lower MSEs) than the
of the( estimator; that is, MSE (y − ŷ) = E [(y − ŷ)]2 = benchmark forecast.
MSE β̂ + σ , where ŷ is the forecast of y and β̂ is an
) 2
However, another potential source of estimation error
estimator of β . Thus, it is clear that if we can reduce the lies in the lack of robustness of the OLS estimator to outliers
MSE of β̂ , we will be able to obtain a considerable reduction or extreme observations. It is well known that the influence
in the MSE of the forecasts and a marked improvement in function of the OLS estimator is unbounded, implying that
the model’s predictive accuracy. the estimation of the conditional mean by OLS may lead
The discussion above is important for understanding to large estimation errors when the data are not Gaus-
the main contribution of this paper. First, suppose that we sian (Xiao & Lima, 2014). In order to minimize the effect
want to forecast the growth rate of the industrial produc- of occasional but extreme observations on the predictive
tion (IP) index and, according to the true model, which accuracy, the forecasting literature has utilized what is
is unknown to us, high-frequency predictors contain im- known as a quantile combination approach (QCA). The idea
portant predictive information. However, we use instead is that the conditional mean of a random variable (which
a random walk model (which corresponds to an autore- is the optimal MSE forecast) can be approximated through
gressive model with lag coefficient equal to one) as our the combination of quantile forecasts. This approach has
benchmark forecasting model. Then, by setting the regres- been used in the forecasting literature by Judge, Hill, Grif-
sion coefficients on the high-frequency predictors to zero, fiths, Lütkepohl, and Lee (1988), Lima and Meng (2017), Ma
and Pohlman (2008), Meligkotsidou, Panopoulou, Vrontos,
∗ Corresponding author. and Vrontos (2014) and Taylor (2007). In this sense, MIDAS
E-mail address: llima@utk.edu (L.R. Lima). and QCA share the same strategy — that is, they aim at
https://doi.org/10.1016/j.ijforecast.2018.09.011
0169-2070/Published by Elsevier B.V. on behalf of International Institute of Forecasters.
Please cite this article as: L.R. Lima, F. Meng and L. Godeiro, Quantile forecasting with mixed-frequency data. International Journal of Forecasting (2019),
https://doi.org/10.1016/j.ijforecast.2018.09.011.
2 L.R. Lima, F. Meng and L. Godeiro / International Journal of Forecasting xxx (xxxx) xxx
obtaining low-MSE forecasts by minimizing MSE β̂ — but

( )
Assumption 1 (Loss Function). The loss function L(·) is a
they target different sources of estimation errors. homogeneous function solely of the forecast error et +h ;
Thus, this paper studies the performance of the quan- that is, L = L(et +h ), and L(ae) = g(a)L(e) for some positive
tile combination approach in situations with many high- function g(·).
frequency predictors. The idea is that if a model includes
high-frequency predictors and there exist occasional Based on the DGP in Eq. (1) and the loss function L(·),
( )but Lima and Meng (2017) showed that the optimal forecast is
extreme observations, then one can minimize MSE β̂ by
employing dimension reduction techniques within a quan- given by:
tile combination approach. In other words, we aim to im- ŷt +h,t = Qτ (yt +h |Xt ) = Xt′ α + (Xt′ γ )δ ∗ (2)
prove the predictive accuracy by reducing the effects of
both forms of estimation error. = E(yt +h |Xt ) + κτ
Our empirical results show that including high- where κτ = Xt γ δ ∗ . Thus, the bias term κτ explicitly
( ′
)
frequency information in the QCA produces a substantial depends ∫on (the loss function through the parameter δ ∗ =
gain in terms of the forecasting accuracy of IP growth arg min L ηt +1 − δ̂ dFη (η). Indeed, if L is the quadratic
)
rates. Furthermore, we show that the overwhelming per- δ̂
formance of the QCA is explained by its ability to lower the function, then δ ∗ = 0, and the optimal forecast will be
forecast bias further while keeping the forecast variance ŷt +h,t = E (yt +h |Xt ). If L corresponds to the absolute value
under control. We also computed how much forecasting function, then δ ∗ = median (η), and therefore the optimal
accuracy is lost when one uses a MIDAS lag structure for the forecast will be ŷt +h = median (yt +h |Xt ). More generally,
high-frequency predictor rather than selecting it by either if L is the asymmetric lin-lin function, then δ ∗ = Fη−1 (τ ),
LASSO or elastic net. We found that substantial accuracy and the optimal forecast will be ŷt +h = Qτ (yt +h |Xt ) for
gains are obtained when such a lag structure is selected by τ ̸= 0.5 (see the proof in Appendix 1). Thus, in our setting,
either the LASSO or the elastic net. where the DGP corresponds to a location-scale model, the
Although the use of QCA leads to robust estimates of the bias term κτ will capture deviations from the conditional
conditional mean (MSE) forecast, we found that using the mean forecast.2
QCA with a fixed lag structure (that is, assuming that the lag In practice, the conditional mean is estimated by the
structure in each conditional quantile is identical to that usual ordinary least squares (OLS) estimator. However,
found for the conditional mean) results in significant im- as Zhao and Xiao (2014) showed, if the data are not dis-
provements in terms of computational efficiency relative tributed normally, OLS estimation is usually less efficient
to the more general QCA where a specific lag structure is than methods that exploit distributional information.3
selected for each quantile. This last result leads us to rec- Nonetheless, given a set of quantile levels τ = (τ1 , τ2 , . . . ,
ommend using the QCA with a fixed lag structure (selected τn ), one can still approximate the conditional mean (MSE
by either the LASSO or the elastic net) because it is not only forecast) by combining n quantile forecasts,
robust to outliers, but also computationally efficient.
τn
The remainder of this paper is organized as follows. The ∑
quantile combination approach is introduced in Section 2. ωτ Qτ (yt +h |Xt ) ≈ E (yt +h |Xt ) , (3)
Section 3 discusses some existing solutions to the param- τ =τ1
eter proliferation problem and extends it towards quantile where ωτ ∈ (0, 1) is the weight assigned to the quantile
regression. Section 4 presents an empirical analysis of IP forecast at level τ . Notice that the weights are quantile-
growth forecasting, and Section 5 concludes. specific, since they are aimed at approximating the mean
of ηt +h , which is zero. This is the quantile combination ap-
2. The quantile combination approach
proach studied by Lima and Meng (2017). The next section
discusses the parameter proliferation problem that arises
Suppose that an agent is interested in forecasting a
stationary time series, yt +h , using information available when we try to include high-frequency information in the
at time t, It . We consider the following data-generating quantile function (Eq. (2)).
process (DGP) studied by Lima and Meng (2017):
3. The parameter proliferation problem
yt +h = Xt′ α + (Xt′ γ )ηt +h (1)
ηt +h |Xt ∼ i.i.d. Fη (0, 1), If we restrict the predictors Xt in the optimal forecast
(Eq. (2)) to those sampled at the same frequency as yt +h ,
where ηt +h is assumed to follow a unknown distribution we miss out on other critical information resources, es-
Fη with mean 0 and unit variance. Xt ∈ It is a vector of pecially those that are updated more frequently. Thus, we
predictors that are observable at time t. The conditional capture more informative resources by assuming that Xt =
mean of yt +h is E(yt +h |Xt ) = Xt′ α and the conditional )′
Wt , XtM , where Wt is a k × 1 vector of pre-determined
(
quantile of yt +h is Qτ (yt +h |Xt ) = Xt′ β(τ ), where β(τ ) =
predictors, such as lags of yt +h and predictors sampled at
α + γ Fη−1 (τ ); and Fη−1 (τ ) represents the unconditional p
)′
Xt1 , Xt2 , . . . , Xt
(
quantile of ηt +h . the same frequency as yt +h , and XtM =
In addition, we assume that the loss function L(·) is
defined as per Patton and Timmermann (2007),1 that is: the most common MSE, MAE, lin-lin, and asymmetric quadratic loss
functions.
2 A proof is provided in Appendix A.
1 The assumption on the loss function is the same as Assumption L2 3 Appendix B provides a short explanation of the robustness of quantile
of Patton and Timmermann (2007). It nests many loss functions, including regression vis-à-vis OLS.
L.R. Lima, F. Meng and L. Godeiro / International Journal of Forecasting xxx (xxxx) xxx 3
j
is p × 1 vector of high-frequency predictors, with Xt , j = number of parameters considered for estimation is reduced
1, 2, . . . , p, being updated mj times between t − 1 and t, substantially, to K ∗ = 1 + k + qj , and the optimal forecast
where mj > 1 for ∀j. For example, if yt is a quarterly vari- (Eq. (2)) becomes
j
able and Xt is a monthly variable, we would have mj = 3.
ŷt +h = Qτ yt +h |Xt∗ ,
( )
(4)
Moreover, if yt +h is affected by four (q = 4) of its own lags,
where Qτ yt +h |Xt is the τ th quantile of yt +h conditional
( )
then there would be pj = q ∗ mj = 12 lags of the monthly ∗
j j j j j j j
predictor Xt ; that is, Xt = (xt , x 1 , x 2 , xt −1 , . . . , x 11 )′ . on Xt∗ , which now can be estimated using the standard
t− 3 t− 3 t− 3
Given p high-frequency predictors with pj lag predictors quantile regression estimator since the parameter prolif-
∑p
each, there would be up to j=1 pj = P parameters. With- eration problem has been eliminated.5
out additional restrictions, the total number Unfortunately, this restricted MIDAS approach is likely
∑p of coefficients to be undermined by the ‘‘ad-hoc’’ choice of the Q matrix,
to be estimated is equal to K = 1 + k + j=1 pj = 1 + k + P.4
In the case where K is close to or even larger than the total which may lead to a lag structure for the high-frequency
number of observations T , we are faced with a parameter predictor that does not maximize the forecast accuracy. We
proliferation problem. overcome this problem by proposing a new procedure that
relies on the use of the LASSO or the elastic net for choosing
3.1. Solutions to the parameter proliferation problem a lag structure for the high-frequency predictors. In what
follows, we describe the proposed approach.6
One successful solution that has been used by the fore-
casting literature is the mixed data sampling (MIDAS) 3.2. Step 1: soft thresholding (lag selection)
regression model, where one assumes a restriction on
the form in which the distributed lags are included in We first apply a penalization method (either LASSO
the regression equation (Clements & Galvão, 2008, 2009; or elastic net) at each quantile level τ in order to select
Ghysels, Santa-Clara, & Valkanov, 2004, 2005; Kuzin, Mar- the most predictive lag structure for the high-frequency
cellino, & Schumacher, 2011). One caveat with the predictors. Bai and Ng (2008) called this first step the soft
MIDAS approach is that it is limited by the number of thresholding rule.
high-frequency predictors that can be included. In other The LASSO estimator of the quantile function at level
words, MIDAS is able to reduce the number of parameters τ ∈ (τ1 , . . . , τn ) was developed by Belloni and Cher-
if only a few high-frequency predictors are included in the nozhukov (2011) and can be summarized as follows:
regression model. ∑
j min ρτ (yt +h − Wt′ β(τ ) − XtM ′ φ(τ ))
A special case of the MIDAS approach includes lags of Xt β,φ
t
in the regression model through the following polynomial: √
τ (1 − τ )
pj −1 + λτ ∥φ(τ )∥ℓ1 (5)
1 ∑ i m
B(L mj
; θj ) = b(i; θ j )L mj
, P
∑⏐
∥φ(τ )∥ℓ1 = ⏐φ (τ )i ⏐ ,
⏐
i=0
i i
j j i=1
where L mj is the lag operator, L mj Xt = x , and b(i; θ j ) is
t − mi
j where the first part, ρτ (e) ≡ [τ − 1(e ≤ 0)]e, represents
the weight assigned to the ith lagged term, i = (1, 2, . . . , the standard quantile estimation and the function 1(e ≤
pj ), which is assumed to take an Almon form, that is: 0) equals one if e ≤ 0, and zero √ otherwise. The LASSO
τ (1−τ )
qj −1 estimator imposes a penalty, λτ m
, on the coefficients
of all (lags of) high-frequency predictors, where m stands
∑
b(i, θ j ) = θk ik i = 1, . . . , pj ,
for the estimation sample size. The optimal value of the
k=0
tuning parameter λτ for each quantile τ is obtained follow-
where θ j = (θ0 , θ1 , . . . , θqj −1 ), qj ≪ pj and qj is the order ing Belloni and Chernozhukov (2011).
of the polynomial. Thus, the pj × 1 vector of high frequency One distinguishing feature of LASSO is that the coeffi-
j
data Xt can be converted to a vector of qj transformed cients of insignificant predictors can be exactly zero, leav-
j
variables X̃t based on the matrix Q: ing the forecasts unaffected by uninformative predictors.
Although LASSO is successful at variable selection, it also
...
⎡ ⎤
1 1 1 1
has two potential limitations that can affect the forecasting
⎢1 2 3 ... pj ⎥
accuracy. In particular, in the case where the number of
Q = ⎢ .. .. .. .. .. ⎥
⎥.
⎢
⎣. . . . . ⎦ parameters K is larger than the sample size T , as in our
qj − 1
1 2qj −1 3qj −1 ... pj
5 This restricted MIDAS approach (based on the Almon lag polynomial)
After this transformation, the vector of predictors in- was implemented recently by Pettenuzzo, Timmermann, and Valkanov
cluded in the location-scale model in Eq. (1) becomes equal (2016) for forecasting the growth rate of the industrial production index.
j j j
to Xt∗ = (Wt , X̃t )′ , where X̃t = QXt . Notice that the total 6 An alternative approach that is not included in this paper is the un-
restricted MIDAS (UMIDAS), which can be estimated by OLS but becomes
heavily parameterized when the frequency mismatch is large (Foroni,
4 Recall Q (y |X ) = X ′ α + X ′ γ F −1 (τ ) = X ′ β(τ ). The vector of Marcellino, & Schumacher, 2015). As was suggested by one of the referees,
τ t +h t t t η t
quantile coefficients β(τ ) includes K coefficients, where βk (τ ) = αk + one could potentially consider a hybrid solution where the LASSO (elastic
γk Fη (τ ), k = 0, 1, . . . , K − 1.
−1
net) is used to select part of the lag structure imposed by UMIDAS.
empirical exercise, LASSO selects at most T variables before the mean cross-validated errors of the forecasting model,
it saturates. Second, according to Bai and Ng (2008, p. 307) with the EN mixing parameter restricted to α = 0.5 (Fried-
and Li (2015, p. 385), if there is a group of variables with man, Hastie, & Tibshirani, 2010).9 For both LASSO and EN,
high pairwise correlations, then LASSO tends to select only the soft thresholding rule is that by which a predictor will
one variable from the group, without caring which one. Zou be selected if the absolute value of its estimated coefficient
and Hastie (2005) proposed the elastic net in order to is different from zero.
address these two problems. Indeed, the idea of the elastic
net is to stretch the fishing net in order to retain all big 3.3. Step 2: hard thresholding
‘‘fishes’’. Like LASSO, the elastic net shrinks the estimates
and performs model selection simultaneously. The LASSO Given the predictors (lags) selected by either the LASSO
penalty is convex, but not strictly convex. The elastic net or the elastic net in the first step, we construct a kmax × 1
relies on strict convexity, which enforces the grouping vector of common factors PCtτ where the optimal number
effects so that predictors with similar properties will have of factors, kmax , is chosen via the eigenvalue ratio approach
similar coefficients (Bai & Ng, 2008, p. 307). developed by Ahn and Horenstein (2013). Then, as per Bai
The elastic net (EN) objective function is: and Ng (2008), we keep only the factors with p-values that
∑
min ρτ (yt +h − Wt′ β(τ ) − XtM ′ φ(τ )) are less than or equal to 0.01. For example, if the optimal
β,φ number is three, we consider the first three principal com-
t
ponents but keep only those with p-values that are less
+ λ1τ ∥φ(τ )∥ℓ1 + λ2τ ∥φ(τ )∥ℓ2 , (6)
than or equal to 0.01. This approach is equivalent to that
∑P ⏐⏐ P
where ∥φ(τ )∥ℓ1 = i=1 φ (τ )i and ∥φ (τ )∥ℓ2 =
developed by Bai and Ng (2008), in which soft and hard
⏐ ∑
⏐
i=1
φ(τ )i . Thus, the elastic net nests both the LASSO and Ridge
2 threshold methods are applied in a forecasting problem
quantile regressions. When the two tuning parameters λ1τ with too many predictors; i.e., they proposed first using
λ2τ LASSO (EN) to select the most predictive variables from
and λ2τ satisfy the relationship λ +λ > 0, the EN penalty
1τ 2τ a large set of covariates and then considered common
is strictly convex, which forces highly-correlated predic-
tors to have similar coefficients. As a result, EN can capture factors of the selected variables as the final predictors. Our
all significant predictors even if they are highly correlated. approach extends this idea towards quantile regressions.
We increase the computational efficiency by follow- Thus, the optimal forecast (Eq. (2)) becomes
ing Bai and Ng (2008)7 and reformulating the EN as a LASSO Qτ yt +h |Wt , PCtτ = Wt′ β(τ ) + PCtτ ′ ϕ(τ ),
( )
(8)
problem. It has a computationally appealing property be-
τ
cause we can solve the EN objective function by using the where PCt is the vector of selected common factors at the
LASSO algorithms proposed by Belloni and Chernozhukov quantile level τ .
(2011). We implement this representation by defining new
variables (assuming for simplicity that the W variables are 3.4. Step 3: quantile combination
absent):
( )
1
( M
) For n widely-spread quantile levels, we generate a
y+
t =
yt
Xt+ = √ √Xt , conditional-mean (MSE) forecast of yt +h by weighting its
OP 1 + λ2 λ2 I P
corresponding quantile forecasts equally; that is:
where OP represents a P × 1 vector of zeros and I p is a τn
1
P ×P identity matrix. Note that the sample size is now equal Qτ yt +h |Wt , PCtτ .
∑ ( )
ft +h,t = (9)
to T + P, which means that the elastic net can potentially n
τ =τ1
select all P high-frequency predictors in all situations.
Based on the new variables, y+ +
t and Xt , the EN objec-
We name these two quantile combination based (MSE)
tive function in Eq. (6) can be rewritten in terms of the forecasts ft +h,t : LASSO and ft +h,t : EN, respectively,
ℓ1 -penalized quantile regression method studied by Bel- depending on whether LASSO or EN is used to select the
loni and Chernozhukov (2011), that is: predictors in the first step. Likewise, we refer to the fol-
T ∗ +P
lowing conditional mean forecast as ftMIDAS
+h,t :
∑
min ρτ (y+
t +h − Xt φ (τ )) + γτ ∥φ (τ )∥ℓ1
+′
(7) τn
1 ( )
β,φ
∑ j
t =1 ftMIDAS
+h,t = Qτ yt +h |Wt , X̃t , (10)
n
τ =τ1
where γτ = √λ1τ . As was noticed by Zou and Hastie
1+λ2τ j
(2005), the optimizer in Eq. (7) is probably different from where X̃t is the transformed vector of high-frequency pre-
that in Eq. (6), but since our interest at this stage is in vari- dictors obtained using the Almon lag polynomial function.
able selection, we follow Bai and Ng (2008) and consider The next section makes use of these two methods for
only the ordering of variables provided by the elastic net in forecasting the industrial production IP growth rate.
Eq. (7). The optimal value of λ2τ 8 is obtained by minimizing Finally, we also consider a quantile combination based
forecast with a fixed lag structure. In other words, instead
7 Unlike us, Bai and Ng (2008) estimate the conditional mean directly
based on the MSE loss function. 9 The penalty of EN models with parameter λ and mixing parameter α
8 Since a random number is involved in the estimation of λ as part can be represented by 1−α λ∥φ∥ℓ2 + αλ∥φ∥ℓ1 . Hence, if α = 1, we have a
2τ 2
of the cross-validation process, we repeat the process 100 times in order LASSO estimator; if α = 0, we have Ridge. To combine LASSO and Ridge,
to reduce the randomness in the results. we choose α = 0.5.
of applying step 1 at various quantiles τ ∈ (0, 1), we the variables used in this paper and the numbers of lags
perform the lag selection in the mean model only and use included in the regression equation are provided in Table 1.
the selected lags to compute a single set of common fac- Given 669 observations12 of IP growth, we use the first
tors, PCt . Specifically, we consider the following forecasting 228 observations, from 1963m1 to 1981m12, as the ini-
model: tial estimation sample. For longer forecast horizons h =
τn (3, 6, 9, 12), the initial estimation windows are shortened,
∑ 1
gt +h,t = Qτ (yt +h |Wt , PCt ) . (11) ranging from 1962m12 + h to 1981m9 − h. This estima-
n tion sample is expanded periodically as the out-of-sample
τ =τ1
forecast moves forwards (recursive forecasting scheme).
The important aspect to notice here is that the common
factor PCt is the same across quantiles τ ∈ (0, 1).10 We 4.2. Forecasting models and MSE decomposition
name these two additional quantile combination based
forecasts gt +h,t : LASSO and gt +h,t : EN, respectively, We estimate two classes of forecasting models based on
depending on whether LASSO or EN is used to select a lag whether or not they include the macroeconomic factors.
structure for the high-frequency predictors in the mean The first class includes the macroeconomic factors along
model. Notice that the only difference between gt +h,t and with up to 12 months of past ADS index observations and
ft +h,t is that the latter also performs lag selection across four lags of yt . As per Pettenuzzo et al. (2016), we include
quantiles, whereas the former relies on the lag structure two lags for each macroeconomic factor. The second class
of the mean model. Thus, gt +h,t is more efficient than ft +h,t does not include the macroeconomic factors but keeps the
computationally, but less accurate. The paper investigates autoregressive lags and the ADS index. Within each class,
this trade-off and makes a recommendation about which we compute the quantile combination forecasts described
forecasting model should be used in practice. by Eqs. (9)–(11), as well as the OLS-based forecasts yt +h,t
(mean model). Note that the mean model yt +h,t does not
4. Empirical analysis take advantage of the distributional information in order
to enhance the forecasting accuracy, and is therefore to
4.1. Data be interpreted as a non-robust version of ft +h,t and gt +h,t .
Finally, recall that the lag structure used by the forecasts
We obtain monthly data on the U.S. growth in industrial gt +h,t is identical to that used by yt +h,t . This leads us to
production from the Federal Reserve of St. Louis FRED propose the following unique decomposition of the mean
database. Specifically, let (Yt ) denote the monthly season- squared error (MSE):
ally adjusted industrial production index (IPI) at time t
( )
j
MSE yMIDAS
( )
and define yt = ln( Y Yt ) × 100. The monthly predictor t +h,t −MSE gt +h,t
t −1 [ ( )]
variables are an updated version of the 132 macroeconomic j
= MSE yMIDAS
( )
t +h,t −MSE yt +h,t +
series used by Ludvigson and Ng (2009). Following Pet- [ ( ) ( )]
tenuzzo et al. (2016), these series are selected to represent + MSE yjt +h,t −MSE gtj+h,t . (12)
broad categories of macroeconomic quantities such as real
output and income, employment and hours, real retail, If the quantile combination method is more robust
j
manufacturing and trade sales, consumer spending, hous- than OLS, then forecasts from gt +h,t will be (more accurate
ing starts, inventories, and inventory sales ratios, orders
)
j j
than those from yt +h,t , and therefore MSE yt +h,t −MSE
and unfilled orders, compensation and labor costs, capacity ( )
j
utilization measures, price indexes, bond and stock mar- gt +h,t will be positive.13 This leads us to re-write Eq. (12)
ket indexes, and foreign exchange measures. Again fol- as:
lowing Pettenuzzo et al. (2016), we extract two common ( ) (
j
)
factors from the 132 macroeconomic series using principal MSE yMIDAS
t +h,t −MSE gt +h,t
components. (
j
) (
j
)
Regarding the daily series, we take advantage of the MSE yt +h,t −MSE gt +h,t
previous result reported by Pettenuzzo et al. (2016), in that, ( ) (
j
)
overall, the models that included the ADS daily business MSE yMIDAS
t +h,t −MSE yt +h,t
cycle index of Aruoba, Diebold, and Scotti (2009) outper- =1+ ( ) ( ). (13)
j j
formed those that included daily series other than the MSE yt +h,t −MSE gt +h,t
ADS index.11 In other words, the ADS index has the best Hence, Eq. (13) can be interpreted as a goodness-of-
predictive power among all other daily series, and thus, fit measure for out-of-sample forecasting. It computes the
this paper did not consider forecasting models with daily j
accuracy loss (relative to gt +h,t ) that is caused by the MIDAS
series other than the ADS index. Complete descriptions of model. There are three potential results:
10 We select the lags in the mean model using LASSO (elastic net) by 12 Recall that we include four autoregressive lags of y , meaning that
t
replacing the tilting function ρτ with the quadratic function and obtaining the dependent variable starts in 1963Q1.
the optimal values of the tuning parameters through the procedure sug- 13 Recall that g j j
and y
t + h, t use exactly the same lag structure. Our
t +h,t
gested in the GLMNET R package (linear regression section) developed by
j
Trevor Hastie and Junyang Qian. empirical results confirm that forecasts from gt +h,t are more accurate
11 We also run the models with other daily series, but the results are (lower MSEs) than those obtained from yt +h,t and yMIDAS
j
t +h,t , especially for
qualitatively identical to those reported by Pettenuzzo et al. (2016). forecasting horizons as short as h = 1,3.
6
L.R. Lima, F. Meng and L. Godeiro / International Journal of Forecasting xxx (xxxx) xxx
Table 1
Description of variables.
Variable(s) Frequency Range # of lags Description Data sources
IP growth rate Monthly 1962m1–2017m9 4 Monthly log industrial production index (IP) Federal Reserve Bank of St. Louis
(yt )
132 macro series Monthly 1962m1–2017m9 12 ‘‘output and income’’, (Jurado, Ludvigson, & Ng, 2015)
‘‘labor market’’, ‘‘housing’’,
‘‘consumption, orders and inventories’’,
‘‘money and credit’’ and ‘‘bond and exchange rates’’
ADS Daily Jan. 1, 1962–Sep. 30, 2017 365 ADS daily business cycle variable of Aruoba et al. (2009) Federal Reserve Bank of Philadelphia
( ) ( )]
j j j
[
MSE yMIDAS
t + h, t
−MSE gt +h,t using gt +h,t instead of ft +h,t ; that is, it measures the trade-
(i) (
j
) (
j
) = 1. In this case, MSE
MSE yt +h,t −MSE gt +h,t off between fixed and flexible lag structures for the quan-
j
tile combination methods. Although both methods ft +h,t
( )
= MSE yjt +h,t , and therefore there is no
( )
yMIDAS
t +h,t j
and gt +h,t are robust to outliers, the latter is more efficient
advantage of using LASSO (EN) to select the lags in
computationally than the former; however, it imposes the
the mean model. The accuracy of the mean models
j restriction that the lag structure of the quantile function is
yt +h,t is identical to the accuracy of yMIDAS
t +h,t (the sec-
identical to that used by the mean model. Thus, there are
ond term on the right-hand side is zero).
[ ( )
MSE yMIDAS
( )]
−MSE gt +h,t
j three possible results:
t +h,t
(ii) (
j
) (
j
) > 1. In this case, MSE [ (
j
)
MSE yt +h,t −MSE ft +h,t
(
j
)]
MSE yt +h,t −MSE gt +h,t
( ) (i) (
j
) (
j
) = 1. In this case, MSE
> MSE yjt +h,t , and therefore MIDAS is
( )
yMIDAS
t +h,t (
) ( )
j
less accurate than models where the lag structure is gt +h,t = MSE ftj+h,t , and therefore there is no
selected by LASSO or elastic net (the second term on advantage in selecting a different lag structure for
the
[ right-hand side is)positive). different
( ) (
MSE yMIDAS
]
−MSE gt +h,t
j [ ( quantiles.
) (
j
)]
j
t +h,t MSE yt +h,t −MSE ft +h,t
(iii) (
j
) (
j
) < 1. In this case, MSE (ii) (
j
) (
j
) > 1. In this case, MSE
MSE yt +h,t −MSE gt +h,t MSE yt +h,t −MSE gt +h,t
( ) ( ) ( )
j
< MSE
( MIDAS
) j
yt +h,t yt +h,t , and therefore MIDAS is gt +h,t > MSE ftj+h,t , and therefore the flexible
more accurate than models where the lag structure j
quantile combination method ft +h,t is more accurate
is selected by LASSO or elastic net (the second term j
than the fixed quantile combination method gt +h,t .
on the right-hand side is negative).
Hence, performing lag selection at different quan-
The decomposition above is useful for illustrating the tiles results in more accurate forecasts. In this case,
contributions of LASSO and the elastic net to the selection the forecasting accuracy of the quantile combination
j
of the lag structure in the mean models yt +h,t , but it does method will also depend on our ability to select an
not show how much more accuracy is gained when one appropriate lag structure for each quantile. This is
j
applies LASSO or the elastic net to the selection of the lag carried out by model ft +h,t , which is computationally
j
structure across quantiles. Indeed, the forecasting model inefficient relative to)gt +h,t .
j j
proposed in this paper, ft +h,t , differs from gt +h,t in the sense
( ) (
j j
[ ]
MSE yt +h,t −MSE ft +h,t
that it does not impose the same lag structure of the mean (iii) (
j
) (
j
) < 1. In this case, gtj+h,t is not
model on all quantiles. Thus, we evaluate the contribution j
of relaxing such a restriction through the following decom- only more efficient computationally than ft +h,t , but
position: also more accurate.
( ) ( )
j j
MSE yt +h,t −MSE ft +h,t 4.3. Forecast evaluation
[ ( ) ( )]
= MSE yjt +h,t −MSE gtj+h,t + (14) We evaluate the performances of different models for
forecasting the IP growth rate by computing the root mean
[ ( ) ( )]
+ MSE gtj+h,t −MSE ftj+h,t . j
squared error (RMSE) of each model ỹt +h,t ∈ (ft +h,t , gt +h,t ,
Again, if the quantile combination method is more ro- yt +h,t , ft +h,t , yt +h,t ) relative to the benchmark model,
MIDAS MIDAS
j
bust than OLS, then forecasts from gt +h,t will be ŷt +h,t .15
( more
Our initial estimation sample spans the period from
)
j j
accurate than those from yt +h,t , and therefore MSE yt +h,t
( ) 1962m1 to 1981m12, and we have two out-of-sample
−MSE gtj+h,t will be positive.14 This leads us to re-write (OOS) periods: 1982m1–2017m9 and 1982m1–2011m12.
Eq. (14) as: The second out-of-sample period is identical to that uti-
( ) ( ) lized by Pettenuzzo et al. (2016), so that part of our results
j j
MSE yt +h,t −MSE ft +h,t are comparable to part of theirs. We use the recursive
( ) ( ) forecasting scheme so that the estimation sample increases
j j
MSE yt +h,t −MSE gt +h,t as we move toward the end of the sample period.
( ) ( ) The RMSE is calculated as follows:
j j
MSE gt +h,t −MSE ft +h,t √∑
T∗
=1+ ( ) ( ). (15) t =1 (yt +h − ỹjt +h,t )2
MSE
j
yt +h,t −MSE j
gt +h,t RMSEjh = √∑ , (16)
T∗ 2
t =1 (yt +h − ŷt +h,t )
Thus, the decomposition in Eq. (15) can also be in-
terpreted as a goodness-of-fit measure for out-of-sample
forecasting. It computes the accuracy loss obtained from 15 Following Pettenuzzo et al. (2016), we estimated two benchmark
models: (i) a naive random walk where the forecast of yt is set equal
to yt −h ; and (ii) an AR model with four lags, the forecast of which is a
14 This is confirmed strongly by our empirical results, especially for linear combination of (yt −h , yt −h−1 , yt −h−2 , yt −h−3 ), where h is the forecast
forecasting horizons as short as h = 1,3. horizon.
Fig. 1. IP growth rate from Jan. 1963 to Sep. 2017. As can be seen, extreme observations appear in 1971, 1978, 1980 and 2008.
where T ∗ is the number of h-step-ahead out-of-sample Tables 2 and 3 present the RMSEs. Panel A of each
(OOS) forecasts. If the value of RMSEj is lower than one, table shows the result for the entire out-of-sample period,
model j outperforms the benchmark model in terms of whereas panel B shows the results for the out-of-sample
RMSEs, producing better IP growth forecasts. period considered by Pettenuzzo et al. (2016).
We test whether a forecasting model j produces signif- Table 2 considers the random walk model as our bench-
icantly better forecasts by testing the null hypothesis of mark forecasting model, while Table 3 uses an AR(4) model.
equal predictability using the tests proposed by Diebold Like Pettenuzzo et al. (2016), we use the Diebold and
and Mariano (2002) and Clark and West (2007). The latter Mariano (2002) test to compare each forecasting model to
was designed for comparing nested models, whereas the the benchmark.
former has the correct size only when the models are non- Table 2 confirms the findings of Pettenuzzo et al. (2016),
nested. If the forecasting models are nested, the Diebold– in that the MIDAS approach outperforms the random walk
Mariano test may be undersized under the null and may model easily across all forecast horizons. We also notice
have a low power under the alternative hypothesis.
that selecting lags through LASSO and the elastic net im-
proves the forecasting accuracy relative to MIDAS, and
4.4. Empirical results
considering a quantile combination approach improves the
forecasting accuracy over the mean models (yt +h,t ) es-
As was stated in the introduction to this paper, Tibshi-
rani (1996) emphasized that the MSE of a forecast is related timated by OLS. Table 3 considers an AR(4) benchmark,
directly to that of a estimator. Fig. 1 shows the monthly which is more accurate than a random walk. This implies
IP growth rates from 1963m1 to 2017m9. Extremely large that such a benchmark will be harder to beat than the
negative growth rates appear during recession periods, naive random walk. Indeed, the results reported in Table 3
such as the oil shocks in the seventies and during the clearly indicate that the proposed forecasting models no
mortgage crisis in 2008–2009. The existence of extreme longer outperform the AR(4) for h > 6. Interestingly,
observations signals that the data may violate the normal the only models that outperform AR(4) for h = 3 are
distribution assumptions required for the efficiency of the those that use the quantile combination approach (ft +h,t ).
OLS estimator. For this reason, forecasting models that rely All in all, Tables 2 and 3 suggest that using LASSO and
on the OLS estimator could be outperformed by models the elastic net to select a lag structure for the ADS index
that use robust estimation methods, such as the quantile results in better forecasts than imposing a MIDAS structure.
combination approach proposed in this paper. Moreover, OLS-based forecasts (yt +h,t ) are outperformed
Fig. 2. Scatterplot of the forecast variance and squared forecast bias for the FAR-ADS ft +h,t , gt +h,t and yt +h,t models relative to AR(4), 1982M1–2017M9,
with a forecast horizon of h = 1. Notes: The y-axis and the x-axis respectively represent the relative forecast variance and squared forecast bias of the
ft +h,t , gt +h,t and yt +h,t models, calculated as the difference between the forecast variance (squared bias) of the conditional model and that of the AR(4). Each
point on the dotted line represents a forecast with the same MSE as the AR(4); the points to the right are forecasts that are outperformed by the AR(4),
while the points to the left represent forecasts that outperform the AR(4). In general, the ft +h,t models are able to reduce the relative squared forecast bias
substantially. The green points are yt +h,t models and the blue points are ft +h,t and gt +h,t . (For interpretation of the references to colour in this figure legend,
the reader is referred to the web version of this article.)
by their robust counterpart obtained from the quantile robustness of ft +h,t but requires much less computation
combination method (ft +h,t ). time.
In general, the approach proposed in this paper (quan- The results displayed in Tables 2 and 3 show that gt +h,t
tile combination) produces a strong forecasting accuracy is less accurate than ft +h,t , but the accuracy loss does not
gain. In fact, Table 4 reports the p-values for the null hy- seem too large. Indeed, Table 5 reports the p-value for the
pothesis of equal RMSEs for the mean forecasting model Clark–West test of the null hypothesis of equal accuracy
(yt +h,t ) and its corresponding quantile combination version between gt +h,t and ft +h,t , and one can see that the null is
(ft +h,t ). The alternative hypothesis corresponds to lower not rejected at the usual level of 5% for most of the specifi-
RMSE of the quantile combination based forecast, ft +h,t . cations and forecasting horizons. These results suggest that
Since these two models are nested, we test such a hypoth- we could take gt +h,t as an efficient approximation of ft +h,t ,
esis using the test proposed by Clark and West (2007).
meaning that it can be computed faster and yet retain the
The results displayed in Table 4 clearly indicate that the
robustness property of ft +h,t .
null hypothesis of equal forecasting accuracy (measured by
the RMSE) is rejected for most of the forecasting horizons,
suggesting that the quantile combination based forecast 4.5. Explaining the benefits of the QCA with mixed-frequency
ft +h,t is more accurate than their corresponding OLS based data
forecasts, yt +h,t .
The above result suggests that we can improve the fore- This section decomposes the mean square forecast error
casting accuracy by considering robust estimation meth- (MSE) into two parts: the forecast variance and the squared
ods, which is in line with the result reported by Tibshirani j
forecast bias. We calculate the MSE of any forecast ỹt +h,t
(1996). The approach proposed in this paper approximates ∑ ( j
)2
1
the conditional mean (the MSE forecast) through the com- as T∗ t yt +h − ỹt +h,t and the unconditional forecast
bination of conditional quantiles. However, the selection of 1
∑ ( j 1 j
)2
, where T ∗ is the
∑
a lag structure that is specific to each quantile is inefficient variance as T∗ t ỹt +h,t − T∗ t ỹt +h,t
computationally. Tables 2 and 3 report results for the fore- total number of out-of-sample forecasts. The squared fore-
cast gt +h,t , which also relies on the quantile combination cast bias is computed as the difference between the MSE
approach but, unlike ft +h,t , utilizes the same lag structure and the forecast variance (Elliott & Timmermann, 2013;
as the mean model yt +h,t . For this reason, gt +h,t retains the Rapach, Strauss, & Zhou, 2010).
Fig. 3. Scatterplot of the forecast variance and squared forecast bias for the AR-ADS ft +h,t , gt +h,t and yt +h,t models relative to AR(4), 1982M1–2017M9, with
a forecast horizon of h = 1. Notes: The y-axis and the x-axis respectively represent the relative forecast variance and squared forecast bias of ft +h,t , gt +h,t
and yt +h,t models, calculated as the difference between the forecast variance (squared bias) of the conditional model and that of the AR(4). Each point on the
dotted line represents a forecast with the same MSE as the AR(4); the points to the right are forecasts that are outperformed by the AR(4), while the points
to the left represent forecasts that outperform the AR(4). In general the ft +h,t models are able to reduce the relative squared forecast bias substantially. The
green points are yt +h,t models and the blue points are ft +h,t and gt +h,t . (For interpretation of the references to colour in this figure legend, the reader is
referred to the web version of this article.)
Figs. 2 and 3 depict the relative forecast variances and Figs. 2 and 3 confirm that the benchmark AR(4) forecast
j
squared forecast biases of all forecasting models ỹt +h,t ∈ has the lowest variance but it is biased. Therefore, the only
(ft +h,t , gt +h,t , yt +h,t , ft +h,t , yt +h,t ) for the entire out-of-
MIDAS MIDAS channel to which one can outperform the benchmark is
sample period with h = 1. The relative forecast variance through a reduction of the forecast bias. A good model
(squared bias) is calculated as the difference between the will be the one that can substantially reduce bias while
forecast variance (squared bias) of the jth model and the keeping variance under control. Figs. 2 and 3 show that if
forecast variance (squared bias) of the benchmark AR(4) one uses the lag structure suggested by the MIDAS model,
model. Hence, the value of the relative forecast variance then the gain over the benchmark is not very large. It seems
(squared bias) for the AR(4) is necessarily equal to zero. that the lag structure imposed by the MIDAS model is not
Each point on the dotted line represents a forecast with the maximizing forecasting accuracy because it is not able to
same MSE as the AR(4); points to the right of the line are reduce bias substantially.
forecasts that are outperformed by the AR(4), and points If we replace the lag structure imposed by the MIDAS
to the left represent forecasts that outperform the AR(4). model with a lag structure selected by either LASSO or the
Finally, the forecast variance and the squared forecast bias elastic net, the forecasting accuracy of the mean models
are both measured in the same scale, so that it is possible (yt +h ) improves substantially relative to the MIDAS model.
to determine the trade-off between the variance and bias Furthermore, additional improvements can be obtained if
of each forecasting model. we also approximate the conditional mean through the
Since the AR(4) forecast is a parsimonious version of the combination of conditional quantiles (forecasts ft +h and
other forecasting models, it will have the lowest variance gt +h ), because quantile combination reduces the squared
among all of them, but it will be biased. Hence, a model will bias of the forecasts further. Note that the accuracy loss that
outperform the AR(4) forecast only if it is able to reduce results from using gt +h rather than ft +h is explained by the
the forecast bias substantially relative to the AR(4). We fact that the latter is able to reduce the forecast bias more
report results for forecasts that use both distributional and than the former, but the reduction is not very large.
high-frequency information, i.e., ft +h,t and gt +h ; as well as We identify the contribution of replacing MIDAS with
for forecasts that rely only on high-frequency information, LASSO or the elastic net, as well as using the computa-
i.e., yt +h,t . According to our explanation, if robust estima- tionally more efficient forecast gt +h rather than the more
tion and correct selection of the lag structure are carried accurate forecast ft +h , by reporting the values of Eqs. (13)
out by the user, one can improve the forecasting accuracy and (15) for the entire out-of-sample period and h = 1 in
over the strong AR(4) benchmark. Table 6.
Table 2
RMSEs of models and equal-weighted forecast combinations relative to the RW.
h=1 h=3 h=6 h=9 h = 12
Models: RMSE DM RMSE DM RMSE DM RMSE DM RMSE DM
Panel A. Out-of-sample period 1982m1–2017m9
FC-all 0.424 0.000 0.757 0.000 0.732 0.000 0.730 0.000 0.693 0.000
FC-(ft +h,t , gt +h,t ) 0.403 0.000 0.755 0.000 0.733 0.000 0.732 0.000 0.692 0.000
FC-yt +h,t 0.484 0.000 0.764 0.000 0.734 0.000 0.731 0.000 0.698 0.000
yt +h,t -FAR-LASSO 0.559 0.000 0.769 0.000 0.755 0.000 0.758 0.000 0.715 0.000
yt +h,t -FAR-EN 0.491 0.000 0.767 0.000 0.756 0.000 0.759 0.000 0.728 0.000
yt +h,t -FAR-MIDAS 0.677 0.000 0.781 0.000 0.754 0.000 0.758 0.000 0.726 0.000
ft +h,t -FAR-LASSO 0.393 0.000 0.759 0.000 0.743 0.000 0.764 0.000 0.714 0.000
ft +h,t -FAR-EN 0.394 0.000 0.755 0.000 0.745 0.000 0.763 0.000 0.710 0.000
ft +h,t -FAR-MIDAS 0.669 0.000 0.770 0.000 0.748 0.000 0.762 0.000 0.724 0.000
yt +h,t -AR-LASSO 0.537 0.000 0.800 0.000 0.779 0.000 0.745 0.000 0.706 0.000
yt +h,t -AR-EN 0.490 0.000 0.822 0.000 0.783 0.000 0.750 0.000 0.711 0.000
yt +h,t -AR-MIDAS 0.677 0.000 0.789 0.000 0.766 0.000 0.750 0.000 0.708 0.000
ft +h,t -AR-LASSO 0.392 0.000 0.780 0.000 0.761 0.000 0.738 0.000 0.696 0.000
ft +h,t -AR-EN 0.393 0.000 0.772 0.000 0.757 0.000 0.740 0.000 0.696 0.000
ft +h,t -AR-MIDAS 0.670 0.000 0.780 0.000 0.764 0.000 0.751 0.000 0.708 0.000
gt +h,t -FAR-LASSO 0.400 0.000 0.764 0.000 0.746 0.000 0.752 0.000 0.721 0.000
gt +h,t -FAR-EN 0.399 0.000 0.755 0.000 0.746 0.000 0.754 0.000 0.716 0.000
gt +h,t -AR-LASSO 0.398 0.000 0.777 0.000 0.773 0.000 0.744 0.000 0.706 0.000
gt +h,t -AR-EN 0.400 0.000 0.793 0.000 0.776 0.000 0.747 0.000 0.709 0.000
Panel B. Out-of-sample period 1982m1–2011m12
FC-all 0.417 0.000 0.751 0.000 0.722 0.000 0.725 0.000 0.691 0.000
FC-(ft +h,t , gt +h,t ) 0.397 0.000 0.750 0.000 0.722 0.000 0.727 0.000 0.689 0.000
FC-yt +h,t 0.478 0.000 0.758 0.000 0.723 0.000 0.726 0.000 0.696 0.000
yt +h,t -FAR-LASSO 0.565 0.000 0.760 0.000 0.742 0.000 0.752 0.000 0.711 0.000
yt +h,t -FAR-EN 0.483 0.000 0.756 0.000 0.743 0.000 0.753 0.000 0.725 0.000
yt +h,t -FAR-MIDAS 0.664 0.000 0.772 0.000 0.738 0.000 0.751 0.000 0.722 0.000
ft +h,t -FAR-LASSO 0.389 0.000 0.751 0.000 0.729 0.000 0.758 0.000 0.710 0.000
ft +h,t -FAR-EN 0.390 0.000 0.747 0.000 0.731 0.000 0.758 0.000 0.705 0.000
ft +h,t -FAR-MIDAS 0.657 0.000 0.763 0.000 0.733 0.000 0.755 0.000 0.721 0.000
yt +h,t -AR-LASSO 0.543 0.000 0.799 0.000 0.774 0.000 0.742 0.000 0.707 0.000
yt +h,t -AR-EN 0.485 0.000 0.822 0.000 0.779 0.000 0.749 0.000 0.711 0.000
yt +h,t -AR-MIDAS 0.666 0.000 0.787 0.000 0.757 0.000 0.747 0.000 0.708 0.000
ft +h,t -AR-LASSO 0.388 0.000 0.778 0.000 0.754 0.000 0.735 0.000 0.696 0.000
ft +h,t -AR-EN 0.389 0.000 0.769 0.000 0.750 0.000 0.738 0.000 0.695 0.000
ft +h,t -AR-MIDAS 0.661 0.000 0.778 0.000 0.756 0.000 0.748 0.000 0.708 0.000
gt +h,t -FAR-LASSO 0.396 0.000 0.757 0.000 0.733 0.000 0.746 0.000 0.717 0.000
gt +h,t -FAR-EN 0.395 0.000 0.748 0.000 0.733 0.000 0.748 0.000 0.712 0.000
gt +h,t -AR-LASSO 0.394 0.000 0.776 0.000 0.767 0.000 0.741 0.000 0.706 0.000
gt +h,t -AR-EN 0.396 0.000 0.792 0.000 0.771 0.000 0.745 0.000 0.709 0.000
Notes: The table shows the RMSEs of models and equal-weighted forecast combinations relative to the RW benchmark. A value less than one suggests that
the forecast from a given model outperforms that of the RW benchmark. We also report the p-values of the Diebold and Mariano (2002) test for the null
hypothesis of equal accuracy between the benchmark and another given model. If the p-value is below 0.1, we conclude that the forecast from a given
model is more accurate statistically than that of the benchmark. Panel A reports the out-of-sample period 1982m1–2017m9, whereas Panel B reports the
period considered by Pettenuzzo et al. (2016), 1982m1 to 2011m12.
Recall that if the ratio in Eq. (13) is greater (less) than 4.6. Forecast combination
one, then the MIDAS approach produces less (more) accu-
rate forecasts than the approaches that use LASSO or the This paper covers two large groups of forecasts, which
elastic net to select the lag structure for the high-frequency differ in terms of the inclusion of the high-frequency pre-
predictor. Likewise, if the ratio in Eq. (15) is greater (less) dictor and whether or not distributional information is
than one, then the fixed quantile-combination method considered for forecasting. Thus, a forecaster will poten-
tially face model uncertainty which could be minimized if
gt +h is less accurate than the flexible quantile-combination
he/she adopted a strategy of forecast combination as per
method ft +h . The results in Table 6 suggest two main con-
Andreou, Ghysels, and Kourtellos (2013) and Pettenuzzo
clusions: (i) it is worthwhile using LASSO or the elastic net
et al. (2016). Due to our sample size restriction, we com-
to select a lag structure for the high-frequency predictor bine forecasts using equal weights, but other weighting
(Eq. (13) is well above one); and (ii) despite the accu- schemes could be considered, such as the squared dis-
racy gain obtained when one uses quantile combination counted MSE weight used by Andreou et al. (2013), for
methods with a lag structure selected for each quantile example. The equal-weight forecast is defined as follows:
included in the combination (model ft +h,t ), using a fixed lag m
structure for all quantiles (model gt +h,t ) results in a similar
∑ 1 j
ỹEW
t +h,t = ỹt +h,t ,
accuracy (Eq. (15) is only slightly greater than one). m
j=1
Table 3
RMSEs of models and equal-weighted forecast combinations relative to the AR(4).
h=1 h=3 h=6 h=9 h = 12
Models: RMSE DM RMSE DM RMSE DM RMSE DM RMSE DM
Panel A. Out-of-sample period 1982m1–2017m9
FC-all 0.530 0.000 0.945 0.003 0.957 0.023 0.990 0.264 0.993 0.354
FC-ft +h,t , gt +h,t 0.504 0.000 0.943 0.001 0.958 0.020 0.992 0.295 0.991 0.303
FC-yt +h,t 0.605 0.000 0.954 0.017 0.960 0.039 0.990 0.292 1.000 0.501
yt +h,t -FAR-LASSO 0.699 0.000 0.961 0.113 0.987 0.365 1.027 0.828 1.023 0.786
yt +h,t -FAR-EN 0.613 0.000 0.957 0.066 0.988 0.372 1.029 0.852 1.043 0.918
yt +h,t -FAR-MIDAS 0.847 0.000 0.976 0.218 0.986 0.355 1.027 0.827 1.039 0.859
ft +h,t -FAR-LASSO 0.491 0.000 0.947 0.029 0.972 0.216 1.035 0.870 1.023 0.778
ft +h,t -FAR-EN 0.493 0.000 0.943 0.023 0.974 0.237 1.034 0.867 1.016 0.713
ft +h,t -FAR-MIDAS 0.836 0.000 0.962 0.103 0.978 0.272 1.032 0.862 1.037 0.851
yt +h,t -AR-LASSO 0.671 0.000 0.999 0.481 1.018 0.935 1.009 0.730 1.011 0.716
yt +h,t -AR-EN 0.613 0.000 1.027 0.878 1.023 0.981 1.017 0.949 1.018 0.847
yt +h,t -AR-MIDAS 0.846 0.000 0.985 0.212 1.002 0.537 1.017 0.877 1.014 0.789
ft +h,t -AR-LASSO 0.490 0.000 0.974 0.009 0.994 0.222 1.000 0.474 0.997 0.287
ft +h,t -AR-EN 0.491 0.000 0.964 0.002 0.990 0.121 1.003 0.784 0.996 0.256
ft +h,t -AR-MIDAS 0.838 0.000 0.974 0.067 0.999 0.476 1.018 0.928 1.014 0.830
gt +h,t -FAR-LASSO 0.500 0.000 0.953 0.048 0.975 0.211 1.020 0.763 1.032 0.871
gt +h,t -FAR-EN 0.499 0.000 0.943 0.017 0.975 0.208 1.022 0.789 1.025 0.825
gt +h,t -AR-LASSO 0.498 0.000 0.970 0.006 1.010 0.877 1.008 0.861 1.011 0.766
gt +h,t -AR-EN 0.500 0.000 0.991 0.211 1.015 0.937 1.013 0.967 1.015 0.839
Panel B. Out-of-sample period 1982m1–2011m12
FC-all 0.519 0.000 0.937 0.002 0.949 0.011 0.986 0.199 0.988 0.285
FC-ft +h,t , gt +h,t 0.493 0.000 0.936 0.001 0.949 0.010 0.987 0.222 0.986 0.228
FC-yt +h,t 0.594 0.000 0.945 0.008 0.951 0.021 0.986 0.230 0.996 0.440
yt +h,t -FAR-LASSO 0.703 0.000 0.948 0.064 0.975 0.261 1.022 0.769 1.017 0.711
yt +h,t -FAR-EN 0.601 0.000 0.944 0.029 0.977 0.276 1.024 0.793 1.037 0.875
yt +h,t -FAR-MIDAS 0.825 0.000 0.963 0.130 0.971 0.234 1.020 0.748 1.034 0.810
ft +h,t -FAR-LASSO 0.484 0.000 0.936 0.015 0.959 0.144 1.030 0.821 1.015 0.689
ft +h,t -FAR-EN 0.485 0.000 0.932 0.011 0.962 0.159 1.029 0.820 1.009 0.613
ft +h,t -FAR-MIDAS 0.817 0.000 0.951 0.062 0.964 0.181 1.026 0.800 1.031 0.795
yt +h,t -AR-LASSO 0.675 0.000 0.997 0.431 1.017 0.914 1.008 0.701 1.011 0.707
yt +h,t -AR-EN 0.604 0.000 1.026 0.859 1.024 0.980 1.018 0.946 1.018 0.838
yt +h,t -AR-MIDAS 0.828 0.000 0.981 0.174 0.996 0.419 1.016 0.853 1.013 0.773
ft +h,t -AR-LASSO 0.482 0.000 0.971 0.006 0.991 0.139 0.999 0.384 0.996 0.222
ft +h,t -AR-EN 0.484 0.000 0.959 0.001 0.987 0.072 1.002 0.703 0.995 0.201
ft +h,t -AR-MIDAS 0.822 0.000 0.971 0.051 0.994 0.361 1.016 0.901 1.013 0.800
gt +h,t -FAR-LASSO 0.493 0.000 0.944 0.029 0.964 0.136 1.014 0.687 1.026 0.805
gt +h,t -FAR-EN 0.492 0.000 0.934 0.010 0.964 0.130 1.017 0.723 1.018 0.740
gt +h,t -AR-LASSO 0.490 0.000 0.968 0.006 1.009 0.830 1.008 0.826 1.010 0.733
gt +h,t -AR-EN 0.493 0.000 0.988 0.171 1.014 0.914 1.012 0.951 1.015 0.824
Notes: The table shows the RMSEs of models and equal-weighted forecast combinations relative to the AR(4) benchmark. A value less than one suggests
that the forecast from a given model outperforms that of the AR(4) benchmark. We also report the p-values of the Diebold and Mariano (2002) test for the
null hypothesis of equal accuracy between the benchmark and another given model. If the p-value is below 0.1, we conclude that the forecast from a given
model is more accurate statistically than that of the benchmark. Panel A reports the out-of-sample period 1982m1–2017m9, whereas Panel B reports the
period considered by Pettenuzzo et al. (2016), 1982m1 to 2011m12.
j
where ỹt +h,t represents a forecast. Thus, our combination forecasts that use high-frequency information, FC-yt +h,t .
includes forecasts from each individual group ft +h,t , gt +h,t Finally, the combination that includes forecasts from all
or yt +h,t , as well as from all groups. groups, FC-all, also produces sizable accuracy gains, con-
Tables 2 and 3 display the RMSEs of various forecast firming the idea that forecast combination can be seen as a
combinations relative to the benchmark random walk and hedge against misspecified models.
AR(4) models. A value below one suggests that the com-
bined forecasts outperform the benchmarks. The p-value of 5. Conclusion
the Diebold–Mariano (DM) test is presented next to each
RMSE. If the p-value is lower than 0.05, we conclude that This paper proposes an unified solution for address-
the forecast combination model produces significantly bet- ing two forms of estimation errors that affect the out-of-
ter forecasts than the benchmarks over the out-of-sample sample predictive accuracy. Specifically, we minimize the
periods. effect of occasional extreme observations on the predictive
Tables 2 and 3 suggest that combining forecasts from accuracy by considering a combination of quantile fore-
models that incorporate high-frequency and distributional casts and by using LASSO or the elastic net to select the
information, FC-ft +h,t , gt +h,t , produces sizable accuracy lag structure of the high frequency predictor in a quantile
gains relative to the benchmark models. A smaller but forecasting model. Thus, this approach incorporates dis-
significant accuracy gain is obtained if we only combine tributional and high-frequency information into the same
Table 4
Clark–West p-values for the null hypothesis of equal predictive ability.
Comparison h=1 h=3 h=6 h=9 h = 12
LASSO models
ft +h,t FAR-LASSO vs. yt +h,t FAR-LASSO 0.005 0.097 0.005 0.838 0.476
ft +h,t AR-LASSO vs. yt +h,t AR-LASSO 0.012 0.038 0.017 0.232 0.200
EN models
ft +h,t FAR-EN vs. yt +h,t FAR-EN 0.000 0.060 0.027 0.678 0.017
ft +h,t AR-EN vs. yt +h,t AR-EN 0.000 0.003 0.004 0.098 0.077
MIDAS models
ft +h,t FAR-MIDAS vs. yt +h,t FAR-MIDAS 0.020 0.010 0.076 0.827 0.345
ft +h,t AR-MIDAS vs. yt +h,t AR-MIDAS 0.034 0.008 0.280 0.606 0.537
Notes: The table displays the p-values of the one-sided Clark and West (2007) test, under the null hypothesis
of equal predictability of the ft +h,t and yt +h,t models across the three major methods.
Table 5
Clark–West p-values for the null hypothesis of equal predictive ability.
Comparison h=1 h=3 h=6 h=9 h = 12
LASSO models
ft +h,t FAR-LASSO vs. gt +h,t -FAR-LASSO 0.070 0.269 0.347 0.953 0.217
ft +h,t AR-LASSO vs. gt +h,t -AR-LASSO 0.066 0.706 0.036 0.116 0.134
EN models
ft +h,t FAR-EN vs. gt +h,t - FAR-EN 0.098 0.503 0.434 0.921 0.240
ft +h,t AR-EN vs. gt +h,t AR-EN 0.057 0.004 0.013 0.082 0.075
Notes: The table displays the p-values of the one-sided Clark and West (2007) test, under the null hypothesis
of equal predictability of the ft +h,t and gt +h,t models across the two major methods.
Table 6 the forecasting accuracy, our results suggest that such a

Accuracy loss decomposition for h = 1. gain is marginal, and therefore does not justify the com-
Model Eq. (15) Eq. (13) putational time required to calculate it. For this reason, we
FAR LASSO 1.05 1.75 recommend using a quantile combination approach with a
FAR EN 1.11 3.14
fixed lag structure selected by either LASSO or the elastic
AR LASSO 1.04 2.01
AR EN 1.08 3.09 net. Our results confirm that this method is not only robust
to estimation errors, but also computationally efficient.16
Notes: The table displays the accuracy loss decomposition for h = 1. The
first column shows the accuracy loss from using gt +h,t rather than ft +h,t .
The second column shows the accuracy loss from using the lag structure Appendix A. Supplementary data
of the MIDAS model rather than selecting it through LASSO or the elastic
net.
Supplementary material related to this article can be
found online at https://doi.org/10.1016/j.ijforecast.2018.
09.011.
forecasting model, and hence helps to eliminate simultane-
ously two important sources of estimation error that affect
the out-of-sample predictive accuracy. References
Our empirical analysis of the U.S. industrial production
growth rate suggests that the new approach can improve Ahn, S. C., & Horenstein, A. R. (2013). Eigenvalue ratio test for the number
of factors. Econometrica, 81(3), 1203–1227.
the forecasting accuracy by a substantial margin. Specifi-
Andreou, E., Ghysels, E., & Kourtellos, A. (2013). Should macroeconomic
cally, the average MSEs of models that explore both dis- forecasters use daily financial data and how? Journal of Business &
tributional and high-frequency information is much lower Economic Statistics, 31(2), 240–251.
than those from forecasts that do not use distributional Aruoba, S., Diebold, F. X., & Scotti, C. (2009). Real-time measurement of
information. Our results indicate that shrinkage methods business conditions. Journal of Business & Economic Statistics, 27(4),
(LASSO and the elastic net) can be used to select the lags 417–427.
Bai, J., & Ng, S. (2008). Forecasting economic time series using targeted
of the high-frequency predictors, and that more accurate
predictors. Journal of Econometrics, 146(2), 304–317.
MSE forecasts can be obtained if one performs robust esti-
Belloni, A., & Chernozhukov, V. (2011). ℓ1-penalized quantile regression
mation of the conditional mean through the combination of in high-dimensional sparse models. The Annals of Statistics, 39(1), 82–
conditional quantiles. This paper reports the first empirical 130.
evidence in favor of this approach. Clark, T. E., & West, K. D. (2007). Approximately normal tests for equal
A simple decomposition of the MSEs of various fore- predictive accuracy in nested models. Journal of Econometrics, 138(1),
casting models shows that substantial accuracy gains are 291–311.
obtained when we replace the MIDAS lag structure with

that selected by either LASSO or the elastic net. Although 16 See supplementary material at https://doi.org/10.1016/j.ijforecast.
the use of a flexible lag structure across quantiles improves 2018.09.011.
Clements, M. P., & Galvão, A. B. (2008). Macroeconomic forecasting with Meligkotsidou, L., Panopoulou, E., Vrontos, I. D., & Vrontos, S. D. (2014). A
mixed-frequency data: forecasting output growth in the united states. quantile regression approach to equity premium prediction. Journal of
Journal of Business & Economic Statistics, 26(4), 546–554. Forecasting, 33(7), 558–576.
Clements, M. P., & Galvão, A. B. (2009). Forecasting us output growth Patton, A. J., & Timmermann, A. (2007). Properties of optimal forecasts un-
using leading indicators: An appraisal using MIDAS models. Journal der asymmetric loss and nonlinearity. Journal of Econometrics, 140(2),
of Applied Econometrics, 24(7), 1187–1206. 884–918.
Diebold, F. X., & Mariano, R. S. (2002). Comparing predictive accuracy. Pettenuzzo, D., Timmermann, A., & Valkanov, R. (2016). A midas approach
Journal of Business & Economic Statistics, 20(1), 134–144. to modeling first and second moment dynamics. Journal of Economet-
Elliott, G., & Timmermann, A. (2013). Handbook of economic forecasting. rics, 193(2), 315–334.
Elsevier. Rapach, D. E., Strauss, J. K., & Zhou, G. (2010). Out-of-sample equity
Elliott, G., & Timmermann, A. (2016). Economic forecasting. Princeton premium prediction: combination forecasts and links to the real econ-
University Press. omy. The Review of Financial Studies, 23(2), 821–862.
Foroni, C., Marcellino, M., & Schumacher, C. (2015). Unrestricted mixed Taylor, J. W. (2007). Using exponentially weighted quantile regression
data sampling (midas): midas regressions with unrestricted lag poly- to estimate value at risk and expected shortfall. Journal of Financial
nomials. Journal of the Royal Statistical Society, Series A (Statistics in Economic, 6(3), 382–406.
Society), 178(1), 57–82. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for Journal of the Royal Statistical Society. Series B. Statistical Methodology,
generalized linear models via coordinate descent. Journal of Statistical
58, 267–288.
Software, 33(1), 1–22.
Xiao, Z., & Lima, L. R. (2014). Testing covariance stationarity. Econometric
Ghysels, E., Santa-Clara, P., & Valkanov, R. (2004). The midas touch: Mixed
Reviews, 26(6), 643–667.
data sampling regression models, UCLA: Finance.
Zhao, Z., & Xiao, Z. (2014). Efficient regressions via optimally combining
Ghysels, E., Santa-Clara, P., & Valkanov, R. (2005). There is a risk-return
quantile information. Econometric Theory, 30(6), 1272–1314.
trade-off after all. Journal of Financial Economic, 76(3), 509–548.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the
Judge, G. G., Hill, R. C., Griffiths, W. E., Lütkepohl, H., & Lee, T. -C. (1988).
elastic net. Journal of the Royal Statistical Society. Series B. Statistical
Introduction to the theory and practice of econometrics. New York: John
Methodology, 67(2), 301–320.
Wiley & Sons.
Jurado, K., Ludvigson, S. C., & Ng, S. (2015). Measuring uncertainty. The
American Economic Review, 105(3), 1177–1216.
Kuzin, V., Marcellino, M., & Schumacher, C. (2011). MIDAS vs. mixed- Luiz Renato Lima is an associate professor of economics in the University
frequency VAR: Nowcasting GDP in the euro area. International Journal of Tennessee at Knoxville. He holds a Ph.D. in economics from the Univer-
of Forecasting, 27(2), 529–542. sity of Illinois at Urbana-Champaign and has published several papers on
Li, J. (2015). Sparse and stable portfolio selection with parameter uncer- economic forecasting using quantile regression techniques.
tainty. Journal of Business & Economic Statistics, 33(3), 381–392.
Lima, L. R., & Meng, F. (2017). Out-of-Sample return predictability: a Fanning Meng holds a Ph.D. in economics from the University of Ten-
quantile combination approach. Journal of Applied Econometrics, 32(4), nessee at Knoxville. She currently works for the Scotia Bank in Toronto,
877–895. Canada where she does model validation for credit risk analysis.
Ludvigson, S. C., & Ng, S. (2009). Macro factors in bond risk premia. The
Review of Financial Studies, 22(12), 5027–5067.
Lucas Godeiro holds a Ph.D. in economics from the Federal University
Ma, L., & Pohlman, L. (2008). Return forecasts and optimal portfolio
of Paraiba-Brazil. He conducts research on text regression and machine
construction: a quantile regression approach. The European Journal of
learning applied to economic forecasting.
Finance, 14(5), 409–425.

Answer Key Problem Set 2

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Answer Key Problem Set 2

Загружено:

Авторское право:

Доступные форматы

International Journal of Forecasting xxx (xxxx) xxx

Contents lists available at ScienceDirect

International Journal of Forecasting

Quantile forecasting with mixed-frequency data

1. Introduction the random walk model produces biased estimates of β̂ but

obtaining low-MSE forecasts by minimizing MSE β̂ — but

Table 6 the forecasting accuracy, our results suggest that such a

obtained when we replace the MIDAS lag structure with

Вам также может понравиться