Вы находитесь на странице: 1из 131

Econometric Analysis of Networks

Christian Brownlees

Universitat Pompeu Fabra, Barcelona GSE

(Brownlees) 0/1
Introduction

Introduction
Introduction

Introduction

Network Analysis has emerged prominently in many fields of


science over the last years:
Economics, Finance, Computer Science, Social Networks, ...

Network analysis is a powerful tool to represent and synthesize the


interconnections of large multivariate systems

(Brownlees) 1/1
Introduction

In these slides...

These slides introduce network techniques for the analysis of large


panels of economic and financial time series.

Focus is on the recent advances from statistics and econometrics


and the empirical evidence emerging from network applications in
economics and finance.

(Brownlees) 2/1
Introduction

Network Analysis: What is the Fuss?

In statistics network/graphical modeling has been around for quite


some time.

However, interest in this field has been renewed with the


development of estimation techniques that allow to work with
high–dimensional applications

In particular, Meinshausen and Bühlmann (2006) is probably one


the first contributions that showed how to estimate
high–dimensional network models using LASSO and that spurred
renewed interest in the field

(Brownlees) 3/1
Introduction

Network Analysis: What is the Fuss?

In economics/finance networks models have become particularly


important from both a theoretical and empirical perspectives over
the last couple of years.

Recent influential research by Acemoglu et al. (2012) has


introduced economic models in which the aggregate fluctuations
are determined by the most interconnected sectors

The great financial crisis has played an important role in


popularizing networks. In particular, one of the lessons from the
crisis is that high degree of interconnectedness among financial
firms can make the whole financial system vulnerable

(Brownlees) 4/1
Introduction

Roadmap

Basic Concepts
Network techiques for the analysis of econ and financial panels
Network for Static Data
Partial Correlation Network

Network for Dynamic Data


Granger Network
Connectedness Table
NETS

(Brownlees) 5/1
Basic Concepts

Basic Concepts
Basic Concepts

What is a Network?

What is a Network?
Mathematically, a network is a graph.

Roughly, a graph is a collection of vertices connected by lines.

There are multiple graph definitions. Graphs can be defined in


different ways depending on the purpose of the application.

The notation for graphs can be quite extensive. We are going to


focus on a subsets of notions useful for the scope of the course.

(Brownlees) 6/1
Basic Concepts

Graphs

A Graph G is defined as a pair of Vertices and Edges

G = (V, E)

The vertices V is (any) set of elements


In this set of slides we will have throughout V = {1, . . . , n}
The edges E connect vertices
The set of edges is defined as E ⊆ V × V
(i, j) ∈ E ⇐⇒ i and j are connected by an edge

(Brownlees) 7/1
Basic Concepts

History of Graphs: Königsberg Bridge Problem


Graph Representation

D B

In maths, graphs turn out to be more convenient to represent and analyse a number of problems.
One of the early examples of graph theory applications is the Königsberg Bridge Problem. The
problem consists of finding out whether it exists a path that crosses the 7 bridges exactly once
that begins and finishes on the same vertex. In 1736, Leonard Euler showed that such path does
not exist using graph theory.
(Brownlees) 8/1
Basic Concepts

Types of Graphs

There are many types of graph definitions.

In what follows we will focus on:


Undirected & Directed Graphs
Unweighted & Weighted Graphs

(Brownlees) 9/1
Basic Concepts

Undirected Graphs
If the edges do not have a directionality the graph is undirected
(i.e. an edge from i to j is the same as and edge from j to i )
Example:

E B

D C

V = {A, B, C , D, E }
E = {{A, B}, {B, C }, {A, D}, {D, E }}

(Brownlees) 10/1
Basic Concepts

Directed Graphs
If the edges have a directionality the graph is directed
(i.e. the edge from i to j is different from an edge from j to i )
Example:

E B

D C

V = {A, B, C , D, E }
E = {(A, B), (B, C ), (A, D), (D, E )}

(Brownlees) 11/1
Basic Concepts

Unweighted and Weighted Graphs

If the edges have different weights the graph is weighted


otherwise the graph is unweighted
Example: Weighted Graph

E B

D C

(Brownlees) 12/1
Basic Concepts

... and there are many more

There are many other graph types, for example:


Mixed Graph (containing both directed and undirected edges)
Coloured Graph (Four Color Map Theorem)
Multiple Layered Graph
...

(Brownlees) 13/1
Basic Concepts

Network Representation

Important matrices associated with G


Assume the graph is undirected, unweighted and defined over a set of n vertices.

The adjacency matrix AG :


An n × n matrix with aij = [AG ]ij = 1 if i and j are connected by an
edge and zero otherwise.

The degree matrix DG :


An n × n diagonal matrix with dii = [DG ]ii = nj=1 aij on the
P

diagonal.

The Laplacian LG :
LG = DG − AG .

(Brownlees) 14/1
Basic Concepts

Network Representation

Adjacency Matrix
A
 
0 1 0 1 0
E B 
 1 0 1 0 0


 
AG = 
 0 1 0 0 0 

1 0 0 0 1
 
 
D C 0 0 0 1 0

(Brownlees) 15/1
Basic Concepts

Network Representation

Degree Matrix
A
 
2 0 0 0 0
E B 
 0 2 0 0 0


 
DG = 
 0 0 1 0 0 

0 0 0 2 0
 
 
D C 0 0 0 0 1

(Brownlees) 15/1
Basic Concepts

Network Representation

Laplacian Matrix
A
 
2 −1 0 −1 0
E B 
 −1 2 −1 0 0


 
LG = 
 0 −1 1 0 0 

−1 0 0 2 −1
 
 
D C 0 0 0 −1 1

(Brownlees) 15/1
Basic Concepts

Network Representation: Remarks

We will not have time to dig into the properties of these matrices

However, it is important to at least point out that several key


network properties turn out to be elegantly embedded in these
matrices

For instance, it can be shown that the number of connected


component of a graph is equal to the number of zero eigenvalues
of its Laplacian

(Brownlees) 16/1
Basic Concepts

Some Notions from Network Analysis

It turns out that in real world networks there are a large number
of patterns that are commonly encountered

Some of the most relevant ones are


Hubs
Power Law Structure
Community Structure

(Brownlees) 17/1
Basic Concepts

Hubs

In many real world networks there are typically vertices that are
“more important” than others.

Importance in a network can be defined in different ways. It is


common to measure the important of a vertex on the basis of how
central the vertex is. A natural measure of centrality in the
number of connections a vertex has.

Vertices with a high centrality/large number of connections are


typically called hubs.

(Brownlees) 18/1
Basic Concepts

Power Law Structure

Real world networks often exhibit a power law structure, that is


the empirical distribution of the degrees of the vertices in the
network is power law distributed

Recall, that that the power law distribution is a heavy tailed


distribution. This implies that a power law network will have a
non-negligible fraction of vertices with a large degree (i.e. hubs)

Networks with power law structure exhibit small world effects / 6


degrees of separation

(Brownlees) 19/1
Basic Concepts

Community Structure

Real world networks often exhibit a community structure or


clustering, that is
Vertices are partitioned into communities/groups
Higher frequency of edges within the same community than
between different communities

An implication of communities is that the network has an


approximate block structure

(Brownlees) 20/1
Networks for Panels of Economic and Financial Time Series

Networks for Panels of Economic


and Financial Time Series
Networks for Panels of Economic and Financial Time Series

Networks in Econ and Finance

Let yt denote a multivariate time series of interest


 
y1 t
 y2 t 
 
yt =  . 
 t = 1, ..., T
 .. 

yn t

e.g. returns of a portfolio of assets, CDS spreads of a panel


sovereign bonds, gdp of panel of countries, ...

(Brownlees) 21/1
Networks for Panels of Economic and Financial Time Series

Networks in Econ and Finance


1 We are concerned with introducing a network representation for
the multivariate process yt where
vertices represent variables and
edges denote presence of an appropriate measure of dependence
between two variables

y1 t

y5 t y2 t

y4 t y3 t

2 Developing estimation techniques that allow us to detect the


network from the data

(Brownlees) 22/1
Networks for Panels of Economic and Financial Time Series

Networks in Econ and Finance


Network analysis of large panels of time series has different highlights:

Dimensionality Reduction
Analysing and understanding the properties of large dimensional
system is challenging. Network representation can be used as a
dimensionality reduction technique that can hence interpretation.

Regularised Estimation
It turns out that (most) network estimation techniques boil down
to the estimation of large dimensional model subject to
appropriate regularization constraints. In a large dimensional
setting regularization can enhance efficiency.
(cf. Ledoit and Wolf, 2004)

(Brownlees) 23/1
Networks for Panels of Economic and Financial Time Series

Networks in Econ and Finance

Central issue: Which measure of dependence should we use?

No unique answer. It depends on context of the application.

In practice, network definitions in econometrics/statistics differ on


the dependence measure chosen to build up the network.

(Brownlees) 24/1
Networks for Panels of Economic and Financial Time Series

Network Definitions

Network classification by dependence type


linear, contemporaneous
Partial Correlation Network (Meinshausen and Bühlmann, 2006; Peng et al., 2009)

linear, dynamic
Granger Network (Billio et al., 2012), Interconnectedness Table (Diebold and Yilmaz,
2014), NETS (Barigozzi and Brownlees, 2016)

nonlinear, contemporaneous
SKEPTIC (Liu et al., 2012), Tail Networks (Hautsch et al., 2012)

(Brownlees) 25/1
Partial Correlation Network

Partial Correlation Network


Partial Correlation Network

Partial Correlation Network:


Definition
Partial Correlation Network

Partial Correlation Network


Historically, the first network definition proposed in the literature
is the Partial Correlation Network, proposed by Dempster (1972)

Let yt be iid multivariate

yt ∼ D(0, Σ)

where D(µ, Σ) denotes a distribution with mean µ and cov Σ.

The partial correlation network associated with the system is an


undirected/unweighted graph where
1 the components of yt denote vertices
2 the presence of an edge between component i and j denotes that
i and j are partially correlated given all other components

(Brownlees) 26/1
Partial Correlation Network

Partial Correlation Networks

Partial Correlation measures (cross-sect.) linear conditional


dependence between yi t and yj t given on all other variables:

ρij = Cor(yi t , yj t |{yk t : k 6= i, j}).

The partial correlation networks is defined

EPC = {{i, j} ∈ V × V|ρij 6= 0}

Note that if the data is Gaussian, absence of partial correlation


implies conditional independence

(Brownlees) 27/1
Partial Correlation Network

Partial Correlation Networks: Properties


Partial Correlation is related to Linear Regression:
For instance, consider the model

y1 t = c + θ1 2 y2 t + θ1 3 y3 t + θ1 4 y4 t + θ1 5 y5 t + u1 t

θ13 is different from 0 ⇔ 1 and 3 are partially correlated

The partial correlation of yi t and yj t is equivalently defined as the


linear correlation between the residuals of yi t and yj t obtained
from the regression of the two components on all the other
variables in the system.

Partial Correlation is related to Correlation:


If there is exist a partial correlation path between vertices i and j ,
then i and j are correlated (and viceversa).
(Brownlees) 28/1
Partial Correlation Network

Characterizing the Partial Correlation Network


There is a well known and interesting connection between the
partial correlation network and the Concentration matrix K = Σ−1

It is easy to see this through the regression representation of the


variables in the system. Each variable i can be expressed as the
linear combination of the other variables and an error term
X  
2
yi t = θij yj t + ui ui ∼ N 0, σ(−i)
j6=i
 
yi t = θi0 xi t + ui 2
ui ∼ N 0, σ(−i)

with xi t = y(−i) t and Cov(ui , yj ) = 0 for each j 6= i

If θij is zero, then i and j have zero partial correlation

(Brownlees) 29/1
Partial Correlation Network

Characterizing the Partial Correlation Network

It is easy to see through the formula for the inverse of a


partitioned matrix that the regression parameters θij can be
expressed as a function of the elements of K

Let kij denote the (i, j) element of K. Then the relation between θij
2
and σ(−i) is given by the following relations
s
kij kii
θij = − = ρij
kii kjj

and
2 1
σ(−i) =
kii

(Brownlees) 30/1
Partial Correlation Network

A Deeper Look...

The regression parameters can be expressed using K

2
The relation between θij and σ(−i) is given by the following
relations s
kij kii
θij = − = ρij
kii kjj

and
2 1
σ(−i) =
kii
It is interesting and straightforward to show this result

(Brownlees) 31/1
Partial Correlation Network

Proof: Step 1 of 3
Partition yt in two subvectors
" #
y1t
yt =
y2t

Define
" # " #
Σ11 Σ12 −1 Σ11 Σ12
Σ= K=Σ =
Σ21 Σ22 Σ21 Σ22

Recall this result for partitioned symmetric matrices.


Let C = Σ11 − Σ12 Σ−122 Σ21 . Then
" #
−1 C−1 C−1 Σ12 Σ−122
Σ = −1
−Σ22 Σ21 C−1 Σ−1
22 + Σ−1
22 Σ 12 C −1 Σ Σ−1
21 22

Finally, let y1t = yit and y2t = y(−i) t = (y1 , yi−1 , yi−2 , yN )0
(Brownlees) 32/1
Partial Correlation Network

Proof: Step 2 of 3
θ as a function of Σ
0 = Cov(ui , y(−i)t )
= Cov(yi − θ0 y(−i)t , y(−i)t )
= Cov(yi , y(−i)t ) − θ0 Cov(y(−i)t , y(−i)t )
= Σ12 − θ0 Σ22
θ = Σ−1
22 Σ21
2
σ(−i) as a function of Σ
Var(ui ) = Cov(yi − θ0 y(−i)t , yi − θ0 y(−i)t )
= Cov(yi , yi − θ0 y(−i)t )
= Cov(yi , yi ) − θ0 Cov(yi , y(−i)t )
2
σ(−i) = Σ11 − Σ12 Σ−1
22 Σ21

(Brownlees) 33/1
Partial Correlation Network

Proof: Step 3 of 3
2
σ(−i) as a function of K

Σ11 = (Σ11 − Σ12 Σ−1


22 Σ21 )
−1

Σ11 = 2
(σ(−i) )−1
2 1
σ(−i) =
kii

θ as a function of K

Σ21 = −Σ−1
22 Σ21 C
−1

Σ21 = −θΣ11
h i−1
θ = −Σ21 Σ11
kij
θj = −
kii

(Brownlees) 34/1
Partial Correlation Network

Characterizing the Partial Correlation Network

An analogous formula can be obtained for the generic i, j partial


correlation
−kij
ρij = p
kii kjj
Implications:
This implies that the partial correlation network is entirely
characterized by K
If kij is nonzero, then node i and j are connected by an edge.
We can reformulate the estimation of the partial correlation
network as the estimation of a concentration matrix.

(Brownlees) 35/1
Partial Correlation Network

Partial Correlation Network:


Estimation
Partial Correlation Network

Sparse Estimation

The partial correlation network is entirely characterized by a


matrix parameter K that is assumed to be sparse
(i.e. it contains zero entries)

We need to introduce a sparse estimator of K. We need an


estimator that simultaneously selects and estimates the nonzero
entries of K

As we will see, other network definitions lead to analogous


problems

(Brownlees) 36/1
Partial Correlation Network

Sparse Estimation
The workhorse of sparse estimation is the LASSO

Consider the regression model

Yt = θ0 Xt + et et ∼ N (0, σ 2 ) t = 1, ..., T

Xt ∈ RP

The (classic) LASSO estimator of this model is defined as


T P
(Yt − θ0 Xt )2 + λ
X X
θ̂λL = arg min |θj | λ≥0
θ
i=1 j=1

where λ is the LASSO tuning parameter

(Brownlees) 37/1
Partial Correlation Network

Sparse Estimation
The (classic) LASSO estimator of this model is defined as
T P
(Yt − θ0 Xt )2 + λ
X X
θ̂λL = arg min |θj | λ≥0
θ
i=1 j=1

where λ is the LASSO tuning parameter

Remarks:
LASSO is a shrinkage type estimator. When λ = 0 the estimator
coincides with least squares. When λ is large the effect of the
penalty is to shrink estimates towards zero (like Ridge regression)

A key feature of LASSO is that this penalty delivers estimates in


which some components are estimated as exact zeros. In other
words, the LASSO estimator has the tendency of delivering sparse
esimates when λ is large enough.
(Brownlees) 38/1
Partial Correlation Network

LASSO & Sparsity: Graphical Intuition


Consider the LASSO objective function when P = 2
T
X
(Yt − θ1 X1t − θ2 X2t )2 + λ|θ1 | + λ|θ2 |
t=1

The LASSO objective function can be thought of the Lagrangian


of the following constrained optimization problem
T
X
(Yt − θ1 X1t − θ2 X2t )2
t=1

subject to
|θ1 | + |θ2 | ≤ rλ

(Brownlees) 39/1
Partial Correlation Network

LASSO & Sparsity: Graphical Intuition

(Brownlees) 40/1
Partial Correlation Network

LASSO & Sparsity: Sketch of Proof


Consider
Yt = θXt + et et ∼ N (0, σ 2 )
where Yt and Xt are de-meaned and Xt is scalar
Let θ̂ be the least squares estimator. Then,
( )
X
θ̂λL = arg min 2
(Yt − θXt ) + λ|θ|
θ
t
( )
X
2
= arg min (Yt − θXt − θ̂Xt + θ̂Xt ) + λ|θ|
θ
t
( )
X
= arg min Xt2 (θ 2
− θ̂) + λ|θ|
θ
t
( )
X 1X 2
= arg min Xt2 θ2 −2 X θ̂θ + λ|θ|
θ
t
n t t

Notice that if θ̂ > 0 then θ̂λL ≥ 0.


(Brownlees) 41/1
Partial Correlation Network

LASSO & Sparsity: Sketch of Proof


Differentiating and setting the objective function for θ ≥ 0 to zero
we get the FOC
X λ
0=2 Xt2 (θ − θ̂) + λ ⇒ θ̂λL = θ̂ − P 2 .
t
2 t Xt

Notice, that when rhs < 0 we truncate the solution to zero.


Thus, for θ̂ > 0 the solution is
 
λ
θ̂λL = θ̂ −
Xt2 +
P
2 t

where (x)+ means max(x, 0).


By carrying out analogous computations for the case θ̂ < 0 one
gets that that the LASSO solution is
 
λ
θ̂λL = sign(θ̂) |θ̂| −
Xt2 +
P
2 t

(Brownlees) 42/1
Partial Correlation Network

LASSO Estimator

Highlights of LASSO:
1 Sparsity Detection. The effect of the absolute value penalty is to
shrink some of the θ estimated coefficients to exact zeros. Under
appropriate conditions, LASSO can detect asymptotically the true
nonzero parameters of the model.

2 High-Dimensionality. Under appropriate conditions, the LASSO


estimator is well behaved even when the number of parameters P
is much larger than the number of observations T

(Brownlees) 43/1
Partial Correlation Network

LASSO Properties: Sketch of Main Assumptions

1 The total number of parameters is allowed to grow as a function


of the number of observations

2 The total number of nonzero parameters is allowed to grow as a


function of the number of observations however it hasp to be small

relative to the number of observations. Typically, o T / log T

3 Nonzero coefficients are larger in absolute value then a signal


threshold

4 Correlation between variables Xi t associated with zero and


non-zero coefficients cannot be too large

(Brownlees) 44/1
Partial Correlation Network

LASSO Properties

The LASSO literature is typically concerned in establishing two results.


Let θ0 denote the true value of the parameter.
Estimation Consistency.
p
||θ̂λL − θ0 || → 0

Selection Consistency.
 
P sign(θ̂λL i ) = sign(θ0 i ) → 1

(Brownlees) 45/1
Partial Correlation Network

LASSO Properties: Comments

It is fair to say that LASSO assumptions are not innocent

There are several variants of the LASSO which tackle the issues of
the baseline (for instance, the Adaptive LASSO).

However, in practice it is important to address the issue of to


which extent LASSO assumptions truly make sense in the context
of a given application.

(Brownlees) 46/1
Partial Correlation Network

LASSO Computation

The LASSO estimator cannot be computed in closed form

However, there are several optimization algorithms that can be


used to compute the estimator

One of the first and most commonly used algorithms proposed in


the literature is the so called shooting algorithm

(Brownlees) 47/1
Partial Correlation Network

LASSO Computation
Shooting Algorithm
Initialize θ̂L with the least squares estimator.
For k in 1, ..., P, 1, ..., P, 1, ..., P until convergence
1 Define
X
Y(−k) t = Yt − θ̂jL Xj t
j6=k

2 Compute LS estimate of k –th coefficient


PT
t=1 Y(−k) t Xk t
θ̂kLS = PT 2
t=1 Xk t

3 Update LASSO estimate of k –th coefficient


!
  λ
θ̂kL = sign θ̂kLS |θ̂kLS | − PT 2
2 t=1 Xk t +
(Brownlees) 48/1
Partial Correlation Network

LASSO Estimation Partial Correlation Network

Can we use LASSO techniques to obtain a sparse estimator of K?


Oui!

There are two approaches to estimate K using LASSO


Regression Based
Neighborhood Selection (Meinshausen and Buhlmann, 2006)
SPACE (Peng et al. 2009)

Concentration Penalization Approach


GLASSO (Yuan and Lin, 2007; Banerjee et al. 2008; Friedman et al., 2008)

All of these methods are implemented in several R packages, are


straightforward to apply and work well in fairly high–dimensional
settings

(Brownlees) 49/1
Partial Correlation Network

Regression Approach
Regression approaches are based on the regression representation
of the series in the panel
X
yi t = θij yj j + ui t , i = 1, . . . , n,
j6=i

2
with Var(ui ) = σ(−i)

The regression coefficients and residual variance of the regression


are related to the entries of the concentration matrix:
1
kii = 2
σ(−i)

and
kij = −θij kii

(Brownlees) 50/1
Partial Correlation Network

Regression Approach - Neighborhood Selection

Meinshausen and Buhlmann (2006) put forward a simple strategy


to estimate the network

The idea is to exploit the regression representation of the variables


in the system to estimate the neighbours of each node

(Brownlees) 51/1
Partial Correlation Network

Regression Approach - Neighborhood Selection

Neighborhood Selection

1 For each i = 1, ..., n, use LASSO regression to estimate the


parameters of the regression
X
yi t = θij yj t + ui t , i = 1, . . . , n,
j6=i

2 Then, kij is set to zero if θ̂ijL = 0 OR θ̂jiL = 0

(Brownlees) 52/1
Partial Correlation Network

Regression Approach - Neighborhood Selection

Procedure is simple but has some limitations

It does not really estimate K but only its sparsity pattern

Also, it doesn’t fully exploit all the information in the system

(Brownlees) 53/1
Partial Correlation Network

Regression Approach - SPACE

Peng et al. (2009) develop a smart algorithm to estimate a sparse


K building up on Neighborhood Selection

The idea is that if the diagonal elements of K are known, then it


is possible to write an auxiliary linear regression model whose
unknown parameters are the partial correlations coefficients.

(Brownlees) 54/1
Partial Correlation Network

Regression Approach - SPACE


SPACE

1 Given an estimate of kii , estimate ρij by minimizing


  s 2 
n T n n Xi−1
X X X
ij k̂ jj
X
y − ρ y + λ |ρij |
 
it jt
   
i=1 t=1 j6=i
k̂ ii i=2 j=1

2 Given an estimate of ρij , estimate kii as the residual variance


3 If algorithm has not converged, go back to 1
4 Estimate the nondiagonal entries of K as
q
k̂ij = −ρ̂ij k̂ii k̂jj

(Brownlees) 55/1
Partial Correlation Network

Regression Approach - SPACE

SPACE allows to simultaneously select and estimate the entries of


K

Note however, that the procedure does not ensure that K is


positive definite. However, in practice, if K is sufficiently sparse
then the estimator will also be positive definite with high
probability.

(Brownlees) 56/1
Partial Correlation Network

Concentration Penalization Approach - GLASSO

Rather than using the regression representation, Yuan and Lin


(2007) suggest to estimate a sparse K by directly penalizing the
log likelihood of the concentration matrix under Gaussianity with
a LASSO penalty

Estimate K by optimizing
 
 X 
K
b = arg min
n
tr(ΣK) − log det(K) + λ |kij |
K∈S  
i6=j

where Σ is the sample covariance estimator

(Brownlees) 57/1
Partial Correlation Network

Concentration Penalization Approach - GLASSO

Approach is natural. Also, Ravikumar et al. (2011) provide


general conditions for the consistency of the estimator.

However, optimization of the objective function is challenging.


Original algorithm proposed by Yuan and Lin (2007) does not
perform well in high–dimensions

Banerjee et al. (2008) and Friedman et al (2008) show that the


optimization can be recast as a sequence of simple LASSO
regression problems which make the estimation procedure
appealing for large problems.

(Brownlees) 58/1
Partial Correlation Network

Partial Correlation Network:


Implementation Issues
Partial Correlation Network

Choosing λ
Fitting a network involves choosing a value of λ

In practice, networks are estimated for different values of λ and


information criteria like the AIC and BIC are used to choose
“optimal” λ’s from the data

Many prefer using the BIC because it penalizes more and makes
the network more sparse

For the space algorithm, the BIC can be computed as


log(T )
BIC (λ) = log(RSS(λ)) + #{(i, j) : i 6= j, ρij 6= 0}
T

(Brownlees) 59/1
Partial Correlation Network

Factors & Networks

Is the sparsity assumption always satisfied in economic panels? No!

An important case in which sparsity is violated, is when the


components of the process are influenced by common explanatory
variables and common factors.
In this case, the influence of the common factors has to to be
filtered out first and network analysis can then be carried out on
factor residuals.
Generally speaking, network analysis can be viewed as the
complement of factor analysis

(Brownlees) 60/1
Partial Correlation Network

Partial Correlation Network:


Empirical Illustration
Partial Correlation Network

Empirical Illustration
Focus is on estimating the network of idiosyncratic
interconnections of stock returns. We consider a sample of daily
log returns for 93 U.S. bluechips between 2000 and 2013

CAPM one-factor type model:

rt = βrm t + t  ∼ N (0, Σ)

Estimation:
1 Estimate ˆt as least squares residuals
2 Estimate the partial correlation network of  using SPACE

The tuning parameter λ is chosen using BIC

(Brownlees) 61/1
Partial Correlation Network

Idiosyncratic Risk Network


VZ

T

DOW ●
FOXA


ACN ●
AAPL ●
QCOM ●
AEP CMCSA


ALL

UNH SO


INTC ●
● IBM TWX
●HPQ
CSCO ● ● CL


SPG ●
TXN EBAY
DD

MET ●
ORCL ●

● ●
BAC EMC ●
●AMZN MSFT● PG ● DIS

MDLZ


AXP ● C
WFC

EXC MDT


GILD BAX● NKE
MO
● ●
USB ●MS
COF ● ●

● GS● AIG BRK.B



● XOM
JPM ●
ABT CVS


BK ●
AMGN MMM

● APA
OXY ● WAG


SLB ●
HAL ●
F ● ● MRK
JNJ ●

CVX BMY

PEP

DVN
●MON
● FCX
SBUX

EMR ●
HD● COST
● ●
LLY
● NOV
COP ● NSC


CAT

APC ●
PFE
KO
● ●
TGT
WMT ● LOW●

UNP

HON


BA


GD UTX

FDX

RTN

UPS

LMT

(Brownlees) 62/1
Partial Correlation Network

Idiosyncratic Risk Network

(Brownlees) 62/1
Partial Correlation Network

Idiosyncratic Risk Network

(Brownlees) 62/1
Partial Correlation Network

Return Network: Empirical Properties

Centrality. Financials, Energy and Technology are some of the


most central sectors. In particular, AIG is a hub in the network.

Power Law. Degree distribution is heterogeneous and most


interconnected vertices have a large number of connections
relative to the total.

Community Structure. Vertices that belong to the same industry


are more likely to be linked.

(Brownlees) 63/1
Networks For Time Series

Networks For Time Series


Networks For Time Series

Limitations of Partial Correlations for Time Series

Defining the network on the basis of partial correlations is


motivated by the analysis of serially uncorrelated Gaussian data.

However, this is typically not always satisfactory for economic and


financial applications where data typically exhibit serial
dependence.

A number of proposal have been put forward in the literature to


overcome the limitations.

(Brownlees) 64/1
Networks For Time Series

Networks for Time Series

Proposals:
(Pairwise) Granger Network
Billio, Getmanski, Lo and Pellizzon (2012)
Connectedness Table
Diebold and Yilmaz (2014)
NETS
Barigozzi and Brownlees (2016)

(Brownlees) 65/1
Networks For Time Series

Networks for Time Series: Remarks

It is interesting to note that, besides the fact that these definitions


define connections in fairly different ways, they are all based on a
Vector Autoregressive (VAR) representation of the data

A limitation of defining connection on the basis of a VAR is that


all these network definitions essentially focus on linear dependence

(Brownlees) 66/1
Granger Networks

Granger Networks
Granger Networks

Granger Networks

Billio, Getmasky, Lo and Pellizzon (2011) propose to construct


Granger Causality Networks

Granger Causality: In time series analysis, x is said to Granger


cause y if past values of x help predicting y above and beyond
past information of y itself

(Brownlees) 67/1
Granger Networks

Granger Causality

Let x and y be two time series. We say that x does not Granger
cause y if the MSE of a forecast for y based on the past of y and
x has the same MSE of a forecast based on the past of y only:

MSE(E(yt+s |yt , yt−1 , ...)) = MSE(E(yt+s |yt , yt−1 , ..., xt , xt−1 , ...))

for all s > 0

(Brownlees) 68/1
Granger Networks

Granger Causality: Remarks

The absence of Granger causality implies restrictions of VAR


representations of the data. These restrictions can be tested using
standard methods to assess the evidence of no Granger causality.

Achtung! The term Granger causality is a bit ambiguous.


Granger causality does not really measure causality.

(Brownlees) 69/1
Granger Networks

Granger Networks

Billio et al. (2012) focus on the analysis of spillover effects among


financial institutions during the financial crisis

To this extent, they consider monthly returns for a panel of


financial companies divided into different industry groups

(Brownlees) 70/1
Granger Networks

Granger Networks:
Definition
Granger Networks

Granger Networks: Linear Model

Consider the model


2
yA t = βA ym t + γA yA t−1 + γAB yB t−1 + A t A t ∼ N (0, σA )

where
yA t return of firm A on period t
yB t return of firm B on period t
ym t return of the market on period t

if γAB is significantly different from zero, then yB Granger causes


yA

(Brownlees) 71/1
Granger Networks

Granger Networks:
Estimation
Granger Networks

Estimation

Estimation of the pairwise Granger network is straightforward

The model
2
yA t = βA ym t + γA yA t−1 + γAB yB t−1 + A t A t ∼ N (0, σA )

can be simply estimated by least squares

(Brownlees) 72/1
Granger Networks

Construction of a Granger Causality Networks


Billio, Getmasky, Lo and Pellizzon (2012) procedure:

Granger Network - BGLP2012


Consider all of the possible pairs of companies in the panel.

For each pair in the panel


1 Estimate a bivariate the model (by least squares)

2 Run a Granger causality test between company A and B


i.e. test the null hypothesis H0 : γAB = 0 at the 1% significance
level

3 If company A Granger causes B, then add a direct edge between


A and B
(Brownlees) 73/1
Granger Networks

Granger Networks:
Empirical Illustration
Granger Networks

Granger Causality Networks: Empirical Application

This methodology is used to analyze monthly returns of a panel of


financial institutions between 1994 and 2008.

Firms are classified in Hedge Funds, Banks, Broker-dealers,


Insurance

Since there is evidence that networks change over time, the


analysis is carried out over different time windows

(Brownlees) 74/1
Granger Networks

Granger Causality Networks: Empirical Application


2002-2004

(Brownlees) 75/1
Granger Networks

Granger Causality Networks: Empirical Application


2002-2006

(Brownlees) 75/1
Granger Networks

Granger Causality Networks: Empirical Application


2006-2008

(Brownlees) 75/1
Granger Networks

Granger Causality Networks: Final Remarks

Billio et al. (2012) have the merit to be the first ones to introduce
network analysis in the field. Great tool for explanatory analysis.

Some caveats:
Networks change a lot over time!

Spurious Correlation problem

Networks are very dense. Could it be that some common factors


are missing?

(Brownlees) 76/1
Connectedness Table

Connectedness Table
Connectedness Table

Connectedness Table

In a number of papers Diebold and Yilmaz propose a network


definition based on the classic variance decompositions analysis

Their are interested in answering the question: “What fraction of


the H–step ahead prediction error variance of series i is due to
shock j ?”

The answer of this question is given by variance decomposition


and we denote this fraction by

dijH

(Brownlees) 77/1
Connectedness Table

Connectedness Table:
Definition
Connectedness Table

Variance Decomposition
Assume yt has an infinite MA representation
yt = µ + t + Ψ1 t−1 + Ψ2 t−2 + ... Var() = Σ

The VMA representation is useful to obtain the error in


forecasting H periods ahead
y t+H − ŷ t+H|t = t+H + Ψ1 t+H−1 + Ψ2 t+H−2 + ... + ΨH−1 t+1

The MSE of the H -period-ahead forecasts is


MSE(ŷ t+H|t ) = E(y t+H − ŷ t+H|t )(y t+H − ŷ t+H|t )
= Σ + Ψ1 ΣΨ01 + Ψ2 ΣΨ02 + ... + ΨH−1 ΣΨ0H−1

(Brownlees) 78/1
Connectedness Table

Variance Decomposition
Consider the orthogonlized representation of the shocks t that is

t = Aut = a 1 u1t + a 2 u2t + ... + a N uNt

where a j is the j-the column of A and the ujt are uncorrelated.


(A can be obtained from the Cholesky decomposition)

This implies that

Σ = Var(t )
= E(t 0t ) = E(Au t u 0t A0 ) = A E(u t u 0t ) A0
= a 1 a 01 Var(u1t ) + a 2 a 02 Var(u2t ) + ... + a N a 0N Var(uNt )

(Brownlees) 79/1
Connectedness Table

Variance Decomposition

Substituting the expression of the variance Σ in the MSE of the


H –step ahead prediction one can decompose the total H –step
ahead prediction error in the sum of the N orthogonalized
components MSE(ŷ t+s|t ) is

N h i
Var(uit ) a j a 0j + Ψ1 a j a 0j Ψ01 + Ψ2 a j a 0j Ψ02 + ... + Ψs−1 a j a 0j Ψ0s−1
X

i=1

Thus,
h i
Var(uit ) a j a 0j + Ψ1 a j a 0j Ψ01 + Ψ2 a j a 0j Ψ02 + ... + Ψs−1 a j a 0j Ψ0s−1

measures the contribution of variable i

(Brownlees) 80/1
Connectedness Table

Variance Decomposition

The H –step ahead proportion of the variance of the prediction


error of i due to j is
h i
Var(ujt ) a j a 0j + Ψ1 a j a 0j Ψ01 + Ψ2 a j a 0j Ψ02 + ... + Ψs−1 a j a 0j Ψ0s−1
dijH = P h ii i
N
l=1 Var(u jt ) a a
l l
0 + Ψ a a 0 Ψ0 + Ψ a a 0 Ψ0 + ... + Ψ
1 l l 1 2 l l 2 a a
s−1 l l s−1
0 Ψ0
ii

As H increases, the MSE converges to the unconditional variance,


thus when H is large we can interpret the variance decomposition
as the portion of the unconditional variance explained by uj

(Brownlees) 81/1
Connectedness Table

Connectedness Table

(Brownlees) 82/1
Connectedness Table

Orthogonalizing the Shocks

Variance Decomposition is based on appropriate orthogonalizing


the system shocks. However, this is not always possible to do,
especially in large dimensional system

To this extent, Diebold and Yilmaz suggest to construct variance


decomposition based on the Generalized Variance Decomposition
(GVD) proposed by Pesaran and Shin (1998)

(Brownlees) 83/1
Connectedness Table

GVD

The GVD connectedness table is constructed using

σjj H (e 0 Ψh Σej )2
P
δijH = PH h=0 i
0 0
h=0 (ei Ψh ΣΨh ei )

and
δijH
dijH = PN
H
j=1 δij

(note that we need to standardize the δ s appropriately as variance


decomposition do not sum to one since errors are allowed to be
correlated)

(Brownlees) 84/1
Connectedness Table

Connectedness Table:
Estimation
Connectedness Table

Estimation

It is straightforward to estimate the connectedness table. The


VMA representation can be obtained by estimating a VAR by LS
and then using the VAR companion form to obtain the infinite
VMA representation

Details can be found in classic time series text books by Hamilton


or Luketpol

(Brownlees) 85/1
Connectedness Table

Parameter Time Variation

Diebold and Yilmaz comment that the parameters of the VMA


representation are likely to slowly change over time

To this extent, they suggest estimation of the connectedness table


using a rolling window to visualize how connectedness changes
over time

(Brownlees) 86/1
Connectedness Table

Connectedness Table:
Empirical Illustration
Connectedness Table

Empirical Illustration

Analysis of the connectedness of 13 large US financial institutions

Focus is on interdependence in volatility measured as realized


volatility

Analysis: full sample static analysis & 100-days rolling window

VAR(3) / H=12 / GVD

(Brownlees) 87/1
Connectedness Table

Empirical Illustration: Unconditional


Connectedness

(Brownlees) 88/1
Connectedness Table

Empirical Illustration: Time–Varying Total


Connectedness

(Brownlees) 89/1
Connectedness Table

Empirical Illustration: Time–Varying Net


Connectedness

(Brownlees) 90/1
Connectedness Table

Large Dimensional Connectedness Tables

The baseline methodology of Diebold and Yilmaz was proposed


for small dimensional VAR systems

In practice, one may want to apply these tools to analyse this


system

In forthcoming research the Diebold, Yilmaz and co-authors


propose to estimate VAR using elastic net (a combination between
LASSO and Ridge) and apply this to study large dimensional
systems.

(Brownlees) 91/1
NETS

NETS
NETS

NETS

NETS (network estimation for time series) has been proposed in


Barigozzi & Brownlees (2016)

The idea is to provide a generalization of the Partial Correlation


Network for dependent data

(Brownlees) 92/1
NETS

NETS:
Definition
NETS

VAR Approximation

Approximate the yt process using a VAR


p
X
yt = Ak yt−k + t t ∼ wn(0, Σ )
k=1

A natural representation for such a process the union of two


networks
1 A Granger network capturing the dynamic structure of the
process and
2 a contemporaneous network capturing contemporaneous
dependence

(Brownlees) 93/1
NETS

NETS

1 Granger network
Directed network in which the set of edges EG is such that

(i, j) ∈ EG ⇔ i Granger causes j

2 contemporaneous network
undirected network in which the set of edges EC is such that

(i, j) ∈ EC ⇔ i and j are partially correlated given the past

(Brownlees) 94/1
NETS

NETS
Network can be characterized in terms of the autoregressive matrices
Ak and the covariance matrix of the VAR innovations Σ

1 Granger network
Directed network in which the set of edges EG is such that

(i, j) ∈ EG ⇔ [Ak ]ji 6= 0 for at least one k

2 contemporaneous network
undirected network in which the set of edges EC is such that

(i, j) ∈ EC ⇔ [Σ−1
 ]ij 6= 0

(Brownlees) 95/1
NETS

NETS:
Estimation
NETS

Sparse Estimation

We work under the assumption that the VAR approximation is


sparse in the sense that the matrices Ak and [Σ−1
 ] are assumed to
be sparse.

We introduce a LASSO estimation algorithm which allows to


simultaneously estimate the parameters Ak and [Σ−1
 ]

We work with an alternative parameterization of the model


θ = (a01 , . . . , a0n , ρ0 )0 where ai contains the stacked autoregressive
coefficients of series i and ρ is the vector partial correlations
implied by [Σ−1  ].

(Brownlees) 96/1
NETS

NETS Steps
It can be shown that the loss function for the estimation of the
model parameters can be written as
   2
n
T X n n r n r
X X X ckk X c kk
LT (θ) = yit − aij − ρik akj  yjt−1 − ρi k ykt 
t=1 i=1 j=1 k6=i
cii k6=i
cii

In order to obtain sparse estimates, we optimize such an objective


function subject to a LASSO penalty
  
n
X Xn n
X
LT (φ) + λ  |aij | + |ρik | .
i=1 j=1 k6=i

It turns out that a variant of the standard shooting algorithm can


be implemented to carry out this optimization using a coordinate
descent algorithm.

(Brownlees) 97/1
NETS

NETS:
Empirical Application
NETS

Empirical Application

We interested in estimating the network of interconnections of


stock returns and stock return volatilities

Daily log returns / log volatilities (high-low range) for 93 U.S.


bluechips between 2000 and 2013

Influence of common factors is netted out

(Brownlees) 98/1
NETS

“Reading” Network

It turns out that the networks shares many of the characteristic of


social networks.
1 Centrality. Financials, Energy and Technology are some of the
most central sectors.

2 Community Structure. Vertices that are similar are linked


(industry linkages)

3 Power Law Structure. Evidence of “Small World Effects”

(Brownlees) 99/1
NETS

Volatility Network

MCD
TGT


ABT NKE


● COF
AXP
JPM

● GE
USB ●

VZ
● ●
BK
● C
CAT

● AIG
FOXA ●
MET
PEP

LLY


DOW

MS●
WFCBAC

ALL
BMY
● BRK.B

HPQ
● DD

F● ●
UNH
GS●

T KO

● FDX

AMGN ●
SPGUNP

CMCSA ●
UPS

GILD

APC

LMT ●
TWX
INTC ACN

AAPL ●
FCX ●
OXY
APA

● ●
● COP ●
EXC

EMC ●
ORCL ●
CVX
HON

DIS
● ●
CSCO ● NOV
DVN ●


TXN ● COST
QCOM ●
SLB


XOM
SO

BA
● ●
AMZN ●
EBAY

MON
RTN


AEP
NSC


MDLZ CL

Granger
(Brownlees) 100/1
NETS

Volatility Network
CMCSA ● DIS
● TWX ●

CVS

GILD

AMGN ●
WAG

CL

● EXC
SO

FDX

COST ●
AEP ●
PG

AXP

WMT ●
UPS

TGT

HD

EBAY

AAPL ●
COF

GS

SBUX ●
AMZN ●
BK

LOW
QCOM

USB

JPM MS

PEP
WFC

KO ●
INTC

CSCO BAC ●
C

VZ ●
T ●
TXN

ORCL ●
SPG ●
AIG


EMC
HPQ


MET

ALL

MSFT

DD
● CAT

DOW

FCX XOM ●
● CVX
OXYCOP ●
LMT

GD
● ●
● ●●
APC BA ●
RTN
APA

MO SLB DVN
NOV ●
UTX

MDLZ ●
HAL ●
JNJ


ABT

PFE ●
HON

MRK
● LLY
BMY

● UNP
NSC ●

Contemporaneous
(Brownlees) 100/1
NETS

Out–of–sample Validation

It is interesting to evaluate the network in terms of out–of–sample


prediction

To this extent, we use the estimated Granger network to predict


future volatility and compare the forecast with the one obtained
using an array of alternative techniques

(Brownlees) 101/1
NETS

Out–of–sample Validation

(Brownlees) 102/1

Вам также может понравиться