Bayesian Statistics and MCMC Methods For Portfolio Selection

Bayesian statistics and MCMC methods for portfo-
lio selection
Facolt di Ingegneria dellInformazione, Informatica e Statistica
Corso di Laurea Magistrale in Scienze Statistiche e Decisionali
Candidate
Jacopo Primavera
ID number 1219046
Thesis Advisor
Prof. Pierpaolo Brutti
Academic Year 2012/2013
Thesis not yet defended
Bayesian statistics and MCMC methods for portfolio selection
Master thesis. Sapienza University of Rome
2013 Jacopo Primavera. All rights reserved
This thesis has been typeset by L
A
T
E
X and the Sapthesis class.
Authors email: j.ac@hotmail.it
iii
Contents
1 Introduction 1
2 Portfolio selection 3
2.1 Mean-variance portfolio . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Diversication and hedging . . . . . . . . . . . . . . . . . . . 4
2.1.2 Two-assets case . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Mean-variance model . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 The importance of estimation error . . . . . . . . . . . . . . . . . . . 8
2.3 Bayesian inference for portfolio selection . . . . . . . . . . . . . . . . 9
2.3.1 Bayesian theory review . . . . . . . . . . . . . . . . . . . . . 10
2.4 Allocation as decision . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.1 Choice under uncertainty . . . . . . . . . . . . . . . . . . . . 11
2.4.2 Maximum expected utility allocation . . . . . . . . . . . . . . 13
2.4.3 Bayesian allocation decision . . . . . . . . . . . . . . . . . . . 14
2.4.4 Bayesian paradigm justication . . . . . . . . . . . . . . . . . 20
3 Non-normal nancial markets 23
3.1 Skewness and portfolio selection . . . . . . . . . . . . . . . . . . . . . 24
3.2 Skewed-elliptical models . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Skewed-normal model . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Simulation-based inference . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.2 Sampling from the Bayesian skewed-normal model . . . . . . 29
4 Hedge fund portfolio application 31
4.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 Univariate statistics . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.2 Multivariate statistics . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Model implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Portfolio weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4 Out-of-sample performance . . . . . . . . . . . . . . . . . . . . . . . 37
5 Conclusions 45
Appendices 47
iv Contents
A MCMC diagnostics 49
A.1 Gelman-Rubin diagnostic . . . . . . . . . . . . . . . . . . . . . . . . 49
A.2 Geweke diagnostic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
v
List of Figures
2.1 Two-assets mean-variance . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Ecient frontier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Mean-variance allocation instability . . . . . . . . . . . . . . . . . . 9
2.4 Non-informative Bayesian allocation . . . . . . . . . . . . . . . . . . 16
2.5 Sample and conjugate Bayesian frontiers . . . . . . . . . . . . . . . . 19
4.1 Hedge Fund Rate-of-Return time series . . . . . . . . . . . . . . . . . 32
4.2 Univariate graphical summaries . . . . . . . . . . . . . . . . . . . . . 34
4.3 QQ-plot univariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 QQ-plot multivariate . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.5 Bivariate normal level curves . . . . . . . . . . . . . . . . . . . . . . 37
4.6 Traceplots MCMC posterior mean vector mmm . . . . . . . . . . . . . . 38
4.7 Traceplots MCMC posterior variance VVV . . . . . . . . . . . . . . . . 39
4.8 Kernel MCMC densities for mmm . . . . . . . . . . . . . . . . . . . . . 39
4.9 Kernel MCMC densities for VVV . . . . . . . . . . . . . . . . . . . . . . 40
4.10 Allocation plot 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.11 Allocation plot 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.12 Out-of-sample MV allocations . . . . . . . . . . . . . . . . . . . . . . 42
4.13 Out-of-sample MV-skewed allocations . . . . . . . . . . . . . . . . . 42
4.14 Out-of-sample analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.15 Out-of-sample analysis (one plot) . . . . . . . . . . . . . . . . . . . . 44
A.1 Gelman plot for mmm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
A.2 Gelman plot for VVV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.3 Geweke plot for mmm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
A.4 Geweke plot for VVV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
vii
List of Tables
2.1 True parameters for non-informative analysis . . . . . . . . . . . . . 16
4.1 Univariate descriptive statistics . . . . . . . . . . . . . . . . . . . . . 33
4.2 Shapiro-Wilk normality test . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Multivariate Shapiro test . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 WinBugs summary statistics . . . . . . . . . . . . . . . . . . . . . . 41
A.1 Gelman-Rubin diagnostic for mmm . . . . . . . . . . . . . . . . . . . . . 50
A.2 Gelman-Rubin diagnostic for VVV . . . . . . . . . . . . . . . . . . . . . 51
A.3 Geweke diagnostic for mmm . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.4 Geweke diagnostic for VVV . . . . . . . . . . . . . . . . . . . . . . . . . 51
1
Chapter 1
Introduction
The attempt to rationalize an investment strategy has a long history, but it is only
since the 1950s and the pioneering work of Markowitz that the study makes the
necessary quantum leap and begins to be properly framed as a statistical decision
problem. The dramatic intuitiveness of the Markowitz model has greatly inuenced
the entire nancial industry and the mean-variance (MV) paradigm still represents
a milestone in modern nance theory (Meucci, 2005). Although it conserves a great
importance for academics and it is a guideline for practitioners, its actual use in
the nancial industry is rare by now (see McNeil (2005)). This thesis addresses
two fundamental drawbacks of the classical mean-variance portfolio which have
lead the nancial industry to distrust it, and how to overcome them through the
use of Bayesian statistics and Markov chain Monte Carlo (MCMC) methods. The
shortcomings of the classical mean-variance portfolio have even suggested some
authors to recover to pure qualitative selection strategies, such as the naive equally
weighted portfolio, which allocates equal parts of the initial budget into each nancial
opportunity (see De Miguel (2007) for a discussion on the performance of the equally
weighted portfolio with respect to the mean-variance allocation). The limitations
of Markowitz portfolio can be cast into two major categories: (1) allocation results
instability and (2) empirically violated assumptions. The instability of the mean-
variance optimization is the result of the highly non-linear relations existing between
the estimated inputs and the allocation results. The allocations turn out to be
extremely sensitive to the estimates plugged in the optimizer and they can be twisted
even for minor changes in the initial inputs. It is therefore evident that accounting
for estimation error in portfolio selection is fundamental since the optimization
step may exacerbate an error committed in the estimation step. But the classical
implementation of the mean-variance optimizer involves a naive plug-in of the sample
market estimates, then accounting only for the variability in the data (i.e. nancial
risk). Considerable eort has been devoted to this issue with the goal of improving on
the performance of classical models. A prominent role in this vast literature is played
by the Bayesian Statistics, which can be a viable alternative to traditional methods
of implementation, naturally providing more robust results (i.e. more stable), as well
as more consistent solutions from a statistical decision point of view. Concerning the
assumptions underlying the mean-variance paradigm, they turn out to be too simple
to capture the proven complexity of nancial systems. Indeed, in the mean-variance
2 1. Introduction
portfolio, the prices are assumed to follow a Brownian motion, which is the standard
model of nance theory and whose increments (i.e. nancial returns) are normally
distributed; but rather than this simple model, return distributions resemble more a
non-linear, chaotic system (see Mandelbrot (2006)). For instance they begin to be
studied by a new interdisciplinary research eld called Econophysics, which attempts
to use the concepts from statistical physics to describe nancial market behaviors
(see Mantegna R. (2000) for an introduction to econophysics). These ndings are
not surprising given that the nancial markets are the result of the interactions
of millions of users worldwide, who act according to the most disparate criteria of
rationality. Thus, it is evident that the standard Gaussian model is not suitable
to explain the actual movements of the markets and from this point of view more
detailed models should be used. There have been numerous proposed solutions to
this issue, with dierent levels of complexity, here a simple extension of the normal
model will be investigated, which explicitly allows for asymmetry in the nancial
return series. In particular the model considered is an elliptical skewed-normal
in the class of skewed distributions developed by Sahu (2003). Issues (1) and (2)
have renewed the interest in portfolio choice problems and they constitute the main
motivation of this study. To jointly account for estimation error and non-normal
markets, a Bayesian framework is adapted to the skewed-normal model. The ensuing
model is analytically intractable, as it is often the case when departing from standard
assumptions, and it needs to be studied by means of a stochastic simulation method,
such as an MCMC method. The thesis is structured as follows. The rst part
presents the mean-variance portfolio developed by Markowitz, highlighting the main
drawbacks coming from its classical implementation. It follows an introduction to the
Bayesian methods and their applications to the mean-variance portfolio. The second
part presents the elliptical-skewed models and the skewed-normal distribution. It
follows a presentation of the MCMC methods needed to deal with a Bayesian version
of the skewed-normal model with particular focus on the Gibbs sampler. The third
part proposes an application for hedge fund industry data to empirically compare
the classical mean-variance strategy with that deriving from tting the Bayesian
skewed-normal model to the data and accounting for skewness in the investors
utility function.
3
Chapter 2
Portfolio selection
In a classical portfolio choice framework the investor has access to N risky nancial
opportunities such as stocks, bonds, currencies, mutual funds and hedge funds.
Denote as rrr
t
the rates of return (or returns) at time t for this N-dimensional
nancial market. For a given initial wealth the investor wants to select a protable
combination of these nancial opportunities. The investors market is assumed to
be opened only at the time T when the allocation decision is made and at the time
T + when the investment horizon expires. In a classical single-period framework
we can set = 1 without loss of generality. This simple setting implies that hedging
activities are not allowed and no trading can be made between times T and T + 1.
Let W
0
be the initial wealth at disposal of the investor and
k
the number of units
(shares in the case of equities, contracts in the case of futures, etc.) of k-th asset that
the investor decides to hold in his portfolio. The ensuing portfolio for the investor is
the vector,
=
1
,
2
, ...,
N
(2.1)
A portfolio with a not-null allocation vector can be equally represented by the
percentages w
k
(k = 1, ..., N) of initial wealth allocated in each asset,
www = w
1
, w
2
, ..., w
N
(2.2)
The main concern of the investor is given by the return r
P
T+1
on the portfolio,
which is dened as the linear combination,
R
P
T+1
www
rrr
T+1
=

w
k
r
k
T+1
(2.3)
and in particular the investor will be interested in his nal wealth, given by:
W = W
0
(1 +www
rrr
T+1
) (2.4)
Now consider an investor starting to observe the N-dimensional market at time
t = 1. The information collected up to time T is denoted as
T
and contains the N
time series of length T for a total of N T single observations. The standard model
in nance theory implies that each of the N time series is:
independent, i.e. each change price appears independently from the last;
4 2. Portfolio selection
stationary, i.e. the process generating price changes, whatever it may be, stays
the same over time;
normal ly distributed, i.e. price changes follow the proportions of the bell curve -
most changes are small, an extremely few are large - in predictable and rapidly
declining frequency
Besides, the joint distribution of the returns is multivariate normal and, as
the marginal distributions, have constant parameters over time. Summarizing, the
returns rrr
t
are distributed as a multivariate normal with mean vector and covariance
matrix for every t = (1, ...T):
rrr
t
N(, ), (t = 1, ..., T) (2.5)
For the linear properties of the normal distribution from 2.5 follows that,
r
P
t
N(w
, w
w), (t = 1, ..., T). (2.6)

A normal distribution has the characteristic property of being completely specied
by its rst two moments. It turns out that an investor with an objective based on
2.3 would discriminate the potential satisfactions ensuing from an investment only
by means of and .
2.1 Mean-variance portfolio
Loosely speaking the mean-variance portfolio assess that an investor likes return
and dislikes risk and recognizes these two components in the mean and the variance
of portfolio return distribution 2.6. Given this brief preamble the mean-variance
portfolio developed by Markowitz (1952) does not seem particularly striking. To
tell the truth, before 1950s the desirability of an investment was mainly equated
to its return component, neglecting the risk counterpart as a signicant drive for
investment decisions (see McNeil (2005) for a discussion).
2.1.1 Diversication and hedging
The main idea underlying the mean-variance optimization (MVO) is that, while the
return of an investment is given by the sum of its parts, the risk of the investment
can be less than the sum of its parts. This is due to the additivity of the expected
value operator and the subadditivity of the standard deviation operator. The mean-
variance optimization gets advantage of that and doing so explicitly formalizes the
fundamental concepts in nance of diversication and hedging. To better see this
fundamental aspects of portfolio selection theory let denote as
P
the portfolio
return,
2
P
the variance of portfolio return,
2
j
the variance of the return on the j-th
asset and
jk
the covariance between the returns of j-th and k-th asset:
_
j

j
j = 1, ..., N
2
j
()
jj
j = 1, ..., N
jk
()
jk
j ,= k
2
P
www
www =

k
w
2
k
2
k
+

k
j=k
w
k
w
j
jk
(2.7)
2.1 Mean-variance portfolio 5
Moreover, denote as
jk
the Pearson correlation coecient between assets returns
j-th and k-th,
jk

jk
k
j, k = 1, ...N (2.8)
so that the expression of portfolio variance (last equation in 2.7), can be refor-
mulated as,
2
P

j
w
2
j
2
j
+

j
k=j
w
k
w
j
j
(2.9)
Equation 2.9 has an expressive geometric interpretation since it can be viewed
as a reformulation of Cantors theorem (see Castellani G. (2005)), which expresses
the length of one side of a triangle starting from the length of the other two and the
magnitude of the (external) angle between them. From a mean-variance portfolio
point of view Cantors theorem highlights the interdependencies between the sum
of all the covariance matrix elements (i.e. portfolio variance in the general case of
non-null correlations between the assets) and the sum of the diagonal covariance
matrix (i.e. portfolio variance in the particular case of perfectly uncorrelated
assets). In particular it is interesting to see how dierent values of
jk
lead the
mean-variance investor to adopt diversication strategies (i.e. all long positions) or
hedging strategies (i.e. long-short positions).
2.1.2 Two-assets case
Consider a market composed only of two assets (a
1
, a
2
) with expected returns and
variances respectively denoted as (
1
,
2
) and (
2
1
,
2
2
). For a non-trivial example let
us suppose that,
1
>
2
> 0 and
2
1
>
2
2
In this scenario there is no stochastic dominance (always assuming a normal
distributions for the assets returns) and the mean-variance analysis will provide
the desired interesting insights. Assuming a budget constraint (i.e. the investor
invests all the available wealth at his disposal) the portfolio www = (w
1
, w
2
) is such that
w
1
+w
2
= 1 and we can set w = w
1
and w
2
= 1 w without loss of generality. The
expressions of the expected portfolio return and portfolio variance can be written as:
_
P
=
2
+ (
1
2
)w
2
P
= w
2
2
1
+ (1 w)
2
2
2
+ 2w(1 w)
12
2
.
(2.10)
which highlights the linear dependence of
P
from the quote w invested in a
1
and the non-linearity of
2
P
. The plots in Figure 2.1 draw the functions for portfolio
return and variance against the allocation weight assigned to the rst asset. As we
can see the rst plot shows the linearity between portfolio return and allocation
weights. Indeed the investor wishing to maximize portfolio return, whatever the
ensuing variance, would allocate all his wealth in the asset yielding the maximum
expected return. Besides, portfolio variance draws a parabola with axis parallel to
the ordinates axis and upward concavity. Indeed, the investor wishing to minimize
portfolio variance, whatever the ensuing expected return, would hold a portfolio
lying on the vertex of the parabola. Due to the non-linearity of variance this point
is not necessarily the portfolio concentrating all the initial budget on the minimum
variance asset. As shown by the expression identifying the vertex, it depends on the
correlation coecient:
w
=

2
2
12
2
1
+
2
2
2
12
2
. (2.11)
Letting the correlation coecient vary, the minimum-variance investor will be
prompted to either diversify or hedge. In particular, from 2.11 follows w
12

2
1
.
0.2 0.4 1.0
0
2
4
6
8
1
0
w
P
o
r
t
f

e
x
p
e
c
t
e
d

r
e
t
u
r
n
0.2 0.4 1.0
0
.
0
0
.
5
1
.
0
1
.
5
2
.
0
w
P
o
r
t
f

r
e
t
u
r
n

v
a
r
i
a
n
c
e
Figure 2.1. Two-assets portfolio expected return (left) and portfolio return variance with

2
1
(right) as functions of w. Red points indicate
2
and
2
2
; blue points indicate
1
and
2
1
; violet point indicates minimum variance portfolio
Portfolios characterized by values of w less than zero (or equivalently greater
than one) represent situations of short-selling
1
. This means that the investor tries
to reduce the overall risk of his investment by holding strongly positively correlated
assets with opposite signs (i.e. hedging), in an attempt to (statistically) oset
simultaneous unfavorable events. Portfolios characterized by values of w greater
than zero represent long positions, meaning that the investor hold positive quantities
of all the assets composing the portfolio (i.e. diversication). It follows that the
mean-variance investor is induced to diversify as long as there is no strong positive
1
The selling of a security that the seller does not own, or any sale that is completed by the
delivery of a security borrowed by the seller. Short sellers assume that they will be able to buy the
stock at a lower amount than the price at which they sold short (www.investopedia.com).
2.1 Mean-variance portfolio 7
correlation between the assets, while he prefers to hedge when this positive correlation
becomes more and more clear.
2.1.3 Mean-variance model
The mean variance portfolio is formalized by means a two-step procedure. First, the
investor draws the ecient portfolios in the plane (
P
,
2
P
) conditioning to a target
value for portfolio return.
www
= argmin
www: www
= , www
111=1, www0
www
www (2.12)
When short selling is allowed, the constraint www 0 in 2.12 can be removed,
yielding the following problem that has an explicit solution:
www
= argmin
www: www
= , www
111=1
www
www
= B
1
111 A
1
+
_
C
1
A1
_
/D,
(2.13)
where A =
1
111 = 111
1
, B =
1
, C = 111
1
111, D = BC A
2
.
The optimal portfolios in 2.13 are paretian optima, in the sense that they are
attainable portfolios whose expected return cannot be better o without making
the risk worse o and vice versa. For dierent values of they form a continuum of
portfolios usually referred as ecient frontier (see Figure 2.2).
Figure 2.2. Simulated Markowitz ecient frontier with N=8 assets.
Each point on the frontier gives on the abscissa the minimum portfolio risk for
a given portfolio expected return on the ordinate. The points inside the frontier
represent the eight single asset portfolios from which the frontier is built. The fact
that these points are on the right-side of the graph testify the eect of diversication:
the single-asset portfolios can be mixed to create returns with the same or lower risk
and the same or higher expected return. The gure also shows the crucial trade-o
between risk and return. From the most diversied (i.e. minimum risk portfolio)
at the left angle of the gure, higher expected returns can be gained only at the
cost of greater risk. Once drawn the ecient frontier the investor can switch to the
second step, that is to pick the portfolio lying on the frontier which best suits his
preferences (in terms of aversion to risk). When a short-selling constraint www > 0 is
added in 2.13 the optimization does not have anymore an analytical solution, but
the ecient frontier can be eciently computed numerically when the constraints
remain ane.
2.2 The importance of estimation error
Markowitzs theory assumes known and . That is, in the mean-variance op-
timization the process of selecting a portfolio is posed as a problem whose rst
and second-order market characteristics are given, and one has only to choose the
allocation vector according to the predetermined optimality criteria. Since in practice
and are unknown, a commonly used approach is to estimate and from his-
torical data, under the assumption that returns are i.i.d. Under the standard model,
maximum likelihood estimates of and are the sample mean and the sample
covariance matrix

. Asymptotic results guarantee that under the assumption of a
normal model in an innite sample dataset the above estimators converge to the
true market parameters. Although asymptotic results can be useful to characterize
the uncertainty of sample estimates, from a practitioner point of view the real issue
is obviously the nite-sample performance. Moreover, the reliability of these results
deteriorates drastically with the number of assets held in the portfolio. Empirical
tests have found that replacing and by their sample counterparts and

may yield bad results and a major guideline in the literature is to nd alternative
estimators that produce better portfolios when they are plugged into the optimizers
(Chen, 2011). The way how Markowitz portfolio is actually implemented is a crucial
issue since mean-variance portfolios are sensitive to small changes in the input data
and errors committed in the estimation step may be multiplied in the allocation
results. This is well documented in the literature. Chopra (1993) shows that even for
minor changes in the estimates of expected return or risks MVO can produce very
dierent results. Best (1991) analyzes the sensitivity of optimal portfolios to changes
in expected return estimates. Other related literature is in Jobson and Korbie (1981)
and Broadie (1993) among others. Michaud (1989) shows the error-maximization
attitude of the mean-variance optimizer, which exacerbates estimation error tending
to overweight overestimated assets and underweight underestimated ones. Moreover,
it is widely accepted that most of the estimation risk in optimal portfolios comes from
errors in the estimates of expected returns, rather than in the estimates of portfolio
variance. To graphically address the sensitivity of the mean-variance optimization
in Figure 2.3 we have randomly generated 100 Monte Carlo time-series of normally
distributed returns and numerically computed the ensuing ecient frontiers replacing
2.3 Bayesian inference for portfolio selection 9
the market parameters by their sample analogues. These estimated frontiers are
compared to the true ecient frontier computed using the true covariance matrix
and expected return vector.
10 15 20 25 30 35 40
0
5
1
0
1
5
2
0
2
5
Standard deviation (%)
E
x
p
e
c
t
e
d

r
e
t
u
r
n

(
%
)
sample frontiers
Figure 2.3. Simulated experiment with T = 60 N-dimensional normal observations: 100
MC ecient frontiers (blu lines), true frontier (red line)
Summarizing, when an investor runs the mean-variance optimization implemented
using the sample counterparts for the unknown mean and variance of portfolio, he
is actually selecting portfolios lying on the estimated frontier (the only available
to him) which is likely to be far from the true ecient frontier. Moreover, the
distance between the estimated and the true frontier increases more than linearly
with respect to the estimation error. Due to the sensitivity of the optimal mean-
variance allocation with respect to the inputs, it is evident why estimation error has
played such an important role in the related literature. Starting with Markowitz
himself, all authors dealing with portfolio selection seem to agree upon the need of
market estimators with an overall better performance with respect to the classical
estimators. In this sense, Bayesian statistics is widely accepted as a convenient tool
which adds exibility to the inferential machine and allows to calibrate estimators
according to the specic, real-life application.
2.3 Bayesian inference for portfolio selection
In the Bayesian framework accounting for the intrinsic uncertainty aecting any
estimation process is naturally assessed. Indeed, within a Bayesian approach the
investor eliminates the dependence of the optimization problem on the unknown
parameters by replacing the true (and unknown) values by a probability distribution,
rather than a point estimate. This probability law depends only on the data the
investor observes and on personal ex-ante (prior) beliefs the investor may have
had about the unknown parameters before analyzing the experiment. The ensuing
portfolio weights derived within this paradigm are optimal with respect to this
parameter distribution but sub-optimal with respect to the true parameter values.
However, this sub-optimality is not relevant since the truth is never revealed anyway.
To the extent that this parameter distribution incorporates all of the available
information (as opposed to just a point estimate), this approach is to many the
most appealing. This kind of inferential framework falls into the Bayesian methods.
The appellative "Bayesian" was named after Thomas Bayes and his pioneering work
published in 1763 titled "On An Essay towards solving a Problem in the Doctrine of
Chances".
2.3.1 Bayesian theory review
Recalling that the purpose of an inferential analysis is to retrieve causes (formalized
by parameters of statistical models) from the eects (summarized by the observations)
(Robert, 2007) the Bayesian scheme implicitly assumes that the most conclusive
answer to this research has to be a probability law (Piccinato, 2009). Let denote the
parameter of interest as and the observations as yyy. The output of the bayesian
inference has to be a probability distribution on the possible values that can
assume. Before observing yyy a prior distribution on is assigned. This distribution
describes the uncertainty around the true value of . The prior distribution can
be derived from personal beliefs and/or expert opinions. Once observed the data,
through Bayes theorem, one actualizes the prior distribution obtaining a posterior
distribution for . In this context, Bayes theorem allows to combine the information
on contained in the prior distribution with that provided by the experiment. From
a Bayesian point of view is a random variable with prior distribution (),
and the data yyy coming from the experiment are like a state variable upon which
to condition the nal distribution of . The Bayes theorem provides the posterior
distribution:
([yyy) =
f(yyy[)()
m(yyy)
(2.14)
where m(yyy) =
_
f(yyy[)() is the marginal density of yyy. Since the derivation of
a posterior distribution is generally done through proportionally relations, Bayes
theorem is often reported in the form
([yyy) L([yyy)() (2.15)
where L([yyy) is the likelihood function.
Bayesian inference results particularly interesting in predictive problems, where
the statistician tries to infer conclusion on the realization of a future experiment
yyy
T+1
governed by the same laws regulating past (observed) data yyy = yyy
T
, yyy
T1
, ....
The "parameter" of interest in this case is the future realization yyy
T+1
. Adopting the
bayesian reasoning we have a natural candidate to describe the initial uncertainty on
this quantity, that is the marginal distribution m(yyy
T+1
) appearing in the denominator
2.4 Allocation as decision 11
of the Bayes theorem 2.14 and formalizing the information available on yyy
T+1
before
observing the data:
m(yyy
T+1
) =
_
f(yyy
T+1
[)()d (2.16)
Once the experiment has drawn, m(yyy
T+1
) is updated with the new information
contained in the data. This process is done via the substitution of the prior with
the posterior distribution for in the equation 2.16:
m(yyy
T+1
[yyy) =
_
f(yyy
T+1
[)([yyy)d (2.17)
Coherently with the Bayesian paradigm, the nal predictive distribution reects
the uncertainty around the true value of the future experiment, which is treated as
a parameter, rather than a naive projection of the sample distribution conditioned
on what we have learned on the parameters up to the time when the inference takes
place.
2.4 Allocation as decision
Reformulating portfolio selection in terms of the theory of choice under uncertainty
allows to generalize the mean-variance rule and take advantage of an axiomatic
framework to rigorously categorize investors preferences. After this generalization
the Bayesian inference will turn out to be the most coherent strategy to deal with
portfolio selection. Previous to the formal introduction of the Bayesian allocation
we briey review the fundamentals of statistical decision theory.
2.4.1 Choice under uncertainty
The foundations of a decision theoretic approach to statistical problems is formulated
in important manuals such as Wald (1950), Savage (1954), and Ferguson (1967)
among others. In its most simple formulation a generic decision problem describes
the following idealized situation: at a given time a person has to choose an element
among a set . The choice of determines a loss L
() which evaluates the

penalty associated with decision and the realization of a random variable
called state of nature. The state of nature formalizes the uncertainty around the
phenomenon aecting the consequence of the decision. In the subjective formalization
of decision theory this quantity is considered a random variable. Then a probability
law P is dened over (, /
), where /
is an appropriate -algebra of subsets of .

It is worth noting that the "consequences" arising from the couple (, ) are generic
quantities. They can be anything aecting the decision-maker and they are not
necessarily expressed in a numerical scale. I will denote this "primitive" concept of
consequence as = C
(). Therefore the loss function L
() has to be intended
as an application L :
1
that maps the consequences into a numerical scale
and allows for a formal comparison between two generic consequences. When it is
possible to compare any two elements in a set, then this set is said to contain a total
preordering. A preordering on a set X is a binary relation R enjoying the following
properties:
1 xRx for every x X (Reectivity)
2 xRx
and x
Rx
xRx
(Transitivity)
If, in addition to (1) and (2) it satises the property:
3 xRx
and x
Rx x = x
(Antisymmetry)
then it is said an ordering. Let assume that contains a total preordering
denoted as _ ("weak preference") and the space of all loss function is denoted as L.
Then, the relation ("less or equal to") on L is said to induce the relation _ on ,
(2.18)
The relation _ meets a generic concept of preference, but it takes a precise
interpretation with respect to the losses. That is, the loss function allows to
formalize the qualitative concept of preference on the consequences. The left side
part of the expression 2.18 can be read as "
is preferable at least as much as
", or
"
is weakly preferable to
". The wish of the decision-maker is to choose an element
such that the corresponding consequence is the most preferable, or equivalently,

the corresponding loss function L
in L
(the set of all possible loss function given

) is minimum. Once dened the loss function the search for a minimum with
respect to is not a trivial problem since a decision does not determine uniquely
the corresponding loss, which is aected by the state of nature as well. To tackle
the issue of the uncertainty about the state of nature the decision-maker needs to
specify a functional K : L

1
, which reduces the conditional loss to a single
number. Finally the decision-maker can determine his optimal decision solving the
following minimization problem:
argmin
K(L
) (2.19)
The functional K is called optimal criterion. As we have seen the classical
framework of decision theory assumes the existence of a loss function through which
the generic consequences (possibly non numeric) are quantied. As pointed out
in Robert (2007) the actual determination of the loss function is often awkward
in practice. This is true especially when or are set with an innite number
of elements. In general the quantication of the consequence of each decision is
challenging since it has to do with the human perception of some "loss" or "gain"
and this is typically non linear and not intuitive. Another important issue in the
framework of decision theory is the determination of the optimal criterion.
Utility theory A solution to determine both the loss function and the optimal
criterion is given by the utility theory developed by Von Neumann (1944). Utility is
dened as the opposite of loss, then the theory could be dened with respect to the
two measures equivalently. Utility is by far more adopted in economics where the
"consequences" are usually measured as "rewards" of an action rather than "losses".
As pointed out in Piccinato (2009) the solution to the problem of specifying a utility
function (or, equivalently a loss function) can be summarized as follows. Adopting
the subjective framework of decision theory for every corresponds a particular
probability distribution on , denoted as Q
; in this context the Q
are usually called

lotteries, in order to highlight the fact that the nal realization is uncertain. Then
assuming a "coherent" set of preferences on the lotteries, a unique utility function is
determined, which implies the use of the expected value as the optimal criterion.
The approach followed by Von Neumann (1944) is to some extent to consider the
problem of determining a utility function from an indirect point of view. That
is, they rst formalize a set of assumptions concerning the utility function which
appeal intuition and can describe a large spectrum of scenarios with reasonable
approximation, second they accept these "rationale" assumptions as axioms and
derive the ensuing conclusions about the relative utility function. The main result
of utility theory is described by the expected utility principle which states that when
the ordering on the preferences satises precise requirements of rationality, then this
ordering can always be represented by means of the expected value operator. Under
certain circumstances
2
the requirements can be grouped on the following three
categories: (1) algebraic (2) archimedean and (3) of substitution. Requirements (1)
contains the properties already stated in statements 2.4.1-2.4.1, namely of reectivity,
transitivity and antisymmetry. They represent natural coherence criteria asking
that a preference has to be indierent with respect to itself (reectivity); that
starting from a three-element chain relation, the relation between the elements
positioning at the extremes can be derived by the relations aecting the common
element (transitivity); nally that there can never be undecidability (antisymmetry).
Requirement (2) reads,
4 if x ~ x
~ x
then , (0, 1) such that: xx
~ x
~ xx
and has the intuitive meaning that if x
is less preferable than x
and if x
is
less preferable than x, then element x
will never be so non-preferable respect to

x
to be not used in a mixture with x resulting more preferable than x
. Finally
requirement (3) reads,
5 if x ~ x
whatever x
A and (0, 1], must be: xx
~ x
Intuitively, if element x is preferable to x
, the preference must be still valid

"substituting" a part of x and x
with any other element x
. While these requirements

need necessarily to be satised in order to "adoperate" the expected utility principle,
numerous others characteristics are usually requested with the aim to provide the
best description of the actual decision-makers preference scale (see Castellani G.
(2005) for a review of utility functions classes).
2.4.2 Maximum expected utility allocation
Given this theoretical framework it is possible to restate the problem of asset
allocation in terms of a decision problem. The action that the investor has to make
is formalized by the vector www R
N
and the "consequence" of the action is the nal
2
when the set of opportunities is composed of random variables with a nite number of determi-
nations
wealth W. Denoting the utility function describing the preferences of the investor as
U(), then the problem of selecting the optimal allocation reduces to the following
expected utility maximization:
www
= argmax
www
E[U(W)] (2.20)
.
Given the constraints imposed by the rationality axioms of the expected utility
principle there are innite choices to describe the preferences of an investor. A
vast literature have focused on the derivation of utility functions starting from the
imposition of specic desirable properties, such as those referring to the absolute
risk aversion:
r(x) =
U
(2)
(x)
U
(1)
(x)
(2.21)
where U
(k)
() denotes the k-th derivative. Besides, under fairly mild regularity
conditions (i.e. innite dierentiability) a utility function U(W) can be approximated
by an innite Taylor expansion around a wealth W
0
:
U(W) =
k=0
_
U
(k)
(W
0
)
k!
(W W
0
)
k
_
(2.22)
This result is important since it allows to link the mean-variance criterion to
a maximization of expected utility. Indeed, assuming that W is the end-of-period
return for an investment, W
0
is the expected return, and the investor has quadratic
utility reading:
U(W) = W
a
2
W
2
(2.23)
,
then the derivatives of order higher than two are null and from 2.22 follows that
the investor is concerned only about the expected return and the variance of return,
as it is the case in the mean-variance paradigm.
2.4.3 Bayesian allocation decision
The expected utility in 2.20 is taken with respect to the probability distribution
function of the source of uncertainty in the portfolio choice problem, namely the next
periods returns of the asset composing the portfolio. Then 2.20 can be re-written
as follows:
www
= argmax
www
_
RRR
T+1
U(r
P
T+1
)f(rrr
T+1
[, )drrr
T+1
(2.24)
Formally, since the distribution in 2.24 will necessarily depend on the unknown
market parameters (, ) the investor needs to integrate out this dependence to
actually compute the expected utility. As we have seen in the classical implementation
the investor naively use estimates of the parameters in place of the true parameter
values, i.e. he ignores estimation risk. The importance of estimation risk in portfolio
optimization has been already discussed, but here we get another insight about the
drawbacks of the classical asset allocation. Indeed, ignoring estimation risk the
investor also fails to ignore the modication of his probabilistic knowledges about
next-period returns, which are dierent respect to the conditional sample distribution.
That is, the uncertainty concerning the true values of the market parameters should
spread apart probabilities and thickens the tails of the distribution for predicting
the next period assets returns. In other words the investor considering market
parameters as unknown, but xed, entities fail to account for both parameter
uncertainty and the ensuing model uncertainty. Both these issues are naturally
taken into account in a Bayesian formulation of the same problem. As we have seen
in the Bayesian framework the unknown parameters are a random variable whose
possible outcomes are described by the posterior probability density function ([yyy).
Now consider the allocation problem 2.24. In order to smoothen the sensitivity of
the allocation function to the parameters it is quite natural to consider the weighted
average of the argument of the optimization 2.20 over all the possible outcomes of
the market parameters:
www argmax
www
_
E[U(r
P
T+1
)]((, )[yyy)d(, ) (2.25)
Consider now the posterior predictive distribution 2.17, which is dened in terms
of the posterior distribution of the parameters. It describes the statistical features of
the market, keeping into account that the value of the parameter is not known with
certainty, i.e. accounting for estimation risk. This thickened predictive distribution
may represents structural uncertainty about bad events. Using the denition of
posterior predictive density in the average allocation 2.25 and exchanging the order
of integration it is immediate to check that the average allocation can be written as
follows:
www
B
= argmax
www
_
RRR
T+1
U(r
P
T+1
)f(RRR
T+1
[
T
)dRRR
T+1
(2.26)
where
T
represents all information available up to time T. Expression 2.26
is called Bayesian allocation decision. It can be argued that a rationale decision
maker chooses an action by maximizing expected utility, the expectation being with
respect to the posterior predictive distribution of the future returns. Because of
computational issues and because the moments of the predictive distribution are
approximated by the moments of the posterior distribution, predictive returns are
often ignored and utility is frequently stated in terms of the model parameters.
Since an eective determination of the predictive distribution can be obtained only
allowing for conditioning on the observations, Bayesian inference turns to be the
most eective methodology in this context. Polson (2000) highlights the dierence
between the posterior variance and the posterior predictive variance. Kan and Zhou
(2007) provide a thorough discussion of the dierence of plug-in and Bayes estimator
of the optimal decision under the parameter-based utility. Their discussion highlights
the dierence between a proper Bayes rule, dened as the decision that maximizes
expected utility, versus a rule that plugs in the Bayes estimate for the weights or
the parameters in the sampling distribution.
Table 2.1. True normal parameters for simulated data used in the non-informative analysis
allocation example
asset A B C D
mean 2.88 6.50 10.88 16.00
sd 10.00 20.00 30.00 40.00
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
V
a
l
u
e
100 94 89 84 79 74 69 64 59 54 49 44 39 34 29 24
Asset A Asset B Asset C Asset D
Figure 2.4. Bayesian non-informative mean-variance allocation for varying sample size.
The rule suggest to allocates increasing wealth on the asset with lowest variance as long
as less observations are available
Non-informative analysis In principle the initial probabilities have to constitute
the synthesis of the informations actually available on the unknown parameter, but
often prior information about the model is too vague or unreliable, and then a full
subjective derivation of the prior distribution is obviously impossible. In general
there cannot be any rule providing a predetermined mathematical form for the
prior distribution. Consequently the prior is usually an element determined case
by case, focusing essentially on the content of the specic problem. With the goal
of developing a Bayesian inferential tool independent from the denition of the
prior law a major direction in Bayesian literature has focused on the construction of
non-informative (or reference, default) priors (see for example Jereys (1961) and
Bernardo (2005)). While any development of a non-informative distribution has
never lead to general consensus Piccinato (2009), there are some relevant results in
this direction (see Robert (2007) for a review).
For T observations of a N-dimensional normal random vector with mean and
covariance matrix ,
YYY
(t)
N
p
(, ), (t = 1, ..., T) (2.27)
the probability density function reads,
f(YYY
t
[,
1
) =
[
1
[
1
2
(2)
p
2
exp
1
2
(XXX
t
)
1
(XXX
t
), (2.28)
for all t = 1, ..., T + 1. The typical non-informative distribution for this model
reads,
(, ) [[
N+1
2
(2.29)
and the ensuing posterior predictive distribution for rrr
T+1
is a multivariate
t-student with T N degrees of freedom and rst two moments reading,
_
E(rrr
T+1
[
T
) =
V(rrr
T+1
[
T
) =
T+1
TN2

(2.30)
where ( ,

) are the sample estimators dened as:
_
=
1
T
T
t=1
YYY
t
=
1
T1
T
t=1
(YYY
t

YYY )(YYY
t

YYY )
(2.31)
In this case, the classical and the Bayesian portfolios are not very dierent from
each other, but the gap increases as long as the sample dataset gets smaller (see
Table 2.1 and Figure 2.4) reecting the smooth process of immunization from risk
as a response to the increased ignorance on the market generating mechanism. To
this extent non-informative analysis is interesting since it allows to isolate the eect
of the new Bayesian framework on portfolio selection (i.e. the two frameworks dier
only at the conceptual level of considering the parameters as random variables, but
are comparable otherwise in terms of information inputted in the process) (see Kan
and Zhou(2003), Klein and Bawa (1976) and Zellner (1971) for non-informative
Bayesian allocation).
Conjugate Analysis Raia and Schlaifer (1961) are among the rst to use
conjugate models for portfolio selection. In a conjugate analysis for the derivation
of a prior distribution one recovers to the choice of an element within a specic
(i.e. conjugate) parametric class. A class of prior probability distributions is said
to be conjugate to a sample model, if the ensuing posterior distribution is of the
same form as the sample model. This transformation is in general easier to interpret
since it aects only the parameters of the distribution, which are "updated" by the
sample evidence. A conjugate model tting the mean-variance framework is the
Normal-inverse-Wishart model (NIW). Denoting as (
0
0
,
0
) the prior knowledge of
the market parameters, following Meucci (2006) the distribution of (, ) is a Normal-
Wishart centered on (
0
0
,
0
) with uncertainty parameters (T
0
,
0
). Explicitly, this
means that the density of [
1
is normal:
([
1
) =
[T
0
1
[
1
2
(2)
p
2
exp
1
2
(
0
0
)
T
0
1
(
0
0
), (2.32)
The density is centered on our prior belief:
E([
1
) =
0
and the parameter T
0
> 0 indicates the condence level in the prior expected
value: the larger T
0
, the more condent we are that the true model parameter is
close to our input
0
.
As for the covariance matrix , we assume that the density of
1
is distributed
as a Wishart:
(
1
) =
[
1
2
0
[
0
2
[
1
[
0
p1
2
exp
1
2
Tr(
0
1
)
p
(
1
2
0
)
(2.33)
This density is also centered on our prior:
E(
1
) =
1
0
,
and the parameter
0
> p1 indicates the condence level in the prior covariance:
the larger
0
, the more condent we are that the true model parameter is close to
our input
0
. The NIW prior distribution can be written as follows:
_
_
_
[ N
p
(
0
,

T
0
)
1
W
p
(
0
,

(1)
0
0
)
(2.34)
The posteriors for the parameters [ and
1
owe to the same class of the
priors, but with updated hyper-parameters:
_
_
T
1
= T
0
+T
1
=

0
T
0
+ xxxT
T
0
+T
1
=
0
+T
1
=
0
0
+TS +
TT
0
T+T
0
( xxx
0
)( xxx
0
)
T
(2.35)
The posterior mean
1
shows how an investor may include in his decision model
additional views about the generating process underlying the market and obtain
posterior distributions which blend this new information with that coming from the
sample data, and smoothly averaging it with respect to the relative condence in
these views.
While the recurse to conjugate analysis cannot be always justied from a con-
ceptual point of view, in practice it constitutes an eective tool to account for
estimation risk and obtain exible allocation, suiting the condence of the investor
in his informations about the market. Moreover, in real-life application the investor
usually needs to nd optimal portfolios comprising many asset with insucient
information. To cope this practical issue interesting researches have been proposed.
The goal here is to achieve a working solution, with the scarce information avail-
able. For example Frost and Savarino (1986) assume a conjugate prior distribution
0 20 40
0
5
1
0
1
5
2
0
2
5
E
x
p
e
c
t
e
d

r
e
t
u
r
n

(
%
)
sample frontiers
0 20 40
0
5
1
0
1
5
2
0
2
5
E
x
p
e
c
t
e
d

r
e
t
u
r
n

(
%
)
bayes frontiers
Figure 2.5. Simulated experiment with T = 60 N-dimensional normal observations: 100
Monte Carlo sample (right) and Bayesian (left) estimated frontiers (blue lines), true
ecient frontier (red lines). The Bayesian estimated frontiers are the result of the
conjugate patterned prior of Frost&Savarino.
on the market parameters which impose identical means, variances and patterned
covariances for all assets returns distributions, leading to a drastic reduction of the
parameters to estimate
3
. These authors have showed that such a prior improves
out-of-sample performance. In Figure 2.5 the Bayesian estimated frontiers ensuing
from the model of Frost and Savarino are compared to those ensuing from the use of
sample estimators, highlighting the great improvement gained in terms of stability
of the mean-variance allocations.
Following this direction, there have been numerous purposes for prior distributions
imposing a specic, homogeneous pattern for the parameters. These approaches
aim to super-impose a predened structure, which contains the minimum needed
information to yield a relevant reduction to the complexity of the problem (i.e. the
number of parameters to estimate) and achieve more stable results, with a modest,
controlled, biasness. This is the well-known trade-o between the eciency and the
unbiasedness of an estimator, which in the Bayesian context can be calibrated to
suit specic, real-life applications.
3
The number of parameters to estimate in a mean-variance context increase more than linearly
with the number of assets since a covariance matrix has
n(n+1)
2
unique parameters
2.4.4 Bayesian paradigm justication
As suggested in Robert (2007) the impact of Bayes theorem is based on the novel
move that puts experimental data and parameters on the same conceptual level,
since both of them are considered random entities. This is indeed revolutionary
and has lead to a clean fracture in the statistical research between those getting
advantage of the "rediscovered" framework and those who refuse it to some extent.
The latter often appeal to the apparent lack of objectivity of the Bayesian paradigm,
caused by the introduction of a "subjective" distribution into the inferential process.
Undoubtedly, the denition of the prior is one of the most critical point of Bayesian
statistics. As we have discussed previously, usually the information contained in the
sample data is not sucient to achieve a working stability for the results and an a
priori structure imposition generate a exible framework potentially yielding better
results in terms of overall performance of an estimator. This point is relevant in
nancial decisions and is well summarized by a passage in Weitzman (2007):
"[...] The peso problem is dened as the nancial equilibrium of a
small-sample situation having a remote chance of a disastrous out-of-
sample happening. In a peso problem, possible future realization of
low-probability bad events that are not included in the too-small sample
are taken into account by real-world investors who conjecture on the
true data-generating process. Naturally, these rare out-of-sample disaster
possibilities are not calibrated since there is no historical data to really
average them. A Bayesian translation of a peso problem is that there
are no enough data to build a reliable posterior distribution based solely
upon sample frequencies (i.e. a posterior that is independent of imposed
priors). In a Bayesian-learning equilibrium where hidden structural
parameters are evolving stochastically, it turns out that asset prices
always depend critically upon subjective prior beliefs and there are never
enough data on frequencies of rare tail events for asset prices to depend
only upon the empirical distribution of past observations [...]"
Having too-small sample dataset is the norm rather than the exception in nance
and frequentist estimators performance strictly rely on long-run (i.e. asymptotic)
results. While a frequentist investor may claim the increasing availability of nancial
data, it is usually neglected that such long time series hardly satisfy the required
stationarity conditions and they should be processed either by tting a more complex
time-varying model (e.g. GARCH models) or considering the sample as a collection
of smaller datasets with dierent probability properties. To reinforce the arguments
in favor of a Bayesian "treating" of the inferential problems, it should be recalled that
following the statistical decision framework presented in Subsection 2.4.1, frequentist
optimal criteria naturally lead to adopt Bayesian decisions. That is, in a decision-
theoretic framework every "good" decision turns out to be a Bayesian decision, even
when there is no sucient prior knowledge to formalize ex-ante a fully subjective
distribution and an objective ad-hoc one is used just as a technical tool. Moreover
the critics moved on the Bayesian reasoning usually focus on inevitable deciencies of
any inferential procedure and which are common to every statistical toolkits, rather
than being a prerogative of the Bayesian one. For instance the criticisms which nd
inadmissible the introduction of external elements, along with the experiment, in
the inferential process seem somewhat futile since neither the experiment alone is
sucient to reach actual, quantitative conclusions about anything without a formal
and "subjective" denition of the overall inferential mechanism (i.e. the selection of
the sample model) (Piccinato, 2009).
23
Chapter 3
Non-normal nancial markets
To reconcile expected utility maximization with mean-variance optimization, it is
usually assumed that nancial markets follow a multivariate gaussian distribution.
The standard model for nancial returns is a sucient approximation for a large
class of nancial phenomena, such as regional stock indexes, mutual funds at long
investment horizons, etc. The mean-variance framework is still a good approximation
for elliptical markets, that is a class of probability laws generalizing the multivariate
normal case. An elliptical distribution retains the important feature of being
completely characterized by a location and a scale parameter, while they do not
necessarily need to correspond to the mean vector and covariance matrix of the
distribution, as in the multivariate normal case. In this cases the mean-variance and
utility optimizations are compatible, no matter the shape of the utility function, since
the innite-dimensional space of moments is reduced to a two-dimensional manifold
parametrized by expected value and variance. Nevertheless, the assumption that a
market is elliptical is very strong. For instance, in high-frequency, derivative or hedge
fund markets, the elliptical assumption cannot be accepted. Besides, with skewed
markets the investor preferences are no more well described by a quadratic utility and
in general the investor would care about higher-order moments of the distribution
of the portfolio return, such as skewness. For example, in non-elliptical markets,
for a given variance, an investor may further trade expected return with positive
skewness, like buying a lottery. Classical mean-variance portfolio optimization does
not provide solutions in the presence of skewness in the market. In a recent study
Xiong (2011) point out that most asset classes returns are not normally distributed,
but the typical Markowitz Mean-Variance Optimization (MVO) framework that has
dominated the asset allocation process for more than 50 years relies on only the rst
two moments of the return distribution. Equally important, considerable evidence
shows that investor preferences go beyond mean and variance to higher moments:
skewness and kurtosis. Investors are particularly concerned about signicant losses,
that is, downside risk, which is a function of skewness and kurtosis. Empirically,
almost all asset classes and portfolios have returns that are not normally distributed.
Many assets return distributions are asymmetrical. In other words, the distribution
is skewed to the left (or occasionally the right) of the mean (expected) value. In
addition, most asset return distributions are more leptokurtic, or fatter tailed, than
are normal distributions. The normal distribution assigns what most people would
24 3. Non-normal nancial markets
characterize as meaninglessly small probabilities to extreme events that empirically
seem to occur approximately 10 times more often than the normal distribution
predicts. Many statistical models have been put forth to account for fat tails. Well-
known examples are the Lvy stable hypothesis (Mandelbrot, 1963), the Students
t-distribution, and a mixture-of-Gaussian-distributions among others. Summarizing,
the strong empirical evidence against the normality of the returns suggests that the
assumption of elliptically distributed asset returns is empirically violated and the
mean-variance analysis needs to be extended. In particular here the focus is given to
the asymmetry in the return distribution, which will have an impact on the portfolio
selection task for the investors that have preference for skewness. Several researchers
have proposed advances to the traditional mean variance theory in order to include
higher moments in the portfolio optimization task (see Athayde (2004) for example).
3.1 Skewness and portfolio selection
Although evidence of skewness and other higher moments in nancial data are
abundant, it is common for skewness to be ignored entirely in practice. Typically
skewness is ignored both in the sampling models and in the assumed utility functions
while it can be claimed that it actually adds a dimension to the mean-variance
framework developed by Markowitz. Consider the two-asset example proposed in
the previous chapter. For each of these two asset portfolios the mean and standard
deviation behave as we would expect, the linear combination of the mean and the
mean of the linear combination are equal and the linear combination of the standard
deviation is greater than the standard deviation of the linear combination (i.e. the
mean is additive and the standard deviation is sub-additive). The skewness is a
dierent matter, as the skewness of the linear combination can be above or below
the linear combination of the skewness. This suggests that an investor that is
interested in skewness must consider an "extended ecient frontier" which includes
the additional dimension of skewness. Indeed, empirically there is strong evidence
that skewness matters in portfolio selection (Harvey, 2010). One eective method
to measure the consequence of including higher moments in the asset allocation
decisions with respect to the classical mean-variance criterion, is to approximate the
expected utility by a Taylor expansion as in 2.22. In particular, as long as concern
skewness, it is possible to consider an investor with cubic utility, which reduces the
innite Taylor expansion to a third-order approximation,
U(W) = U(

W) +
3
k=1
_
U
(k)
(

W
k!
(W

W)
k
_
(3.1)
easy to integrate and to compare with the second-order counterpart. Applying
the expectation operator to both sides of 3.1 and assuming that the investors
"wealth" is completely determined by the end-of-period rate of return r
P
T+1
we obtain
the following expression:
E[U(r
P
T+1
)] = U(E(r
P
T+1
)) +
U
(2)
(E(r
P
T+1
))
2

(2)
+
U
(3)
(E(r
P
T+1
))
6

(3)
(3.2)
3.2 Skewed-elliptical models 25
where
(i)
is the i-th centered moment of r
P
T+1
. According to the prescriptions of
a Bayesian allocation decision, the expected value in 3.2 is taken with respect to the
posterior predictive distribution of rrr
T+1
. Denoting as m
p
, V
p
and S
p
the predictive
expected value, covariance matrix and third-order tensor matrix of next-period
returns for the assets composing the portfolio, the rst three moments of r
P
T+1
can
be written as linear combinations of m
p
, V
p
, S
p
and the allocation vector www and the
expected utility ensuing from a third-order Taylor expansion approximation is given
by:
E(U(r
P
T+1
)[
T
) = www
m
p
U
(2)
(r
P
T+1
)
2!
www
V
p
www +
U
(3)
(r
P
T+1
)
3!
www
S
p
(www www) (3.3)
.
where denotes the kronecker product. Thus the expected utility is related to the
investors preferences (or aversions) towards the second and third (central) moment
of the predictive distribution, whose contribute is directly given by derivatives of
the utility function. Scott and Horvart (1980) have put forward that, under the
assumption of positive marginal utility, decreasing absolute risk aversion at all wealth
levels together with strict consistency for moment preference, one has:
_
U
(k)
(W) > 0 W if k is odd
U
(k)
(W) < 0 W if k is even
Therefore 3.3 can be conveniently rewritten in terms of a risk aversion coecient
() and a preference for skewness coecient (), so that the ensuing optimization
problem reads:
max
www
111=1 www>0
www
m
p

2
www
V
p
www +

6
www
S
p
(www www) (3.4)
.
3.2 Skewed-elliptical models
Skewness is an indicator of the asymmetry of a distribution. In a nancial return
distribution it is a fundamental feature since it can indicate a non-negligible likelihood
for extreme events such as economic downturns (or upturns). The existence of
skewed distributions in the nancial markets makes unreasonable to rely solely on
symmetrical laws which do not capture potential skewness revealed by the data. As
pointed out in Sahu (2003) the class of multivariate elliptical skewed distributions
is convenient to model nancial markets, which are known to be non-normal and
aected by turbulence in both the left and right side. Azzalini (1996) was among
the rst to study extensions of the normal distribution incorporating skewness.
Since then there has been a vast literature studying skewed models and proposing
eective solutions to their denition and implementation (see Ferreira (2007) for
recent developments). One eective class of skewed distributions has been developed
in Sahu (2003). The distributions within this class have the fundamental advantage
to be practically implementable, since they are dened with a convenient hierarchical
structure. The denition of the elements within this class is based on the distribution
of a generic multivariate symmetric elliptical random vector. Suppose is a d d
positive denite matrix, is a vector in R
d
. The random vector XXX is elliptically
distributed with location parameter and scale parameter :
XXX El(, ; g
(d)
) (3.5)
if its probability density function reads,
f(xxx[, ; g
(d)
) = [[
1
2
g
(d)
_
(xxx
T
1
(xxx ))
_
, xxx R
d
(3.6)
where g
(d)
(u) is an application g : R
+
R
+
dened by:
g
(d)
(u) =
(
d
2
)
d
2
g(u; d)
_
0
r
d
2
1
g(r; d)dr
(3.7)
and where g(u; d) is a function g : R
+
R
+
such that the integral in the
denominator of 3.7 exists. Then a skewed random vector YYY within the class of Sahu
is dened as:
YYY = DDDZZZ + (3.8)
where
_
ZZZ El
d
(000, III; g
(d)
)
El
d
(, ; g
(d)
)
(3.9)
D is a d d (diagonal) matrix controlling the skewness of the distribution and
(, ) are the canonical location and scale parameters for the underlying symmetric
elliptical distribution. Marginalizing YYY with respect to ZZZI
(ZZZ>0)
we obtain the desired
generic multivariate skewed elliptical distribution:
YYY SE
d
(, , DDD; g
(d)
) (3.10)
3.2.1 Skewed-normal model
To obtain the skewed normal distribution it is sucient to set
g
(d)
(u) = (2)
d/2
exp(u/2) (3.11)
in 3.9, which equals to assume a multivariate normal distribution on both the
error term and the latent variable ZZZ in 3.8. Marginalizing with respect to the
distribution of ZZZ truncated in the positive real axis we have (see Sahu (2003) for a
demonstration),
YYY SN(, , DDD) (3.12)
In the original derivation of Sahu et Al. (2003) the matrix DDD is diagonal (for
a generalization to a full-rank matrix see Harvey (2010)). Here we will retain the
original denition with a diagonal skewness matrix denoted as DDD(). A d-dimensional
3.3 Simulation-based inference 27
random vector YYY follows an m-variate skewed-normal distribution if its pdf is given
by,
f(yyy[, , DDD()) =2
d
[ +DDD()
2
[
1
2
d
_
( +DDD()
2
)
1
2
(yyy )
_
d
_
(III DDD()
( +DDD()
2
)
1
DDD())
1
2
DDD()
( +DDD()
2
)
1
(yyy )
_
(3.13)
where
d
is the multivariate normal density function with mean zero and identity
covariance, and
d
is a multivariate normal cumulative distribution also with mean
zero and identity covariance. The mean and the covariance matrix are given by,
_
_
_
E(YYY ) = +
_
2
C(YYY ) = + (1
2
)DDD()
2
(3.14)
It is noted that when = 0, the SN distribution reduces to usual normal
distribution. Concerning the third order of the SN distribution, when DDD is diagonal,
it can be represented by a d d
2
tensor matrix with non-zero entries only for the
(iii)-th coordinates, given by,
s
(3)
iii
(YYY ) =
2
3
i
( 4)
3
(3.15)
The SN model has the desirable property that marginal distributions of subsets
of skew normal variables are skew normal (see Sahu (2003) for a proof). Unlike the
multivariate normal density, linear combinations of variables from a multivariate
skewed normal density are not skew normal. While the skew normal is similar
in concept to a mixture of normal random variables, it is fundamentally dierent.
A mixture takes on the value of one of the underlying distributions with some
probability and a mixture of normal random variables results in a Lvy stable
distribution. The skew normal is not a mixture of normal distributions, but it is
the sum of two normal random variables, one of which is truncated, and results in
a distribution that is not Lvy stable. Though it is not stable, the skew normal
has several attractive properties. Because it is the sum of two distributions, it can
accommodate heavy tails for instance and the marginal distribution of any subset
of assets is also skew normal. This is important in the portfolio selection setting
because it insures consistency in selecting optimal portfolio weights. For example,
with short selling not allowed, if optimal portfolio weights for a set of assets are
such that the weight is zero for one of the assets then removing that asset from
the selection process and re-optimizing will not change the portfolio weights for the
remaining assets.
3.3 Simulation-based inference
The resort to simulation-based algorithms is dictated by the complexity of the
posterior distributions ensuing from the assumption of a skewed-normal model for
the sample data. The main idea behind the simulation-based algorithms may be
resumed by the brief historical note reported by Andrieu (2003):
While convalescing from an illness in 1946, Stan Ulam was playing
solitaire. It, then, occurred to him to try to compute the chances that a
particular solitaire laid out with 52 cards would come out successfully.
After attempting exhaustive combinatorial calculations, he decided to go
for the more practical approach of laying out several solitaires at random
and then observing and counting the number of successful plays. This
idea of selecting a statistical sample to approximate a hard combinatorial
problem by a much simpler problem is at the heart of modern Monte
Carlo simulation.
Formally, the idea of Monte Carlo simulation is to draw an IID set of samples
x
(i)
N
i=1
from a target density (x) dened on a high-dimensional space X (e.g. the
support of the random variable with pdf ()). These N samples can be used to
approximate the target density with the following empirical point-mass function
(x)
N
=
1
N
x
(i) (x) (3.16)
where
x
(i) (x) denotes the delta-Dirac mass located at x
(i)
.
Simulation-based approaches turn out to be preferable to deterministic numerical
techniques when the researcher needs to study the details of a likelihood surface
or posterior distribution, or needs to simultaneously estimate several features of
these functions. The fundamental ingredient in the Monte Carlo simulation is the
ability to draw uniform pseudo-random values, a feature by now implemented in
most computer packages. Indeed, starting from this basic random simulation it is
possible to derive draws from the most common probability distribution function
since those distributions can be represented as a deterministic transformation of
uniform random variables. Nevertheless there are many distributions from which it
is dicult, or even impossible, to directly simulate starting from uniform random
deviates. Moreover, in some cases, we are not even able to represent the distribution
in a usable form, such as a transformation or a mixture. In such settings it is possible
to turn to another class of simulation techniques, i.e. an extension of MC methods.
This class is named MCMC (Markov Chain Monte Carlo) and adopt the following
strategy: it generates samples x
(i)
while exploring the state space A using a Markov
chain mechanism. This mechanism is constructed so that the chain spends more
time in the most important regions. In particular, it is constructed so that the
samples x
(i)
mimic samples drawn from the target distribution (x).
3.3.1 Gibbs sampler
The use of Markov Chain Monte Carlo (MCMC) is most often in cases where the con-
struction of an IID sample of points under a target distribution is impossible. Then
a MCMC algorithm points to generate values from an arbitrary target distribution,
starting from the trajectory of Markov chain whose stationary distribution is indeed
the target function. The main idea underlying MCMC is to choose an easy-to-handle
proposal distribution, simulate from it and accept or reject the proposed value de-
pending on how likely is the event that the value has been generated from the target
distribution. This scheme is very general and its most popular implementation is
3.3 Simulation-based inference 29
known as the Metropolis-Hastings algorithm (MH). MH imposes minimal regularity
conditions on both the target and the proposal distribution in play, which make it
a universal algorithm, applying to a multitude of scenarios. Almost every MCMC
algorhitm can be thought as a special case of MH. Besides, the use of more particular
methods may be preferable due to their peculiarities which may better suit the
problem under exam. This is the case of the Gibbs sampler which is particularly
convenient for the inference of multivariate Bayesian hierarchical models. Suppose
we want to draw simulations from a multivariate random vector:
XXX = (X
1
, ..., X
p
) (3.17)
where the X
i
s are either uni- or multidimensional. Moreover we are not able to
draw simulations from its probability density function f, while we do from the full
conditionals f
i
() (i = 1, ...p) of XXX, dened as the probability density functions of the
single components in the random vector conditional on all the remaining elements:
X
i
[x
1
, ..., x
i1
, x
i+1
, ..., x
p
f
i
(x
i
[x
1
, ..., x
i1
, x
i+1
, ..., x
p
), i = 1, ..., p (3.18)
Following Robert (2004) the associated Gibbs sampler is given by:
Given xxx
(t)
= (x
(t)
1
, ..., x
(t)
p
) generate
1 XXX
(t+1)
1
f
1
(x
1
[x
(t)
2
, ..., x
(t)
p
);
2 XXX
(t+1)
2
f
2
(x
2
[x
(t+1)
1
, x
(t)
3
, ..., x
(t)
p
),
...
p XXX
(t+1)
p
f
p
(x
p
[x
(t+1)
1
, ..., x
(t+1)
p
).
The transition from XXX
(t)
to XXX
(t+1)
described by this algorithm builds a Markov
chain whose stationary probability distribution exists by construction and f.
Under fairly general conditions, the chains produced by this algorithm are ergodic,
therefore the use of the chain is fundamentally identical to the use of an IID sample
from f in the sense that the empirical average converges to the actual expected
value:
1
T
h(X
(t)
) E[h(X)] (3.19)
The actual convergence of the chain to this target distribution is guaranteed by
specic properties of the chain such as the irreducibility, recurrence, and aperiodicity,
which can be tested empirically using the R package CODA.
3.3.2 Sampling from the Bayesian skewed-normal model
Since the parameters in 3.13 are not known in practice they should be considered
as random variables in order to account for the uncertainty about their true values
in the optimization process. For this task one possible solution is to build a
conjugate Bayesian model with low condence in the prior parameters (unless
subjective information is available). Let (, ) be the parameters of interest, where
= (
, vec(DDD)
) and vec() forms a vector by stacking the columns of a matrix. In

a non-informative setting the conjugate model may read:
_
N
d(d+1)
(000, 100III
d(d+1)
)
IW(d, dIII
d
)
(3.20)
The ensuing posterior distributions [[yyy], [[yyy] are not analytically tractable,
but they can be approximated by means of stochastic simulation methods. In a
Gibbs sampling framework what is needed to draw multivariate draws from this
random vector is the ability to draw from the full conditionals [[yyy, ] and [[, yyy, ].
Combining the posterior distribution it is possible to obtain an estimate for the
mean, the covariance matrix and the (iii)-th entries of the tensor matrix using
formulas 3.14 and 3.15. Since in a Bayesian framework the expected utility is taken
with respect with the posterior predictive distribution what is actually needed to run
the optimization are the predictive moments m
p
, V
p
and S
p
, which can be written
in terms of the posterior means of the parameters as shown in Harvey (2010):
_
_
m
p
= m
V
p
=

V +V(m[yyy)
S
p
=

S + 3E(V m[yyy) 3E(V [yyy) m
p
E[(mm
p
) (mm
p
)[yyy]
(3.21)
Due to the non-standard form of the likelihood for the SN model the posterior
distributions cannot be directly simulated by a common random generator and
one would need to approximate it with an embedded MCMC step within the main
iteration cycle
1
. Nevertheless, the full conditionals become known probability
laws (due to the conjugacy) conditioning them on the auxiliary random variable ZZZ,
which is the reason why the distribution within the class of Sahu (2003) are easy to
implement. The Gibbs sampler algorithm presented above can be generalized by a
"demarginalisation" (Robert, 2004) construction. This practice can be referred as
data augmentation and it represents the fundamental idea behind the construction
of the elliptical-skewed class. Given two probability density functions f and g such
that
_
Z
g(x, z)dz = f(x) (3.22)
density g is said the completion of f and it can be chosen in such a way that
the full conditionals of g are easy to simulate from. Following this scheme it is
straightforward to obtain draws from the marginal posteriors of and .
1
These methodologies can be referred as hybrid Gibbs samplers
31
Chapter 4
Hedge fund portfolio
application
This example wants to address the new perspectives opened by the simulation-
based methods to solve portfolio optimization problems that adequately account for
estimation risk and non-normal distribution of the reference market. The dataset
used has been downloaded from the HFRI website and is composed of HFRI hedge
funds indices. The methodology to construct the HFRI Hedge Fund Indices is based
on dened and predetermined rules and objective criteria to select and rebalance
components to maximize representation of the Hedge Fund Universe
1
.
4.1 Data description
Hedge Funds are private investment vehicles where the manager is free to operate in
a variety of markets using investment strategies not restricted in short exposures
or leverage. a traditional mutual fund can be characterised as operating in equity
and/or bond markets, having a buy and hold strategy and no leverage. Hedge
funds oer more variety than a traditional mutual fund and therefore the hedge
fund universe is usually segmented in styles. The hedge funds return generating
process is strictly linked to both the location and the style or strategy followed by
the manager. Due to the strategies used, hedge funds returns exhibit typically strong
deviations from the normal distribution. Here we consider four indexes provided by
HFRI which represent dierent investment strategies. The dataset contains monthly
performance observations from August-2001 to August-2011.
HFRX Equity Hedge Index This index is a proxy for the performance of hedge
funds managers who maintain positions both long and short in primarily equity and
equity derivative securities. EH managers would typically maintain at least 50%
exposure to, and may in some cases be entirely invested in, equities, both long and
short.
1
Refer to https://www.hedgefundresearch.com/ for more details on the data used in this study.
32 4. Hedge fund portfolio application
HFRX Event-Driven Index This index is a proxy for the performance of hedge
funds whose managers maintain positions in companies currently or prospectively in-
volved in corporate transactions of a wide variety including but not limited to mergers,
restructurings, nancial distress, tender oers, shareholder buybacks, debt exchanges,
security issuance or other capital structure adjustments. Security types can range
from most senior in the capital structure to most junior or subordinated, and fre-
quently involve additional derivative securities. Event Driven exposure includes
a combination of sensitivities to equity markets, credit markets and idiosyncratic,
company specic developments.
HFRI Relative Value Index This index is a proxy for the performance of
hedge funds whose managers maintain positions in which the investment thesis
is predicated on realization of a valuation discrepancy in the relationship between
multiple securities. Managers employ a variety of fundamental and quantitative
techniques to establish investment theses, and security types range broadly across
equity, xed income, derivative or other security types.
HFRI Macro Index This index is a proxy for the performance of hedge funds
whose managers trade a broad range of strategies in which the investment process is
predicated on movements in underlying economic variables and the impact these have
on equity, xed income, hard currency and commodity markets. Managers employ a
variety of techniques, both discretionary and systematic analysis, combinations of
top down and bottom up theses, quantitative and fundamental approaches and long
and short term holding periods.
Time
E
q
H
20020101 20100101
1
0
0
Time
R
e
l
V
a
l
20020101 20100101
8
0
Time
E
v
D
r
i
v
e
n
20020101 20100101
8
0
Time
M
a
c
r
o
20020101 20100101
2
2
6
Figure 4.1. Hedge Fund Strategies Rates-of-Return from August-2001 to August-2011
4.1 Data description 33
4.1.1 Univariate statistics
We rst investigate the statistical properties of the marginal distributions for the
four time series of hedge fund performance data. In Table 4.1 we have reported the
mean, standard deviation and skewness of the single variables. Except for the Macro
strategy they all exhibit asymmetry on the left tail, that is there is an "abnormal"
concentration of very poor performances with respect to exceptional ones. In other
terms the rst three distributions are left-skewed, which means that an extreme
low performance is far more likely than an extreme high performance. Recall that
the dataset covers the turbulent period of the global nancial recession, then these
results are not surprising. On the other hand the Macro strategy exhibit a slight
right-skewed distribution, which is the result of a good response to the crisis, where
the index does not fall down dramatically. Asymmetries are rather common in
nancial markets and especially in the hedge fund environment, given that the
strategies are usually built to outperform (rather than mimic, as it is often the case
for a mutual fund for example) a benchmark index and this is done biasing the
allocations, following thesis which try to discover hidden values and inevitably lead
to strong departures from neutral positions.
Table 4.1. Mean, standard deviation and skewness for monthly rate-of-return time series
from August 2001 to August 2011
EqH RelVal EvDriven Macro
mean 0.44 0.54 0.61 0.65
sd 2.44 1.38 2.02 1.53
skewness -0.97 -2.85 -1.25 0.22
In Figure 4.2 the histograms clearly show these features and the ensuing poor
tting provided by the normal laws on the tails of the empirical distributions.
To test the normality in Table 4.2 we have reported the Shapiro-Wilk statistic
and the relative p-values, which test the null hypothesis that a sample came from a
normally distributed population. Given the p-values close to zero for the rst three
distributions we can fairly reject the assumption of normality, while this cannot be
rejected for the Macro index. The last distribution is indeed the one which better
resemble a bell curve and the (positive) asymmetry is not so pronounced to return a
negative normality test. Nevertheless, considered jointly, the non-normality is rather
pronounced in this dataset.
Table 4.2. Shapiro-Wilk normality test
Statistic p-value
EqH 0.9479 0.000142
RelVal 0.7614 9.656 10
13
EvDriven 0.9261 5.140 10
6
Macro 0.9238 0.8712
EqH
D
e
n
s
i
t
y
10 5 0 5
0
.
0
0
0
.
2
0
RelVal
D
e
n
s
i
t
y
8 4 0 2 4
0
.
0
0
.
4
EvDriven
D
e
n
s
i
t
y
8 4 0 4
0
.
0
0
0
.
2
5
Macro
D
e
n
s
i
t
y
2 0 2 4 6
0
.
0
0
0
.
2
0
Figure 4.2. Univariate histograms, kernel densities (blue solid line) and tted normal
densities (red dashed lines)
The QQ-plots corroborate the results obtained with the normality tests.
4.1.2 Multivariate statistics
To test the likelihood of the assumptions underlying the mean-variance framework it
is necessary to go beyond the univariate statistics and check the way how the samples
co-move together. In Figure 4.4 the multivariate normality is tested graphically
using a QQ-plot for the Mahalanobis distance d
2
= (xxx )
1
(xxx ), which in
the case of multivariate normality has an asymptotic chi-square distribution. In
addition a multivariate version of the Shapiro test is provided in Table 4.3. Both
the graphical and the hypothetical test lead to amply reject the multivariate normal
distribution.
Table 4.3. Multivariate Shapiro test
Statistic p-value
0.7319 1.424 10
13
To further investigate the multivariate characteristics of the sample in 4.5 we
have reported the tted bivariate normal level curves against the actual data points.
As we can see the bivariate normal level curves fail to include many points, which
lie in the corners of the graphs. In particular the scatterplots including the Macro
4.2 Model implementation 35
2 0 1 2
1
0
0
EqH
Theoretical Quantiles
S
a
m
p
l
e

Q
u
a
n
t
i
l
e
s
2 0 1 2
8
0
RelVal
S
a
m
p
l
e

Q
u
a
n
t
i
l
e
s
2 0 1 2
8
0
EvDriven
S
a
m
p
l
e

Q
u
a
n
t
i
l
e
s
2 0 1 2
2
2
6
Macro
S
a
m
p
l
e

Q
u
a
n
t
i
l
e
s
Figure 4.3. QQ-plots for testing univariate normality
index contain points in the top-right corner, meaning that positive extreme events
for this index are correlated with the exceptional performance of the remaining
indexes. Moreover, the graphs not including the Macro index exhibit the opposite
behavior, that is a left-tail dependence. This can be explained by the existence of
co-skewness and/or much fatter tails than the normal approximation. Therefore
the graphical tool may suggest to consider more complex models, accounting for
co-skewness and/or kurtosis in the sample data. These models will not be considered
here.
4.2 Model implementation
The empirical evidence of a non-normal market is formalized adapting the skewed-
normal model 3.12-3.13 to the data. In order to account for estimation risk and
coherently base the nal optimal decision on the (posterior) predictive distribution of
the investors objective we choose to use a Bayesian model within the non-informative
framework 3.20. To obtain the needed posterior distributions for the parameters
of interest we implement the model using the BUGS software. We ran a total of
150000 iterations, discarding the rst fty thousand (burn-in = 50000) and picking
one iteration out of ve (thin = 5), for a total of 20000 nal draws from the posterior
distributions. Two chains are run in parallel in order to check for convergence. In
Figures 4.6-4.7 we have reported the traceplots for the couples of parallel chains
obtained for the posterior mean [mmm[yyy] and variances [VVV [yyy] of the model. The chains
are obtained combining the chains of and DDD according to the formulas for the
0 5 10 15
0
1
0
2
0
3
0
4
0
Theoretical quantiles
2
=4
S
a
m
p
l
e

q
u
a
n
t
i
l
e
s
Figure 4.4. QQ-plot for the Mahalanobis distances in the sample data
moments of the skewed-normal distribution given in 3.14. In both gures the parallel
chains are fairly identical indicating that they "forgot" their initial values (i.e. the
chains suggest recurrence). Besides, the chains exhibit a good mixing indicating
that they switch from one state to another easily, possibly visiting all the important
regions of the denition set (i.e. the chains suggest irreducibility). These graphical
diagnostics are important to assess the convergence of the MCMC algorithm and
justify the use of the simulated draws as an IID sample from the posterior distribution
of interest. Along with the graphical diagnostics there exist more quantitative tools
to assess the convergence of the algorithm. These statistical tests are provided in the
Appendix A while the summaries of the MCMC posteriors are listed in Table 4.4.
4.3 Portfolio weights
Since we are primarily concerned in measuring the eect of including asymmetry
in the asset allocation, we assume a third-order expected utility as presented in
3.4. The predictive moments needed to compute such expected utility are retrieved
from the (MCMC) posterior means using formulas 3.14-3.15 and 3.21. The expected
utility is then maximized using numerical methods implemented in the software R
2
. In Figures 4.10a-4.10b and 4.11a-4.11b we have reported the barplots indicating
the maximum expected utility allocation for an investor with xed risk aversion
coecient and varying preference for skewness for dierent values of the risk aversion
coecient. The left-most bar corresponds to the allocation with zero preference
2
the package used to run the non-linear constrained optimization is named alabama
4.4 Out-of-sample performance 37
10 0 5
4
0
4
EqH
R
e
l
V
a
l
10 0 5
4
0
4
EqH
E
v
D
r
i
v
e
n
10 0 5
2
2
4
6
EqH
M
a
c
r
o
8 4 0 4
4
0
4
RelVal
E
v
D
r
i
v
e
n
8 4 0 4
2
2
4
6
RelVal
M
a
c
r
o
8 4 0 4
2
2
4
6
EvDriven
M
a
c
r
o
Figure 4.5. Fitted bivariate normal level curves (up to the 99-th percentile) against
scatter-plot of actual data points for all the six couple of variables
for skewness, then it coincides with the mean-variance allocation. As far as the
investor is more inclined to trade skewness, the allocation changes with respect to
the mean-variance one, favoring the asset with the highest skewness value (Macro
index in this case). This behavior is more clear when the coecient of preference for
skewness is big with respect to the aversion to risk coecient.
4.4 Out-of-sample performance
To empirically test the strategy explicitly including the third order predictive moment
in the allocation decision, we ran an out-of-sample performance analysis over 60
months, using a rolling estimation window of 60 monthly observations (i.e. 5 years).
For each period we derive the minimum variance portfolio and the best variance-
skewness portfolio with identical coecient for risk aversion and skewness preference
( = = 10). This means that we are neglecting the contribute of the expected
value here. Therefore the two investors are assumed to be indierent with respect
to the expected return of the investment, while they are fully concerned about
the risk of the investment, identied in the variance for the MV investor and in a
combination of variance and skewness for the skewness-sensitive investor. It is worth
10000 20000 30000
0
.
0
Iterations
10000 20000 30000
0
.
0
0
.
6
Iterations
10000 20000 30000
0
.
0
1
.
0
Iterations
10000 20000 30000
0
.
2
1
.
0
Iterations
Figure 4.6. Traceplots for two parallel MCMC simulations from the posterior distribution
of the mean vector m = +
_
2
: m
1
(top-left), m
2
(top-right), m
3
(bottom-left),
m
4
(bottom-right).
noting that we are considering agents with an investment horizon of one month,
which is a reasonable holding period in an asset allocation problem. In Figure 4.14
we have reported the sixty realized returns obtained for the two investors and the
development of the compounded return assuming an initial budget of 1. The graphs
are positioned horizontally and they are on the same scale, then it is easy to check
that the mean-variance investor incurs in much lower realized returns during the ve
years of trading, which do not seem to be compensated by much greater realized
returns with respect to the skewness-sensitive investor. In terms of the compounded
returns, the picture is much more clear, showing how the mean-variance strategy
could be very unstable, with very high prots followed by very dramatic losses, while
accounting for skewness in this case turns out to be a much more calm strategy,
immune to big downturns of the economy, at least with respect to the Markowitz
strategy. In Figure 4.15 we plotted the compounded returns at a dierent scaling,
and in a unique graph, in order to show how the MV strategy starts to lose earlier
and how the MVS strategy follows to some extent the behavior of the MV strategy,
but with a much higher degree if immunization from extreme left-tail events.
10000 20000 30000
4
7
1
0
Iterations
10000 20000 30000
1
.
0
2
.
5
Iterations
10000 20000 30000
3
5
7
Iterations
10000 20000 30000
1
.
5
3
.
5
Iterations
Figure 4.7. Traceplots for two parallel MCMC simulations from the posterior distribution
of the diagonal components of the covariance matrix: +(1
2
)DDD()
2
: V
11
(top-left),
V
22
(top-right), V
33
(bottom-left), V
44
(bottom-right).
0.5 0.0 0.5 1.0 1.5
0
.
0
1
.
0
N = 40000 Bandwidth = 0.0239
D
e
n
s
i
t
y
0.0 0.2 0.4 0.6 0.8 1.0
0
.
0
1
.
5
3
.
0
N = 40000 Bandwidth = 0.01264
D
e
n
s
i
t
y
0.0 0.5 1.0
0
.
0
1
.
0
2
.
0
N = 40000 Bandwidth = 0.02
D
e
n
s
i
t
y
0.0 0.4 0.8 1.2
0
.
0
1
.
0
2
.
0
N = 40000 Bandwidth = 0.01603
D
e
n
s
i
t
y
Figure 4.8. Kernel densities for the aggregated MCMC chain for mmm: m
1
(top-left), m
2
(top-right), m
3
(bottom-left), m
4
(bottom-right).
4 5 6 7 8 9 10
0
.
0
0
.
2
0
.
4
N = 40000 Bandwidth = 0.08328
D
e
n
s
i
t
y
1.0 1.5 2.0 2.5 3.0
0
.
0
1
.
0
N = 40000 Bandwidth = 0.02296
D
e
n
s
i
t
y
3 4 5 6 7
0
.
0
0
.
4
0
.
8
N = 40000 Bandwidth = 0.05704
D
e
n
s
i
t
y
2 3 4 5
0
.
0
0
.
4
0
.
8
N = 40000 Bandwidth = 0.03994
D
e
n
s
i
t
y
Figure 4.9. Kernel densities for the aggregated chain for diagonal components of VVV : V
11
(top-left), V
22
(top-right), V
33
(bottom-left), V
44
(bottom-right).
Allocation with = 5 Allocation with = 5
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
V
a
l
u
e
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
(a)
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
V
a
l
u
e
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
(b)
Figure 4.10. Asset allocation for increasing values of preference for skewness (abscissa)
and xed aversion to risk
Table 4.4. WinBugs summary statistics for the parameters of interest
Mean SD Naive SE Time-series SE
1
-0.41 0.63 0.01 0.08
2
-0.96 0.14 0.00 0.00
3
-0.81 0.24 0.00 0.02
4
1.68 0.60 0.01 0.06
11
5.83 0.79 0.01 0.01
12
2.45 0.37 0.00 0.01
13
4.62 0.63 0.01 0.01
14
1.90 0.41 0.00 0.01
21
2.45 0.37 0.00 0.01
22
1.28 0.24 0.00 0.00
23
2.07 0.31 0.00 0.00
24
0.61 0.20 0.00 0.00
31
4.62 0.63 0.01 0.01
32
2.07 0.31 0.00 0.00
33
3.85 0.54 0.01 0.01
34
1.35 0.33 0.00 0.00
41
1.90 0.41 0.00 0.01
42
0.61 0.20 0.00 0.00
43
1.35 0.33 0.00 0.00
44
1.41 0.39 0.00 0.01
1
0.76 0.53 0.01 0.06
2
1.28 0.15 0.00 0.01
3
1.25 0.26 0.00 0.02
4
-0.70 0.49 0.00 0.05
deviance 1692.82 107.85 1.08 4.51
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
V
a
l
u
e
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
(a)
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
V
a
l
u
e
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
(b)
Figure 4.11. Asset allocation for increasing values of preference for skewness (abscissa)
and xed aversion to risk
MV allocation 2006/08 2009/02 MV allocation 2006/08 2009/02
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
V
a
l
u
e
2006/08 2006/12 2007/04 2007/08 2007/12 2008/04 2008/08 2008/12
(a)
MV allocation 2009/03 2011/08 MV allocation 2009/03 2011/08
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
V
a
l
u
e
2009/02 2009/06 2009/10 2010/02 2010/06 2010/10 2011/02 2011/06
(b)
Figure 4.12. Minimum-variance asset allocation along a period of ve years using a rolling
estimation window of W = 60 months: from August-2006 to February-2009 (left), from
March-2009 to August-2011 (right)
MVskewed allocation 2006/08 2009/02 MVskewed allocation 2006/08 2009/02
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
V
a
l
u
e
2006/08 2006/12 2007/04 2007/08 2007/12 2008/04 2008/08 2008/12
(a)
MVskewed allocation 2009/03 2011/08 MVskewed allocation 2009/03 2011/08
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
V
a
l
u
e
2009/02 2009/06 2009/10 2010/02 2010/06 2010/10 2011/02 2011/06
(b)
Figure 4.13. Minimum-variance-skewness asset allocation with = = 10 along a period
of ve years using a rolling estimation window of W = 60 months: from August-2006 to
February-2009 (left), from March-2009 to August-2011 (right)
0 10 30 50
8
0
0
2
0
0
4
0
0
Index
R
e
a
l
i
z
e
d

r
e
t
u
r
n
0 10 30 50
8
0
0
2
0
0
4
0
0
Index
R
e
a
l
i
z
e
d

r
e
t
u
r
n
0 10 30 50
2
e
+
0
8
4
e
+
0
8
Index
C
o
m
p
o
u
n
d
e
d

r
e
t
u
r
n
0 10 30 50
2
e
+
0
8
4
e
+
0
8
Index
C
o
m
p
o
u
n
d
e
d

r
e
t
u
r
n
Figure 4.14. Out-of-sample analysis: realized returns for the MV (top-left) and the MVS
(top-right) strategies, and the compounded returns for the MV (bottom-left) and MVS
(bottom-right) strategies, over the period 08/2006 - 08/2011
0 10 20 30 40 50 60
1
e
+
0
6
0
e
+
0
0
1
e
+
0
6
2
e
+
0
6
3
e
+
0
6
Index
C
o
m
p
o
u
n
d
e
d

r
e
t
u
r
n
Figure 4.15. Compounded returns traceplot at lower scale: MVS prot&loss (darkred
solid line), MV prot&loss (green points)
45
Chapter 5
Conclusions
The analysis provided in this study contributes to gather empirical evidence on
the need to account for higher-order moments in asset allocation decisions. This is
done by elaborating an in-sample and an out-of-sample analysis for a real hedge-
fund dataset. The computational issues arising when departing from the standard
normal model are overcome using a simulation-based algorithm able to provide
valid approximations to inferential problems otherwise intractable. Besides, the
thesis contributes to argue in favor of the Bayesian paradigm as a rm point in the
portfolio optimization process due to the coherent framework that provides to handle
statistical decision problems and the fair generalization of the classical/frequentist
inferential procedures, vital to build operative tools to solve complex problems
and for the inclusion of private information in the optimization process. In this
thesis some simplications have been made. First, our information is restricted to
past returns. That is, investors make decisions based on past returns and do not
use other conditioning information such as macro-economic variables. Second, the
portfolio choice problem examined is a static one. There is a growing literature
that considers the more challenging dynamic asset allocation problem that allows
for portfolio weights to change with investment horizon, labor income and other
economic variables. Third, the potential presence of co-skewness and kurtosis in
the sample data and in the investors preferences drivers are not taken into account.
Therefore many progresses can still be made in future researches.
47
Appendices
49
Appendix A
MCMC diagnostics
A.1 Gelman-Rubin diagnostic
Gelman and Rubins (1992) approach to monitoring convergence is based on de-
tecting when the Markov chains have forgotten their starting points, by comparing
several sequences drawn from dierent starting points and checking that they are
indistinguishable. Using overlaid traceplots is possible to gain a qualitative infor-
mation, while this test constitutes a more quantitative approach. Approximate
convergence is diagnosed when the variance between the dierent sequences is no
larger than the variance within each individual sequence. In the limit both variances
approach the true variance of the distribution, but from opposite directions. One
can now monitor the convergence of the Markov chain by estimating the factor by
which the conservative estimate of the distribution might be reduced: that is, the
ratio between the estimated upper and lower bounds for the standard deviation
of random variable, which is called estimated potential scale reduction or shrink
factor. As the simulation converges, this quantity declines to 1, meaning that the
parallel Markov chains are essentially overlapping. If the shrink factor is high, then
one should proceed with further simulations. The Gelman and Rubin diagnostics
calculated by the R package CODA are the 50% and 97.5% quantiles of the sampling
distribution for the shrink factor. These quantiles are estimated from the second
half of each chain only. In Tables A.1-A.2 and Figures A.1-A.2 we have reported the
results of Gelman-Rubin diagnostic test for two parallel chains of the mean vector
mmm and the diagonal elements of the covariance matrix VVV .
A.2 Geweke diagnostic
Geweke (1992) proposes a convergence diagnostic based on standard time-series
methods. The chain is divided into 2 parts containing the rst 10% and the last
50% of the iterations. If the whole chain is stationary, the means of the values early
and late in the sequence should be similar. The convergence diagnostic Z is the
dierence between the 2 means divided by the asymptotic standard error of their
dierence. As n , the sampling distribution of Z goes to N(0; 1) if the chain
has converged. Hence values of Z which fall in the extreme tails of N(0; 1) indicate
that the chain has not yet converged. In A.3-A.4 and A.3-A.4 we have reported the
50 A. MCMC diagnostics
Geweke diagnostic results for the aggregated chains of the mean vector mmm and the
diagonal elements of the covariance matrix VVV .
Table A.1. Gelman and Rubin convergence diagnostic for mmm: shrink factor point estimate
(left column), upper bound condence interval (right column)
Point est. Upper C.I.
m
1
1.00 1.01
m
2
1.00 1.00
m
3
1.00 1.01
m
4
1.00 1.00
10000 20000 30000
1
.
0
1
.
6
last iteration in chain
s
h
r
i
n
k

f
a
c
t
o
r
median
97.5%
m1
10000 20000 30000
1
.
0
1
.
8
s
h
r
i
n
k

f
a
c
t
o
r
median
97.5%
m2
10000 20000 30000
1
.
0
1
.
4
s
h
r
i
n
k

f
a
c
t
o
r
median
97.5%
m3
10000 20000 30000
1
.
0
0
0
1
.
0
3
0
s
h
r
i
n
k

f
a
c
t
o
r
median
97.5%
m4
Figure A.1. Gelman plot for mmm: median and 97.5-th quantile of shrink factor sample
distribution against number of iterations
A.2 Geweke diagnostic 51
Table A.2. Gelman-Rubin diagnostic for VVV : shrink factor point estimate (left column),
Upper bound condence interval (right column)
Point est. Upper C.I.
v1 1.00 1.00
v2 1.00 1.00
v3 1.00 1.00
v4 1.00 1.01
Table A.3. Geweke diagnostic for mmm: average Z-scores
m
1
m
2
m
3
m
4
0.3163 0.1699 0.1470 -0.1334
0 5000 15000
1
.
0
0
1
.
2
0
s
h
r
i
n
k

f
a
c
t
o
r
median
97.5%
v1
0 5000 15000
1
.
0
2
.
0
s
h
r
i
n
k

f
a
c
t
o
r
median
97.5%
v2
0 5000 15000
1
.
0
0
1
.
0
4
s
h
r
i
n
k

f
a
c
t
o
r
median
97.5%
v3
0 5000 15000
1
3
5
s
h
r
i
n
k

f
a
c
t
o
r
median
97.5%
v4
Figure A.2. Gelman plot for VVV : median and 97.5-th quantile of shrink factor sample
distribution against number of iterations
Table A.4. Geweke diagnostic test VVV
V
1
V
2
V
3
V
4
-1.893 -2.911 -1.233 1.014
52 A. MCMC diagnostics
0 5000 15000
2
0
2
First iteration in segment
Z
s
c
o
r
e
m1
0 5000 15000
2
0
2
Z
s
c
o
r
e
m2
0 5000 15000
2
0
2
Z
s
c
o
r
e
m3
0 5000 15000
2
0
2
Z
s
c
o
r
e
m4
Figure A.3. Geweke plot for mmm: Z-scores against number of discarded iterations from the
beginning of the chain
0 5000 15000
2
0
2
Z
s
c
o
r
e
v1
0 5000 15000
3
0
3
Z
s
c
o
r
e
v2
0 5000 15000
2
0
2
Z
s
c
o
r
e
v3
0 5000 15000
2
2
Z
s
c
o
r
e
v4
Figure A.4. Geweke Z-scores plot for VVV : Z-scores against number of discarded iterations
from the beginning of the chain
53
Bibliography
Athayde, F. (2004). Finding the maximum skewness portfolio - a general solution to
three moments portfolio choice. Journal of Economic Dynamics and Control.
Azzalini, D. V. (1996). The multivariate skewed-normal distribution. Biometrika.
Bernardo, J. (2005). Reference analysis. Handbook of Statistics.
Best, M.J., G. R. (1991). On the sensitivity of mean-variance-ecient portfolios to
changes in asset means: some analytical and computational results. Review of
Financial Studies.
Castellani G., De Felice M., M. F. (2005). Manuale di nanza. Il Mulino.
Chen, L. X. (2011). Mean-variance portfolio optimization when means and covari-
ances are unknown. The Annals of applied statistics.
Chopra, V.K., Z. W. (1993). The eect of errors in means, variances, and covariances
on optimal choice. The journal of portfolio management.
De Miguel, V., G. L. U. R. (2007). Optimal versus naive diversication: how
inecient is the 1/n portfolio strategy? The review of nancial studies.
Ferguson (1967). Mathematical statistics. A decision theoretic approach. Academic
Press.
Ferreira, S. (2007). A new class of skewed multivariate distributions with applications
to regression analysis. Statistica sinica.
Harvey, C. (2010). Portfolio selection with higher moments. Quantitative nance.
Jereys (1961). Theory of probability. Clarendon Press.
Mandelbrot, B. (1963). The variation of certain speculative prices. The journal of
business.
Mandelbrot, B.; Hudson, R. (2006). The (mis)behavior of markets. A fractal view of
nancial turbulence. Basic Books.
Mantegna R., S. H. (2000). An introduction to econophysics: correlation and
complexity in nance. Biddies Ltd, Guildford & Kings Lynn.
Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7:7791.
54 Bibliography
McNeil, A. J., F. R. E. P. (2005). Quantitative Risk Management: Concepts,
Techniques and Tools. Princeton Uuniversity Press.
Meucci, A. (2005). Risk and Asset Allocation. Springer-Verlang.
Michaud (1989). The markowitz optimization enigma: Is "optmized" optimal?
Piccinato, L. (2009). Metodi per le Decisioni Statistiche. Springer-Verlag.
Polson, T. (2000). Bayesian portfolio selection: an empirical analysis of the s&p500
index 1970-1996. Journal of Business & Economic Statistics.
Robert, C. P., C. G. (2004). Monte Carlo Statistical Methods. Springer.
Robert, C. P. (2007). The Bayesian Choice. Springer.
Sahu, D. (2003). A new class of multivariate skew distributions with applications to
bayesian regression models.
Savage, L. J. (1954). The Foundations of Statistics. John Wiley & Sons.
Von Neumann, J., M. O. (1944). Theory of games and economic behavior. Princeton
University Press.
Wald (1950). Statistical decision functions. Wiley.
Weitzman (2007). Subjective expectations and asset-return puzzles.
Xiong, J.X., I. T. (2011). The impact of skewness and fat tails on the asset allocation
decision. Financial Analyst Journal.

Bayesian Statistics and MCMC Methods For Portfolio Selection

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Bayesian Statistics and MCMC Methods For Portfolio Selection

Загружено:

Авторское право:

Доступные форматы

Bayesian statistics and MCMC methods for portfo-

w), (t = 1, ..., T). (2.6)

() which evaluates the

is an appropriate -algebra of subsets of .

(). Therefore the loss function L

is preferable at least as much as

". The wish of the decision-maker is to choose an element

such that the corresponding consequence is the most preferable, or equivalently,

(the set of all possible loss function given

; in this context the Q

are usually called

then , (0, 1) such that: xx

and has the intuitive meaning that if x

is less preferable than x

will never be so non-preferable respect to

to be not used in a mixture with x resulting more preferable than x

A and (0, 1], must be: xx

Intuitively, if element x is preferable to x

, the preference must be still valid

with any other element x

. While these requirements

) and vec() forms a vector by stacking the columns of a matrix. In

Вам также может понравиться