Вы находитесь на странице: 1из 280

Volatility (finance)

From Wikipedia, the free encyclopedia


Jump to: navigation, search
This article does not cite any references or sources.
Please help improve this article by adding citations to reliable sources. Unsourced material may be
challenged and removed. (February 2009)

In finance, volatility most frequently refers to the standard deviation of the continuously
compounded returns of a financial instrument within a specific time horizon. It is used to
quantify the risk of the financial instrument over the specified time period. Volatility is normally
expressed in annualized terms, and it may either be an absolute number ($5) or a fraction of the
mean (5%).
[edit] Volatility terminology
Volatility as described here refers to the actual current volatility of a financial instrument for a
specified period (for example 30 days or 90 days). It is the volatility of a financial instrument
based on historical prices over the specified period with the last observation the most recent
price. This phrase is used particularly when it is wished to distinguish between the actual current
volatility of an instrument and
• actual historical volatility which refers to the volatiltiy of a financial instrument over a
specified period but with the last observation on a date in the past
• actual future volatilty which refers to the volatility of a financial instrument over a
specified period starting at the current time and ending at a future date (normally the
expiry date of a option)
• historical implied volatility which refers to the implied volatility observed from
historical prices of the financial instrument (normally options)
• current implied volatility which refers to the implied volatility observed from current
prices of the financial instrument
• future implied volatility which refers to the implied volatility observed from future
prices of the financial instrument
For a financial instrument whose price follows a Gaussianrandom walk, or Wiener process, the
width of the distribution increases as time increases. This is because there is an increasing
probability that the instrument's price will be farther away from the initial price as time increases.
However, rather than increase linearly, the volatility increases with the square-root of time as
time increases, because some fluctuations are expected to cancel each other out, so the most
likely deviation after twice the time will not be twice the distance from zero.
Since observed price changes do not follow Gaussian distributions, others such as the Levy
Distribution are often used.[1] These can capture attributes such as "fat tails" although their
variance remains finite.
[edit] Volatility for market players
When investing directly in a security, volatility is often viewed as a negative in that it represents
uncertainty and risk. However, with other investing strategies, volatility is often desirable. For
example, if an investor is short on the peaks, and long on the lows of a security, the profit will be
greatest when volatility is highest.
In today's markets, it is also possible to trade volatility directly, through the use of derivative
securities such as options and variance swaps. See Volatility arbitrage.
[edit] Volatility versus direction
Volatility does not measure the direction of price changes, merely how dispersed they are
expected to be. This is because when calculating standard deviation (or variance), all differences
are squared, so that negative and positive differences are combined into one quantity. Two
instruments with different volatilities may have the same expected return, but the instrument with
higher volatility will have a larger swings in values at the end of a given period of time.
For example, a lower volatility stock may have an expected (average) return of 7%, with annual
volatility of 5%. This would indicate returns from approximately -3% to 17% most of the time
(19 times out of 20, or 95%). A higher volatility stock, with the same expected return of 7% but
with annual volatility of 20%, would indicate returns from approximately -33% to 47% most of
the time (19 times out of 20, or 95%)
Volatility is a poor measure of risk, as explained by Peter Carr, "it is only a good measure of risk
if you feel that being rich then being poor is the same as being poor then rich".
[edit] Volatility over time
Although the Black Scholes equation assumes predictable constant volatility, none of these are
observed in real markets, and amongst the models are are Bruno Dupire's Local Volatility,
Poisson Process where volatility jumps to new levels with a predictable frequency, and the
increasingly popular Heston model of Stochastic Volatility.[2]
It's common knowledge that types of assets experience periods of high and low volatility. That
is, during some periods prices go up and down quickly, while during other times they might not
seem to move at all.
Periods when prices fall quickly (a crash) are often followed by prices going down even more, or
going up by an unusual amount. Also, a time when prices rise quickly (a bubble) may often be
followed by prices going up even more, or going down by an unusual amount.
The converse behavior, 'doldrums' can last for a long time as well.
Most typically, extreme movements do not appear 'out of nowhere'; they're presaged by larger
movements than usual. This is termed autoregressive conditional heteroskedasticity. Of course,
whether such large movements have the same direction, or the opposite, is more difficult to say.
And an increase in volatility does not always presage a further increase—the volatility may
simply go back down again.
[edit] Mathematical definition
The annualized volatility σ is the standard deviation of the instrument's logarithmic returns in a
year.
The generalized volatility σT for time horizon T in years is expressed as:
Therefore, if the daily logarithmic returns of a stock have a standard deviation of σSD and the
time period of returns is P, the annualized volatility is

A common assumption is that P = 1/252 (there are 252 trading days in any given year). Then, if
σSD = 0.01 the annualized volatility is

The monthly volatility (i.e., T = 1/12 of a year) would be

Note that the formula used to annualize returns is not deterministic, but is an extrapolation valid
for a random walk process whose steps have finite variance. Generally, the relation between
volatility in different time scales is more complicated, involving the Lévy stability exponent α:

If α = 2 you get the Wiener process scaling relation, but some people believe α < 2 for financial
activities such as stocks, indexes and so on. This was discovered by Benoît Mandelbrot, who
looked at cotton prices and found that they followed a Lévy alpha-stable distribution with
α&nsbp;= 1.7. (See New Scientist, 19 April, 1997].) Mandelbrot's conclusion is, however, not
accepted by mainstream financial econometricians.
[edit] Quick-and-dirty (percentage) volatility measurement
Suppose you notice that a market price index which is approximately 10,000, moves about 100
points a day on average. This would constitute a 1% (up or down) daily movement.
To annualize this, you can use the "rule of 16", that is, multiply by 16 to get 16% as the overall
(annual) volatility. The rationale for this is that 16 is the square root of 256, which is
approximately the number of actual trading days in a year. This uses the statistical result that the
standard deviation of the sum of n independent variables (with equal standard deviations) is ∛n
times the standard deviation of the individual variables.
It also takes the average magnitude of the observations as an approximation to the standard
deviation of the variables. Under the assumption that the variables are normally distributed with
mean zero and standard deviation σ, the expected value of the magnitude of the observations is
√(2/π)σ = 0.798σ, thus the observed average magnitude of the observations may be taken as a
rough approximation to σ.

Wiener process
A single realization of a one-dimensional Wiener process

A single realization of a three-dimensional Wiener process

In mathematics, the Wiener process is a continuous-time stochastic process named in honor of


Norbert Wiener. It is often called Brownian motion, after Robert Brown. It is one of the best
known Lévy processes (càdlàg stochastic processes with stationaryindependent increments) and
occurs frequently in pure and applied mathematics, economics and physics.
The Wiener process plays an important role both in pure and applied mathematics. In pure
mathematics, the Wiener process gave rise to the study of continuous time martingales. It is a
key process in terms of which more complicated stochastic processes can be described. As such,
it plays a vital role in stochastic calculus, diffusion processes and even potential theory. It is the
driving process of Schramm-Loewner evolution. In applied mathematics, the Wiener process is
used to represent the integral of a Gaussianwhite noise process, and so is useful as a model of
noise in electronics engineering, instruments errors in filtering theory and unknown forces in
control theory.
The Wiener process has applications throughout the mathematical sciences. In physics it is used
to study Brownian motion, the diffusion of minute particles suspended in fluid, and other types
of diffusion via the Fokker-Planck and Langevin equations. It also forms the basis for the
rigorous path integral formulation of quantum mechanics (by the Feynman-Kac formula, a
solution to the Schrödinger equation can be represented in terms of the Wiener process) and the
study of eternal inflation in physical cosmology. It is also prominent in the mathematical theory
of finance, in particular the Black–Scholes option pricing model.
[edit]Characterizations of the Wiener process
The Wiener process Wt is characterized by three facts:[1]
1. W0 = 0
2. Wt is almost surely continuous

3. Wt has independent increments with distribution


(for 0 ≤ s<t).
N(μ, σ2) denotes the normal distribution with expected valueμ and varianceσ2. The condition that
it has independent increments means that if 0 ≤ s1 ≤ t1 ≤ s2 ≤ t2 then Wt1 − Ws1 and Wt2 − Ws2 are
independent random variables, and the similar condition holds for n increments.
An alternative characterization of the Wiener process is the so-called Lévy characterization that
says that the Wiener process is an almost surely continuous martingale with W0 = 0 and quadratic
variation [Wt, Wt] = t (which means that Wt2-t is also a martingale).
A third characterization is that the Wiener process has a spectral representation as a sine series
whose coefficients are independent N(0,1) random variables. This representation can be obtained
using the Karhunen-Loève theorem.
The Wiener process can be constructed as the scaling limit of a random walk, or other discrete-
time stochastic processes with stationary independent increments. This is known as Donsker's
theorem. Like the random walk, the Wiener process is recurrent in one or two dimensions
(meaning that it returns almost surely to any fixed neighborhood of the origin infinitely often)
whereas it is not recurrent in dimensions three and higher. Unlike the random walk, it is scale
invariant, meaning that

is a Wiener process for any nonzero constant α. The Wiener measure is the probability law on
the space of continuous functionsg, with g(0) = 0, induced by the Wiener process. An integral
based on Wiener measure may be called a Wiener integral.
[edit]Properties of a one-dimensional Wiener process
[edit]Basic properties
The unconditional probability density function at a fixed time t:

The expectation is zero:


E(Wt) = 0.
The variance is t:

The covariance and correlation:

The results for the expectation and variance follow immediately from the definition that
increments have a normal distribution, centred at zero. Thus

The results for the covariance and correlation follow from the definition that non-overlapping
increments are independent, of which only the property that they are uncorrelated is used.
Suppose that t1<t2.

Substitute the simple identity :

Since W(t1) = W(t1) − W(t0) and W(t2) − W(t1), are independent,

Thus
[edit]Self-similarity
[edit]Brownian scaling

For every c>0 the process is another Wiener process.

A demonstration of Brownian scaling, showing for decreasing c.


Note that the average features of the function do not change while zooming in, and
note that it zooms in quadratically faster horizontally than vertically.

[edit]Time reversal

The process Vt = W1 − W1 − t for 0 ≤ t ≤ 1 is distributed like Wt for 0 ≤ t ≤ 1.


[edit]Time inversion

The process Vt = tW1 / t is another Wiener process.


[edit]A class of Brownian martingales
If a polynomialp(x,t) satisfies the PDE

then the stochastic process

is a martingale.

Example: is a martingale, which shows that the quadratic variation of W on [0,t] is


equal to t. It follows that the expected time of first exit of W from ( − c,c) is equal to c2.
More generally, for every polynomial p(x,t) the following stochastic process is a martingale:
where a is the polynomial

Example: p(x,t) = (x2 − t)2,a(x,t) = 4x2; the process is a

martingale, which shows that the quadratic variation of the martingale on [0,t] is equal

to
About functions p(x,t) more general than polynomials, see local martingales.
[edit]Some properties of sample paths
The set of all functions w with these properties is of full Wiener measure. That is, a path (sample
function) of the Wiener process has all these properties almost surely.
[edit]Qualitative properties
• For every ε>0, the function w takes both (strictly) positive and (strictly)
negative values on (0,ε).
• The function w is continuous everywhere but differentiable nowhere (like the
Weierstrass function).
• Points of local maximum of the function w are a dense countable set; the
maximum values are pairwise different; each local maximum is sharp in the
following sense: if w has a local maximum at t then

as s tends to t. The same holds for local


minima.
• The function w has no points of local increase, that is, no t>0 satisfies the
following for some ε in (0,t): first, w(s) ≤ w(t) for all s in (t-ε,t), and second,
w(s) ≥ w(t) for all s in (t,t+ε). (Local increase is a weaker condition than that
w is increasing on (t-ε,t+ε).) The same holds for local decrease.
• The function w is of unbounded variation on every interval.
• Zeros of the function w are a nowhere denseperfect set of Lebesgue measure
0 and Hausdorff dimension 1/2.

[edit]Quantitative properties
[edit]Law of the iterated logarithm
[edit]Modulus of continuity
Local modulus of continuity:

Global modulus of continuity (Lévy):

[edit]Local time
The image of the Lebesgue measure on [0, t] under the map w (the pushforward measure) has a
density Lt(·). Thus,

for a wide class of functions ƒ (namely: all continuous functions; all locally integrable functions;
all non-negative measurable functions). The density Lt is (more exactly, can and will be chosen
to be) continuous. The number Lt(x) is called the local time at x of w on [0, t]. It is strictly
positive for all x of the interval (a, b) where a and b are the least and the greatest value of w on
[0, t], respectively. (For x outside this interval the local time evidently vanishes.) Treated as a
function of two variables x and t, the local time is still continuous. Treated as a function of t
(while x is fixed), the local time is a singular function corresponding to a nonatomic measure on
the set of zeros of w.
These continuity properties are fairly non-trivial. Consider that the local time can also be defined
(as the density of the pushforward measure) for a smooth function. Then, however, the density is
discontinuous, unless the given function is monotone. In other words, there is a conflict between
good behavior of a function and good behavior of its local time. In this sense, the continuity of
the local time of the Wiener process is another manifestation of non-smoothness of the
trajectory.
[edit]Related processes
The generator of a Brownian motion is ½ times the Laplace-Beltrami operator. The
image above is of the Brownian motion on a special manifold: the surface of a
sphere.

The stochastic process defined by

is called a Wiener process with drift μ and infinitesimal variance σ2. These processes exhaust
continuous Lévy processes.
Two random processes on the time interval [0, 1] appear, roughly speaking, when conditioning
the Wiener process to vanish on both ends of [0,1]. With no further conditioning, the process
takes both positive and negative values on [0, 1] and is called Brownian bridge. Conditioned also
to stay positive on (0, 1), the process is called Brownian excursion. In both cases a rigorous

treatment involves a limiting procedure, since the formula


does not work when P(B) = 0.
A geometric Brownian motion can be written

It is a stochastic process which is used to model processes that can never take on negative values,
such as the value of stocks.
The stochastic process

is distributed like the Ornstein–Uhlenbeck process.


The time of hitting a single point x > 0 by the Wiener process is a random variable with the Lévy
distribution. The family of these random variables (indexed by all positive numbers x) is a left-
continuous modification of a Lévy process. The right-continuousmodification of this process is
given by times of first exit from closed intervals [0, x].
The local time Lt(0) treated as a random function of t is a random process distributed like the

process
The local time Lt(x) treated as a random function of x (while t is constant) is a random process
described by Ray-Knight theorems in terms of Bessel processes.
[edit]Brownian martingales
Let A be an event related to the Wiener process (more formally: a set, measurable with respect to
the Wiener measure, in the space of functions), and Xt the conditional probability of A given the
Wiener process on the time interval [0, t] (more formally: the Wiener measure of the set of
trajectories whose concatenation with the given partial trajectory on [0, t] belongs to A). Then the
process Xt is a continuous martingale. Its martingale property follows immediately from the
definitions, but its continuity is a very special fact – a special case of a general theorem stating
that all Brownian martingales are continuous. A Brownian martingale is, by definition, a
martingale adapted to the Brownian filtration; and the Brownian filtration is, by definition, the
filtration generated by the Wiener process.
[edit]Time change
Every continuous martingale (starting at the origin) is a time changed Wiener process.
Example. 2Wt = V(4t) where V is another Wiener process (different from W but distributed like
W).

Example. where and V is another Wiener process.


In general, if M is a continuous martingale then Mt − M0 = VA(t) where A(t) is the quadratic
variation of M on [0,t], and V is a Wiener process.
Corollary. (See also Doob's martingale convergence theorems) Let Mt be a continuous
martingale, and

Then only the following two cases are possible:

other cases (such as etc.) are of probability 0.

Especially, a nonnegative continuous martingale has a finite limit (as ) almost surely.
All stated (in this subsection) for martingales holds also for local martingales.
[edit]Change of measure
A wide class of continuous semimartingales (especially, of diffusion processes) is related to the
Wiener process via a combination of time change and change of measure.
Using this fact, the qualitative properties stated above for the Wiener process can be generalized
to a wide class of continuous semimartingales.
[edit]Complex-valued Wiener process
The complex-valued Wiener process may be defined as a complex-valued random process of the
form Zt = Xt + iYt where Xt,Yt are independent Wiener processes (real-valued).
[edit]Self-similarity
Brownian scaling, time reversal, time inversion: the same as in the real-valued case.
Rotation invariance: for every complex number c such that |c|=1 the process cZt is another
complex-valued Wiener process.
[edit]Time change

If f is an entire function then the process f(Zt) − f(0) is a time-changed complex-valued Wiener
process.

Example. where and U is


another complex-valued Wiener process.
In contrast to the real-valued case, a complex-valued martingale is generally not a time-changed
complex-valued Wiener process. For example, the martingale 2Xt + iYt is not (here Xt,Yt are
independent Wiener processes, as before).

Implied volatility
In financial mathematics, the implied volatility of an option contract is the volatility implied by
the market price of the option based on an option pricing model. In other words, it is the
volatility that, when used in a particular pricing model, yields a theoretical value for the option
equal to the current market price of that option. Non-option financial instruments that have
embedded optionality, such as an interest rate cap, can also have an implied volatility. Implied
volatility, a forward-looking measure, differs from historical volatility because the latter is
calculated from known past returns of a security.
[edit] Motivation
An ordinary option pricing model, such as Black-Scholes, uses a variety of inputs to derive a
theoretical value for an option. Inputs to pricing models vary depending on the type of option
being priced and the pricing model used. However, in general, the value of an option depends on
an estimate of the future realized volatility, , of the underlying. Or, mathematically:
where is the theoretical value of an option, and is a pricing model that depends on , along
with other inputs.

The function f is monotonically increasing in , meaning that a higher value for volatility results
in a higher theoretical value of the option. Conversely, by the inverse function theorem, there can

be at most one value for that, when applied as an input to , will result in a particular
value for .

Put in other terms, assume that there is some inverse function , such that

where is the market price for an option. The value is the volatility implied by the market

price , or the implied volatility.


[edit] Example

A standard call option contract, , on 100 shares of non-dividend-paying XYZ Corp. The
option is struck at $50 and expires in 32 days. The risk-free interest rate is 5%. XYZ stock is

currently trading at $51.25 and the current market price of is $2.00. Using a standard

Black-Scholes pricing model, the volatility implied by the market price is 18.7%, or:

To verify, we apply the implied volatility back into the pricing model, and we generate a
theoretical value of $2.0004:

which confirms our computation of the market implied volatility.


[edit] Solving the inverse pricing model function
In general, a pricing model function, , does not have a closed-form solution for its inverse,

. Instead, a root finding technique is used to solve the equation:


While there are many techniques for finding roots, two of the most commonly used are Newton's
method and Brent's method. Because options prices can move very quickly, it is often important
to use the most efficient method when calculating implied volatilities.
Newton's method provides rapid convergence, however it requires the first partial derivative of

the option's theoretical value with respect to volatility, i.e. , which is also known as vega (see
The Greeks). If the pricing model function yields a closed-form solution for vega, which is the
case for Black-Scholes model, then Newton's method can be more efficient. However, for most
practical pricing models, such as a binomial model, this is not the case and vega must be derived
numerically. When forced to solve vega numerically, it usually turns out that Brent's method is
more efficient as a root-finding technique.
[edit] Implied volatility as measure of relative value
Often, the implied volatility of an option is a more useful measure of the option's relative value
than its price. This is because the price of an option depends most directly on the price of its
underlying security. If an option is held as part of a delta neutral portfolio, that is, a portfolio that
is hedged against small moves in the underlier's price, then the next most important factor in
determining the value of the option will be its implied volatility.
Implied volatility is so important that options are often quoted in terms of volatility rather than
price, particularly between professional traders.
[edit] Example
A call option is trading at $1.50 with the underlying trading at $42.05. The implied volatility of
the option is determined to be 18.0%. A short time later, the option is trading at $2.10 with the
underlying at $43.34, yielding an implied volatility of 17.2%. Even though the option's price is
higher at the second measurement, it is still considered cheaper on a volatility basis. This is
because the underlying needed to hedge the call option can be sold for a higher price.
[edit] Implied volatility as a price
Another way to look at implied volatility is to think of it as a price, not as a measure of future
stock moves. In this view it simply is a more convenient way to communicate option prices than
currency. Prices are different in nature from statistical quantities: We can estimate volatility of
future underlying returns using any of a large number of estimation methods, however the
number we get is not a price. A price requires two counterparts, a buyer and a seller. Prices are
determined by supply and demand. Statistical estimates depend on the time-series and the
mathematical structure of the model used. It is a mistake to confuse a price, which implies a
transaction, with the result of a statistical estimation which is merely what comes out of a
calculation. Implied volatilities are prices: They have been derived from actual transactions. Seen
in this light, it should not be surprising that implied volatilities might not conform to what a
particular statistical model would predict.
[edit] Non-constant implied volatility
In general, options based on the same underlier but with different strike value and expiration
times will yield different implied volatilities. This is generally viewed as evidence that an
underlier's volatility is not constant, but, instead depends on factors such as the price level of the
underlier, the underlier's recent variance, and the passage of time. See stochastic volatility and
volatility smile for more information.
[edit] Volatility instruments
Volatility instruments are financial instruments that track the value of implied volatility of other
derivative securities. For instance, the CBOE Volatility Index (VIX) is calculated from a
weighted average of implied volatilities of various options on the S&P 500 Index. There are also
other commonly referenced volatility indices such as the VXN index (Nasdaq 100 index futures
volatility measure), the QQV (QQQQ volatility measure), IVX - Implied Volatility Index (an
expected stock volatility over a future period for any of US securities and exchange traded
instruments), as well as options and futures derivatives based directly on these volatility indices
themselves.

Random walk

Example of eight random walks in one dimension starting at 0. The plot shows the
current position on the line (vertical axis) versus the time steps (horizontal axis).
An animated example of three Brownian motion-like random walks on a torus,
starting at the centre of the image.

A random walk, sometimes denoted RW, is a mathematical formalisation of a trajectory that


consists of taking successive random steps. The results of random walk analysis have been
applied to computer science, physics, ecology, economics, and a number of other fields as a
fundamental model for random processes in time. For example, the path traced by a molecule as
it travels in a liquid or a gas, the search path of a foraging animal, the price of a fluctuating stock
and the financial status of a gambler can all be modeled as random walks. The term random walk
was first introduced by Karl Pearson in 1905[1].
Various different types of random walks are of interest. Often, random walks are assumed to be
Markov chains or Markov processes, but other, more complicated walks are also of interest.
Some random walks are on graphs, others on the line, in the plane, or in higher dimensions,
while some random walks are on groups. Random walks also vary with regard to the time
parameter. Often, the walk is in discrete time, and indexed by the natural numbers, as in

. However, some walks take their steps at random times, and in that case the

position Xt is defined for the continuum of times . Specific cases or limits of random
walks include the drunkard's walk and Lévy flight. Random walks are related to the diffusion
models and are a fundamental topic in discussions of Markov processes. Several properties of
random walks, including dispersal distributions, first-passage times and encounter rates, have
been extensively studied.
[edit]Lattice random walk
A popular random walk model is that of a random walk on a regular lattice, where at each step
the walk jumps to another site according to some probability distribution. In simple random
walk, the walk can only jump to neighbouring sites of the lattice. In symmetric simple random
walk on a locally finite lattice, the probabilities of walk jumping to any one of its neighbours are
the same. The most well-studied example is of random walk on the d-dimensional integer lattice

(sometimes called the hypercubic lattice) .


[edit]One-dimensional random walk

A particularly elementary and concrete random walk is the random walk on the integers , which
starts at S0 = 0 and at each step moves by ±1 with equal probability. To define this walk formally,

take independent random variables , each of which is 1 with probability 1/2 and −1

with probability 1/2, and set This sequence {Sn} is called the simple random
walk on .
This walk can be illustrated as follows. Say you flip a fair coin. If it lands on heads, you move
one to the right on the number line. If it lands on tails, you move one to the left. So after five
flips, you have the possibility of landing on 1, −1, 3, −3, 5, or −5. You can land on 1 by flipping
three heads and two tails in any order. There are 10 possible ways of landing on 1. Similarly,
there are 10 ways of landing on −1 (by flipping three tails and two heads), 5 ways of landing on
3 (by flipping four heads and one tail), 5 ways of landing on −3 (by flipping four tails and one
head), 1 way of landing on 5 (by flipping five heads), and 1 way of landing on −5 (by flipping
five tails). See the figure below for an illustration of this example.

Five flips of a fair coin

What can we say about the position Sn of the walk after n steps? Of course, it is random, so we
cannot calculate it. But we may say quite a bit about its distribution. It is not hard to see that the
expectation E(Sn) of Sn is zero. For example, this follows by the additivity property of

expectation: . A similar calculation, using the independence of the

random variables Zn, shows that . This hints that E | Sn | , the expected translation
distance after n steps, should be of the order of √n. In fact,

Suppose we draw a line some distance from the origin of the walk. How many times will the
random walk cross the line if permitted to continue walking forever? The following, perhaps
surprising theorem is the answer: simple random walk on will cross every point an infinite
number of times. This result has many names: the level-crossing phenomenon, recurrence or the
gambler's ruin. The reason for the last name is as follows: if you are a gambler with a finite
amount of money playing a fair game against a bank with an infinite amount of money, you will
surely lose. The amount of money you have will perform a random walk, and it will almost
surely, at some time, reach 0 and the game will be over.
If a and b are positive integers, then the expected number of steps until a one-dimensional simple
random walk starting at 0 first hits b or −a is ab. The probability that this walk will hit b before
-a steps is a/(a + b), which can be derived from the fact that simple random walk is a martingale.
Some of the results mentioned above can be derived from properties of Pascal's triangle. The
number of different walks of n steps where each step is +1 or −1 is clearly 2n. For the simple
random walk, each of these walks are equally likely. In order for Sn to be equal to a number k it is
necessary and sufficient that the number of +1 in the walk exceeds those of −1 by k. Thus, the
number of walks which satisfy Sn = k is precisely the number of ways of choosing (n + k)/2
elements from an n element set (for this to be non-zero, it is necessary that n + k be an even

number), which is an entry in Pascal's triangle denoted by . Therefore, the

probability that Sn = k is equal to . By representing entries of Robert's


triangle in terms of factorials and using Stirling's formula, one can obtain good estimates for
these probabilities for large values of n.
This relation with Pascal's triangle is easily demonstrated for small values of n. At zero turns, the
only possibility will be to remain at zero. However, at one turn, you can move either to the left or
the right of zero, meaning there is one chance of landing on −1 or one chance of landing on 1. At
two turns, you examine the turns from before. If you had been at 1, you could move to 2 or back
to zero. If you had been at −1, you could move to −2 or back to zero. So there is one chance of
landing on −2, two chances of landing on zero, and one chance of landing on 2.

- - - - -
n 0 1 2345
5 4 3 2 1

P[S0 =
1
k]

2P[S1 =
1 1
k]

22P[S2
1 2 1
= k]
23P[S3
1 3 3 1
= k]

24P[S4
1 4 6 4 1
= k]

25P[S5 1 1
1 5 5 1
= k] 0 0

The central limit theorem and the law of the iterated logarithm describe important aspects of the
behavior of simple random walk on .
[edit]Gaussian random walk
A random walk having a step size that varies according to a normal distribution is used as a
model for real-world time series data such as financial markets. The Black-Scholes formula for
modeling equity option prices, for example, uses a gaussian random walk as an underlying
assumption.
Here, the step size is the inverse normal Φ − 1(z,μ,σ) where 0 ≤ z ≤ 1 is a uniformly distributed
random number, and μ and σ are the mean and standard deviations of the normal distribution,
respectively.
The root mean squared expected translation distance after n steps is

[edit]Higher dimensions
Random walk in two dimensions.
Random walk in two dimensions with more, and smaller, steps. In the limit, for very
small steps, one obtains the Brownian motion.

Imagine now a drunkard walking randomly in a city. The city is realistically infinite and
arranged in a square grid, and at every intersection, the drunkard chooses one of the four possible
routes (including the one he came from) with equal probability. Formally, this is a random walk
on the set of all points in the plane with integercoordinates. Will the drunkard ever get back to
his home from the bar? It turns out that he will. This is the high dimensional equivalent of the
level crossing problem discussed above. The probability of returning to the origin decreases as
the number of dimensions increases. In three dimensions, the probability decreases to roughly
34%. A derivation, along with values of p(d) are discussed here:
http://mathworld.wolfram.com/PolyasRandomWalkConstants.html.
The trajectory of a random walk is the collection of sites it visited, considered as a set with
disregard to when the walk arrived at the point. In one dimension, the trajectory is simply all
points between the minimum height the walk achieved and the maximum (both are, on average,
on the order of √n). In higher dimensions the set has interesting geometric properties. In fact, one
gets a discrete fractal, that is a set which exhibits stochastic self-similarity on large scales, but on
small scales one can observe "jaggedness" resulting from the grid on which the walk is
performed. The two books of Lawler referenced below are a good source on this topic.

Three random walks in three dimensions.

[edit]Random walk on graphs


Assume now that our city is no longer a perfect square grid. When our drunkard reaches a certain
junction he picks between the various available roads with equal probability. Thus, if the
junction has seven exits the drunkard will go to each one with probability one seventh. This is a
random walk on a graph. Will our drunkard reach his home? It turns out that under rather mild
conditions, the answer is still yes. For example, if the lengths of all the blocks are between a and
b (where a and b are any two finite positive numbers), then the drunkard will, almost surely,
reach his home. Notice that we do not assume that the graph is planar, i.e. the city may contain
tunnels and bridges. One way to prove this result is using the connection to electrical networks.
Take a map of the city and place a one ohmresistor on every block. Now measure the "resistance
between a point and infinity". In other words, choose some number R and take all the points in
the electrical network with distance bigger than R from our point and wire them together. This is
now a finite electrical network and we may measure the resistance from our point to the wired
points. Take R to infinity. The limit is called the resistance between a point and infinity. It turns
out that the following is true (an elementary proof can be found in the book by Doyle and Snell):
Theorem: a graph is transient if and only if the resistance between a point and infinity is finite.
It is not important which point is chosen if the graph is connected.
In other words, in a transient system, one only needs to overcome a finite resistance to get to
infinity from any point. In a recurrent system, the resistance from any point to infinity is infinite.
This characterization of recurrence and transience is very useful, and specifically it allows us to
analyze the case of a city drawn in the plane with the distances bounded.
A random walk on a graph is a very special case of a Markov chain. Unlike a general Markov
chain, random walk on a graph enjoys a property called time symmetry or reversibility. Roughly
speaking, this property, also called the principle of detailed balance, means that the probabilities
to traverse a given path in one direction or in the other have a very simple connection between
them (if the graph is regular, they are just equal). This property has important consequences.
Starting in the 1980s, much research has gone into connecting properties of the graph to random
walks. In addition to the electrical network connection described above, there are important
connections to isoperimetric inequalities, see more here, functional inequalities such as Sobolev
and Poincaré inequalities and properties of solutions of Laplace's equation. A significant portion
of this research was focused on Cayley graphs of finitely generatedgroups. For example, the
proof of Dave Bayer and Persi Diaconis that 7 riffle shuffles are enough to mix a pack of cards
(see more details under shuffle) is in effect a result about random walk on the group Sn, and the
proof uses the group structure in an essential way. In many cases these discrete results carry over
to, or are derived from Manifolds and Lie groups.
A good reference for random walk on graphs is the online book by Aldous and Fill. For groups
see the book of Woess. If the graph itself is random, this topic is called "random walk in random
environment" — see the book of Hughes.
We can think about choosing every possible edge with the same probability as maximizing
uncertainty (entropy) locally. We could also do it globally – in maximal entropy random walk
(MERW) we want all paths to be equally probable, or in other words: for each two vertexes, each
path of given length is equally probable. This random walk has much stronger localization
properties.
[edit]Relation to Wiener process
Simulated steps approximating a Wiener process in two dimensions.

A Wiener process is a stochastic process with similar behaviour to Brownian motion, the
physical phenomenon of a minute particle diffusing in a fluid. (Sometimes the Wiener process is
called "Brownian motion", although this is strictly speaking a confusion of a model with the
phenomenon being modeled.)
A Wiener process is the scaling limit of random walk in dimension 1. This means that if you take
a random walk with very small steps you get an approximation to a Wiener process (and, less
accurately, to Brownian motion). To be more precise, if the step size is ε, one needs to take a
walk of length L/ε2 to approximate a Wiener process walk of length L. As the step size tends to 0
(and the number of steps increases proportionally) random walk converges to a Wiener process
in an appropriate sense. Formally, if B is the space of all paths of length L with the maximum
topology, and if M is the space of measure over B with the norm topology, then the convergence
is in the space M. Similarly, a Wiener process in several dimensions is the scaling limit of
random walk in the same number of dimensions.
A random walk is a discrete fractal, but a Wiener process trajectory is a true fractal, and there is
a connection between the two. For example, take a random walk until it hits a circle of radius r
times the step length. The average number of steps it performs is r2. This fact is the discrete
version of the fact that a Wiener process walk is a fractal of Hausdorff dimension 2 [2]. In two
dimensions, the average number of points the same random walk has on the boundary of its
trajectory is r4/3. This corresponds to the fact that the boundary of the trajectory of a Wiener
process is a fractal of dimension 4/3, a fact predicted by Mandelbrot using simulations but
proved only in 2000 (Science, 2000).
A Wiener process enjoys many symmetries random walk does not. For example, a Wiener
process walk is invariant to rotations, but random walk is not, since the underlying grid is not
(random walk is invariant to rotations by 90 degrees, but Wiener processes are invariant to
rotations by, for example, 17 degrees too). This means that in many cases, problems on random
walk are easier to solve by translating them to a Wiener process, solving the problem there, and
then translating back. On the other hand, some problems are easier to solve with random walks
due to its discrete nature.
Random walk and Wiener process can be coupled, namely manifested on the same probability
space in a dependent way that forces them to be quite close. The simplest such coupling is the
Skorokhod embedding, but other, more precise couplings exist as well.
The convergence of a random walk toward the Wiener process is controlled by the central limit
theorem. For a particle in a known fixed position at t = 0, the theorem tells us that after a large
number of independent steps in the random walk, the walker's position is distributed according to
a normal distribution of total variance:

where t is the time elapsed since the start of the random walk, is the size of a step of the random
walk, and δt is the time elapsed between two successive steps.
This corresponds to the Green function of the diffusion equation that controls the Wiener
process, which demonstrates that, after a large number of steps, the random walk converges
toward a Wiener process.
In 3D, the variance corresponding to the Green's function of the diffusion equation is:

By equalizing this quantity with the variance associated to the position of the random walker,
one obtains the equivalent diffusion coefficient to be considered for the asymptotic Wiener
process toward which the random walk converges after a large number of steps:

(valid only in 3D)

Remark: the two expressions of the variance above correspond to the distribution associated to

the vector that links the two ends of the random walk, in 3D. The variance associated to each
component Rx, Ry or Rz is only one third of this value (still in 3D).

[edit]Self-interacting random walks


There are a number of interesting models of random paths in which each step depends on the
past in a complicated manner. All are more difficult to analyze than the usual random walk —
some notoriously so. For example
• The self-avoiding walk. See the Madras and Slade book.
• The loop-erased random walk. See the two books of Lawler.
• The reinforced random walk. See the review by Robin Pemantle.
• The exploration process.

[edit]Applications
The following are the applications of random walk:
• In economics, the "random walk hypothesis" is used to model shares prices
and other factors. Empirical studies found some deviations from this
theoretical model, especially in short term and long term correlations. See
share prices.
• In population genetics, random walk describes the statistical properties of
genetic drift
• In physics, random walks are used as simplified models of physical Brownian
motion and the randommovement of molecules in liquids and gases. See for
example diffusion-limited aggregation. Also in physics, random walks and
some of the self interacting walks play a role in quantum field theory.
• In mathematical ecology, random walks are used to describe individual
animal movements, to empirically support processes of biodiffusion, and
occasionally to model population dynamics.
• In polymer physics, random walk describes an ideal chain. It is the simplest
model to study polymers.
• In other fields of mathematics, random walk is used to calculate solutions to
Laplace's equation, to estimate the harmonic measure, and for various
constructions in analysis and combinatorics.
• In computer science, random walks are used to estimate the size of the Web.
In the World Wide Web conference-2006, bar-yossef et al. published their
findings and algorithms for the same. (This was awarded the best paper for
the year 2006).
In all these cases, random walk is often substituted for Brownian motion.
• In brain research, random walks and reinforced random walks are used to
model cascades of neuron firing in the brain.
• In vision science, fixational eye movements are well described by a random
walk.
• In psychology, random walks explain accurately the relation between the
time needed to make a decision and the probability that a certain decision
will be made. (Nosofsky, 1997)
• Random walk can be used to sample from a state space which is unknown or
very large, for example to pick a random page off the internet or, for
research of working conditions, a random worker in a given country.
• When this last approach is used in computer science it is known as Markov
Chain Monte Carlo or MCMC for short. Often, sampling from some
complicated state space also allows one to get a probabilistic estimate of the
space's size. The estimate of the permanent of a large matrix of zeros and
ones was the first major problem tackled using this approach.
• In wireless networking, random walk is used to model node movement.
• Motile bacteria engage in a biased random walk.
• Random walk is used to model gambling.
• In physics, random walks underlying the method of Fermi estimation.
• During World War II a random walk was used to model the distance that an
escaped prisoner of war would travel in a given time.

[edit]Probabilistic interpretation
A one-dimensional random walk can also be looked at as a Markov chain whose state space is

given by the integers , for some number ,

. We can call it a random walk because we may think of it as being


a model for an individual walking on a straight line who at each point of time either takes one
step to the right with probability p or one step to the left with probability 1 − p.
A random walk is a simple stochastic process.
[edit]Properties of random walks
Where R is the average end-to-end distance, R2 is the average square of the end-to-end distance,
N is the length of the walk, and b is the step size.
[edit]Simple random walk

Dimens Transi
R R2
ion ent

Nb
1 0 2 No

Nb
2 0 2 No

3 0 ... Yes

[edit]Non-reversal random walk

Dimens
R R2
ion

2 2Nb2
?

(3/2)N
3
? b2

Variance swap
A variance swap is an over-the-counterfinancial derivative that allows one to speculate on or
hedgerisks associated with the magnitude of movement, i.e. volatility, of some underlying
product, like an exchange rate, interest rate, or stock index.
One leg of the swap will pay an amount based upon the realised variance of the price changes of
the underlying product. Conventionally, these price changes will be daily logreturns, based upon
the most commonly used closing price. The other leg of the swap will pay a fixed amount, which
is the strike, quoted at the deal's inception. Thus the net payoff to the counterparties will be the
difference between these two and will be settled in cash at the expiration of the deal, though
some cash payments will likely be made along the way by one or the other counterparty to
maintain agreed upon margin.
[edit] Structure and features
The features of a variance swap include:
• the variance strike
• the realised variance
• the vega notional: Like other swaps, the payoff is determined based on a notional
amount that is never exchanged. However, in the case of a variance swap, the notional
amount is specified in terms of vega, to convert the payoff into dollar terms.
The payoff of a variance swap is given as follows:

where:
• Nvar = variance notional (a.k.a. variance units),

• = annualised realised variance, and

• = variance strike.[1]
The annualised realised variance is calculated based on a prespecified set of sampling points over
the period. It does not always coincide with the classic statistical definition of variance as the
contract terms may not subtract the mean. For example, suppose that there are n+1 sample points
S0,S1,...,Sn. Define, for i=1 to n, Ri = ln(Si / Si-1), the natural log returns. Then


where A is an annualisation factor normally chosen to be approximately the number of sampling
points in a year (commonly 252). It can be seen that subtracting the mean return will decrease the
realised variance. If this is done, it is common to use n − 1 as the divisor rather than n,
corresponding to an unbiased estimate of the sample variance.
It is market practice to determine the number of contract units as follows:

where Nvol is the corresponding vega notional for a volatility swap.[1] This makes the payoff of a
variance swap comparable to that of a volatility swap, another less popular instrument used to
trade volatility.
[edit] Pricing and valuation
The variance swap may be hedged and hence priced using a portfolio of European call and put
options with weights inversely proportional to the square of strike[2][3].
Any volatility smile model which prices vanilla options can therefore be used to price the
variance swap. For example, using the Heston model, a closed-form solution can be derived for
the fair variance swap rate. Care must be taken with the behaviour of the smile model in the
wings as this can have a disproportionate effect on the price.
We can derive the payoff of a variance swap using Ito's Lemma. We first assume that the
underlying stock is described as follows:

Applying Ito's formula, we get:

Taking integrals, the total variance is:

We can see that the total variance consists of a rebalanced hedge of and short a log contract.
A short log contract position is equal to being short a futures contract and a collection of puts and
calls:

Taking integrals and setting the value of the variance swap equal to zero, we can rearrange the
formula to solve for the fair variance swap strike:

Where:
S0 is the initial price of the underlying security
S* is the at the money price
K is the strike of the each option in the collection of options used
[edit] Uses
Many[who?] find variance swaps interesting or useful for their purity. An alternative way of
speculating on volatility is with an option, but if one only has interest in volatility risk, this
strategy will require constant delta hedging, so that direction risk of the underlying security is
approximately removed. What is more, a replicating portfolio of a variance swap would require
an entire strip of options, which would be very costly to execute. Finally, one might often find
the need to be regularly rolling this entire strip of options so that it remains centered around the
current price of the underlying security.
The advantage of variance swaps is that they provide pure exposure to the volatility of the
underlying price, as opposed to call and put options which may carry directional risk (delta). The
profit and loss from a variance swap depends directly on the difference between realized and
implied volatility.[4]
Another aspect that some speculators may find interesting is that the quoted strike is determined
by the implied volatility smile in the options market, whereas the ultimate payout will be based
upon actual realized variance. Historically, implied variance has been above realized variance[5],
a phenomenon known as the Variance risk premium, creating an opportunity for volatility
arbitrage, in this case known as the rolling short variance trade. For the same reason, these swaps
can be used to hedge Options on Realized Variance.

Volatility arbitrage
In finance, volatilityarbitrage (or vol arb) is a type of statistical arbitrage that is implemented by
trading a delta neutral portfolio of an option and its underlier. The objective is to take advantage
of differences between the implied volatility of the option, and a forecast of future realized
volatility of the option's underlier. In volatility arbitrage, volatility is used as the unit of relative
measure rather than price - that is, traders attempt to buy volatility when it is low and sell
volatility when it is high.[1][2]
[edit] Overview
To an option trader engaging in volatility arbitrage, an option contract is a way to speculate in
the volatility of the underlying rather than a directional bet on the underlier's price. If a trader
buys options as part of a delta-neutral portfolio, he is said to be long volatility. If he sells
options, he is said to be short volatility. So long as the trading is done delta-neutral, buying an
option is a bet that the underlier's future realized volatility will be high, while selling an option is
a bet that future realized volatility will be low. Because of put call parity, it doesn't matter if the
options traded are calls or puts. This is true because put-call parity posits a risk neutral
equivalence relationship between a call, a put and some amount of the underlier. Therefore,
being long a delta neutral call results in the same returns as being long a delta neutral put.
[edit] Forecast volatility
To engage in volatility arbitrage, a trader must first forecast the underlier's future realized
volatility. This is typically done by computing the historical daily returns for the underlier for a
given past sample such as 252 days, the number of trading days in a year. The trader may also
use other factors, such as whether the period was unusually volatile, or if there are going to be
unusual events in the near future, to adjust his forecast. For instance, if the current 252-day
volatility for the returns on a stock is computed to be 15%, but it is known that an important
patent dispute will likely be settled in the next year, the trader may decide that the appropriate
forecast volatility for the stock is 18%.
[edit] Market (Implied) Volatility
As described in option valuation techniques, there are a number of factors that are used to
determine the theoretical value of an option. However, in practice, the only two inputs to the
model that change during the day are the price of the underlier and the volatility. Therefore, the
theoretical price of an option can be expressed as:

where is the price of the underlier, and is the estimate of future volatility. Because the

theoretical price function is a monotonically increasing function of , there must be a

corresponding monotonically increasing function that expresses the volatility implied by the

option's market price , or

Or, in other words, when all other inputs including the stock price are held constant, there

exists no more than one implied volatility for each market price for the option.
Because implied volatility of an option can remain constant even as the underlier's value changes,
traders use it as a measure of relative value rather than the option's market price. For instance, if
a trader can buy an option whose implied volatility is 10%, it's common to say that the trader
can "buy the option for 10%". Conversely, if the trader can sell an option whose implied
volatility is 20%, it is said the trader can "sell the option at 20%".
For example, assume a call option is trading at $1.90 with the underlier's price at $45.50,
yielding an implied volatility of 17.5%. A short time later, the same option might trade at $2.50
with the underlier's price at $46.36, yielding an implied volatility of 16.8%. Even though the
option's price is higher at the second measurement, the option is still considered cheaper because
the implied volatility is lower. The reason this is true is because the trader can sell stock needed
to hedge the long call at a higher price.
[edit] Mechanism
Armed with a forecast volatility, and capable of measuring an option's market price in terms of
implied volatility, the trader is ready to begin a volatility arbitrage trade. A trader looks for
options where the implied volatility, is either significantly lower than or higher than the
forecast realized volatility , for the underlier. In the first case, the trader buys the option and
hedges with the underlier to make a delta neutral portfolio. In the second case, the trader sells the
option and then hedges them.
Over the holding period, the trader will realize a profit on the trade if the underlier's realized
volatility is closer to his forecast than it is to the market's forecast (i.e. the implied volatility).
The profit is extracted from the trade through the continual re-hedging required to keep the
portfolio delta neutral.
[edit]
Black–Scholes
From Wikipedia, the free encyclopedia

The Black–Scholes model is a mathematical description of financial markets and derivative


investment instruments. The model develops partial differential equations whose solution, the
Black–Scholes formula, is widely used in the pricing of European-styleoptions.
The model was first articulated by Fischer Black and Myron Scholes in their 1973 paper, "The
Pricing of Options and Corporate Liabilities." The foundation for their research relied on work
developed by scholars such as Jack L. Treynor, Paul Samuelson, A. James Boness, Sheen T.
Kassouf, and Edward O. Thorp. The fundamental insight of Black–Scholes is that the option is
implicitly priced if the stock is traded. Robert C. Merton was the first to publish a paper
expanding the mathematical understanding of the options pricing model and coined the term
Black–Scholes options pricing model.
Merton and Scholes received the 1997 Nobel Prize in Economics (The Sveriges Riksbank Prize
in Economic Sciences in Memory of Alfred Nobel) for their work. Though ineligible for the prize
because of his death in 1995, Black was mentioned as a contributor by the Swedish academy.[1]
The uniqueness and originality of the model developed by Black, Scholes, and Merton is
disputed today, because as early as 1908, the Italian mathematician Vinzenz Bronzin developed a
largely identical model[2].
[edit]Model assumptions
The Black–Scholes model of the market for a particular equity makes the following explicit
assumptions:
• It is possible to borrow and lend cash at a known constant risk-free interest
rate. This restriction has been removed in later extensions of the model.
• The price follows a Geometric Brownian motion with constant drift and
volatility. This often implies the validity of the efficient-market hypothesis.
• There are no transaction costs or taxes.
• Returns from the security follow a Log-normal distribution.
• The stock does not pay a dividend (see below for extensions to handle
dividend payments).
• All securities are perfectly divisible (i.e. it is possible to buy any fraction of a
share).
• There are no restrictions on short selling.
• There is no arbitrage opportunity
• Options use the European exercise terms, which dictate that options may
only be exercised on the day of expiration.
From these conditions in the market for an equity (and for an option on the equity), the authors
show that "it is possible to create a hedged position, consisting of a long position in the stock and
a short position in [calls on the same stock], whose value will not depend on the price of the
stock."[3]
Several of these assumptions of the original model have been removed in subsequent extensions
of the model. Modern versions account for changing interest rates (Merton, 1976), transaction
costs and taxes (Ingerson, 1976), and dividend payout (Merton, 1973).
[edit]Notation
Define
S, the price of the stock (please note as below).

V(S,t), the price of a derivative as a function of time and stock price.

C(S,t) the price of a European call and P(S,t) the price of a European put
option.

K, the strike of the option.

r, the annualized risk-free interest rate, continuously compounded.

μ, the drift rate of S, annualized.

σ, the volatility of the stock; this is the square root of the quadratic variation
of the stock's log price process.

t a time in years; we generally use now = 0, expiry = T.

Π, the value of a portfolio.

R, the accumulated profit or loss following a delta-hedging trading strategy.


N(x) denotes the standard normalcumulative distribution function,

N'(x) denotes the standard normal probability density function, .

[edit]Mathematical model
Simulated Geometric Brownian Motions with Parameters from Market Data

As per the model assumptions above, we assume that the underlying asset (typically the stock)
follows a geometric Brownian motion. That is,

where W is Brownian—the dW term here stands in for any and all sources of uncertainty in the
price history of a stock.
The payoff of an option V(S,T) at maturity is known. To find its value at an earlier time we
need to know how V evolves as a function of S and T. By Itō's lemma for two variables we have

Now consider a trading strategy under which one holds a single option and continuously trades

in the stock in order to hold shares. At time t, the value of these holdings will be

The composition of this portfolio, called the delta-hedge portfolio, will vary from time-step to
time-step. Let R denote the accumulated profit or loss from following this strategy. Then over the
time period [t, t + dt], the instantaneous profit or loss is

By substituting in the equations above we get


This equation contains no dW term. That is, it is entirely riskless (delta neutral). Black and
Scholes reason that under their ideal conditions, the rate of return on this portfolio must be equal
at all times to the rate of return on any other riskless instrument; otherwise, there would be
opportunities for arbitrage. Now assuming the risk-free rate of return is r we must have over the
time period [t, t + dt]

If we now substitute in for Π and identify left and right hand side of the equation we obtain the
Black–Scholes partial differential equation (PDE):

With the assumptions of the Black–Scholes model, this partial differential equation holds
whenever V is twice differentiable with respect to S and once with respect to t.
[edit]Other derivations
See also: Martingale pricing

Above we used the method of arbitrage-free pricing ("delta-hedging") to derive some PDE
governing option prices given the Black–Scholes model. It is also possible to use a risk-
neutrality argument. This latter method gives the price as the expectation of the option payoff
under a particular probability measure, called the risk-neutral measure, which differs from the
real world measure.
[edit]Black–Scholes formula

Black–Scholes European Call Option Pricing Surface


The Black Scholes formula calculates the price of Europeanput and call options. It can be
obtained by solving the Black–Scholes partial differential equation.
The value of a call option in terms of the Black–Scholes parameters is:

The price of a put option is:

For both, as above:


• N(•) is the cumulative distribution function of the standard normal
distribution
• T - t is the time to maturity
• S is the spot price of the underlying asset
• K is the strike price
• r is the risk free rate (annual rate, expressed in terms of continuous
compounding)
• σ is the volatility in the log-returns of the underlying
[edit]Interpretation
N(d1) and N(d2) are the probabilities of the option expiring in-the-money under the equivalent
exponential martingale probability measure (numéraire = stock) and the equivalent martingale
probability measure (numéraire = risk free asset), respectively. The equivalent martingale
probability measure is also called the risk-neutral probability measure. Note that both of these
are probabilities in a measure theoretic sense, and neither of these is the true probability of
expiring in-the-money under the real probability measure.
[edit]Derivation
We now show how to get from the general Black–Scholes PDE to a specific valuation for an
option. Consider as an example the Black–Scholes price of a call option, for which the PDE
above has boundary conditions
The last condition gives the value of the option at the time that the option matures. The solution

of the PDE gives the value of the option at any earlier time, . In order to
solve the PDE we transform the equation into a diffusion equation which may be solved using
standard methods. To this end we introduce the change-of-variable transformation

Then the Black–Scholes PDE becomes a diffusion equation

The terminal condition C(S,T) = max(S − K,0) now becomes an initial condition

Using the standard method for solving a diffusion equation we have

After some algebra we obtain

where

and
Substituting for u, x, and τ, we obtain the value of a call option in terms of the Black–Scholes
parameters:

where

The price of a put option may be computed from this by put-call parity and simplifies to

[edit]Greeks
The Greeks under Black–Scholes are given in closed form, below:

Wha
Calls Puts
t

delt
a

gam
ma

vega

thet
a
rho

Note that the gamma and vega formulas are the same for calls and puts. This can be seen directly
from put-call parity.
In practice, some sensitivities are usually quoted in scaled-down terms, to match the scale of
likely changes in the parameters. For example, rho is often reported divided by 10,000 (1bp rate
change), vega by 100 (1 vol point change), and theta by 365 or 252 (1 day decay based on either
calendar days or trading days per year).
[edit]Extensions of the model
The above model can be extended for variable (but deterministic) rates and volatilities. The
model may also be used to value European options on instruments paying dividends. In this case,
closed-form solutions are available if the dividend is a known proportion of the stock price.
American options and options on stocks paying a known cash dividend (in the short term, more
realistic than a proportional dividend) are more difficult to value, and a choice of solution
techniques is available (for example lattices and grids).
[edit]Instruments paying continuous yield dividends
For options on indexes, it is reasonable to make the simplifying assumption that dividends are
paid continuously, and that the dividend amount is proportional to the level of the index.
The dividend payment paid over the time period [t, t + dt] is then modelled as

for some constant q (the dividend yield).


Under this formulation the arbitrage-free price implied by the Black–Scholes model can be
shown to be

where now

is the modified forward price that occurs in the terms d1 and d2:
Exactly the same formula is used to price options on foreign exchange rates, except that now q
plays the role of the foreign risk-free interest rate and S is the spot exchange rate. This is the
Garman-Kohlhagen model (1983).
[edit]Instruments paying discrete proportional dividends
It is also possible to extend the Black–Scholes framework to options on instruments paying
discrete proportional dividends. This is useful when the option is struck on a single stock.
A typical model is to assume that a proportion δ of the stock price is paid out at pre-determined
times t1, t2, .... The price of the stock is then modelled as

where n(t) is the number of dividends that have been paid by time t.
The price of a call option on such a stock is again

where now

is the forward price for the dividend paying stock.


[edit]Black–Scholes in practice
The normality assumption of the Black–Scholes model does not capture extreme
movements such as stock market crashes.

The Black–Scholes model disagrees with reality in a number of ways, some significant. It is
widely employed as a useful approximation, but proper application requires understanding its
limitations – blindly following the model exposes the user to unexpected risk.
Among the most significant limitations are:
• the underestimation of extreme moves, yielding tail risk, which can be
hedged with out-of-the-money options;
• the assumption of instant, cost-less trading, yielding liquidity risk, which is
difficult to hedge;
• the assumption of a stationary process, yielding volatility risk, which can be
hedged with volatility hedging;
• the assumption of continuous time and continuous trading, yielding gap risk,
which can be hedged with Gamma hedging.
In short, while in the Black–Scholes model one can perfectly hedge options by simply Delta
hedging, in practice there are many other sources of risk.
Results using the Black–Scholes model differ from real world prices due to simplifying
assumptions of the model. One significant limitation is that in reality security prices do not
follow a strict stationary log-normal process, nor is the risk-free interest actually known (and is
not constant over time). The variance has been observed to be non-constant leading to models
such as GARCH to model volatility changes. Pricing discrepancies between empirical and the
Black–Scholes model have long been observed in options that are far out-of-the-money,
corresponding to extreme price changes; such events would be very rare if returns were
lognormally distributed, but are observed much more often in practice.
Nevertheless, Black–Scholes pricing is widely used in practice,[4] for it is easy to calculate and
explicitly models the relationship of all the variables. It is a useful approximation, particularly
when analyzing the directionality that prices move when crossing critical points. It is used both
as a quoting convention and a basis for more refined models. Although volatility is not constant,
results from the model are often useful in practice and helpful in setting up hedges in the correct
proportions to minimize risk. Even when the results are not completely accurate, they serve as a
first approximation to which adjustments can be made.
One reason for the popularity of the Black–Scholes model is that it is robust in that it can be
adjusted to deal with some of its failures. Rather than considering some parameters (such as
volatility or interest rates) as constant, one considers them as variables, and thus added sources
of risk. This is reflected in the Greeks (the change in option value for a change in these
parameters, or equivalently the partial derivatives with respect to these variables), and hedging
these Greeks mitigates the risk caused by the non-constant nature of these parameters. Other
defects cannot be mitigated by modifying the model, however, notably tail risk and liquidity risk,
and these are instead managed outside the model, chiefly by minimizing these risks and by stress
testing.
Additionally, rather than assuming a volatility a priori and computing prices from it, one can use
the model to solve for volatility, which gives the implied volatility of an option at given prices,
durations and exercise prices. Solving for volatility over a given set of durations and strike prices
one can construct an implied volatility surface. In this application of the Black–Scholes model, a
coordinate transformation from the price domain to the volatility domain is obtained. Rather than
quoting option prices in terms of dollars per unit (which are hard to compare across strikes and
tenors), option prices can thus be quoted in terms of implied volatility, which leads to trading of
volatility in option markets.
[edit]The volatility smile
Main article: Volatility smile

One of the attractive features of the Black–Scholes model is that the parameters in the model
(other than the volatility) — the time to maturity, the strike, and the current underlying price —
are unequivocally observable. All other things being equal, an option's theoretical value is a
monotonic increasing function of implied volatility. By computing the implied volatility for
traded options with different strikes and maturities, the Black–Scholes model can be tested. If the
Black–Scholes model held, then the implied volatility for a particular stock would be the same
for all strikes and maturities. In practice, the volatility surface (the three-dimensional graph of
implied volatility against strike and maturity) is not flat. The typical shape of the implied
volatility curve for a given maturity depends on the underlying instrument. Equities tend to have
skewed curves: compared to at-the-money, implied volatility is substantially higher for low
strikes, and slightly lower for high strikes. Currencies tend to have more symmetrical curves,
with implied volatility lowest at-the-money, and higher volatilities in both wings. Commodities
often have the reverse behaviour to equities, with higher implied volatility for higher strikes.
Despite the existence of the volatility smile (and the violation of all the other assumptions of the
Black–Scholes model), the Black–Scholes PDE and Black–Scholes formula are still used
extensively in practice. A typical approach is to regard the volatility surface as a fact about the
market, and use an implied volatility from it in a Black–Scholes valuation model. This has been
described as using "the wrong number in the wrong formula to get the right price."[5] This
approach also gives usable values for the hedge ratios (the Greeks).
Even when more advanced models are used, traders prefer to think in terms of volatility as it
allows them to evaluate and compare options of different maturities, strikes, and so on.
[edit]Valuing bond options
Black–Scholes cannot be applied directly to bond securities because of pull-to-par. As the bond
reaches its maturity date, all of the prices involved with the bond become known, thereby
decreasing its volatility, and the simple Black–Scholes model does not reflect this process. A
large number of extensions to Black–Scholes, beginning with the Black model, have been used
to deal with this phenomenon.
[edit]Interest rate curve
In practice, interest rates are not constant - they vary by tenor, giving an interest rate curve which
may be interpolated to pick an appropriate rate to use in the Black–Scholes formula. Another
consideration is that interest rates vary over time. This volatility may make a significant
contribution to the price, especially of long-dated options.
[edit]Short stock rate
It is not free to take a short stock position. Similarly, it may be possible to lend out a long stock
position for a small fee. In either case, this can be treated as a continuous dividend for the
purposes of a Black–Scholes valuation.
[edit]Alternative formula derivation
Let S0 be the current price of the underlying stock and S the price when the option matures at
time T. Then S0 is known, but S is a random variable. Assume that

is a normal random variable with meanuT and varianceσ2T. It follows that the mean of S is

for some constant q (independent of T). Now a simple no-arbitrage argument shows that the
theoretical future value of a derivative paying one share of the stock at time T, and so with
payoff S, is

where r is the risk-free interest rate. This suggests making the identification q = r for the purpose
of pricing derivatives. Define the theoretical value of a derivative as the present value of the
expected payoff in this sense. For a call option with exercise price K this discounted expectation
(using risk-neutral probabilities) is

The derivation of the formula for C is facilitated by the following lemma: Let Z be a standard
normal random variable and let b be an extended real number. Define

If a is a positive real number, then

where N is the standard normal cumulative distribution function. In the special case b = −∞, we
have

Now let
and use the corollary to the lemma to verify the statement above about the mean of S. Define

and observe that

for some b. Define

and observe that

[citation needed]

The rest of the calculation is straightforward.


Although the "elementary" derivation leads to the correct result, it is incomplete as it cannot
explain, why the formula refers to the risk-free interest rate while a higher rate of return is
expected from risky investments. This limitation can be overcome using the risk-neutral
probability measure, but the concept of risk-neutrality and the related theory is far from
elementary. In elementary terms, the value of the option today is not the expectation of the value
of the option at expiry, discounted with the risk-free rate. (So the basic capital asset pricing
model (CAPM) results are not violated.) The value is instead computed using the expectation
under another distribution of probability, of the value of the option at expiry, discounted with
the risk-free rate. This other distribution of probability is called the "risk neutral" probability.
[citation needed]

[edit]Remarks on notation
The reader is warned of the inconsistent notation that appears in this article. Thus the
letter S is used as:
(1) a constant denoting the current price of the stock
(2) a real variable denoting the price at an arbitrary time

(3) a random variable denoting the price at maturity

(4) a stochastic process denoting the price at an arbitrary time

It is also used in the meaning of (4) with a subscript denoting time, but here the subscript is
merely a mnemonic.
In the partial derivatives, the letters in the numerators and denominators are, of course, real
variables, and the partial derivatives themselves are, initially, real functions of real variables. But
after the substitution of a stochastic process for one of the arguments they become stochastic
processes.
The Black–Scholes PDE is, initially, a statement about the stochastic process S, but when S is
reinterpreted as a real variable, it becomes an ordinary PDE. It is only then that we can ask about
its solution.
The parameter u that appears in the discrete-dividend model and the elementary derivation is not
the same as the parameter μ that appears elsewhere in the article. For the relationship between
them see Geometric Brownian motion.

Poisson process
From Wikipedia, the free encyclopedia
(Redirected from Poisson Process)
Jump to: navigation, search
A Poisson process, named after the French mathematician Siméon-Denis Poisson (1781–1840),
is a stochastic process in which events occur continuously and independently of one another (the
word event used here is not an instance of the concept of event frequently used in probability
theory). Examples that are well-modeled as Poisson processes include the radioactive decay of
atoms, telephone calls arriving at a switchboard, page view requests to a website, and rainfall.
The Poisson process is a collection {N(t) : t ≥ 0} of random variables, where N(t) is the number
of events that have occurred up to time t (starting from time 0). The number of events between
time a and time b is given as N(b) − N(a) and has a Poisson distribution. Each realization of the
process {N(t)} is a non-negative integer-valued step function that is non-decreasing, but for
intuitive purposes it is usually easier to think of it as a point pattern on [0,∞) (the points in time
where the step function jumps, i.e. the points in time where an event occurs).
The Poisson process is a continuous-time process: its discrete-time counterpart is the Bernoulli
process. Poisson processes are also examples of continuous-time Markov processes. A Poisson
process is a pure-birth process, the simplest example of a birth-death process. By the
aforementioned interpretation as a random point pattern on [0, ∞) it is also a point process on the
real half-line.
[edit] Definition
The basic form of Poisson process, often referred to simply as "the Poisson process", is a
continuous-time counting process {N(t), t ≥ 0} that possesses the following properties:
• N(0) = 0
• Independent increments (the numbers of occurrences counted in disjoint intervals are
independent from each other)
• Stationary increments (the probability distribution of the number of occurrences counted
in any time interval only depends on the length of the interval)
• No counted occurrences are simultaneous.
Consequences of this definition include:
• The probability distribution of N(t) is a Poisson distribution.
• The probability distribution of the waiting time until the next occurrence is an
exponential distribution.
Other types of Poisson process are described below.
[edit] Types
[edit] Homogeneous

Sample Poisson process N(t)


The homogeneous Poisson process is one of the most well-known Lévy processes. This process
is characterized by a rate parameter λ, also known as intensity, such that the number of events in
time interval (t, t + τ] follows a Poisson distribution with associated parameter λτ. This relation is
given as

where N(t + τ) − N(t) is the number of events in time interval (t, t + τ].
Just as a Poisson random variable is characterized by its scalar parameter λ, a homogeneous
Poisson process is characterized by its rate parameter λ, which is the expected number of
"events" or "arrivals" that occur per unit time.
N(t) is a sample homogeneous Poisson process, not to be confused with a density or distribution
function.
[edit] Non-homogeneous
Main article: Non-homogeneous Poisson process
In general, the rate parameter may change over time; such a process is called a non-
homogeneous Poisson process or inhomogeneous Poisson process. In this case, the
generalized rate function is given as λ(t). Now the expected number of events between time a
and time b is

Thus, the number of arrivals in the time interval (a, b], given as N(b) − N(a), follows a Poisson
distribution with associated parameter λa,b

A homogeneous Poisson process may be viewed as a special case when λ(t) = λ, a constant rate.
[edit] Spatial
A further variation on the Poisson process, called the spatial Poisson process, introduces a spatial

dependence on the rate function and is given as where for some vector spaceV
(e.g. R2 or R3). For any set (e.g. a spatial region) with finite measure, the number of
events occurring inside this region can be modelled as a Poisson process with associated rate
function λS(t) such that

In the special case that this generalized rate function is a separable function of time and space,
we have:

for some function . Without loss of generality, let

(If this is not the case, λ(t) can be scaled appropriately.) Now, represents the spatial
probability density function of these random events in the following sense. The act of sampling
this spatial Poisson process is equivalent to sampling a Poisson process with rate function λ(t),

and associating with each event a random vector sampled from the probability density

function . A similar result can be shown for the general (non-separable) case.
[edit] Properties
In its most general form, the only two conditions for a counting process to be a Poisson process
are:
• Orderliness: which roughly means

which implies that arrivals don't occur simultaneously (but this is actually a
mathematically stronger statement).
• Memorylessness (also called evolution without after-effects): the number of arrivals
occurring in any bounded interval of time after time t is independent of the number of
arrivals occurring before time t.
These seemingly unrestrictive conditions actually impose a great deal of structure in the Poisson
process. In particular, they imply that the time between consecutive events (called interarrival
times) are independent random variables. For the homogeneous Poisson process, these inter-
arrival times are exponentially distributed with parameter λ (mean 1/λ).
Proof : Let τ1 be the first arrival time of the Poisson process. Its distribution satisfies

Also, the memorylessness property entails that the number of events in any time interval is
independent of the number of events in any other interval that is disjoint from it. This latter
property is known as the independent increments property of the Poisson process.
To illustrate the exponentially-distributed inter-arrival times property, consider a homogeneous
Poisson process N(t) with rate parameter λ, and let Tk be the time of the kth arrival, for k = 1, 2, 3,
... . Clearly the number of arrivals before some fixed time t is less than kif and only if the waiting
time until the kth arrival is more than t. In symbols, the event [N(t) < k] occurs if and only if the
event [Tk > t] occurs. Consequently the probabilities of these events are the same:

In particular, consider the waiting time until the first arrival. Clearly that time is more than tif
and only if the number of arrivals before time t is 0. Combining this latter property with the
above probability distribution for the number of homogeneous Poisson process events in a fixed
interval gives

Consequently, the waiting time until the first arrival T1 has an exponential distribution, and is
thus memoryless. One can similarly show that the other interarrival times Tk − Tk−1 share the
same distribution. Hence, they are independent, identically-distributed (i.i.d.) random variables
with parameter λ > 0; and expected value 1/λ. For example, if the average rate of arrivals is 5 per
minute, then the average waiting time between arrivals is 1/5 minute.
[edit] Examples
The following examples are well-modeled by the Poisson process:
• The arrival of "customers" in a queue.
• The number of raindrops falling over an area.
• The number of photons hitting a photodetector.
• The number of telephone calls arriving at a switchboard, or at an automatic phone-
switching system.
• The number of particles emitted via radioactive decay by an unstable substance, where
the rate decays as the substance stabilizes.
• The long-term behavior of the number of web page requests arriving at a server, except
for unusual circumstances such as coordinated denial of service attacks or flash crowds.
Such a model assumes homogeneity as well as weak stationarity.

Autoregressive conditional heteroskedasticity


In econometrics, a model featuring autoregressive conditional heteroskedasticity considers the
variance of the current error term or innovation to be a function of the actual sizes of the
previous time periods' error terms: often the variance is related to the squares of the previous
innovations. Such models are often called ARCH models (Engle, 1982), although a variety of
other acronyms is applied to particular structures of model which have a similar basis. ARCH
models are employed commonly in modeling financialtime series that exhibit time-varying
volatility clustering, i.e. periods of swings followed by periods of relative calm.
[edit] ARCH(q) model Specification
Specifically, let denote the error terms (return residuals, w.r.t. a mean process) and assume that

, where and where the series are modeled by

and where and .


An ARCH(q) model can be estimated using ordinary least squares. A methodology to test for the
lag length of ARCH errors using the Lagrange multiplier test was proposed by Engle (1982).
This procedure is as follows:
1. Estimate the best fitting AR(q) model

.
2. Obtain the squares of the error and regress them on a constant and q lagged values:

where q is the length of ARCH lags.


3. The null hypothesis is that, in the absence of ARCH components, we have αi = 0 for all
. The alternative hypothesis is that, in the presence of ARCH
components, at least one of the estimated αi coefficients must be significant. In a sample
of T residuals under the null hypothesis of no ARCH errors, the test statistic TR² follows
χ2 distribution with q degrees of freedom. If TR² is greater than the Chi-square table
value, we reject the null hypothesis and conclude there is an ARCH effect in the ARMA
model. If TR² is smaller than the Chi-square table value, we do not reject the null
hypothesis.
[edit] GARCH
If an autoregressive moving average model (ARMA model) is assumed for the error variance,
the model is a generalized autoregressive conditional heteroskedasticity (GARCH,
Bollerslev(1986)) model.

In that case, the GARCH(p, q) model (where p is the order of the GARCH terms and q is the

order of the ARCH terms ) is given by

Generally, when testing for heteroskedasticity in econometric models, the best test is the White
test. However, when dealing with time series data, this means to test for ARCH errors (as
described above) and GARCH errors (below).
Prior to GARCH there was EWMA which has now been superseded by GARCH, although some
people utilise both.
[edit] GARCH(p, q) model specification
The lag length p of a GARCH(p, q) process is established in three steps:
1. Estimate the best fitting AR(q) model

.
2. Compute and plot the autocorrelations of ε by
2
3. The asymptotic, that is for large samples, standard deviation of ρ(i) is .
Individual values that are larger than this indicate GARCH errors. To estimate the total
number of lags, use the Ljung-Box test until the value of the these are less than, say, 10%
significant. The Ljung-Box Q-statistic follows χ2 distribution with n degrees of freedom

if the squared residuals are uncorrelated. It is recommended to consider up to T/4


values of n. The null hypothesis states that there are no ARCH or GARCH errors.
Rejecting the null thus means that there are existing such errors in the conditional
variance.
[edit] Nonlinear GARCH (NGARCH)
Nonlinear GARCH (NGARCH) also known as Nonlinear Asymmetric GARCH(1,1)
(NAGARCH) was introduced by Engle and Ng in 1993.

.
For stock returns, parameter is usually estimated to be positive; in this case, it reflects the
leverage effect, signifying that negative returns increase future volatility by a larger amount than
positive returns of the same magnitude.[1][2]
This model shouldn't be confused with the NARCH model, together with the NGARCH
extension, introduced by Higgins and Bera in 1992.[clarification needed]
[edit] IGARCH
Integrated Generalized Autoregressive Conditional Heteroskedasticity IGARCH is a restricted
version of the GARCH model, where the sum of the persistent parameters sum up to one, and
therefore there is a unit root in the GARCH process. The condition for this is

.
[edit] EGARCH
The exponential general autoregressive conditional heteroskedastic (EGARCH) model by
Nelson (1991) is another form of the GARCH model. Formally, an EGARCH(p,q):
where g(Zt) = θZt + λ( | Zt | − E( | Zt | )), is the conditional variance, ω, β, α, θ and λ
are coefficients, and Zt is a standard normal variable.

Since may be negative there are no (fewer) restrictions on the parameters.


[edit] GARCH-M
The GARCH-in-mean (GARCH-M) model adds a heteroskedasticity term into the mean
equation. It has the specification:

The residual is defined as

[edit] QGARCH
The Quadratic GARCH (QGARCH) model by Sentana (1995) is used to model asymmetric
effects of positive and negative shocks.

In the example of a GARCH(1,1) model, the residual process is

where zt is i.i.d. and

[edit] GJR-GARCH
Similar to QGARCH, The Glosten-Jagannathan-Runkle GARCH (GJR-GARCH) model by
Glosten, Jagannathan and Runkle (1993) also models asymmetry in the GARCH process. The
suggestion is to model where zt is i.i.d., and

where It − 1 = 0 if , and It − 1 = 1 if .

[edit] TGARCH model


Finally, the Threshold GARCH (TGARCH) model by Zakoian (1994) is similar to GJR
GARCH, and the specification is one on conditional standard deviation instead of conditional
variance:
where if , and if . Likewise, if

, and if .
[edit] fGARCH
Hentschel's fGARCH model[3], also known as Family GARCH, is an omnibus model that nests
a variety of other popular symmetric and asymmetric GARCH models including APARCH, GJR,
AVGARCH, NGARCH, etc.

Rate of return
In finance, rate of return (ROR), also known as return on investment (ROI), rate of profit or
sometimes just return, is the ratio of money gained or lost (whether realized or unrealized) on an
investment relative to the amount of money invested. The amount of money gained or lost may
be referred to as interest, profit/loss, gain/loss, or net income/loss. The money invested may be
referred to as the asset, capital, principal, or the cost basis of the investment. ROI is usually
expressed as a percentage rather than a fraction.
[edit]Calculation
The initial value of an investment, Vi, does not always have a clearly defined monetary value,
but for purposes of measuring ROI, the initial value must be clearly stated along with the
rationale for this initial value. The final value of an investment, Vf, also does not always have a
clearly defined monetary value, but for purposes of measuring ROI, the final value must be
clearly stated along with the rationale for this final value.[citation needed]
The rate of return can be calculated over a single period, or expressed as an average over
multiple periods.
[edit]Single-period
[edit]Arithmetic return
The arithmetic return is:

rarith is sometimes referred to as the yield. See also: effective interest rate, effective annual rate
(EAR) or annual percentage yield (APY).
[edit]Logarithmic or continuously compounded return
The logarithmic return or continuously compounded return, also known as force of interest,
is defined as:
It is the reciprocal of the e-folding time.
[edit]Multiperiod average returns
[edit]Arithmetic average rate of return
The arithmetic average rate of return over n periods is defined as:

[edit]Geometric average rate of return


The geometric average rate of return, also known as the time-weighted rate of return, over n
periods is defined as:

The geometric average rate of return calculated over n years is also known as the annualized
return.
[edit]Internal rate of return
Main article: Internal rate of return

The internal rate of return (IRR), also known as the dollar-weighted rate of return, is defined
as the value(s) of that satisfies the following equation:

where:
• NPV = net present value of the investment
• Ct = cashflow at time t
When the rate of return r is smaller than the IRR rate , the investment is profitable, i.e., NPV>
0. Otherwise, the investment is not profitable.
[edit]Comparisons between various rates of return
[edit]Arithmetic and logarithmic return

The value of an investment is doubled over a year if the annual ROR or

. The value falls to zero when or .


Arithmetic and logarithmic returns are not equal, but are approximately equal for small returns.
The difference between them is large only when percent changes are high. For example, an
arithmetic return of +50% is equivalent to a logarithmic return of 40.55%, while an arithmetic
return of -50% is equivalent to a logarithmic return of -69.31%.
Logarithmic returns are often used by academics in their research. The main advantage is that the
continuously compounded return is symmetric, while the arithmetic return is not: positive and
negative percent arithmetic returns are not equal. This means that an investment of $100 that
yields an arithmetic return of 50% followed by an arithmetic return of -50% will result in $75,
while an investment of $100 that yields a logarithmic return of 50% followed by an logarithmic
return of -50% it will remain $100.

Comparison of arithmetic and logarithmic returns for


initial investment of $100

Initial investment, Vi $100 $100 $100 $100 $100

Final investment, Vf $0 $50 $100 $150 $200

Profit/loss, Vf − Vi -$100 -$50 $0 $50 $100

Arithmetic return,
-100% -50% 0% 50% 100%
rarith

Logarithmic return,
-69.31% 0% 40.55% 69.31%
rlog

[edit]Arithmetic average and geometric average rates of return


Both arithmetic and geometric average rates of returns are averages of periodic percentage
returns. Neither will accurately translate to the actual dollar amounts gained or lost if percent
gains are averaged with percent losses. [1] A 10% loss on a $100 investment is a $10 loss, and a
10% gain on a $100 investment is a $10 gain. When percentage returns on investments are
calculated, they are calculated for a period of time – not based on original investment dollars, but
based on the dollars in the investment at the beginning and end of the period. So if an investment
of $100 loses 10% in the first period, the investment amount is then $90. If the investment then
gains 10% in the next period, the investment amount is $99.
A 10% gain followed by a 10% loss is a 1% loss. The order in which the loss and gain occurs
does not affect the result. A 50% gain and a 50% loss is a 25% loss. An 80% gain plus an 80%
loss is a 64% loss. To recover from a 50% loss, a 100% gain is required. The mathematics of this
are beyond the scope of this article, but since investment returns are often published as "average
returns", it is important to note that average returns do not always translate into dollar returns.

Example #1 Level Rates of Return

Year Year Year Year


1 2 3 4

Rate of Return 5% 5% 5% 5%

Geometric Average at
5% 5% 5% 5%
End of Year

$105. $110. $115. $121.


Capital at End of Year
00 25 76 55

$10.2 $15.7 $21.5


Dollar Profit/(Loss) $5.00
5 6 5

Compound Yield 5% 5.4%

Example #2 Volatile Rates of Return, including


losses

Year Year Year Year


1 2 3 4

Rate of Return 50% -20% 30% -40%

Geometric Average at -
50% 9.5% 16%
End of Year 1.6%
$150. $120. $156. $93.6
Capital at End of Year
00 00 00 0

($6.4
Dollar Profit/(Loss)
0)

-
Compound Yield
1.6%

Example #3 Highly Volatile Rates of Return,


including losses

Year Year Year Year


1 2 3 4

Rate of Return -95% 0% 0% 115%

- -
Geometric Average at -
-95% 77.6 63.2
End of Year 42.7%
% %

$5.0 $5.0 $5.0 $10.7


Capital at End of Year
0 0 0 5

($89.2
Dollar Profit/(Loss)
5)

-
Compound Yield
22.3%

[edit]Annual returns and annualized returns


Care must be taken not to confuse annual and annualized returns. An annual rate of return is a
single-period return, while an annualized rate of return is a multi-period, geometric average
return.
An annual rate of return is the return on an investment over a one-year period, such as January 1
through December 31, or June 3 2006 through June 2 2007. Each ROI in the cash flow example
above is an annual rate of return.
An annualized rate of return is the return on an investment over a period other than one year
(such as a month, or two years) multiplied or divided to give a comparable one-year return. For
instance, a one-month ROI of 1% could be stated as an annualized rate of return of 12%. Or a
two-year ROI of 10% could be stated as an annualized rate of return of 5%. **For GIPS
compliance: you do not annualize portfolios or composites for periods of less than one year. You
start on the 13th month.
In the cash flow example below, the dollar returns for the four years add up to $265. The
annualized rate of return for the four years is: $265 ÷ ($1,000 x 4 years) = 6.625%.
[edit]Uses
• ROI is a measure of cash[citation needed] generated by or lost due to the
investment. It measures the cash flow or income stream from the investment
to the investor, relative to the amount invested. Cash flow to the investor can
be in the form of profit, interest, dividends, or capital gain/loss. Capital
gain/loss occurs when the market value or resale value of the investment
increases or decreases. Cash flow here does not include the return of
invested capital.

Cash Flow Example on $1,000


Investment

Year Year Year Year


1 2 3 4

Dollar
$100 $55 $60 $50
Return

ROI 10% 5.5% 6% 5%

• ROI values typically used for personal financial decisions include Annual
Rate of Return and Annualized Rate of Return. For nominal risk investments
such as savings accounts or Certificates of Deposit, the personal investor
considers the effects of reinvesting/compounding on increasing savings
balances over time. For investments in which capital is at risk, such as stock
shares, mutual fund shares and home purchases, the personal investor
considers the effects of price volatility and capital gain/loss on returns.
• Profitability ratios typically used by financial analysts to compare a
company’s profitability over time or compare profitability between
companies include Gross Profit Margin, Operating Profit Margin, ROI ratio,
Dividend yield, Net profit margin, Return on equity, and Return on assets.[2]
• During capital budgeting, companies compare the rates of return of different
projects to select which projects to pursue in order to generate maximum
return or wealth for the company's stockholders. Companies do so by
considering the average rate of return, payback period, net present value,
profitability index, and internal rate of return for various projects. [3]
• A return may be adjusted for taxes to give the after-tax rate of return. This is
done in geographical areas or historical times in which taxes consumed or
consume a significant portion of profits or income. The after-tax rate of return
is calculated by multiplying the rate of return by the tax rate, then
subtracting that percentage from the rate of return.
• A return of 5% taxed at 15% gives an after-tax return of 4.25%
0.05 x 0.15 = 0.0075
0.05 - 0.0075 = 0.0425 = 4.25%

• A return of 10% taxed at 25% gives an after-tax return of 7.5%


0.10 x 0.25 = 0.025
0.10 - 0.025 = 0.075 = 7.5%

Investors usually seek a higher rate of return on taxable investment returns than on non-taxable
investment returns.
• A return may be adjusted for inflation to better indicate its true value in
purchasing power. Any investment with a nominal rate of return less than the
annual inflation rate represents a loss of value, even though the nominal rate
of return might well be greater than 0%. When ROI is adjusted for inflation,
the resulting return is considered an increase or decrease in purchasing
power. If an ROI value is adjusted for inflation, it is stated explicitly, such as
“The return, adjusted for inflation, was 2%.”
• Many online poker tools include ROI in a player's tracked statistics, assisting
users in evaluating an opponent's profitability.

[edit]Cash or potential cash returns


[edit]Time value of money
Investments generate cash flow to the investor to compensate the investor for the time value of
money.
Except for rare periods of significant deflation where the opposite may be true, a dollar in cash is
worth less today than it was yesterday, and worth more today than it will be worth tomorrow.
The main factors that are used by investors to determine the rate of return at which they are
willing to invest money include:
• estimates of future inflation rates
• estimates regarding the risk of the investment (e.g. how likely it is that
investors will receive regular interest/dividend payments and the return of
their full capital)
• whether or not the investors want the money available (“liquid”) for other
uses.
The time value of money is reflected in the interest rates that banks offer for deposits, and also in
the interest rates that banks charge for loans such as home mortgages. The “risk-free” rate is the
rate on U.S. Treasury Bills, because this is the highest rate available without risking capital.
The rate of return which an investor expects from an investment is called the Discount Rate.
Each investment has a different discount rate, based on the cash flow expected in future from the
investment. The higher the risk, the higher the discount rate (rate of return) the investor will
demand from the investment.
[edit]Compounding or reinvesting
Compound interest or other reinvestment of cash returns (such as interest and dividends) does
not affect the discount rate of an investment, but it does affect the Annual Percentage Yield,
because compounding/reinvestment increases the capital invested.
For example, if an investor put $1,000 in a 1-year Certificate of Deposit (CD) that paid an annual
interest rate of 4%, compounded quarterly, the CD would earn 1% interest per quarter on the
account balance. The account balance includes interest previously credited to the account.

Compound Interest Example

1st 2nd 3rd 4th


Quarter Quarter Quarter Quarter

Capital at the beginning of $1,020.1 $1,030.3


$1,000 $1,010
the period 0 0

Dollar return for the period $10 $10.10 $10.20 $10.30

Account Balance at end of $1,010.0 $1,020.1 $1,030.3 $1,040.6


the period 0 0 0 0

Quarterly ROI 1% 1% 1% 1%

The concept of 'income stream' may express this more clearly. At the beginning of the year, the
investor took $1,000 out of his pocket (or checking account) to invest in a CD at the bank. The
money was still his, but it was no longer available for buying groceries. The investment provided
a cash flow of $10.00, $10.10, $10.20 and $10.30. At the end of the year, the investor got
$1,040.60 back from the bank. $1,000 was return of capital.
Once interest is earned by an investor it becomes capital. Compound interest involves
reinvestment of capital; the interest earned during each quarter is reinvested. At the end of the
first quarter the investor had capital of $1,010.00, which then earned $10.10 during the second
quarter. The extra dime was interest on his additional $10 investment. The Annual Percentage
Yield or Future value for compound interest is higher than for simple interest because the interest
is reinvested as capital and earns interest. The yield on the above investment was 4.06%.
Bank accounts offer contractually guaranteed returns, so investors cannot lose their capital.
Investors/Depositors lend money to the bank, and the bank is obligated to give investors back
their capital plus all earned interest. Because investors are not risking losing their capital on a
bad investment, they earn a quite low rate of return. But their capital steadily increases.
[edit]Returns when capital is at risk
[edit]Capital
Example: Stock with low volatility and a regular
gains and losses
quarterly dividend, reinvested
Many investments
carry significant 1st 2nd 3rd 4th
risk that the End of:
Quarter Quarter Quarter Quarter
investor will lose
some or all of the
Dividend $1 $1.01 $1.02 $1.03
invested capital.
For example,
investments in Stock Price $98 $101 $102 $99
company stock
shares put capital Shares
0.010204 0.01 0.01 0.010404
at risk. The value Purchased
of a stock share
depends on what Total Shares
1.010204 1.020204 1.030204 1.040608
someone is willing Held
to pay for it at a
certain point in Investment
time. Unlike $99 $103.04 $105.08 $103.02
Value
capital invested in
a savings account,
Quarterly ROI -1% 4.08% 1.98% -1.96%
the capital value
(price) of a stock
share constantly changes. If the price is relatively stable, the stock is said to have “low
volatility.” If the price often changes a great deal, the stock has “high volatility.” All stock shares
have some volatility, and the change in price directly affects ROI for stock investments.
Stock returns are usually calculated for holding periods such as a month, a quarter or a year.
[edit]Reinvestment when capital is at risk: rate of return and yield

Yield is the compound rate of return that includes the effect of reinvesting interest or dividends.
To the right is an example of a stock investment of one share purchased at the beginning of
the year for $100.
• The quarterly dividend is reinvested at the quarter-end stock price.
• The number of shares purchased each quarter = ($ Dividend)/($ Stock Price).
• The final investment value of $103.02 is a 3.02% Yield on the initial
investment of $100. This is the compound yield, and this return can be
considered to be the return on the investment of $100.
To calculate the rate of return, the investor includes the reinvested dividends in the total
investment. The investor received a total of $4.06 in dividends over the year, all of which were
reinvested, so the investment amount increased by $4.06.
• Total Investment = Cost Basis = $100 + $4.06 = $104.06.
• Capital gain/loss = $103.02 - $104.06 = -$1.04 (a capital loss)
• ($4.06 dividends - $1.04 capital loss ) / $104.06 total investment = 2.9% ROI
The disadvantage of this ROI calculation is that it does not take into account the fact that not all
the money was invested during the entire year (the dividend reinvestments occurred throughout
the year). The advantages are: (1) it uses the cost basis of the investment, (2) it clearly shows
which gains are due to dividends and which gains/losses are due to capital gains/losses, and (3)
the actual dollar return of $3.02 is compared to the actual dollar investment of $104.06.
For U.S. income tax purposes, if the shares were sold at the end of the year, dividends would be
$4.06, cost basis of the investment would be $104.06, sale price would be $103.02, and the
capital loss would be $1.04.
Since all returns were reinvested, the ROI might also be calculated as a continuously
compounded return or logarithmic return. The effective continuously compounded rate of
return is the natural log of the final investment value divided by the initial investment value:
• Vi is the initial investment ($100)
• Vf is the final value ($103.02)

.
[edit]Mutual fund and investment company returns
Mutual funds, exchange-traded funds (ETFs), and other equitized investments (such as unit
investment trusts or UITs, insurance separate accounts and related variable products such as
variable universal life insurance policies and variable annuity contracts, and bank-sponsored
commingled funds, collective benefit funds or common trust funds) are essentially portfolios of
various investment securities such as stocks, bonds and money market instruments which are
equitized by selling shares or units to investors. Investors and other parties are interested to know
how the investment has performed over various periods of time.
Performance is usually quantified by a fund's total return. In the 1990s, many different fund
companies were advertising various total returns-- some cumulative, some averaged, some with
or without deduction of sales loads or commissions, etc. To level the playing field and help
investors compare performance returns of one fund to another, the U.S. Securities and Exchange
Commission (SEC) began requiring funds to compute and report total returns based upon a
standardized formula-- so called "SEC Standardized total return" which is the average annual
total return assuming reinvestment of dividends and distributions and deduction of sales loads or
charges. Funds may compute and advertise returns on other bases (so-called "non-standardized"
returns), so long as they also publish no less prominently the "standardized" return data.
Subsequent to this, apparently investors who'd sold their fund shares after a large increase in the
share price in the late 1990s and early 2000s were ignorant of how significant the impact of
income/capital gain taxes was on their fund "gross" returns. That is, they had little idea how
significant the difference could be between "gross" returns (returns before federal taxes) and
"net" returns (after-tax returns). In reaction to this apparent investor ignorance, and perhaps for
other reasons, the SEC made further rule-making to require mutual funds to publish in their
annual prospectus, among other things, total returns before and after the impact of U.S federal
individual income taxes. And further, the after-tax returns would include 1) returns on a
hypothetical taxable account after deducting taxes on dividends and capital gain distributions
received during the illustrated periods and 2) the impacts of the items in #1) as well as assuming
the entire investment shares were sold at the end of the period (realizing capital gain/loss on
liquidation of the shares). These after-tax returns would apply of course only to taxable accounts
and not to tax-deferred or retirement accounts such as IRAs.
Lastly, in more recent years, "personalized" investment returns have been demanded by
investors. In other words, investors are saying more or less the fund returns may not be what
their actual account returns are based upon the actual investment account transaction history.
This is because investments may have been made on various dates and additional purchases and
withdrawals may have occurred which vary in amount and date and thus are unique to the
particular account. More and more fund and brokerage firms have begun providing personalized
account returns on investor's account statements in response to this need.
With that out of the way, here's how basic earnings and gains/losses work on a mutual fund. The
fund records income for dividends and interest earned which typically increases the value of the
mutual fund shares, while expenses set aside have an offsetting impact to share value. When the
fund's investments increase in market value, so too does the value of the fund shares (or units)
owned by the investors. When investments increase (decrease) in market value, so too the fund
shares value increases (or decreases). When the fund sells investments at a profit, it turns or
reclassifies that paper profit or unrealized gain into an actual or realized gain. The sale has no
affect on the value of fund shares but it has reclassified a component of its value from one bucket
to another on the fund books-- which will have future impact to investors. At least annually, a
fund usually pays dividends from its net income (income less expenses) and net capital gains
realized out to shareholders as an IRS requirement. This way, the fund pays no taxes but rather
all the investors in taxable accounts do. Mutual fund share prices are typically valued each day
the stock or bond markets are open and typically the value of a share is the net asset value of the
fund shares investors own.
[edit]Total returns
This section addresses only total returns without the impact of U.S. federal individual income
and capital gains taxes.
Mutual funds report total returns assuming reinvestment of dividend and capital gain
distributions. That is, the dollar amounts distributed are used to purchase additional shares of the
funds as of the reinvestment/ex-dividend date. Reinvestment rates or factors are based on total
distributions (dividends plus capital gains) during each period.

• .

.

.
Total Return = ((Final Price x Last Reinvestment Factor) - Beginning Price) / Beginning Price
[edit]Average annual total return (geometric)
Average Annual Return (geometric) US mutual funds are to compute total return as proscribed
by the U.S. Securities and Exchange Commission (SEC) in instructions to form N-1A (the fund
prospectus) as the average annual compounded rates of return for 1-year, 5-year and 10-year
periods (or inception of the fund if shorter) as the "average annual total return" for each fund.
The following formula is used:[4]
P(1+T)n = ERV
Where:
P = a hypothetical initial investment of $1,000.
T = average annual total return.
n = number of years.
ERV = ending redeemable value of a hypothetical $1,000 payment made at the beginning of the
1-, 5-, or 10-year periods at the end of the 1-, 5-, or 10-year periods (or fractional portion).
=
[edit]Example

Example: Mutual Fund with low volatility and a regular annual dividend,
reinvested at year-end share price, initial share value $100

End of: Year 1 Year 2 Year 3 Year 4 Year 5

Dividend $5 $5 $5 $5 $5

Capital Gain Distribution $2

Total Distribution $5 $5 $7 $5 $5

Share Price $98 $101 $102 $99 $101

Shares Purchased 0.05102 0.04950 0.06863 0.05051 0.04950

Shares Owned 1.05102 1.10053 1.16915 1.21966 1.26916

Reinvestment Factor 1.05102 1.05203 1.07220 1.05415 1.05219

• Total Return = (($101 x 1.05219) - $100) / $100 = 6.27% (net of expenses)


• Average Annual Return (geometric) = (((28.19)/100)+1) ^ (1/5)) – 1) x 100 =
5.09%
Using a Holding Period Return calculation, after 5 years, an investor who reinvested owned
1.26916 share valued at $101 per share ($128.19 in value). ($128.19-$100)/$100/5 = 5.638%
return. An investor who did not reinvest received total cash payments of $27 in dividends and $1
in capital gain. ($27+$1)/$100/5 = 5.600% return.
Mutual funds include capital gains as well as dividends in their return calculations. Since the
market price of a mutual fund share is based on net asset value, a capital gain distribution is
offset by an equal decrease in mutual fund share value/price. From the shareholder's perspective,
a capital gain distribution is not a net gain in assets, but it is a realized capital gain.
[edit]Summary: overall rate of return
Rate of Return and Return on Investment indicate cash flow from an investment to the
investor over a specified period of time, usually a year.
ROI is a measure of investment profitability, not a measure of investment size. While compound
interest and dividend reinvestment can increase the size of the investment (thus potentially
yielding a higher dollar return to the investor), Return on Investment is a percentage return
based on capital invested.
In general, the higher the investment risk, the greater the potential investment return, and the
greater the potential investment loss.

Stable distribution
It has been suggested that stability (probability) be merged into this article or
section. (Discuss)

Stable

Probability density function

Symmetric α-stable distributions with unit scale factor


Skewed centered stable distributions with unit scale factor

Cumulative distribution function

CDFs for symmetric α-stable distributions


CDFs for skewed centered stable distributions

exponent

called skewness
parameter parameter (note that skewness is
s: undefined)

scale

location

support: or a half-line if |
β|=1
usually not analytically expressible
pdf:
(see text)
usually not analytically expressible
cdf:
(see text)
mean: undefined when α ≤ 1, otherwise μ
usually not analytically expressible
median:
(see text). Equal to μ when β = 0
usually not analytically expressible.
mode:
Equal to μ when β = 0
infinite except when α = 2, when it is
variance:
2c2
undefined except when α = 2, when it
skewness:
is 0
kurtosis: undefined except when α = 2, when it
is 0
entropy: not analytically expressible (see text)
mgf: undefined

cf:
for

for

In probability theory, a random variable is said to be stable (or to have a stable distribution) if
it has the property that a linear combination of two independent copies of the variable has the
same distribution, up to location and scale parameters. The stable distribution family is also
sometimes referred to as the Lévy alpha-stable distribution.
The importance of stable probability distributions is that they are "attractors" for properly
normed sums of independent and identically-distributed (iid) random variables. The normal
distribution is one family of stable distributions. By the classical central limit theorem the
properly normed sum of a set of random variables, each with finite variance, will tend towards a
normal distribution as the number of variables increases. Without the finite variance assumption
the limit may be a stable distribution. Stable distributions that are non-normal are often called
stable Paretian distributions, after Vilfredo Pareto.
[edit] Definition
The stable distributions are defined by the following property:
Let X1 and X2 be independent copies of a random variableX. Random variable X is said to
be stable if for any constants a and b the random variable aX1 + bX2 has the same
distribution as cX + d with some constants c and d. The distribution is said to be strictly
stable if this holds with d = 0 (Nolan 2009).
Since the normal distribution, the Cauchy distribution, and the Lévy distribution all have the
above property, it follows that they are special cases of stable distributions.
Such distributions form a four-parameter family of continuous probability distributions
parametrized by location and scale parameters μ and c, respectively, and two shape parameters β
and α, roughly corresponding to measures of asymmetry and concentration, respectively (see the
figures).
Although the probability density function for a general stable distribution cannot be written
analytically, the general characteristic function can be. Any probability distribution is determined

by its characteristic function by:

A random variable X is called stable if its characteristic function is given by (Nolan 2009)(Voit
2003 § 5.4.3)

where sgn(t) is just the sign of t and Φ is given by

for all α except α = 1 in which case:

is a shift parameter, , called the skewness parameter, is a measure of


asymmetry. Notice that in this context the usual skewness is not well defined, as for α<2 the
distribution does not admit 2nd or higher moments, and the usual skewness definition is the 3rd
central moment.
In the simplest case β = 0, the characteristic function is just a stretched exponential function; the
distribution is symmetric about μ and is referred to as a (Lévy) symmetric alpha-stable
distribution.

When β=1 and μ=0, the distribution is supported by


The parameter | c | > 0 is a scale factor which is a measure of the width of the distribution and
α is the exponent or index of the distribution and specifies the asymptotic behavior of the
distribution for α < 2. Parameters are not completely independent, for example β=1 is possible

only when
[edit] Parameterizations
The above definition is only one of the parameterizations in use for stable distributions; it is the
most common but is not continuous in the parameters. For example, for the case α = 1 we could
replace Φ by: (Nolan 2009)

and by

This parameterization has the advantage that we may define a standard distribution using

and

The pdf for all α will then have the following standardization property:

[edit] Applications
Stable distributions owe their importance in both theory and practice to the generalization of the
Central Limit Theorem to random variables without second (and possibly first) order moments
and the accompanying self-similarity of the stable family. It was the seeming departure from
normality along with the demand for a self-similar model for financial data (i.e. the shape of the
distribution for yearly asset price changes should resemble that of the constituent daily or
monthly price changes) that led Benoît Mandelbrot to propose that cotton prices follow an alpha-
stable distribution with α equal to 1.7. Lévy distributions are frequently found in analysis of
critical behavior and financial data (Voit 2003 § 5.4.3).
They are also found in spectroscopy as a general expression for a quasistatically pressure-
broadened spectral line(Peach 1981 § 4.5).
[edit] Properties
• All stable distributions are infinitely divisible.
• With the exception of the normal distribution (α = 2), stable distributions are
leptokurtotic and heavy-tailed distributions.
• Closure under convolution
Stable distributions are closed under convolution for a fixed value of α. Since convolution is
equivalent to multiplication of the Fourier-transformed function, it follows that the product of
two stable characteristic functions with the same α will yield another such characteristic
function. The product of two stable characteristic functions is given by:

Since is not a function of the or variables it follows that these parameters for the
convolved function are given by:

In each case, it can be shown that the resulting parameters lie within the required intervals for a
stable distribution.
[edit] Other definitions of stability
Below we give frequently used equivalent definitions of stability (Nolan 2009),(Voit 2003 §
5.4.3).
A random variable X is called stable if for n independent copies Xi of X there exists a constant d
such that

(equality of distributions).
[edit] The distribution
A stable distribution is therefore specified by the above four parameters. It can be shown that any
stable distribution has continuous (even smooth) density function. If f(x,α,β,c,μ) denotes the

density of X and (sum of independent copies of X) then Y has the density


s f(y / s,α,β,c,0) with
−1

The asymptotic behavior is described, for α < 2, by: (Nolan, Theorem 1.12)

where Γ is the Gamma function (except that when α < 1 and β = 1 or −1, the tail vanishes to the
left or right, resp., of μ). This "heavy tail" behavior causes the variance of Lévy distributions to
be infinite for all α < 2. This property is illustrated in the log-log plots below.
When α=2, the distribution is Gaussian (see below), with tails asymptotic to exp(−x2/4c2)/(2c√π).
[edit] Special cases

Log-log plot of symmetric centered stable distribution PDF's showing the power law behavior
for large x. The power law behavior is evidenced by the straight-line appearance of the PDF for
large x, with the slope equal to -(α+1). (The only exception is for α = 2, in black, which is a
normal distribution.)
Log-log plot of skewed centered stable distribution PDF's showing the power law behavior for
large x. Again the slope of the linear portions is equal to -(α+1)
There is no general analytic solution for the form of p(x). There are, however three special cases
which can be analytically expressed as can be seen by inspection of the characteristic function.
• For α = 2 the distribution reduces to a Gaussian distribution with variance σ2 = 2c2 and
mean μ and the skewness parameter β has no effect (Nolan 2009)(Voit 2003 § 5.4.3).
• For α = 1 and β = 0 the distribution reduces to a Cauchy distribution with scale
parameter c and shift parameter μ(Voit 2003 § 5.4.3)(Nolan 2009).
• For α = 1 / 2 and β = 1 the distribution reduces to a Lévy distribution with scale
parameter c and shift parameter μ. (Peach 1981 § 4.5)(Nolan 2009)
Note that the above three distributions are also connected, in the following way: A standard
Cauchy random variable can be viewed as a mixture of Gaussian random variables (all with
mean zero), with the variance being drawn from a standard Lévy distribution. And in fact this is
a special case of a more general theorem which allows any symmetric alpha-stable distribution to
be viewed in this way (with the alpha parameter of the mixture distribution equal to twice the
alpha parameter of the mixing distribution—and the beta parameter of the mixing distribution
always equal to unity).
Other special cases are:
• In the limit as c approaches zero or as α approaches zero the distribution will approach a
Dirac delta functionδ(x − μ).
• For α = 1 and β = 1 ,the distribution is a Landau distribution which has a specific usage
in physics under this name.
[edit] The generalized central limit theorem
Another important property of stable distributions is the role that they play in a generalized
central limit theorem. The central limit theorem states that the sum of a number of random
variables with finite variances will tend to a normal distribution as the number of variables
grows. A generalization due to Gnedenko and Kolmogorov states that the sum of a number of
random variables with power-law tail distributions decreasing as 1 / | x | α − 1 where 1 < α < 2
(and therefore having infinite variance) will tend to a stable distribution f(x;α,0,c,0) as the
number of variables grows. (Voit 2003 § 5.4.3)
[edit] Series representation
The stable distribution can be restated as the real part of a simpler integral:(Peach 1981 § 4.5)

Expressing the second exponential as a Taylor series, we have:

where q = cα(1 − iβΦ). Reversing the order of integration and summation, and carrying out
the integration yields:

which will be valid for and will converge for appropriate values of the parameters. (Note
that the n=0 term which yields a delta function in x − μ has therefore been dropped.) Expressing
the first exponential as a series will yield another series in positive powers of x − μ which is
generally less useful.

Absolute deviation
From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, the absolute deviation of an element of a data set is the absolutedifference between
that element and a given point. Typically the point from which the deviation is measured is a
measure of central tendency, most often the median or sometimes the mean of the data set.
Di = | xi − m(X) |
where
Di is the absolute deviation,

xi is the data element


and m(X) is the chosen measure of central tendency of the data set—

sometimes the mean ( ), but most often the median.

[edit]Measures of dispersion
Several measures of statistical dispersion are defined in terms of the absolute deviation.
[edit]Average absolute deviation
The average absolute deviation, or simply average deviation of a data set is the average of the
absolute deviations and is a summary statistic of statistical dispersion or variability. It is also
called the mean absolute deviation, but this is easily confused with the median absolute
deviation.
The average absolute deviation of a set {x1, x2, ..., xn} is

The choice of measure of central tendency, m(X), has a marked effect on the value of the
average deviation. For example, for the data set {2, 2, 3, 4, 14}:

Measure of central
Average absolute deviation
tendency m(X)

Mean = 5

Median = 3

Mode = 2

The average absolute deviation from the median is less than or equal to the average absolute
deviation from the mean. In fact, the average absolute deviation from the median is always less
than or equal to the average absolute deviation from any other fixed number.
The average absolute deviation from the mean is less than or equal to the standard deviation; one
way of proving this relies on Jensen's inequality.
If x is a Gaussian random variable with a mean of 0, then, in expectation for large n, the ratio of
standard deviation to mean absolute deviation should satisfy the following equality [1]
In other words, for a Gaussian, mean absolute deviation is about 0.8 times the standard deviation.
[edit]Mean absolute deviation
The mean absolute deviation (MAD) is the mean absolute deviation from the mean. A related
quantity, the mean absolute error (MAE), is a common measure of forecast error in time series
analysis, where this measures the average absolute deviation of observations from their forecasts.
It should be noted that although the term mean deviation is used as a synonym for mean
absolute deviation, to be precise it is not the same; in its strict interpretation (namely, omitting
the absolute value operation), the mean deviation of any data set from its mean is always zero.
[edit]Median absolute deviation
Main article: Median absolute deviation

The median absolute deviation (also MAD) is the median absolute deviation from the median. It
is a robust estimator of dispersion.
For the example {2, 2, 3, 4, 14}: 3 is the median, so the absolute deviations from the median are
{1, 1, 0, 1, 11} (or reordered as {0, 1, 1, 1, 11}) with a median absolute deviation of 1, in this
case unaffected by the value of the outlier 14.
[edit]Maximum absolute deviation
The maximum absolute deviation about a point is the maximum of the absolute deviations of a
sample from that point. It is realized by the sample maximum or sample minimum and cannot be
less than half the range.
[edit]Minimization
The measures of statistical dispersion derived from absolute deviation characterize various
measures of central tendency as minimizing dispersion: The median is the measure of central
tendency most associated with the absolute deviation, in that
L2 norm statistics

just as the mean minimizes the standard deviation,

L1 norm statistics

the median minimizes average absolute deviation,

L∞ norm statistics

the mid-range minimizes the maximum absolute deviation, and

trimmed L∞ norm statistics


for example, the midhinge (average of first and third quartiles) which
minimizes the median absolute deviation of the whole distribution, also
minimizes the maximum absolute deviation of the distribution after the top
and bottom 25% have been trimmed off.

[edit]Estimation
This section requires expansion.

The mean absolute deviation of a sample is a biased estimator of the mean absolute deviation of
the population.

Normal distribution
From Wikipedia, the free encyclopedia
Jump to: navigation, search
This article is about the univariate normal distribution. For normally distributed vectors, see
Multivariate normal distribution. For matrices, see Matrix normal distribution. For stochastic
processes, see Gaussian process.

Probability density function

The red line is the standard normal distribution

Cumulative distribution function


Colors match the image above

notation:

μ∈R — mean (location)


parameters: σ2 ≥ 0 — variance (squared
scale)

x∈R if σ2> 0
support:
x = μ if σ2 = 0

pdf:

cdf:

mean: μ

median: μ

mode: μ

variance: σ2

skewness: 0
kurtosis: 0

entropy:

mgf:

cf:

Fisher
information:

In probability theory and statistics, the normal distribution or Gaussian distribution is a


continuous probability distribution that often gives a good description of data that cluster around
the mean. The graph of the associated probability density function is bell-shaped, with a peak at
the mean, and is known as the Gaussian function or bell curve.
The Gaussian distribution is one of many things named afterCarl Friedrich Gauss, who used it to
analyze astronomical data,[1] and determined the formula for its probability density function.
However, Gauss was not the first to study this distribution or the formula for its density function
—that had been done earlier by Abraham de Moivre.
The normal distribution is often used to describe, at least approximately, any variable that tends
to cluster around the mean. For example, the heights of adult males in the United States are
roughly normally distributed, with a mean of about 70 in (1.8 m). Most men have a height close
to the mean, though a small number of outliers have a height significantly above or below the
mean. A histogram of male heights will appear similar to a bell curve, with the correspondence
becoming closer if more data are used.
By the central limit theorem, under certain conditions the sum of a number of random variables
with finite means and variances approaches a normal distribution as the number of variables
increases. For this reason, the normal distribution is commonly encountered in practice, and is
used throughout statistics, natural science, and social science[2] as a simple model for complex
phenomena. For example, the observational error in an experiment is usually assumed to follow a
normal distribution, and the propagation of uncertainty is computed using this assumption.
[edit] History
The bean machine is a device invented by Sir Francis Galton to demonstrate how the normal
distribution appears in nature. This machine consists of a vertical board with interleaved rows of
pins. Small balls are dropped from the top and then bounce randomly left or right as they hit the
pins. The balls are collected into bins at the bottom and settle down into a pattern resembling the
Gaussian curve.
The normal distribution was first introduced by de Moivre in an article in 1733,[3] which was
reprinted in the second edition of his “The Doctrine of Chances” (1738) in the context of
approximating certain binomial distributions for large n. His result was extended by Laplace in
his book “Analytical theory of probabilities” (1812), and is now called the theorem of de
Moivre–Laplace.
Laplace used the normal distribution in the analysis of errors of experiments. The important
method of least squares was introduced by Legendre in 1805. Gauss, who claimed to have used
the method since 1794, justified it rigorously in “Theoria Motus Corporum Coelestium in
Sectionibus Conicis Solem Ambientum” (1809) by assuming the normal distribution of the errors.
Gauss’s notation was quite different from the modern one, for the error Δ he writes

In the middle of the 19th century Maxwell demonstrated that the normal distribution is not only a
convenient mathematical tool, but that it also appears in nature. He writes[4]: “The number of
particles whose velocity, resolved in a certain direction, lies between x and x+dx is

It was Pearson to first write the distribution in terms of the standard deviation σ as in modern
notation. Soon after this, in year 1915, Fisher has added the location parameter to the formula for
normal distribution, expressing it in the way it is written nowadays:

Since its introduction, the normal distribution has been known by many different names: the law
of error, the law of facility of errors, Laplace’s second law, Gaussian law, etc. Curiously, it has
never been known under the name of its inventor, de Moivre. The name “normal distribution”
was coined independently by Peirce, Galton and Lexis around 1875; the term was derived from
the fact that this distribution was seen as typical, common, normal. This name was popularized
in statistical community by Pearson around the turn of the 20th century.[5]
The term “standard normal” which denotes the normal distribution with zero mean and unit
variance came into general use around 1950s, appearing in the popular textbooks by P.G. Hoel
(1947) “Introduction to mathematical statistics” and A.M. Mood (1950) “Introduction to the
theory of statistics”.[6]
[edit] Definition
The simplest case of a normal distribution is known as the standard normal distribution,
described by the probability density function

The constant in this expression ensures that the total area under the curve ϕ(x) is equal to
one, [proof]
and ⁄2 in the exponent makes the “width” of the curve (measured as half of the distance
1

between the inflection points of the curve) also equal to one. It is traditional[7] in statistics to
denote this function with the Greek letter ϕ (phi), whereas density functions for all other
distributions are usually denoted with letters ƒ or p. The alternative glyph φ is also used quite
often, however within this article we reserve “φ” to denote characteristic functions.
More generally, a normal distribution results from exponentiating a quadratic function (just as an
exponential distribution results from exponentiating a linear function):

This yields the classic “bell curve” shape (provided that a< 0 so that the quadratic function is
concave). Notice that f(x) > 0 everywhere. One can adjust a to control the “width” of the bell,
then adjust b to move the central peak of the bell along the x-axis, and finally adjust c to control
the “height” of the bell. For f(x) to be a true probability density function over R, one must choose

c such that (which is only possible when a < 0).


Rather than using a, b, and c, it is far more common to describe a normal distribution by its
meanμ = −b/(2a) and varianceσ2 = −1/(2a). Changing to these new parameters allows us to
rewrite the probability density function in a convenient standard form,

Notice that for a standard normal distribution, μ = 0 and σ2 = 1. The last part of the equation
above shows that any other normal distribution can be regarded as a version of the standard
normal distribution that has been stretched horizontally by a factor σ and then translated
rightward by a distance μ. Thus, μ specifies the position of the bell curve’s central peak, and σ
specifies the “width” of the bell curve.
The parameter μ is at the same time the mean, the median and the mode of the normal
distribution. The parameter σ2 is called the variance; as for any real-valued random variable, it
describes how concentrated the distribution is around its mean. The square root of σ2 is called the
standard deviation and is the width of the density function.
Some authors[8] instead of σ2 use its reciprocal τ = σ−2, which is called the precision. This
parameterization has an advantage in numerical applications where σ2 is very close to zero and is
more convenient to work with in analysis as τ is a natural parameter of the normal distribution.
Another advantage of using this parameterization is in the study of conditional distributions in
multivariate normal case.
Normal distribution is denoted as N(μ, σ2). Commonly the letter N is written in calligraphic font
(typed as \mathcal{N} in LaTeX). Thus when a random variable X is distributed normally with
mean μ and variance σ2, we write

[edit] Characterization
In the previous section the normal distribution was defined by specifying its probability density
function. However there are other ways to characterize a probability distribution. They include:
the cumulative distribution function, the moments, the cumulants, the characteristic function, the
moment-generating function, etc.
[edit] Probability density function
The continuous probability density function of the normal distribution exists only when the
variance parameter σ2 is not equal to zero. Then it is given by the Gaussian function

When σ2 = 0, the density can be represented as a Dirac delta function:

This isn’t a function in a usual sense, but rather a generalized function: it is equal to infinity at x
= μ and zero elsewhere.
Properties:
• Function ƒ(x) is symmetric around x = μ, which is at the same time the mode, the median
and the mean of the distribution.
• The inflection points of the curve occur one standard deviation away from the mean (i.e.,
at x = μ − σ and x = μ + σ).
• The standard normal density ϕ(x) is an eigenfunction of the Fourier transform.
• The function is supersmooth of order 2, implying that it is infinitely differentiable.
• The derivative of ϕ(x) is ϕ′(x) = −x·ϕ(x), the second derivative is ϕ′′(x) = (x2 − 1)ϕ(x).
[edit] Cumulative distribution function
See also: Error function, Q-function, and Standard normal table
The cumulative distribution function (cdf) of a random variable X evaluated at a number x, is the
probability of the event that X is less than or equal to x. The cdf of the standard normal
distribution is denoted with the capital greek letter Φ (phi), and can be computed as an integral of
the probability density function:

This integral cannot be expressed in terms of standard functions, however with the use of a
special functionerf, called the error function, the standard normal cdf Φ(x) can be written as

The complement of the standard normal cdf, 1 − Φ(x), is often denoted Q(x), and is referred to as
the Q-function, especially in engineering texts.[9][10] This represents the tail probability of the
Gaussian distribution, that is the probability that a standard normal random variable X is greater
than the number x:

Other definitions of the Q-function, all of which are simple transformations of Φ, are also used
occasionally.[11]
The inverse of the standard normal cdf, called the quantile function or probit function, can be
expressed in terms of the inverse error function:

It is recommended to use letter z to denote the quantiles of the standard normal cdf, unless that
letter is already used for some other purpose.
The values Φ(x) may be approximated very accurately by a variety of methods, such as
numerical integration, Taylor series, asymptotic series and continued fractions. For large values
of x it is usually easier to work with the Q-function.
For a generic normal random variable with mean μ and variance σ2 > 0 the cdf will be equal to

and the corresponding quantile function is

For a normal distribution with zero variance, the cdf is the Heaviside function:

Properties:
• The standard normal cdf is symmetric around point (0, ½): Φ(−x) = 1 − Φ(x).
• The derivative of Φ(x) is equal to the standard normal pdf ϕ(x): Φ’(x) = ϕ(x).
• The antiderivative of Φ(x) is: ∫ Φ(x) dx = xΦ(x) + ϕ(x).
[edit] Characteristic function
The characteristic functionφX(t) of a random variable X is defined as the expected value of eitX,
where i is the imaginary unit, and t ∈ R is the argument of the characteristic function. Thus the
characteristic function is the Fourier transform of the density ϕ(x).
For the standard normal random variable, the characteristic function is

For a generic normal distribution with mean μ and variance σ2, the characteristic function is [12]

[edit] Moment generating function


The moment generating function is defined as the expected value of etX. For a normal
distribution, the moment generating function exists and is equal to

The cumulant generating function is the logarithm of the moment generating function:

Since this is a quadratic polynomial in t, only the first two cumulants are nonzero.
[edit] Properties
1. The family of normal distributions is closed under linear transformations. That is, if X is
normally distributed with mean μ and variance σ2, then a linear transform aX + b (for
some real numbers a ≠ 0 and b) is also normally distributed:

Also if X1, X2 are two independent normal random variables, with means μ1, μ2 and
standard deviations σ1, σ2, then their linear combination will also be normally distributed:
[proof]

2. The converse of (1) is also true: if X1 and X2 are independent and their sum X1 + X2 is
distributed normally, then both X1 and X2 must also be normal. This is known as the
Cramér’s theorem.
3. Normal distribution is infinitely divisible: for a normally distributed X with mean μ and
variance σ2 we can find n independent random variables {X1, …, Xn} each distributed
normally with means μ/n and variances σ2/n such that
4. Normal distribution is stable (with exponent α = 2): if X1, X2 are two independent N(μ, σ2)
random variables and a, b are arbitrary real numbers, then

where X3 is also N(μ, σ2). This relationship directly follows from property (1).
5. The Kullback–Leibler divergence between two normal distributions X1∼N(μ1, σ21 )and
X2∼N(μ2, σ22 )is given by:[13]

The Hellinger distance between the same distributions is equal to

6. The Fisher information matrix for normal distribution is diagonal and takes form

7. Normal distributions belongs to an exponential family with natural parameters and

, and natural statistics x and x2. The dual, expectation parameters for normal
distribution are η1 = μ and η2 = μ2 + σ2.
8. Of all probability distributions over the reals with mean μ and variance σ2, the normal
distribution N(μ, σ2) is the one with the maximum entropy.
9. The family of normal distributions forms a manifold with constant curvature −1. The
same family is flat with respect to the (±1)-connections ∇(e) and ∇(m).[14]
[edit] Standardizing normal random variables
As a consequence of property 1, it is possible to relate all normal random variables to the
standard normal. For example if X is normal with mean μ and variance σ2, then

has mean zero and unit variance, that is Z has the standard normal distribution. Conversely,
having a standard normal random variable Z we can always construct another normal random
variable with specific mean μ and variance σ2:
This “standardizing” transformation is convenient as it allows one to compute the pdf and
especially the cdf of a normal distribution having the table of pdf and cdf values for the standard
normal. They will be related via

[edit] Moments
The normal distribution has moments of all orders. That is, for a normally distributed X with
mean μ and variance σ2, the expectation E[|X|p] exists and is finite for all p such that Re[p] > −1.
Usually we are interested only in moments of integer orders: p = 1, 2, 3, ….
• Central moments are the moments of X around its mean μ. Thus, central moment of
order p is the expected value of (X − μ)p. Using standardization of normal distribution,
this expectation will be equal to σp·E[Zp], where Z is standard normal.

Here n!! denotes the double factorial, that is the product of every other number from n to
1; and 1{…} is the indicator function.
• Central absolute moments are the moments of |X − μ|. They coincide with regular
moments for all even orders, but are nonzero for all odd p’s.

• Raw moments and raw absolute moments are the moments of X and |X| respectively.
The formulas for these moments are much more complicated, and are given in terms of
confluent hypergeometric functions1F1 and U.

These expressions remain valid even if p is not integer.


• First two cumulants are equal to μ and σ2 respectively, whereas all higher-order
cumulants are equal to zero.

Order Raw moment Central moment Cumulant

1 0
2

3 0 0

4 0

5 0 0

6 0

7 0 0

8 0

[edit] Central limit theorem


Main article: Central limit theorem
The theorem states that under certain, fairly common conditions, the sum of a large number of
random variables will have approximately normal distribution. For example if X1, …, Xn is a
sequence of iid random variables, each having mean μ and variance σ2 but otherwise distributions
of Xi’s can be arbitrary, then the central limit theorem states that

The theorem will hold even if the summands Xi are not iid, although some constraints on the
degree of dependence and the growth rate of moments still have to be imposed.
The importance of the central limit theorem cannot be overemphasized. A great number of test
statistics, scores, and estimators encountered in practice contain sums of certain random
variables in them, even more estimators can be represented as sums of random variables through
the use of influence functions — all of these quantities are governed by the central limit theorem
and will have asymptotically normal distribution as a result.
Plot of the pdf of a normal distribution with μ = 12 and σ = 3, approximating the pdf of a
binomial distribution with n = 48 and p = 1/4
Another practical consequence of the central limit theorem is that certain other distributions can
be approximated by the normal distribution, for example:
• The binomial distributionB(n, p) is approximately normal N(np, np(1 − p)) for large n and
for p not too close to zero or one.
• The Poisson(λ) distribution is approximately normal N(λ, λ) for large values of λ.
• The chi-squared distributionχ2(k) is approximately normal N(k, 2k) for large ks.
• The Student’s t-distributiont(ν) is approximately normal N(0, 1) when ν is large.
Whether these approximations are sufficiently accurate depends on the purpose for which they
are needed, and the rate of convergence to the normal distribution. It is typically the case that
such approximations are less accurate in the tails of the distribution.
A general upper bound for the approximation error in the central limit theorem is given by the
Berry–Esseen theorem.
[edit] Standard deviation and confidence intervals

Dark blue is less than one standard deviation from the mean. For the normal distribution, this
accounts for about 68% of the set (dark blue), while two standard deviations from the mean
(medium and dark blue) account for about 95%, and three standard deviations (light, medium,
and dark blue) account for about 99.7%.
About 68% of values drawn from a normal distribution are within one standard deviation σ > 0
away from the mean μ; about 95% of the values are within two standard deviations and about
99.7% lie within three standard deviations. This is known as the 68-95-99.7 rule, or the
empirical rule, or the 3-sigma rule.
To be more precise, the area under the bell curve between μ − nσ and μ + nσ in terms of the
cumulative normal distribution function is given by

where erf is the error function. To 12 decimal places, the values for the 1-, 2-, up to 6-sigma
points are:

0.68268949213
1
7

0.95449973610
2
4

0.99730020393
3
7

0.99993665751
4
6

0.99999942669
5
7

0.99999999802
6
7

The next table gives the reverse relation of sigma multiples corresponding to a few often used
values for the area under the bell curve. These values are useful to determine (asymptotic)
confidence intervals of the specified levels based on normally distributed (or asymptotically
normal) estimators:

1.28155156554
0.80
5

1.64485362695
0.90
1

1.95996398454
0.95
0

2.32634787404
0.98
1

2.57582930354
0.99
9

2.80703376834
0.995
4

3.09023230616
0.998
8

3.29052673149
0.999
2

3.89059188641
0.9999
3

0.99999 4.41717341346
9

where the value on the left of the table is the proportion of values that will fall within a given
interval and n is a multiple of the standard deviation that specifies the width of the interval.
[edit] Related and derived distributions
• If X is distributed normally with mean μ and variance σ2, then
○ The exponent of X is distributed log-normally: eX ~ lnN(μ, σ2).
○ The absolute value of X has folded normal distribution: IXI ~ Nf (μ, σ2).
○ The square of X, scaled down by the variance σ2, has the non-central chi-square
distribution with 1 degree of freedom: X2/σ2 ~ χ21(μ). If μ = 0, the distribution is
called simply chi-square.
○ Variable X restricted to an interval [a, b] is called the truncated normal
distribution.
○ (X − μ)−2 has a Lévy distribution with location 0 and scale σ−2.
• If X1 and X2 are two independent standard normal random variables, then
○ Their sum and difference is distributed normally with mean zero and variance
two: X1 ± X2∼N(0, 2).
○ Their product Z = X1 · X2 follows an (unnamed?) distribution with density function
[15]

where K0 is the modified Bessel function of the second kind. This distribution is
symmetric around zero, unbounded at z = 0, and has the characteristic
functionφZ(t) = (1 + t 2)−1/2.
○ Their ratio follows the standard Cauchy distribution: X1 ÷ X2∼ Cauchy(0, 1).
○ Their Euclidean norm has the Rayleigh distribution (also known as chi
distribution with 2 degrees of freedom):

• If X1, X2, …, Xn are independent standard normal random variables, then the sum of their

squares has the chi-square distribution with n degrees of freedom: .


• If X1, X2, …, Xn are independent normally distributed random variables with means μ and
variances σ2, then their sample mean is independent from the sample standard deviation,
which can be demonstrated using the Basu’s theorem or Cochran’s theorem. The ratio of
these two quantities will have the Student’s t-distribution with n − 1 degrees of freedom:
• If X1, …, Xn, Y1, …, Ym are independent standard normal random variables, then the ratio
of their normalized sums of squares will have the F-distribution with (n, m) degrees of
freedom:

[edit] Descriptive and inferential statistics


[edit] Scores
Many scores are derived from the normal distribution, including percentile ranks ("percentiles"
or "quantiles"), normal curve equivalents, stanines, z-scores, and T-scores. Additionally, a
number of behavioral statistical procedures are based on the assumption that scores are normally
distributed; for example, t-tests and ANOVAs (see below). Bell curve grading assigns relative
grades based on a normal distribution of scores.

This section requires expansion.

[edit] Normality tests


Main article: Normality tests
Normality tests assess the likelihood that the given data set {x1, …, xn} comes from a normal
distribution. Typically the null hypothesisH0 is that the observations are distributed normally
with unspecified mean μ and variance σ2, versus the alternative Ha that the distribution is
arbitrary. A great number of tests (over 40) have been devised for this problem, the more
prominent of them are outlined below:
• “Visual” tests are more intuitively appealing but subjective at the same time, as they rely
on informal human judgement to accept or reject the null hypothesis.
○ Q-Q plot — is a plot of the sorted values from the data set against the expected
values of the corresponding quantiles from the standard normal distribution. That
is, it’s a plot of point of the form (Φ−1(pk), x(k)), where plotting points pk are equal
to pk = (k−α)/(n+1−2α) and α is an adjustment constant which can be anything
between 0 and 1. If the null hypothesis is true, the plotted points should
approximately lie on a straight line.
○ P-P plot — similar to the Q-Q plot, but used much less frequently. This method

consists of plotting the points (Φ(z(k)), pk), where . For normally


distributed data this plot should lie on a 45° line between (0,0) and (1,1).
○ Wilk–Shapiro test employs the fact that the line in the Q-Q plot has the slope of σ.
The test compares the least squares estimate of that slope with the value of the
sample variance, and rejects the null hypothesis if these two quantities differ
significantly.
○ Normal probability plot (rankit plot)
• Moment tests:
○ D’Agostino’s K-squared test
○ Jarque–Bera test
• Empirical distribution function tests:
○ Kolmogorov–Smirnov test
○ Lilliefors test
○ Anderson–Darling test
[edit] Estimation of parameters
It is often the case that we don’t know the parameters of the normal distribution, but instead want
to estimate them. That is, having a sample X1, …, Xn from a normal N(μ,σ2) population we would
like to learn the approximate values of parameters μ and σ2.
The standard approach to this problem is the maximum likelihood method, which gives as
estimates the values that maximize the log-likelihood function:

Maximizing this function with respect to μ and σ2 yields the maximum likelihood estimates

Estimator is called the sample mean, as it is the arithmetic mean of the sample observations.
The estimator is similarly called the sample variance. Sometimes instead of another
estimator is considered, s2, which differs from the former by having (n − 1) instead of n in the
denominator (so called Bessel’s correction):
This quantity s2 is also called the sample variance, and its square root the sample standard
deviation. The difference between s2 and becomes negligibly small for large n’s.
These estimators have the following properties:

• is the uniformly minimum variance unbiased (UMVU) estimator for μ, by the


Lehmann–Scheffé theorem.

• is a consistent estimator of μ, that is converges in probability to μ as n → ∞.

• has normal final sample distribution:

which implies that the standard error of is equal to , that is if one wishes to
decrease the standard error by a factor of 10, one must increase the number of samples by
a factor of 100. This fact is widely used in determining sample sizes for opinion polls and
number of trials in Monte Carlo simulation.

• is a biased estimator of σ2, whereas s2 is unbiased. On the other hand, is a superior


estimator in terms of the mean squared error (MSE) criterion.

• is a consistent and asymptotically normal estimator:

• has a distribution proportional to chi-squared in finite sample:

• is independent from , by Cochran’s theorem. The normal distribution is the only


distribution whose sample mean and sample variance are independent.
• The ratio

has Student’s t-distribution. This t-statistic is ancillary, and is used for testing the
hypothesis H0:μ = μ0 and in construction of confidence intervals.
• The 1−α confidence intervals for μ and σ2 are:
where q… denotes the quantile function. For large n it is possible to replace the quantiles
of t- and χ²-distributions with the normal quantiles. For example, the approximate 95%
confidence intervals will be given by

where 1.96 is the 97.5%-th quantile of the standard normal distribution.


[edit] Occurrence
The occurrence of normal distribution in practical problems can be loosely classified into three
categories:
1. Exactly normal distributions;
2. Approximately normal laws, for example when such approximation is justified by the
central limit theorem; and
3. Distributions modeled as normal — the normal distribution being one of the simplest and
most convenient to use, frequently researchers are tempted to assume that certain quantity
is distributed normally, without justifying such assumption rigorously. In fact, the
maturity of a scientific field can be judged by the prevalence of the normality assumption
in its methods.[citation needed]
[edit] Exact normality
Certain quantities in physics are distributed normally, as was first demonstrated by James Clerk
Maxwell. Examples of such quantities are:
• Velocities of the molecules in the ideal gas. More generally, velocities of the particles in
any system in thermodynamic equilibrium will have normal distribution, due to the
maximum entropy principle.
• The position of a particle which experiences diffusion, starting at a point x = μ at t = 0.
• Probability density function of a ground state in a quantum harmonic oscillator.
• The density of an electron cloud in 1s state.
[edit] Approximate normality
Approximately normal distributions occur in many situations, as explained by the central limit
theorem. When the outcome is produced by a large number of small effects acting additively and
independently, its distribution will be close to normal.
The normal approximation will not be valid if the effects act multiplicatively (instead of
additively), or if there is a single external influence which has a considerably larger magnitude
than the rest of the effects.
• In counting problems, where the central limit theorem includes a discrete-to-continuum
approximation and where infinitely divisible and decomposable distributions are
involved, such as
○ Binomial random variables, associated with yes/no questions;
○ Poisson random variables, associated with rare events;
• Light intensity
○ The intensity of laser light is normally distributed;
○ Thermal light has a Bose–Einstein distribution on very short time scales, and a
normal distribution on longer timescales due to the central limit theorem.
[edit] Assumed normality
There are statistical methods to empirically test that assumption, see the #Normality tests section.
• In physiological measurements of biological specimens:
○ The logarithm of measures of size of living tissue (length, height, skin area,
weight);
○ The length of inert appendages (hair, claws, nails, teeth) of biological specimens,
in the direction of growth; presumably the thickness of tree bark also falls under
this category;
○ Other physiological measures may be normally distributed, but there is no reason
to expect that a priori. Of relevance to biology and economics is the fact that
complex systems tend to display power laws rather than normality.
• Financial variables, in the Black–Scholes model
○ Changes in the logarithm of exchange rates, price indices, and stock market
indices; these variables behave like compound interest, not like simple interest,
and so are multiplicative;
○ While the Black–Scholes model assumes normality, in reality these variables
exhibit heavy tails, as seen in stock market crashes;
○ Other financial variables may be normally distributed, but there is no reason to
expect that a priori;
• Measurement errors are often assumed to be normally distributed, and any deviation from
normality is considered something which should be explained;
[edit] Photon counting
Light intensity from a single source varies with time, as thermal fluctuations can be observed if
the light is analyzed at sufficiently high time resolution. Quantum mechanics interprets
measurements of light intensity as photon counting, where the natural assumption is to use the
Poisson distribution. When light intensity is integrated over large times longer than the
coherence time, the Poisson-to-normal approximation is appropriate.
[edit] Measurement errors
Normality is the central assumption of the mathematical theory of errors. Similarly, in statistical
model-fitting, an indicator of goodness of fit is that the residuals (as the errors are called in that
setting) be independent and normally distributed. The assumption is that any deviation from
normality needs to be explained. In that sense, both in model-fitting and in the theory of errors,
normality is the only observation that need not be explained, being expected. However, if the
original data are not normally distributed (for instance if they follow a Cauchy distribution), then
the residuals will also not be normally distributed. This fact is usually ignored in practice.
Repeated measurements of the same quantity are expected to yield results which are clustered
around a particular value. If all major sources of errors have been taken into account, it is
assumed that the remaining error must be the result of a large number of very small additive
effects, and hence normal. Deviations from normality are interpreted as indications of systematic
errors which have not been taken into account. Whether this assumption is valid is debatable.
A famous and oft-quoted remark attributed to Gabriel Lippmann says: "Everyone believes in the
[normal] law of errors: the mathematicians, because they think it is an experimental fact; and the
experimenters, because they suppose it is a theorem of mathematics." [16]
[edit] Physical characteristics of biological specimens
The sizes of full-grown animals is approximately lognormal. The evidence and an explanation
based on models of growth was first published in the 1932 book Problems of Relative Growth by
Julian Huxley.
Differences in size due to sexual dimorphism, or other polymorphisms like the
worker/soldier/queen division in social insects, further make the distribution of sizes deviate
from lognormality.
The assumption that linear size of biological specimens is normal (rather than lognormal) leads
to a non-normal distribution of weight (since weight or volume is roughly proportional to the 2nd
or 3rd power of length, and Gaussian distributions are only preserved by linear transformations),
and conversely assuming that weight is normal leads to non-normal lengths. This is a problem,
because there is no a priori reason why one of length, or body mass, and not the other, should be
normally distributed. Lognormal distributions, on the other hand, are preserved by powers so the
"problem" goes away if lognormality is assumed.
On the other hand, there are some biological measures where normality is assumed, such as
blood pressure of adult humans. This is supposed to be normally distributed, but only after
separating males and females into different populations (each of which is normally distributed).
[edit] Financial variables
The normal model of asset price movements does not capture extreme movements such as stock
market crashes.
Already in 1900 Louis Bachelier proposed representing price changes of stocks using the normal
distribution. This approach has since been modified slightly. Because of the multiplicative nature
of compounding of returns, financial indicators such as stock values and commodityprices
exhibit "multiplicative behavior". As such, their periodic changes (e.g., yearly changes) are not
normal, but rather lognormal - i.e., logarithmic returns as opposed to values are normally
distributed. This is still the most commonly used hypothesis in finance, in particular in option
pricing in the Black–Scholes model.
However, in reality financial variables exhibit heavy tails, and thus the assumption of normality
understates the probability of extreme events such as stock market crashes. Corrections to this
model have been suggested by mathematicians such as Benoît Mandelbrot, who observed that
the changes in logarithm over short periods (such as a day) are approximated well by
distributions that do not have a finite variance, and therefore the central limit theorem does not
apply. Rather, the sum of many such changes gives log-Levy distributions.
The two paragraphs above reflect the earlier notions about finance data. Later, by more careful
analysis based on the construction of a statistical ensemble it was discovered that financial
returns have finite variance[citation needed]. Fat tails were not observed in the analysis of FX time
series. Moreover, it has been pointed out that all existing finance data analyses save two have
assumed that time averages converge to ensemble averages. But finance data are nonstationary,
so that no such convergence exists or can exist. In addition, it has been shown how spurious
stylized facts like the appearance of fat tails is generated by using a time average on a variable
like FX returns where fat tails are not present[citation needed]. In other words, the field has evolved far
beyond the initial speculations by Bachelier, Mandelbrot and others. The class of models
describing finance data are martingales, are therefore diffusive. It was Osborne who introduced
the most useful starting point in 1958, the (diffusive) lognormal model that Black and Scholes
used to price options.
No existing model of finance returns captures the notion of a market crash. Certainly, no fat
tailed model can do so. The reason for this is that both fat and non-fat tailed models predict
returns in normal liquid markets, whereas a market crash is a liquidity drought. Crashes produce
far too few statistics to permit falsifiable modelling.
[edit] Distribution in standardized testing and intelligence
In standardized testing, results can be scaled to have a normal distribution; for example, the
SAT's traditional range of 200–800 is based on a normal distribution with a mean of 500 and a
standard deviation of 100. As the entire population is known, this normalization can be done, and
allows the use of the Z test in standardized testing.
Sometimes, the difficulty and number of questions on an IQ test is selected in order to yield
normal distributed results. Or else, the raw test scores are converted to IQ values by fitting them
to the normal distribution. In either case, it is the deliberate result of test construction or score
interpretation that leads to IQ scores being normally distributed for the majority of the
population. However, the question whether intelligence itself is normally distributed is more
involved, because intelligence is a latent variable, therefore its distribution cannot be observed
directly.
[edit] Diffusion equation
The probability density function of the normal distribution is closely related to the
(homogeneous and isotropic) diffusion equation and therefore also to the heat equation. This
partial differential equation describes the time evolution of a mass-density function under
diffusion. In particular, the probability density function

for the normal distribution with expected value 0 and variance t satisfies the diffusion equation:

If the mass-density at time t = 0 is given by a Dirac delta, which essentially means that all mass
is initially concentrated in a single point, then the mass-density function at time t will have the
form of the normal probability density function with variance linearly growing with t. This
connection is no coincidence: diffusion is due to Brownian motion which is mathematically
described by a Wiener process, and such a process at time t will also result in a normal
distribution with variance linearly growing with t.
More generally, if the initial mass-density is given by a function ϕ(x), then the mass-density at
time t will be given by the convolution of ϕ and a normal probability density function.
[edit] Generating values from normal distribution
For computer simulations, especially in applications of Monte-Carlo method, it is often useful to
generate values that have a normal distribution. All algorithms described here are concerned with
generating the standard normal, since a N(μ, σ2) can be generated as X = μ + σZ, where Z is
standard normal. The algorithms rely on the availability of a random number generator capable
of producing random values distributed uniformly.
• The most straightforward method is based on the probability integral transform property:
if U is distributed uniformly on (0,1), then Φ−1(U) will have the standard normal
distribution. The drawback of this method is that it relies on calculation of the probit
function Φ−1, which cannot be done analytically. Some approximate methods are
described in Hart (1968) and in the erf article.
• A simple approximate approach that is easy to program is as follows: simply sum 12
uniform (0,1) deviates and subtract 6 — the resulting random variable will have
approximately standard normal distribution. In truth, the distribution will be Irwin–Hall,
which is a 12-section eleventh-order polynomial approximation to the normal
distribution. This random deviate will have a limited range of (−6, 6).[17]
• The Box–Muller method uses two independent random numbers U and V distributed
uniformly on (0,1]. Then two random variables X and Y

will both have the standard normal distribution, and be independent. This formulation
arises because for a bivariate normal random vector (X Y) the squared norm X2 + Y2 will
have the chi-square distribution with two degrees of freedom, which is an easily-
generated exponential random variable corresponding to the quantity −2ln(U) in these
equations; and the angle is distributed uniformly around the circle, chosen by the random
variable V.
• Marsaglia polar method is a modification of the Box–Muller method algorithm, which
does not require computation of functions sin() and cos(). In this method U and V are
drawn from the uniform (−1,1) distribution, and then S = U2 + V2 is computed. If S is
greater or equal to one then the method starts over, otherwise two quantities

are returned. Again, X and Y here will be independent and standard normally distributed.
• The ziggurat algorithm (Marsaglia & Tsang 2000) is faster than the Box–Muller
transform and still exact. In about 97% of all cases it uses only two random numbers, one
random integer and one random uniform, one multiplication and an if-test. Only in 3% of
the cases where the combination of those two falls outside the “core of the ziggurat” a
kind of rejection sampling using logarithms, exponentials and more uniform random
numbers has to be employed.
• There is also some investigation into the connection between the fast Hadamard
transform and the normal distribution, since the transform employs just addition and
subtraction and by the central limit theorem random numbers from almost any
distribution will be transformed into the normal distribution. In this regard a series of
Hadamard transforms can be combined with random permutations to turn arbitrary data
sets into a normally-distributed data.
[edit] Numerical approximations of the normal cdf
The standard normal cdf is widely used in scientific and statistical computing. Different
approximations are used depending on the desired level of accuracy.
• Abramowitz & Stegun (1964) give the approximation for Φ(x) with the absolute error |
ε(x)| < 7.5·10−8 (algorithm 26.2.17):

where ϕ(x) is the standard normal pdf, and b0 = 0.2316419, b1 = 0.319381530, b2 =


−0.356563782, b3 = 1.781477937, b4 = −1.821255978, b5 = 1.330274429.
• Hart (1968) lists almost a hundred of rational function approximations for the erfc()
function. His algorithms vary in the degree of complexity and the resulting precision,
with maximum absolute precision of 24 digits. An algorithm by West (2009) combines
Hart’s algorithm 5666 with a continued fraction approximation in the tail to provide a fast
computation algorithm with a 16-digit precision.
• Marsaglia (2004) suggested a simple algorithm[18] based on the Taylor series expansion

for calculating Φ(x) with arbitrary precision. The drawback of this algorithm is
comparatively slow calculation time (for example it takes over 300 iterations to calculate
the function with 16 digits of precision when x = 10).
• The GNU Scientific Library calculates values of the standard normal cdf using Hart’s
algorithms and approximations with Chebyshev polynomials.
??? For a more detailed discussion of how to calculate the normal distribution, see Knuth’s The
Art of Computer Programming, volume 2, section 3.4.1C.

Central limit theorem


From Wikipedia, the free encyclopedia
Jump to: navigation, search
Histogram plot of average proportion of heads in a fair coin toss, over a large number of
sequences of coin tosses
In probability theory, the central limit theorem (CLT) states conditions under which the mean
of a sufficiently large number of independentrandom variables, each with finite mean and
variance, will be approximately normally distributed (Rice 1995). The central limit theorem also
requires the random variables to be identically distributed, unless certain conditions are met.
Since real-world quantities are often the balanced sum of many unobserved random events, this
theorem provides a partial explanation for the prevalence of the normal probability distribution.
The CLT also justifies the approximation of large-sample statistics to the normal distribution in
controlled experiments.
For other generalizations for finite variance which do not require identical distribution, see
Lindeberg's condition, Lyapunov's condition, Gnedenko and Kolmogorov states.
In more general probability theory, a central limit theorem is any of a set of weak-convergence
theories. They all express the fact that a sum of many independent random variables will tend to
be distributed according to one of a small set of "attractor" (i.e. stable) distributions. Specifically,
the sum of a number of random variables with power-law tail distributions decreasing as 1 / | x |
α−1
where 1 < α < 2 (and therefore having infinite variance) will tend to a stable distribution
f(x;α,0,c,0) as the number of variables grows.[1] This article is concerned only with the
classical (i.e. finite variance) central limit theorem.
[edit] History
Tijms (2004, p. 169) writes:

Sir Francis Galton (Natural Inheritance, 1889) described the Central Limit Theorem as:

The actual term "central limit theorem" (in German: "zentraler Grenzwertsatz") was first used by
George Pólya in 1920 in the title of a paper.[2](Le Cam 1986) Pólya referred to the theorem as
"central" due to its importance in probability theory. According to Le Cam, the French school of
probability interprets the word central in the sense that "it describes the behaviour of the centre
of the distribution as opposed to its tails" (Le Cam 1986). The abstract of the paper On the
central limit theorem of calculus of probability and the problem of moments by Pólya in 1920
translates as follows.

A thorough account of the theorem's history, detailing Laplace's foundational work, as well as
Cauchy's, Bessel's and Poisson's contributions, is provided by Hald.[3] Two historic accounts, one
covering the development from Laplace to Cauchy, the second the contributions by von Mises,
Pólya, Lindeberg, Lévy, and Cramér during the 1920s, are given by Hans Fischer.[4] A period
around 1935 is described in (Le Cam 1986). See Bernstein (1945) for a historical discussion
focusing on the work of Pafnuty Chebyshev and his students Andrey Markov and Aleksandr
Lyapunov that led to the first proofs of the CLT in a general setting.
A curious footnote to the history of the Central Limit Theorem is that a proof of a result similar
to the 1922 Lindeberg CLT was the subject of Alan Turing's 1934 Fellowship Dissertation for
King's College at the University of Cambridge. Only after submitting the work did Turing learn
it had already been proved. Consequently, Turing's dissertation was never published.[5][6]
[edit] Classical central limit theorem
A distribution being "smoothed out" by summation, showing original density of distribution and
three subsequent summations; see Illustration of the central limit theorem for further details.
The central limit theorem is also known as the second fundamental theorem of probability.[citation
needed]
(The Law of large numbers is the first.)
Let X1, X2, X3, …, Xn be a sequence of n independent and identically distributed (iid) random
variables each having finite values of expectation µ and variance σ2> 0. The central limit theorem
states[citation needed] that as the sample size n increases the distribution of the sample average of these
random variables approaches the normal distribution with a mean µ and variance σ2/n
irrespective of the shape of the common distribution of the individual terms Xi.
For a more precise statement of the theorem, let Sn be the sum of the n random variables, given
by

Then, if we define new random variables

then they will converge in distribution to the standard normal distributionN(0,1) as n approaches
infinity. N(0,1) is thus the asymptotic distribution of the Zn's. This is often written as

Zn can also be expressed as


where

is the sample mean.


Convergence in distribution means that, if Φ(z) is the cumulative distribution function of N(0,1),
then for every real numberz, we have

or

[edit] Proof
For a theorem of such fundamental importance to statistics and applied probability, the central
limit theorem has a remarkably simple proof using characteristic functions. It is similar to the
proof of a (weak) law of large numbers. For any random variable, Y, with zero mean and unit
variance (var(Y) = 1), the characteristic function of Y is, by Taylor's theorem,

where o (t2 ) is "little o notation" for some function of t that goes to zero more rapidly than t2.
Letting Yi be (Xi − μ)/σ, the standardized value of Xi, it is easy to see that the standardized mean
of the observations X1, X2, ..., Xn is

By simple properties of characteristic functions, the characteristic function of Zn is

But this limit is just the characteristic function of a standard normal distribution N(0, 1), and the
central limit theorem follows from the Lévy continuity theorem, which confirms that the
convergence of characteristic functions implies convergence in distribution.
[edit] Convergence to the limit
The central limit theorem gives only an asymptotic distribution. As an approximation for a finite
number of observations, it provides a reasonable approximation only when close to the peak of
the normal distribution; it requires a very large number of observations to stretch into the tails.
If the third central moment E((X1 − μ)3) exists and is finite, then the above convergence is
uniform and the speed of convergence is at least on the order of 1/n1/2 (see Berry-Esséen
theorem).
The convergence to the normal distribution is monotonic, in the sense that the entropy of Zn
increases monotonically to that of the normal distribution, as proven in Artstein, Ball, Barthe and
Naor (2004).
The central limit theorem applies in particular to sums of independent and identically distributed
discrete random variables. A sum of discrete random variables is still a discrete random variable,
so that we are confronted with a sequence of discrete random variables whose cumulative
probability distribution function converges towards a cumulative probability distribution
function corresponding to a continuous variable (namely that of the normal distribution). This
means that if we build a histogram of the realisations of the sum of n independent identical
discrete variables, the curve that joins the centers of the upper faces of the rectangles forming the
histogram converges toward a Gaussian curve as n approaches infinity. The binomial distribution
article details such an application of the central limit theorem in the simple case of a discrete
variable taking only two possible values.
[edit] Relation to the law of large numbers
The law of large numbers as well as the central limit theorem are partial solutions to a general
problem: "What is the limiting behavior of Sn as n approaches infinity?" In mathematical
analysis, asymptotic series are one of the most popular tools employed to approach such
questions.
Suppose we have an asymptotic expansion of ƒ(n):

Dividing both parts by φ1(n) and taking the limit will produce a1, the coefficient of the highest-
order term in the expansion, which represents the rate at which ƒ(n) changes in its leading term.

Informally, one can say: "ƒ(n) grows approximately as a1 φ(n)". Taking the difference between
ƒ(n) and its approximation and then dividing by the next term in the expansion, we arrive at a
more refined statement about ƒ(n):

Here one can say that the difference between the function and its approximation grows
approximately as a2 φ2(n). The idea is that dividing the function by appropriate normalizing
functions, and looking at the limiting behavior of the result, can tell us much about the limiting
behavior of the original function itself.
Informally, something along these lines is happening when the sum, Sn, of independent
identically distributed random variables, X1, ..., Xn, is studied in classical probability theory. If
each Xi has finite mean μ, then by the Law of Large Numbers, Sn/n → μ.[7] If in addition each Xi
has finite variance σ2, then by the Central Limit Theorem,
where ξ is distributed as N(0, σ2). This provides values of the first two constants in the informal
expansion

In the case where the Xi's do not have finite mean or variance, convergence of the shifted and
rescaled sum can also occur with different centering and scaling factors:

or informally

Distributions Ξ which can arise in this way are called stable.[8] Clearly, the normal distribution is
stable, but there are also other stable distributions, such as the Cauchy distribution, for which the
mean or variance are not defined. The scaling factor bn may be proportional to nc, for any c ≥ 1/2;
it may also be multiplied by a slowly varying function of n.[9][10]
The Law of the Iterated Logarithm tells us what is happening "in between" the Law of Large
Numbers and the Central Limit Theorem. Specifically it says that the normalizing function

intermediate in size between n of The Law of Large Numbers and √n of the


central limit theorem provides a non-trivial limiting behavior.
[edit] Illustration
Main article: Illustration of the central limit theorem
Given its importance to statistics, a number of papers and computer packages are available that
demonstrate the convergence involved in the central limit theorem. [11]
[edit] Alternative statements of the theorem
[edit] Density functions
The density of the sum of two or more independent variables is the convolution of their densities
(if these densities exist). Thus the central limit theorem can be interpreted as a statement about
the properties of density functions under convolution: the convolution of a number of density
functions tends to the normal density as the number of density functions increases without
bound, under the conditions stated above.
[edit] Characteristic functions
Since the characteristic function of a convolution is the product of the characteristic functions of
the densities involved, the central limit theorem has yet another restatement: the product of the
characteristic functions of a number of density functions becomes close to the characteristic
function of the normal density as the number of density functions increases without bound, under
the conditions stated above. However, to state this more precisely, an appropriate scaling factor
needs to be applied to the argument of the characteristic function.
An equivalent statement can be made about Fourier transforms, since the characteristic function
is essentially a Fourier transform.
[edit] Extensions to the theorem
[edit] Multidimensional central limit theorem
We can easily extend proofs using characteristic functions for cases where each individual Xi is
an independent and identically distributed random vector, with mean vector μ and covariance
matrixΣ (amongst the individual components of the vector). Now, if we take the summations of
these vectors as being done componentwise, then the Multidimensional central limit theorem
states that when scaled, these converge to a multivariate normal distribution.

[edit] Products of positive random variables


The logarithm of a product is simply the sum of the logarithms of the factors. Therefore when the
logarithm of a product of random variables that take only positive values approaches a normal
distribution, the product itself approaches a log-normal distribution. Many physical quantities
(especially mass or length, which are a matter of scale and cannot be negative) are the products
of different random factors, so they follow a log-normal distribution.
Whereas the central limit theorem for sums of random variables requires the condition of finite
variance, the corresponding theorem for products requires the corresponding condition that the
density function be square-integrable (see Rempala 2002).
[edit] Lack of identical distribution
The central limit theorem also applies in the case of sequences that are not identically distributed,
provided one of a number of conditions apply.
[edit] Lyapunov condition
Main article: Lyapunov condition
Let Xn be a sequence of independent random variables defined on the same probability space.
Assume that Xn has finite expected value μn and finite standard deviation σn. We define

If for someδ > 0, the expected values are finite for every and the
Lyapunov's condition

is satisfied, then the distribution of the random variable


converges to the standard normal distribution N(0, 1).
[edit] Lindeberg condition
Main article: Lindeberg's condition
In the same setting and with the same notation as above, we can replace the Lyapunov condition
with the following weaker one (from Lindeberg in 1920). For every ε > 0

where 1{…} is the indicator function. Then the distribution of the standardized sum Zn converges
towards the standard normal distribution N(0,1).
[edit] Beyond the classical framework
Asymptotic normality, that is, convergence to the normal distribution after appropriate shift and
rescaling, is a phenomenon much more general than the classical framework treated above,
namely, sums of independent random variables (or vectors). New frameworks are revealed from
time to time; no single unifying framework is available for now.
[edit] Under weak dependence
A useful generalization of a sequence of independent, identically distributed random variables is
a mixing random process in discrete time; "mixing" means, roughly, that random variables
temporally far apart from one another are nearly independent. Several kinds of mixing are used
in ergodic theory and probability theory. See especially strong mixing (also called α-mixing)

defined by where α(n) is so-called strong mixing coefficient.


A simplified formulation of the central limit theorem under strong mixing is given in (Billingsley
1995, Theorem 27.4):

Theorem. Suppose that is stationary and α-mixing with and that

and . Denote then the limit

exists, and if then converges in distribution to


N(0,1).

In fact, where the series converges absolutely.


The assumption cannot be omitted, since the asymptotic normality fails for

where Yn are another stationary sequence.


For the theorem in full strength see (Durrett 1996, Sect. 7.7(c), Theorem (7.8)); the assumption

is replaced with and the assumption αn = O(n− 5) is

replaced with Existence of such δ > 0 ensures the conclusion. For


encyclopedic treatment of limit theorems under mixing conditions see (Bradley 2005).
[edit] Martingale central limit theorem
Main article: Martingale central limit theorem
Theorem. Let a martingaleMn satisfy

• in probability as n tends to
infinity,

• for every as
n tends to infinity,

then converges in distribution to N(0,1) as n tends to infinity.


See (Durrett 1996, Sect. 7.7, Theorem (7.4)) or (Billingsley 1995, Theorem 35.12).
Caution: The restricted expectation E(X;A) should not be confused with the conditional

expectation
[edit] Convex bodies

Theorem (Klartag 2007, Theorem 1.2). There exists a sequence for which the following

holds. Let , and let random variables have a log-concavejoint densityf such

that for all and for all

Then the distribution of is -close to N(0,1) in the


total variation distance.
These two -close distributions have densities (in fact, log-concave densities), thus, the total
variance distance between them is the integral of the absolute value of the difference between the
densities. Convergence in total variation is stronger than weak convergence.
An important example of a log-concave density is a function constant inside a given convex
body and vanishing outside; it corresponds to the uniform distribution on the convex body,
which explains the term "central limit theorem for convex bodies".

Another example: where α >


1 and αβ > 1. If β = 1 then factorizes into

which means independence of In


general, however, they are dependent.

The condition ensures that are of zero


mean and uncorrelated; still, they need not be independent, nor even pairwise independent. By
the way, pairwise independence cannot replace independence in the classical central limit
theorem (Durrett 1996, Section 2.4, Example 4.5).
Here is a Berry-Esseen type result.

Theorem (Klartag 2008, Theorem 1). Let satisfy the assumptions of the previous
theorem, then

for all here is a universal (absolute) constant. Moreover, for every

such that

A more general case is treated in (Klartag 2007, Theorem 1.1). The condition

is replaced with much weaker conditions: E(Xk) = 0,

E(XkXl) = 0 for The distribution of

need not be approximately normal (in fact, it can be uniform).

However, the distribution of is close to N(0,1) (in the total variation


distance) for most of vectors according to the uniform distribution on the sphere

[edit] Lacunary trigonometric series


Theorem (Salem - Zygmund). Let U be a random variable distributed uniformly on (0, 2π), and
Xk = rk cos(nkU + ak), where
• nk satisfy the lacunarity condition: there exists q > 1 such that nk+1 ≥ qnk for all k,
• rk are such that

• 0 ≤ ak< 2π.
Then

converges in distribution to N(0, 1/2).


See (Zygmund 1959, Sect. XVI.5, Theorem (5-5)) or (Gaposhkin 1966, Theorem 2.1.13).
[edit] Gaussian polytopes
Theorem (Barany & Vu 2007, Theorem 1.1). Let A1, ..., An be independent random points on the
plane R2 each having the two-dimensional standard normal distribution. Let Kn be the convex
hull of these points, and Xn the area of Kn Then

converges in distribution to N(0,1) as n tends to infinity.


The same holds in all dimensions (2, 3, ...).
The polytopeKn is called Gaussian random polytope.
A similar result holds for the number of vertices (of the Gaussian polytope), the number of
edges, and in fact, faces of all dimensions (Barany & Vu 2007, Theorem 1.2).
[edit] Linear functions of orthogonal matrices
A linear function of a matrix M is a linear combination of its elements (with given coefficients),

where A is the matrix of the coefficients; see Trace_(linear_algebra)#Inner


product.
A random orthogonal matrix is said to be distributed uniformly, if its distribution is the
normalized Haar measure on the orthogonal group O(n,R); see Rotation matrix#Uniform random
rotation matrices.
Theorem (Meckes 2008). Let M be a random orthogonal n×n matrix distributed uniformly, and

A a fixed n×n matrix such that and let Then the distribution of

X is close to N(0,1) in the total variation metric up to


[edit] Subsequences

Theorem (Gaposhkin 1966, Sect. 1.5). Let random variables be such

that weakly in L2(Ω) and weakly in L1(Ω). Then there exist integers

such that converges in distribution to N(0, 1) as k


tends to infinity.
[edit] Applications and examples

A histogram plot of monthly accidental deaths in the US, between 1973 and 1978 exhibits
normality, due to the central limit theorem
There are a number of useful and interesting examples and applications arising from the central
limit theorem (Dinov, Christou & Sanchez 2008). See e.g. [1], presented as part of the SOCR
CLT Activity.
• The probability distribution for total distance covered in a random walk (biased or
unbiased) will tend toward a normal distribution.
• Flipping a large number of coins will result in a normal distribution for the total number
of heads (or equivalently total number of tails).
From another viewpoint, the central limit theorem explains the common appearance of the "Bell
Curve" in density estimates applied to real world data. In cases like electronic noise, examination
grades, and so on, we can often regard a single measured value as the weighted average of a
large number of small effects. Using generalisations of the central limit theorem, we can then see
that this would often (though not always) produce a final distribution that is approximately
normal.
In general, the more a measurement is like the sum of independent variables with equal influence
on the result, the more normality it exhibits. This justifies the common use of this distribution to
stand in for the effects of unobserved variables in models like the linear model.
[edit] Signal processing
Signals can be smoothed by applying a Gaussian filter, which is just the convolution of a signal
with an appropriately scaled Gaussian function. Due to the central limit theorem this smoothing
can be approximated by several filter steps that can be computed much faster, like the simple
moving average.
The central limit theorem implies that to achieve a Gaussian of varianceσ2n filters with windows

of variances with must be applied.

Cauchy distribution
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Not to be confused with the Lorenz curve.
This article does not cite any references or sources.
Please help improve this article by adding citations to reliable sources. Unsourced material may be
challenged and removed. (August 2009)

Cauchy–Lorentz

Probability density function

The purple curve is the standard Cauchy distribution


Cumulative distribution function

location (real)
parameters:
scale (real)

support:

pdf:

cdf:

mean: not defined

median: x0

mode: x0

variance: not defined

skewness: not defined

kurtosis: not defined


entropy:

mgf: not defined

cf:

The Cauchy–Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz, is a
continuousprobability distribution. As a probability distribution, it is known as the Cauchy
distribution, while among physicists, it is known as the Lorentz distribution, Lorentz(ian)
function, or Breit–Wigner distribution.
Its importance in physics is due to its being the solution to the differential equation describing
forced resonance.[citation needed] In mathematics, it is closely related to the Poisson kernel, which is
the fundamental solution for the Laplace equation in the upper half-plane. In spectroscopy, it is
the description of the shape of spectral lines which are subject to homogeneous broadening in
which all atoms interact in the same way with the frequency range contained in the line shape.
Many mechanisms cause homogeneous broadening, most notably collision broadening, and
Chantler–Alda radiation.[1]
[edit] Characterization
[edit] Probability density function
The Cauchy distribution has the probability density function

where x0 is the location parameter, specifying the location of the peak of the distribution, and γ is
the scale parameter which specifies the half-width at half-maximum (HWHM). γ is also equal to
half the interquartile range. Cauchy himself exploited such a density function in 1827, with
infinitesimal scale parameter, in defining a Dirac delta function (see there).
The amplitude of the above Lorentzian function is given by

In physics, a three-parameter Lorentzian function is often used, as follows:


where I is the height of the peak.
The special case when x0 = 0 and γ = 1 is called the standard Cauchy distribution with the
probability density function

[edit] Cumulative distribution function


The cumulative distribution function (cdf) is:

and the inverse cumulative distribution function of the Cauchy distribution is

[edit] Properties
The Cauchy distribution is an example of a distribution which has no mean, variance or higher
moments defined. Its mode and median are well defined and are both equal to x0.
When U and V are two independent normally distributedrandom variables with expected value 0
and variance 1, then the ratio U/V has the standard Cauchy distribution.
If X1, ..., Xn are independent and identically distributed random variables, each with a standard
Cauchy distribution, then the sample mean (X1 + ... + Xn)/n has the same standard Cauchy
distribution (the sample median, which is not affected by extreme values, can be used as a
measure of central tendency). To see that this is true, compute the characteristic function of the
sample mean:

where is the sample mean. This example serves to show that the hypothesis of finite variance
in the central limit theorem cannot be dropped. It is also an example of a more generalized
version of the central limit theorem that is characteristic of all stable distributions, of which the
Cauchy distribution is a special case.
The Cauchy distribution is an infinitely divisible probability distribution. It is also a strictly
stable distribution.
The standard Cauchy distribution coincides with the Student's t-distribution with one degree of
freedom.
Like all stable distributions, the location-scale family to which the Cauchy distribution belongs is
closed under linear transformations with real coefficients. In addition, the Cauchy distribution is
the only univariate distribution which is closed under linear fractional transformations with real
coefficients. In this connection, see also McCullagh's parametrization of the Cauchy
distributions.
[edit] Characteristic function
Let X denote a Cauchy distributed random variable. The characteristic function of the Cauchy
distribution is given by

which is just the Fourier transform of the probability density. It follows that the probability may
be expressed in terms of the characteristic function by:

[edit] Explanation of undefined moments


[edit] Mean
If a probability distribution has a density functionf(x) then the mean is

The question is now whether this is the same thing as

If at most one of the two terms in (2) is infinite, then (1) is the same as (2). But in the case of the
Cauchy distribution, both the positive and negative terms of (2) are infinite. This means (2) is
undefined. Moreover, if (1) is construed as a Lebesgue integral, then (1) is also undefined, since
(1) is then defined simply as the difference (2) between positive and negative parts.
However, if (1) is construed as an improper integral rather than a Lebesgue integral, then (2) is
undefined, and (1) is not necessarily well-defined. We may take (1) to mean
and this is its Cauchy principal value, which is zero, but we could also take (1) to mean, for
example,

which is not zero, as can be seen easily by computing the integral.


Various results in probability theory about expected values, such as the strong law of large
numbers, will not work in such cases.
[edit] Second moment
Without a defined mean, it is impossible to consider the variance or standard deviation of a
standard Cauchy distribution, as these are defined with respect to the mean. But the second
moment about zero can be considered. It turns out to be infinite:

[edit] Estimation of parameters


Since the mean and variance of the Cauchy distribution are not defined, attempts to estimate
these parameters will not be successful. For example, if N samples are taken from a Cauchy
distribution, one may calculate the sample mean as:

Although the sample values xi will be concentrated about the central value x0, the sample mean
will become increasingly variable as more samples are taken, due to the increased likelihood of
encountering sample points with a large absolute value. In fact, the distribution of the sample
mean will be equal to the distribution of the samples themselves. Similarly, calculating the
sample variance will result in values that grow larger as more samples are taken.
A more robust means of estimating the central value x0 and the scaling parameter γ are needed.
For example, a simple method is to take the median value of the sample as an estimator of x0 and
half the sample interquartile range as an estimator of γ. Other, more precise and robust methods
have been developed [2]
[edit] Related distributions
• The ratio of two independent standard normal random variables is a standard Cauchy
variable, a Cauchy(0,1). Thus the Cauchy distribution is a ratio distribution.
• The standard Cauchy(0,1) distribution arises as a special case of Student's t distribution
with one degree of freedom.
• Relation to stable distribution: if X ~ Stable(1,0,γ,μ), then X ~Cauchy(μ, γ).

[edit] Relativistic Breit–Wigner distribution


Main article: Relativistic Breit–Wigner distribution
In nuclear and particle physics, the energy profile of a resonance is described by the relativistic
Breit–Wigner distribution, while the Cauchy distribution is the (non-relativistic) Breit–Wigner
distribution.

Lévy distribution
From Wikipedia, the free encyclopedia
Jump to: navigation, search
For the more general family of Lévy alpha-stable distributions, of which this distribution is a
special case, see stable distribution.

Lévy (unshifted)

Probability density function

Cumulative distribution function


parameters:

support:

pdf:

cdf:

mean: infinite

median:

mode:

variance: infinite

skewness: undefined

kurtosis: undefined

entropy:

γ is Euler gamma
mgf: undefined

cf:

In probability theory and statistics, the Lévy distribution, named after Paul Pierre Lévy, is a
continuous probability distribution for a non-negative random variable. In spectroscopy this
distribution, with frequency as the dependent variable, is known as a van der Waals profile.[note
1]

It is one of the few distributions that are stable and that have probability density functions that
are analytically expressible, the others being the normal distribution and the Cauchy distribution.
All three are special cases of the stable distributions, which does not generally have an
analytically expressible probability density.
[edit] Definition
The probability density function of the Lévy distribution over the domain is

where μ is the location parameter and c is the scale parameter. The cumulative distribution
function is

where erfc(z) is the complementary error function. The shift parameter μ has the effect of
shifting the curve to the right by an amount μ, and changing the support to the interval [μ, ).
Like all stable distributions, the Levy distribution has a standard form f(x;0,1) which has the
following property:

where y is defined as

The characteristic function of the Lévy distribution is given by

Note that the characteristic function can also be written in the same form used for the stable
distribution with α = 1 / 2 and β = 1:

Assuming μ = 0, the nth moment of the unshifted Lévy distribution is formally defined by:

which diverges for all n > 0 so that the moments of the Lévy distribution do not exist. The
moment generating function is then formally defined by:
which diverges for t> 0 and is therefore not defined in an interval around zero, so that the
moment generating function is not defined per se. Like all stable distributions except the normal
distribution, the wing of the probability density function exhibits heavy tail behavior falling off
according to a power law:

This is illustrated in the diagram below, in which the probability density functions for various
values of c and μ = 0 are plotted on a log-log scale.

Probability density function for the Lévy distribution

[edit] Related distributions


• Relation to stable distribution: If then

• Relation to Scale-inverse-chi-square distribution: If then


• Relation to inverse gamma distribution: If then

• Relation to Normal distribution: If then

• Relation to Folded normal distribution: If then

[edit] Applications
• The Lévy distribution is of interest to the financial modeling community due to its
empirical similarity to the returns of securities.
• It is claimed that fruit flies follow a form of the distribution to find food (Lévy flight).[1]
• The frequency of geomagnetic reversals appears to follow a Lévy distribution
• The time of hitting a single point (different from the starting point 0) by the Brownian
motion has the Lévy distribution.
• The length of the path followed by a photon in a turbid medium follows the Lévy
distribution. [2]
• The Lévy distribution has been used post 1987 crash by the Options Clearing Corporation
for setting margin requirements because its parameters are more robust to extreme events
than those of a normal distribution, and thus extreme events do not suddenly increase
margin requirements which may worsen a crisis.[3]
• The statistics of solar flares are described by a non-Gaussian distribution. The solar flare
statistics were shown to be describable by a Lévy distribution and it was assumed that
intermittent solar flares perturb the intrinsic fluctuations in Earth’s average temperature.
The end result of this perturbation is that the statistics of the temperature anomalies
inherit the statistical structure that was evident in the intermittency of the solar flare data.
[4]

Probability distribution
From Wikipedia, the free encyclopedia
Jump to: navigation, search

This article relies largely or entirely upon a single source. Please help
improve this article by introducing appropriate citations of additional
sources. (November 2008)
In probability theory and statistics, a probability distribution identifies either the probability of
each value of an unidentified random variable (when the variable is discrete), or the probability
of the value falling within a particular interval (when the variable is continuous).[1] The
probability distribution describes the range of possible values that a random variable can attain
and the probability that the value of the random variable is within any (measurable) subset of that
range.

The Normal distribution, often called the "bell curve".

When the random variable takes values in the set of real numbers, the probability distribution is
completely described by the cumulative distribution function, whose value at each real x is the
probability that the random variable is smaller than or equal to x.
The concept of the probability distribution and the random variables which they describe
underlies the mathematical discipline of probability theory, and the science of statistics. There is
spread or variability in almost any value that can be measured in a population (e.g. height of
people, durability of a metal, etc.); almost all measurements are made with some intrinsic error;
in physics many processes are described probabilistically, from the kinetic properties of gases to
the quantum mechanical description of fundamental particles. For these and many other reasons,
simple numbers are often inadequate for describing a quantity, while probability distributions are
often more appropriate.
There are various probability distributions that show up in various different applications. One of
the more important ones is the normal distribution, which is also known as the Gaussian
distribution or the bell curve and approximates many different naturally occurring distributions.
The toss of a fair coin yields another familiar distribution, where the possible values are heads or
tails, each with probability 1/2.
[edit]Formal definition
In the measure-theoretic formalization of probability theory, a random variable is defined as a
measurable functionX from a probability space to its observation space .A
probability distribution is the pushforward measureX*P = PX−1 on .
In other words, a probability distribution is a probability measure over the observation space
instead of the underlying probability space.
[edit]Probability distributions of real-valued random variables
Because a probability distribution Pr on the real line is determined by the probability of a real-
valued random variable X being in a half-open interval (-∞, x], the probability distribution is
completely characterized by its cumulative distribution function:

[edit]Discrete probability distribution


Main article: Discrete probability distribution

A probability distribution is called discrete if its cumulative distribution function only increases
in jumps. More precisely, a probability distribution is discrete if there is a finite or countable set
whose probability is 1.
For many familiar discrete distributions, the set of possible values is topologically discrete in the
sense that all its points are isolated points. But, there are discrete distributions for which this
countable set is dense on the real line.
Discrete distributions are characterized by a probability mass function, p such that

[edit]Continuous probability distribution


Main article: Continuous probability distribution

By one convention, a probability distribution is called continuous if its cumulative distribution

function is continuous and, therefore, the probability measure of

singletons for all .


Another convention reserves the term continuous probability distribution for absolutely
continuous distributions. These distributions can be characterized by a probability density

function: a non-negative Lebesgue integrable function defined on the real numbers such that

Discrete distributions and some continuous distributions (like the Cantor distribution) do not
admit such a density.
[edit]Terminology
The support of a distribution is the smallest closed interval/set whose complement has
probability zero. It may be understood as the points or elements that are actual members of the
distribution.
A discrete random variable is a random variable whose probability distribution is discrete.
Similarly, a continuous random variable is a random variable whose probability distribution is
continuous.
[edit]Simulated sampling
Main article: Inverse transform sampling

If one is programming and one wishes to sample from a probability distribution (either discrete
or continuous), the following algorithm lets one do so. This algorithm assumes that one has
access to the inverse of the cumulative distribution (easy to calculate with a discrete distribution,
can be approximated for continuous distributions) and a computational primitive called
"random()" which returns an arbitrary-precision floating-point-value in the range of [0,1).
define function sampleFrom(cdfInverse (type="function")):
// input:
// cdfInverse(x) - the inverse of the CDF of the probability distribution
// example: if distribution is [[Gaussian]], one can use a [[Taylor
approximation]] of the inverse of [[erf]](x)
// example: if distribution is discrete, see explanation below
pseudocode
// output:
// type="real number" - a value sampled from the probability distribution
represented by cdfInverse

r = random()

while(r == 0): (make sure r is not equal to 0; discontinuity possible)


r = random()

return cdfInverse(r)
For discrete distributions, the function cdfInverse (inverse of cumulative distribution function)
can be calculated from samples as follows: for each element in the sample range (discrete values
along the x-axis), calculating the total samples before it. Normalize this new discrete distribution.
This new discrete distribution is the CDF, and can be turned into an object which acts like a
function: calling cdfInverse(query) returns the smallest x-value such that the CDF is greater than
or equal to the query.
define function dataToCdfInverse(discreteDistribution (type="dictionary"))
// input:
// discreteDistribution - a mapping from possible values to
frequencies/probabilities
// example: {0 -> 1-p, 1 -> p} would be a [[Bernoulli distribution]]
with chance=p
// example: setting p=0.5 in the above example, this is a [[fair coin]]
where P(X=1)->"heads" and P(X=0)->"tails"
// output:
// type="function" - a function that represents (CDF^-1)(x)

define function cdfInverse(x):


integral = 0
go through mapping (key->value) in sorted order, adding value to
integral...
stop when integral > x (or integral >= x, doesn't matter)
return last key we added
return cdfInverse
Note that often, mathematics environments and computer algebra systems will have some way to
represent probability distributions and sample from them. This functionality might even have
been developed in third-party libraries. Such packages greatly facilitate such sampling, most
likely have optimizations for common distributions, and are likely to be more elegant than the
above bare-bones solution.
[edit]Some properties
• The probability density function of the sum of two independent random
variables is the convolution of each of their density functions.
• The probability density function of the difference of two independent random
variables is the cross-correlation of their density functions.
• Probability distributions are not a vector space – they are not closed under
linear combinations, as these do not preserve non-negativity or total integral
1 – but they are closed under convex combination, thus forming a convex
subset of the space of functions (or measures).

Self-similarity
From Wikipedia, the free encyclopedia
Jump to: navigation, search

A Koch curve has an infinitely repeating self-similarity when it is magnified.


In mathematics, a self-similar object is exactly or approximately similar to a part of itself (i.e.
the whole has the same shape as one or more of the parts). Many objects in the real world, such
as coastlines, are statistically self-similar: parts of them show the same statistical properties at
many scales.[1] Self-similarity is a typical property of fractals.
Scale invariance is an exact form of self-similarity where at any magnification there is a smaller
piece of the object that is similar to the whole. For instance, a side of the Koch snowflake is both
symmetrical and scale-invariant; it can be continually magnified 3x without changing shape.
[edit] Definition
A compacttopological spaceX is self-similar if there exists a finite setS indexing a set of non-

surjectivehomeomorphisms for which


If , we call X self-similar if it is the only non-emptysubset of Y such that the equation

above holds for . We call

a self-similar structure. The homeomorphisms may be iterated, resulting in an iterated function


system. The composition of functions creates the algebraic structure of a monoid. When the set S
has only two elements, the monoid is known as the dyadic monoid. The dyadic monoid can be
visualized as an infinite binary tree; more generally, if the set S has p elements, then the monoid
may be represented as a p-adic tree.
The automorphisms of the dyadic monoid is the modular group; the automorphisms can be
pictured as hyperbolic rotations of the binary tree.
[edit] Examples

Self-similarity in the Mandelbrot set shown by zooming in on the Feigenbaum Point at (-


1.401155189...,0)
An image of a fern which exhibits affine self-similarity
The Mandelbrot set is also self-similar around Misiurewicz points.
Self-similarity has important consequences for the design of computer networks, as typical
network traffic has self-similar properties. For example, in teletraffic engineering, packet
switched data traffic patterns seem to be statistically self-similar[2]. This property means that
simple models using a Poisson distribution are inaccurate, and networks designed without taking
self-similarity into account are likely to function in unexpected ways.
Similarly, stock market movements are described as displaying self-affinity, i.e. they appear self-
similar when transformed via an appropriate affine transformation for the level of detail being
shown.[3]
Some very natural self-similar objects are plants. The image on the right is a self similar, albeit
mathematically generated. True ferns, however, will be extremely close to true self similarity.
Other plants, such as Romanesco broccoli, are extremely self-similar.

Central tendency
From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, the term central tendency relates to the way in which quantitative data tend to
cluster around some value.[1] A measure of central tendency is any of a number of ways of
specifying this "central value". In practical statistical analyses, the terms are often used before
one has chosen even a preliminary form of analysis: thus an initial objective might be to "choose
an appropriate measure of central tendency".
In the simplest cases, the measure of central tendency is an average of a set of measurements, the
word average being variously construed as mean, median, or other measure of location,
depending on the context. However, the term is applied to multidimensional data as well as to
univariate data and in situations where a transformation of the data values for some or all
dimensions would usually be considered necessary: in the latter cases, the notion of a "central
location" is retained in converting an "average" computed for the transformed data back to the
original units. In addition, there are several different kinds of calculations for central tendency,
where the kind of calculation depends on the type of data (level of measurement).
Both "central tendency" and "measure of central tendency" apply to either statistical populations
or to samples from a population.
[edit]Basic measures of central tendency
The following may be applied to individual dimensions of multidimensional data, after
transformation, although some of these involve their own implicit transformation of the data.
• Arithmetic mean - the sum of all measurements divided by the number of
observations in the data set
• Median - the middle value that separates the higher half from the lower half
of the data set
• Mode - the most frequent value in the data set
• Geometric mean - the nth root of the product of the data values
• Harmonic mean - the reciprocal of the arithmetic mean of the reciprocals of
the data values
• Weighted mean - an arithmetic mean that incorporates weighting to certain
data elements
• Truncated mean - the arithmetic mean of data values after a certain number
or proportion of the highest and lowers data values have been discarded.
• Midrange - the arithmetic mean of the maximum and minimum values of a
data set.

Geometric mean
From Wikipedia, the free encyclopedia
Jump to: navigation, search

The geometric mean, in mathematics, is a type of mean or average, which indicates the central
tendency or typical value of a set of numbers. It is similar to the arithmetic mean, which is what
most people think of with the word "average", except that instead of adding the set of numbers
and then dividing the sum by the count of numbers in the set, n, the numbers are multiplied and
then the nth root of the resulting product is taken.
For instance, the geometric mean of two numbers, say 2 and 8, is just the square root of their
product which equals 4; that is 2√2 × 8 = 4. As another example, the geometric mean of three
numbers 1, ½, ¼ is the cube root of their product (1/8), which is 1/2; that is 3√1 × ½ × ¼ =
½.
The geometric mean can also be understood in terms of geometry. The geometric mean of two
numbers, a and b, is the length of one side of a square whose area is equal to the area of a
rectangle with sides of lengths a and b. Similarly, the geometric mean of three numbers, a, b, and
c, is the length of one side of a cube whose volume is the same as that of a cuboid with sides
whose lengths are equal to the three given numbers.
The geometric mean only applies to positive numbers.[1] It is also often used for a set of numbers
whose values are meant to be multiplied together or are exponential in nature, such as data on the
growth of the human population or interest rates of a financial investment. The geometric mean
is also one of the three classic Pythagorean means, together with the aforementioned arithmetic
mean and the harmonic mean.
[edit]Calculation
The geometric mean of a data set [a1, a2, ..., an] is given by

The geometric mean of a data set is less than or equal to the data set's arithmetic mean (the two
means are equal if and only if all members of the data set are equal). This allows the definition of
the arithmetic-geometric mean, a mixture of the two which always lies in between.
The geometric mean is also the arithmetic-harmonic mean in the sense that if two sequences
(an) and (hn) are defined:

and

then an and hn will converge to the geometric mean of x and y.


This can be seen easily from the fact that the sequences do converge to a common limit (which
can be shown by Bolzano-Weierstrass theorem) and the fact that geometric mean is preserved:

Replacing arithmetic and harmonic mean by a pair of generalized means of opposite, finite
exponents yields the same result.
[edit]Relationship with arithmetic mean of logarithms
By using logarithmic identities to transform the formula, we can express the multiplications as a
sum and the power as a multiplication.

This is sometimes called the log-average. It is simply computing the arithmetic mean of the
logarithm transformed values of ai (i.e., the arithmetic mean on the log scale) and then using the
exponentiation to return the computation to the original scale, i.e., it is the generalised f-mean
with f(x) = log x.
In simple words
for observations more than 2...
1) for normal frequency

2) for frequency distribution


[edit]Applications
[edit]Exponential growth
The geometric mean may be more appropriate than the arithmetic mean for describing
exponential growth.
Suppose an orange tree yields 100 oranges one year and then 180, 210 and 300 the following
years, so the growth is 80%, 16.7% and 42.9% for each year respectively.[Note: (210-
180)/180=0.167 and 0.167*100=16.7%]. Using the arithmetic mean calculates an average
growth of 46.5% (80% + 16.7% + 42.9% divided by 3). However, if we start with 100 oranges
and let it grow 46.5% each year, the result is 314 oranges, not 300.
Instead, we can use the geometric mean. Growing with 80% corresponds to multiplying with
1.80, so we take the geometric mean of 1.80, 1.167 and 1.429, i.e.

, thus the "average" growth per year is 44.3%. If we start


with 100 oranges and let the number grow with 44.3% each year, the result is 300 oranges.
[edit]Spectral flatness
In signal processing, spectral flatness is described by the ratio of the geometric mean of the
power spectrum to its arithmetic mean.
1. ^ The geometric mean only applies to positive numbers in order to avoid
taking the root of a negative product, which would result in imaginary
numbers, and also to satisfy certain properties about means, which is
explained later in the article.

Harmonic mean
From Wikipedia, the free encyclopedia
Jump to: navigation, search

In mathematics, the harmonic mean (formerly sometimes called the subcontrary mean) is one
of several kinds of average. Typically, it is appropriate for situations when the average of rates is
desired.
The harmonic mean H of the positive real numbersx1, x2, ..., xn is defined to be

Equivalently, the harmonic mean is the reciprocal of the arithmetic mean of the reciprocals.
[edit]Relationship with other means
A geometric construction of the three Pythagorean means (of two numbers only).
Harmonic mean denoted by H in purple color.

The harmonic mean is one of the three Pythagorean means. For all data sets containing at least
one pair of nonequal values, the harmonic mean is always the least of the three means, while the
arithmetic mean is always the greatest of the three and the geometric mean is always in between.
(If all values in a nonempty dataset are equal, the three means are always equal to one another;
e.g. the harmonic, geometric, and arithmetic means of {2, 2, 2} are all 2.)
It is the special case M−1 of the power mean.
Since the harmonic mean of a list of numbers tends strongly toward the least elements of the list,
it tends (compared to the arithmetic mean) to mitigate the impact of large outliers and aggravate
the impact of small ones.
The arithmetic mean is often mistakenly used in places calling for the harmonic mean.[1] In the
speed example below for instance the arithmetic mean 50 is incorrect, and too big.
[edit]Weighted harmonic mean
If a set of weightsw1, ..., wn is associated to the dataset x1, ..., xn, the weighted harmonic mean
is defined by

The harmonic mean is the special case where all of the weights are equal to 1.
[edit]Examples
[edit]In physics
In certain situations, especially many situations involving rates and ratios, the harmonic mean
provides the truest average. For instance, if a vehicle travels a certain distance at a speed x (e.g.
60 kilometres per hour) and then the same distance again at a speed y (e.g. 40 kilometres per
hour), then its average speed is the harmonic mean of x and y (48 kilometres per hour), and its
total travel time is the same as if it had traveled the whole distance at that average speed.
However, if the vehicle travels for a certain amount of time at a speed x and then the same
amount of time at a speed y, then its average speed is the arithmetic mean of x and y, which in
the above example is 50 kilometres per hour. The same principle applies to more than two
segments: given a series of sub-trips at different speeds, if each sub-trip covers the same
distance, then the average speed is the harmonic mean of all the sub-trip speeds, and if each sub-
trip takes the same amount of time, then the average speed is the arithmetic mean of all the sub-
trip speeds. (If neither is the case, then a weighted harmonic mean or weighted arithmetic mean
is needed.)
Similarly, if one connects two electrical resistors in parallel, one having resistance x (e.g. 60Ω)
and one having resistance y (e.g. 40Ω), then the effect is the same as if one had used two
resistors with the same resistance, both equal to the harmonic mean of x and y (48Ω): the
equivalent resistance in either case is 24Ω (one-half of the harmonic mean). However, if one
connects the resistors in series, then the average resistance is the arithmetic mean of x and y (with
total resistance equal to the sum of x and y). And, as with previous example, the same principle
applies when more than two resistors are connected, provided that all are in parallel or all are in
series.
[edit]In other sciences
In Information retrieval and some other fields, the harmonic mean of the precision and the recall
is often used as an aggregated performance score: the F-score (or F-measure).
An interesting consequence arises from basic algebra in problems of working together. As an
example, if a gas-powered pump can drain a pool in 4 hours and a battery-powered pump can
drain the same pool in 6 hours, then it will take both pumps (6 · 4)/(6 + 4), which is equal to 2.4
hours, to drain the pool together. Interestingly, this is one-half of the harmonic mean of 6 and 4.
In hydrology the harmonic mean is used to average hydraulic conductivity values for flow that is
perpendicular to layers (e.g. geologic or soil). On the other hand, for flow parallel to layers the
arithmetic mean is used.
In sabermetrics, the Power-speed number of a player is the harmonic mean of his home run and
stolen base totals.
When considering fuel economy in automobiles two measures are commonly used - miles per
gallon (mpg), and litres per 100 km. As the dimensions of these quantities are the inverse of each
other (one is distance per volume, the other volume per distance) when taking the mean value of
the fuel-economy of a range of cars one measure will produce the harmonic mean of the other -
i.e. converting the mean value of fuel economy expressed in litres per 100 km to miles per gallon
will produce the harmonic mean of the fuel economy expressed in miles-per-gallon.
[edit]In finance
The harmonic mean is the preferable method for averaging multiples, such as the price/earning
ratio, in which price is in the numerator. If these ratios are averaged using an arithmetic mean (a
common error), high data points are given greater weights than low data points. The harmonic
mean, on the other hand, gives equal weight to each data point. See "Fairness Opinions:
Common Errors and Omissions" in The Handbook of Business Valuation and Intellectual
Property Analysis (McGraw Hill, 2004).
[edit]Harmonic mean of two numbers
For the special case of just two numbers x1 and x2, the harmonic mean can be written
In this special case, the harmonic mean is related to the arithmetic meanA = (x1 + x2) / 2 and
the geometric mean by

So , which means the geometric mean, for two numbers, is the geometric mean of
the arithmetic mean and the harmonic mean.

Weighted mean
From Wikipedia, the free encyclopedia
Jump to: navigation, search

The weighted mean is similar to an arithmetic mean (the most common type of average), where
instead of each of the data points contributing equally to the final average, some data points
contribute more than others. The notion of weighted mean plays a role in descriptive statistics
and also occurs in a more general form in several other areas of mathematics.
If all the weights are equal, then the weighted mean is the same as the arithmetic mean. While
weighted means generally behave in a similar fashion to arithmetic means, they do have a few
counter-intuitive properties, as captured for instance in Simpson's paradox.
The term weighted average usually refers to a weighted arithmetic mean, but weighted versions
of other means can also be calculated, such as the weighted geometric mean and the weighted
harmonic mean.
[edit]Example
Given two school classes, one with 20 students, and one with 30 students, the grades in each
class on a test were:
Morning class = 62, 67, 71, 74, 76, 77, 78, 79, 79, 80, 80, 81, 81, 82, 83, 84,
86, 89, 93, 98

Afternoon class = 81, 82, 83, 84, 85, 86, 87, 87, 88, 88, 89, 89, 89, 90, 90,
90, 90, 91, 91, 91, 92, 92, 93, 93, 94, 95, 96, 97, 98, 99

The straight average for the morning class is 80 and the straight average of the afternoon class is
90. The straight average of 80 and 90 is 85, the mean of the two class means. However, this does
not account for the difference in number of students in each class, and the value of 85 does not
reflect the average student grade (independent of class). The average student grade can be
obtained by either averaging all the numbers without regard to classes, or weighting the class
means by the number of students in each class:
Or, using a weighted mean of the class means:

The weighted mean makes it possible to find the average student grade also in the case where
only the class means and the number of students in each class are available.
[edit]Mathematical definition
Formally, the weighted mean of a non-empty set of data

with non-negative weights

is the quantity

which means:

Therefore data elements with a high weight contribute more to the weighted mean than do
elements with a low weight. The weights cannot be negative. Some may be zero, but not all of
them (since division by zero is not allowed).
The formulas are simplified when the weights are normalized such that they sum up to 1, i.e.

. For such normalized weights the weighted mean is simply .

The common mean is a special case of the weighted mean where all data have equal
weights, wi = w.

[edit]Length-weighted mean
This is used for weighting a response variable based upon its dependency on x, a distance
variable.

[edit]Convex combination
Since only the relative weights are relevant, any weighted mean can be expressed using
coefficients that sum to one. Such a linear combination is called a convex combination.
Using the previous example, we would get the following:

This simplifies to:

[edit]Statistical properties
The weighted sample mean with normalized weights is itself a random variable. Its expected
value and standard deviation are related to the expected values and standard deviations of the
observations as follows.

If the observations have expected values , then the weighted sample mean has

expectation . Particularly, if the expectations of all observations are equal,

, then the expectation of the weighted sample mean will be the same, .
For uncorrelated observations with standard deviations σi, the weighted sample mean has

standard deviation . Consequently, when the standard deviations of all


observations are equal, σi = d, the weighted sample mean will have standard deviation

. Here V2 is the quantity , such that . It attains


its minimum value for equal weights, and its maximum when all weights except one are zero. In

the former case we have , which is related to the central limit theorem.
[edit]Dealing with variance
For the weighted mean of a list of data for which each element comes from a different

probability distribution with known variance , one possible choice for the weights is given by:

The weighted mean in this case is:

and the variance of the weighted mean is:

which reduces to , when all


The significance of this choice is that this weighted mean is the maximum likelihood estimator
of the mean of the probability distributions under the assumption that they are independent and
normally distributed with the same mean.
[edit]Correcting for over/under dispersion
Weighted means are typically used to find the weighted mean of experimental data, rather than
theoretically generated data. In this case, there will be some error in the variance of each data
point. Typically experimental errors may be underestimated due to the experimenter not taking
into account all sources of error in calculating the variance of each data point. In this event, the
variance in the weighted mean must be corrected to account for the fact that χ2 is too large. The
correction that must be made is
where is χ2 divided by the number of degrees of freedom, in this case n − 1. This gives the
variance in the weighted mean as:

[edit]Weighted sample variance


Typically when a mean is calculated it is important to know the variance and standard deviation
of that mean. When a weighted mean μ * with normalized weights is used, the variance of the
weighted sample is different from the variance of the unweighted sample. The biased weighted
sample variance is defined similarly to the normal biased sample variance:

For small samples, it is customary to use an unbiased estimator for the population variance. In
normal unweighted samples, the N in the denominator (corresponding to the sample size) is
changed to N − 1. While this is simple in unweighted samples, it is not straightforward when the
sample is weighted. The unbiased estimator of a weighted population variance is given by [1]:

where as introduced previously. The degrees of freedom of the weighted, unbiased


sample variance vary accordingly from N − 1 down to 0.
The standard deviation is simply the square root of the variance above.
[edit]Accounting for correlations

In the general case, suppose that , is the covariance matrix relating the
quantities xi, is the common mean to be estimated, and is the design matrix [1, ..., 1] (of
length n). The Gauss–Markov theorem states that the estimate of the mean having minimum
variance is given by:
and

[edit]Decreasing strength of interactions


Consider the time series of an independent variable x and a dependent variable y, with n
observations sampled at discrete times ti. In many common situations, the value of y at time ti
depends not only on xi but also on its past values. Commonly, the strength of this dependence
decreases as the separation of observations in time increases. To model this situation, one may
replace the independent variable by its sliding mean z for a window size m.

Range Weighted Mean


Interpretation

Weighted Mean Range


Equivalence (1-5)

3.34 -
Strong
5.00

1.67 -
Satisfactory
3.33

0.00 -
Weak
1.66

[edit]Exponentially decreasing weights


In the scenario described in the previous section, most frequently the decrease in interaction
strength obeys a negative exponential law. If the observations are sampled at equidistant times,
then exponential decrease is equivalent to decrease by a constant fraction 0 < Δ < 1 at each
time step. Setting w = 1 − Δ we can define m normalized weights by , where V1

is the sum of the unnormalized weights. In this case V1 is simply ,


approaching V1 = 1 / (1 − w) for large values of m.
The damping constant w must correspond to the actual decrease of interaction strength. If this
cannot be determined from theoretical considerations, then the following properties of
exponentially decreasing weights are useful in making a suitable choice: at step (1 − w) − 1, the
weight approximately equals e− 1(1 − w) = 0.39(1 − w), the tail area the value e− 1, the head

area 1 − e− 1 = 0.61. The tail area at step n is . Where primarily the closest n
observations matter and the effect of the remaining observations can be ignored safely, then
choose w such that the tail area is sufficiently small.

[edit]Weighted averages of functions


The concept of weighted average can be extended to functions.[2]

Volatility smile
From Wikipedia, the free encyclopedia
Jump to: navigation, search

In finance, the volatility smile is a long-observed pattern in which at-the-moneyoptions tend to


have lower implied volatilities than in- or out-of-the-money options. The pattern displays
different characteristics for different markets and results from the probability of extreme moves.
Equity options traded in American markets did not show a volatility smile before the Crash of
1987 but began showing one afterwards.[1]
Modelling the volatility smile is an active area of research in quantitative finance. Typically, a
quantitative analyst will calculate the implied volatility from liquid vanilla options and use
models of the smile to calculate the price of more exotic options.
A closely related concept is that of term structure of volatility, which refers to how implied
volatility differs for related options with different maturities. An implied volatility surface is a
3-D plot that combines volatility smile and term structure of volatility into a consolidated view
of all options for an underlier.
[edit]Volatility smiles and implied volatility
In the Black-Scholes model, the theoretical value of a vanilla option is a monotonic increasing
function of the Black-Scholes volatility. Furthermore, except in the case of American options
with dividends whose early exercise could be optimal, the price is a strictly increasing function
of volatility. This means it is usually possible to compute a unique implied volatility from a
given market price for an option. This implied volatility is best regarded as a rescaling of option
prices which makes comparisons between different strikes, expirations, and underlyings easier
and more intuitive.
When implied volatility is plotted against strike price, the resulting graph is typically downward
sloping for equity markets, or valley-shaped for currency markets. For markets where the graph
is downward sloping, such as for equity options, the term "volatility skew" is often used. For
other markets, such as FX options or equity index options, where the typical graph turns up at
either end, the more familiar term "volatility smile" is used. For example, the implied volatility
for upside (i.e. high strike) equity options is typically lower than for at-the-money equity options.
However, the implied volatilities of options on foreign exchange contracts tend to rise in both the
downside and upside directions. In equity markets, a small tilted smile is often observed near the
money as a kink in the general downward sloping implicit volatility graph. Sometimes the term
"smirk" is used to describe a skewed smile.
Market practitioners use the term implied-volatility to indicate the volatility parameter for ATM
(at-the-money) option. Adjustments to this value is undertaken by incorporating the values of
Risk Reversal and Flys (Skews) to determine the actual volatility measure that may be used for
options with a delta which is not 50.
Callx = ATMx + 0.5 RRx + Flyx
Putx = ATMx - 0.5 RRx + Flyx
Risk reversals are generally quoted X% delta risk reversal and essentially is Long X% delta call,
and short X% delta put.
Butterfly, on the other hand, is Y% delta fly which mean Long Y% delta call, Long Y% delta
put, and short ATM.

[edit]Implied volatility and historical volatility


It is helpful to note that implied volatility is related to historical volatility, however the two are
distinct. Historical volatility is a direct measure of the movement of the underlier’s price
(realized volatility) over recent history (e.g. a trailing 21-day period). Implied volatility, in
contrast, is set by the market price of the derivative contract itself, and not the underlier.
Therefore, different derivative contracts on the same underlier have different implied volatilities.
For instance, the IBM call option, struck at $100 and expiring in 6 months, may have an implied
volatility of 18%, while the put option struck at $105 and expiring in 1 month may have an
implied volatility of 21%. At the same time, the historical volatility for IBM for the previous 21
day period might be 17% (all volatilities are expressed in annualized percentage moves).
[edit]Term structure of volatility
For options of different maturities, we also see characteristic differences in implied volatility.
However, in this case, the dominant effect is related to the market's implied impact of upcoming
events. For instance, it is well-observed that realized volatility for stock prices rises significantly
on the day that a company reports its earnings. Correspondingly, we see that implied volatility
for options will rise during the period prior to the earnings announcement, and then fall again as
soon as the stock price absorbs the new information. Options that mature earlier exhibit a larger
swing in implied volatility than options with longer maturities.
Other option markets show other behavior. For instance, options on commodity futures typically
show increased implied volatility just prior to the announcement of harvest forecasts. Options on
US Treasury Bill futures show increased implied volatility just prior to meetings of the Federal
Reserve Board (when changes in short-term interest rates are announced).
The market incorporates many other types of events into the term structure of volatility. For
instance, the impact of upcoming results of a drug trial can cause implied volatility swings for
pharmaceutical stocks. The anticipated resolution date of patent litigation can impact technology
stocks, etc.
Volatility term structures list the relationship between implied volatilities and time to expiration.
The term structures provide another method for traders to gauge cheap or expensive options.
[edit]Implied volatility surface
It is often useful to plot implied volatility as a function of both strike price and time to maturity.
The result is a 3-D surface whereby the current market implied volatility (Z-axis) for all options
on the underlier is plotted against strike price and time to maturity (X & Y-axes).
The implied volatility surface simultaneously shows both volatility smile and term structure of
volatility. Option traders use an implied volatility plot to quickly determine the shape of the
implied volatility surface, and to identify any areas where the slope of the plot (and therefore
relative implied volatilities) seems out of line.
The graph shows an implied volatility surface for all the call options on a particular underlying
stock price. The Z-axis represents implied volatility in percent, and X and Y axes represent the
option delta, and the days to maturity. Note that to maintain put-call parity, a 20 delta put must
have the same implied volatility as an 80 delta call. For this surface, we can see that the
underlying symbol has both volatility skew (a tilt along the delta axis), as well as a volatility
term structure indicating an anticipated event in the near future.
[edit]Evolution: Sticky
An implied volatility surface is static: it describes the implied volatilities at a given moment in
time. How the surface changes over time (especially as spot changes) is called the evolution of
the implied volatility surface.
Common heuristics include:
• "sticky strike" (or "sticky-by-strike", or "stick-to-strike"): if spot changes, the
implied volatility of an option with a given absolute strike does not change.
• "sticky moneyness" (aka, "sticky delta"; see moneyness for why these are
equivalent terms): if spot changes, the implied volatility of an option with a
given moneyness does not change.
So if spot moves from $100 to $120, sticky strike would predict that the implied volatility of a
$120 strike option would be whatever it was before the move (though it has moved from being
OTM to ATM), while sticky delta would predict that the implied volatility of the $120 strike
option would be whatever the $100 strike option's implied volatility was before the move (as
these are both ATM at the time).
[edit]Modeling volatility
Methods of modelling the volatility smile include stochastic volatility models and local volatility
models.
[edit]
Risk reversal
[edit]Description
Risk reversal refers to the manner in which similar out-of-the-moneycall and put options,
usually foreign exchange options, are quoted by Finance dealers. Instead of quoting these
options' prices, dealers quote their volatility. The greater the demand for an options contract, the
greater its volatility and its price. A positive risk reversal means the volatility of calls is greater
than the volatility of similar puts, which implies a skewed distribution of expected spot returns
composed of a relatively large number of small down moves and a relatively small number of
large upmoves.

Risk reversal as an investment strategy can be used to simulate being long in a stock, while
reducing the down side risk. Using a risk reversal is a high leverage technique.
From an article at QuantPrinciple: "A risk reversal is a position in which you simulate the
behavior of a long, therefor it is sometimes called a synthetic long. This is an investment strategy
that amounts to both buying and selling out-of-money options simultaneously. In this strategy,
the investor will first make a market hunch, if that hunch is bullish he will want to go long.
However, instead of going long on the stock, he will buy an out of the money call option, and
simultaneously sell an out of the money put option. Presumably he will use the money from the
sale of the put option to purchase the call option. Then as the stock goes up in price, the call
option will be worth more, and the put option will be worth less." [1]
[edit]
Put–call parity
From Wikipedia, the free encyclopedia

In financial mathematics, put-call parity defines a relationship between the price of a call option
and a put option—both with the identical strike price and expiry. To derive the put-call parity
relationship, the assumption is that the options are not exercised before expiration day, which
necessarily applies to European options. Put-call parity can be derived in a manner that is largely
model independent.
[edit]Derivation
An example using stock options follows, though this may be generalised to other options.
Consider a call option and a put option with the same strike K for expiry at the same date T on
some stock, which pays no dividend. Let S denote the (unknown) underlying value at expiration.
First consider a portfolio that consists of one put option and one share. This portfolio at time T
has value:

Now consider a portfolio that consists of one call option and K bonds that each pay 1 (with
certainty) at time T. This portfolio at T has value:
Notice that, whatever the final share price S is at time T, each portfolio is worth the same as the
other. This implies that these two portfolios must have the same value at any time t before T. To
prove this suppose that, at some time t, one portfolio were cheaper than the other. Then one
could purchase (go long) the cheaper portfolio and sell (go short) the more expensive. Our
overall portfolio would, for any value of the share price, have zero value at T. We would be left
with the profit we made at time t. This is known as a risk-less profit and represents an arbitrage
opportunity.
Thus the following relationship exists between the value of the various instruments at a general
time t:

where
C(t) is the value of the call at time t,

P(t) is the value of the put,

S(t) is the value of the share,

K is the strike price, and

B(t,T) value of a bond that matures at time T. If a stock pays dividends,


they should be included in B(t,T), because option prices are typically not
adjusted for ordinary dividends.

If the bond interest rate, r, is assumed to be constant then

Using the above, and given no arbitrage opportunities, for any three prices of the call, put, bond
and stock one can compute the implied price of the fourth.
When valuing European options written on stocks with known dividends that will be paid out
during the life of the option, the formula becomes:

Where D(t) represents the present value of the dividends to be paid out before expiration of the
option.
[edit]History
Nelson, an option arbitrage trader in New York, published a book: "The ABC of Option
Arbitrage" in 1904 that describes the put-call parity in detail. His book was re-discovered by
Espen Gaarder Haug in the early 2000 and many references from Nelson's book are given in
Haug's book "Derivatives Models on Models".
Henry Deutsch describes the put-call parity in 1910 in his book "Arbitrage in Bullion, Coins,
Bills, Stocks, Shares and Options, 2nd Edition". London: Engham Wilson but in less detail than
Nelson (1904).
Mathematics professor Vinzenz Bronzin also derives the put-call parity in 1908 and uses it as
part of his arbitrage argument to develop a series of mathematical option models under a series
of different distributions. The work of professor Bronzin was just recently rediscovered by
professor Wolfgang Hafner and professor Heinz Zimmermann. The original work of Bronzin is a
book written in German and is now translated and published in English in an edited work by
Hafner and Zimmermann (Vinzenz Bronzin's option pricing models, Springer Verlag).
Michael Knoll, in The Ancient Roots of Modern Financial Innovation: The Early History of
Regulatory Arbitrage, describes the important role that put-call parity played in developing the
equity of redemption, the defining characteristic of a modern mortgage, in Medieval England
Russell Sage used put-call parity to create synthetic loans, which had higher interest rates than
the usury laws of the time would have normally allowed.
Its first description in the "modern" literature appears to be Hans Stoll's paper, The Relation
Between Put and Call Prices, from 1969.
[edit]Implications
Put-call parity implies:
• Equivalence of calls and puts: Parity implies that a call and a put can be used
interchangeably in any delta-neutral portfolio. If d is the call's delta, then
buying a call, and selling d shares of stock, is the same as buying a put and
buying 1 − d shares of stock. Equivalence of calls and puts is very
important when trading options.
• Parity of implied volatility: In the absence of dividends or other costs of carry
(such as when a stock is difficult to borrow or sell short), the implied volatility
of calls and puts must be identical.[1]

[edit]Other arbitrage relationships


Note that there are several other (theoretical) properties of option prices which may be derived
via arbitrage considerations. These properties define price limits, the relationship between price,
dividends and the risk free rate, the appropriateness of early exercise, and the relationship
between the prices of various types of options. See links below.
[edit]Put-call Parity and American Options
For American options, where you have the right to exercise before expiration, this affects the B(t,
T) term in the above equation. Put-call parity only holds for European options, or American
options if they are not exercised early.
c + PV(x) = p + s
• the left part of the equation is called "fiduciary call"
• the right side of the equation is called "protective put"

Volatility swap
In finance, a volatility swap is a forward contract on the future realised volatility of a given
underlying asset. Volatility swaps allow investors to trade the volatility of an asset directly, much
as they would trade a price index.
The underlying is usually a foreign exchange (FX) rate (very liquid market) but could be as well
a single name equity or index. However, the variance swap is preferred in the equity market due
to the fact it can be replicated with a linear combination of options and a dynamic position in
futures.
Unlike a stock option, whose volatility exposure is contaminated by its stock price dependence,
these swaps provide pure exposure to volatility alone. You can use these instruments to speculate
on future volatility levels, to trade the spread between realized and implied volatility, or to hedge
the volatility exposure of other positions or businesses.

Heston model
In finance, the Heston model, named after Steven Heston, is a mathematical model describing
the evolution of the volatility of an underlying asset [1]. It is a stochastic volatility model: such a
model assumes that the volatility of the asset is not constant, nor even deterministic, but follows
a random process.
[edit]Basic Heston model
The basic Heston model assumes that St, the price of the asset, is determined by a stochastic
process:

where νt, the instantaneous variance, is a CIR process:

and are Wiener processes (i.e., random walks) with correlation ρ.


The parameters in the above equations represent the following:
• μ is the rate of return of the asset.
• θ is the long vol, or long run average price volatility; as t tends to infinity,
the expected value of νt tends to θ.
• κ is the rate at which νt reverts to θ.
• ξ is the vol of vol, or volatility of the volatility; as the name suggests, this
determines the variance of νt.
[edit]Extensions
In order to take into account all the features from the volatility surface, the Heston model may be
a too rigid framework. It may be necessary to add degrees of freedom to the original model. A
first straightforward extension is to allow the parameters to be time-dependent. The model
dynamics now write:

where νt, the instantaneous variance, is a time-dependent CIR process:

and are Wiener processes (i.e., random walks) with correlation ρ. In order to retain
model tractability, one may impose parameters to be piecewise-constant.
Another approach is to add a second process of variance, independent of the first one.

A significant extension of Heston model to make both volatility and mean stochastic is given by
Lin Chen (1996). In Chen model the dynamics of the instantaneous interest rate are specified by

[edit]Risk-neutral measure
See Risk-neutral measure for the complete article

A fundamental concept in derivatives pricing is that of the Risk-neutral measure; this is


explained in further depth in the above article. For our purposes, it is sufficient to note the
following:
1. To price a derivative whose payoff is a function of one or more underlying
assets, we evaluate the expected value of its discounted payoff under a risk-
neutral measure.
2. A risk-neutral measure, also known as an equivalent martingale measure, is
one which is equivalent to the real-world measure, and which is arbitrage-
free: under such a measure, the discounted prices of each of the underlying
assets is a martingale.
3. In the Black-Scholes and Heston frameworks (where filtrations are generated
from a linearly independent set of Wiener processes alone), any equivalent
measure can be described in a very loose sense by adding a drift to each of
the Wiener processes.
4. By selecting certain values for the drifts described above, we may obtain an
equivalent measure which fulfills the arbitrage-free condition.
Consider a general situation where we have n underlying assets and a linearly independent set of
m Wiener processes. The set of equivalent measures is isomorphic to Rm, the space of possible
drifts. Let us consider the set of equivalent martingale measures to be isomorphic to a manifold
M embedded in Rm; initially, consider the situation where we have no assets and M is
isomorphic to Rm.
Now let us consider each of the underlying assets as providing a constraint on the set of
equivalent measures, as its expected discount process must be equal to a constant (namely, its
initial value). By adding one asset at a time, we may consider each additional constraint as
reducing the dimension of M by one dimension. Hence we can see that in the general situation
described above, the dimension of the set of equivalent martingale measures is m − n.
In the Black-Scholes model, we have one asset and one Wiener process. The dimension of the set
of equivalent martingale measures is zero; hence it can be shown that there is a single value for
the drift, and thus a single risk-neutral measure, under which the discounted asset e− ρtSt will be
a martingale.
In the Heston model, we still have one asset (volatility is not considered to be directly observable
or tradeable in the market) but we now have two Wiener processes - the first in the Stochastic
Differential Equation (SDE) for the asset and the second in the SDE for the stochastic volatility.
Here, the dimension of the set of equivalent martingale measures is one; there is no unique risk-
free measure.
This is of course problematic; while any of the risk-free measures may theoretically be used to
price a derivative, it is likely that each of them will give a different price. In theory, however,
only one of these risk-free measures would be compatible with the market prices of volatility-
dependent options (for example, European calls, or more explicitly, variance swaps) Hence we
could add a volatility-dependent asset; by doing so, we add an additional constraint, and thus
choose a single risk-free measure which is compatible with the market. This measure may be
used for pricing.
[edit]Implementation
A recent discussion of implementation of the Heston model is given in a paper by Kahl and
Jäckel [2].
Information about how to use the Fourier transform to value options is given in a paper by Carr
and Madan [3].
Extension of the Heston model with stochastic interest rates is given in [4].
Derivation of closed-form option prices for time-dependent Heston model is presented in [5].
Derivation of closed-form option prices for double Heston model are presented in [6] and [7]

Delta neutral
From Wikipedia, the free encyclopedia
(Redirected from Delta hedging)

Jump to: navigation, search

In finance, delta neutral describes a portfolio of related financial securities in which the
portfolio value remains unchanged due to small changes in the value of the underlying security.
Such a portfolio typically contains options and their corresponding underlying securities such
that positive and negative delta components offset, resulting in the portfolio's value being
relatively insensitive to changes in the value of the underlying security.
A related term, delta hedging is the process of setting or keeping the delta of a portfolio as close
to zero as possible. In practice, maintaining a zero delta is very complex because there are risks
associated with re-hedging on large movements in the underlying stock's price, and research
indicates portfolios tend to have lower cash flows if re-hedged too frequently.[1]
[edit]Nomenclature
δ The sensitivity of an option's value to a change in the underlying stock's price.
V0 The initial value of the option.
V The current Value of the option.
S0 The initial value of the underlying stock.
[edit]Mathematical interpretation
Main article: Greeks (finance)

Delta measures the sensitivity of the value of an option to changes in the price of the underlying
stock assuming all other variables remain unchanged. [2].

Mathematically, delta is represented as partial derivative of the option's fair value with
respect to the price of the underlying security.
Delta is clearly a function of S, however Delta is also a function of Strike Price and time to
expiry. [3]
Therefore, if a position is delta neutral (or, instantaneously delta-hedged) its instantaneous
change in value, for an infinitesimal change in the value of the underlying security, will be zero;
see Hedge (finance). Since delta measures the exposure of a derivative to changes in the value of
the underlying, a portfolio that is delta neutral is effectively hedged. That is, its overall value will
not change for small changes in the price of its underlying instrument.
[edit]Creating the position
Delta hedging - i.e. establishing the required hedge - may be accomplished by buying or selling
an amount of the underlier that corresponds to the delta of the portfolio. By adjusting the amount
bought or sold on new positions, the portfolio delta can be made to sum to zero, and the portfolio
is then delta neutral.
Options market makers, or others, may form a delta neutral portfolio using related options
instead of the underlying. The portfolio's delta (assuming the same underlier) is then the sum of
all the individual options' deltas. This method can also be used when the underlier is difficult to
trade, for instance when an underlying stock is hard to borrow and therefore cannot be sold short.
[edit]Theory
The existence of a delta neutral portfolio was shown as part of the original proof of the Black-
Scholes model, the first comprehensive model to produce correct prices for some classes of
options.
From the Taylor expansion of the value of an option, we get the change in the value of an option,

, for a change in the value of the underlier :

where (delta) and (gamma). (see The Greeks)


For any small change in the underlier, we can ignore the second-order term and use the quantity
to determine how much of the underlier to buy or sell to create a hedged portfolio.

When the change in the value of the underlier is not small, the second-order term, , cannot be
ignored. In practice, maintaining a delta neutral portfolio requires continual recalculation of the
position's Greeks and rebalancing of the underlier's position. Typically, this rebalancing is
performed daily or weekly.

Variance risk premium


From Wikipedia, the free encyclopedia
Jump to: navigation, search

The Variance risk premium is the phenomenon on the Variance swap market of the variance
swap strike being greater than the realized variance on average. For most trades, the buyer of
variance ends up with a loss on the trade, while the seller profits.[1] The amount that the buyer of
variance typically loses in entering into the variance swap is known as the variance risk
premium. The variance risk premium can be naively justified by taking into account the large
negative convexity of a short variance position; variance during the rare times of crisis can be
50-100 times that of normal market conditions.
Using insurance as an analogy, the variance buyer typically pays a premium to be able to receive
the large positive payoff of a variance swap in times of market turmoil, to "insure" against times
of market turmoil.
The variance risk premium can also be analysed from the perspective of asset allocation. Carr
and Wu (2007) examines whether the excess returns of selling or buying variance swaps can be
explained using common factor models such as the CAPM model and the Fama-French factors,
which include returns of different segments of stocks on the market. Despite the intuitive
connection between stock price volatility and stock price, none of these models are able to
strongly explain the excess returns on variance swaps. This implies that there is another factor
that is unrelated to stock prices that affects how much, on average, one will pay to enter into a
variance swap contract. This suggests that investors are willing to pay extra money to enter into
variance because they dislike variance, not just because it is anti-correlated with stock prices, but
on its own right. This leads to many considering variance as an asset class in and of itself.
In the years before the 2008 Financial Crisis, selling variance on a rolling basis was a popular
trade among hedge funds and other institutional investors.

Efficient-market hypothesis
From Wikipedia, the free encyclopedia
Jump to: navigation, search

In finance, the efficient-market hypothesis (EMH) asserts that financial markets are
"informationally efficient". The weak version of EMH suppose that prices on traded assets
(e.g.,stocks, bonds, or property) already reflect all past publicly available information. The semi-
strong version supposes that prices reflect all publicly available information and instantly change
to reflect new information. The strong version supposes that market reflects even hidden/inside
information. There is some disputed evidence to suggest that the weak and semi-strong versions
are valid while there is powerful evidence against the strong version. Therefore, according to
theory, it is improbable to consistently outperform the market by using any information that the
market already has, except through inside trading. Information or news in the EMH is defined as
anything that may affect prices that is unknowable in the present and thus appears randomly in
the future. The hypothesis has been attacked by critics who blame the belief in rational markets
for much of the financial crisis of 2007–2010,[1][2] with noted financial journalist Roger
Lowenstein declaring "The upside of the current Great Recession is that it could drive a stake
through the heart of the academic nostrum known as the efficient-market hypothesis."[3]
[edit]Historical background
The efficient-market hypothesis was first expressed by Louis Bachelier, a French mathematician,
in his 1900 dissertation, "The Theory of Speculation".[4] His work was largely ignored until the
1950s; however beginning in the 30s scattered, independent work corroborated his thesis. A
small number of studies indicated that US stock prices and related financial series followed a
random walk model.[5] Research by Alfred Cowles in the ’30s and ’40s suggested that
professional investors were in general unable to outperform the market.
The efficient-market hypothesis was developed by Professor Eugene Fama at the University of
Chicago Booth School of Business as an academic concept of study through his published Ph.D.
thesis in the early 1960s at the same school. It was widely accepted up until the 1990s, when
behavioral finance economists, who were a fringe element, became mainstream.[6] Empirical
analyses have consistently found problems with the efficient-market hypothesis, the most
consistent being that stocks with low price to earnings (and similarly, low price to cash-flow or
book value) outperform other stocks.[7][8] Alternative theories have proposed that cognitive biases
cause these inefficiencies, leading investors to purchase overpriced growth stocks rather than
value stocks.[6] Although the efficient-market hypothesis has become controversial because
substantial and lasting inefficiencies are observed, Beechey et al. (2000) consider that it remains
a worthwhile starting point.[9]
The efficient-market hypothesis emerged as a prominent theory in the mid-1960s. Paul
Samuelson had begun to circulate Bachelier's work among economists. In 1964 Bachelier's
dissertation along with the empirical studies mentioned above were published in an anthology
edited by Paul Cootner.[10] In 1965 Eugene Fama published his dissertation arguing for the
random walk hypothesis,[11] and Samuelson published a proof for a version of the efficient-
market hypothesis.[12] In 1970 Fama published a review of both the theory and the evidence for
the hypothesis. The paper extended and refined the theory, included the definitions for three
forms of financial market efficiency: weak, semi-strong and strong (see below).[13]
Further to this evidence that the UK stock market is weak-form efficient, other studies of capital
markets have pointed toward their being semi-strong-form efficient. Studies by Firth (1976,
1979, and 1980) in the United Kingdom have compared the share prices existing after a takeover
announcement with the bid offer. Firth found that the share prices were fully and instantaneously
adjusted to their correct levels, thus concluding that the UK stock market was semi-strong-form
efficient. However, the market's ability to efficiently respond to a short term, widely publicized
event such as a takeover announcement does not necessarily prove market efficiency related to
other more long term, amorphous factors. David Dreman has criticized the evidence provided by
this instant "efficient" response, pointing out that an immediate response is not necessarily
efficient, and that the long-term performance of the stock in response to certain movements are
better indications. A study on stocks response to dividend cuts or increases over three years
found that after an announcement of a dividend cut, stocks underperformed the market by 15.3%
for the three-year period, while stocks outperformed 24.8% for the three years afterward after a
dividend increase announcement.[14]
[edit]Theoretical background
Beyond the normal utility maximizing agents, the efficient-market hypothesis requires that
agents have rational expectations; that on average the population is correct (even if no one
person is) and whenever new relevant information appears, the agents update their expectations
appropriately. Note that it is not required that the agents be rational. EMH allows that when
faced with new information, some investors may overreact and some may underreact. All that is
required by the EMH is that investors' reactions be random and follow a normal distribution
pattern so that the net effect on market prices cannot be reliably exploited to make an abnormal
profit, especially when considering transaction costs (including commissions and spreads). Thus,
any one person can be wrong about the market — indeed, everyone can be — but the market as a
whole is always right. There are three common forms in which the efficient-market hypothesis is
commonly stated — weak-form efficiency, semi-strong-form efficiency and strong-form
efficiency, each of which has different implications for how markets work.
In weak-form efficiency, future prices cannot be predicted by analyzing price from the past.
Excess returns can not be earned in the long run by using investment strategies based on
historical share prices or other historical data. Technical analysis techniques will not be able to
consistently produce excess returns, though some forms of fundamental analysis may still
provide excess returns. Share prices exhibit no serial dependencies, meaning that there are no
"patterns" to asset prices. This implies that future price movements are determined entirely by
information not contained in the price series. Hence, prices must follow a random walk. This
'soft' EMH does not require that prices remain at or near equilibrium, but only that market
participants not be able to systematically profit from market 'inefficiencies'. However, while
EMH predicts that all price movement (in the absence of change in fundamental information) is
random (i.e., non-trending), many studies have shown a marked tendency for the stock markets
to trend over time periods of weeks or longer[15] and that, moreover, there is a positive correlation
between degree of trending and length of time period studied[16] (but note that over long time
periods, the trending is sinusoidal in appearance). Various explanations for such large and
apparently non-random price movements have been promulgated. But the best explanation seems
to be that the distribution of stock market prices is non-Gaussian (in which case EMH, in any of
its current forms, would not be strictly applicable).[17][18]
In semi-strong-form efficiency, it is implied that share prices adjust to publicly available new
information very rapidly and in an unbiased fashion, such that no excess returns can be earned by
trading on that information. Semi-strong-form efficiency implies that neither fundamental
analysis nor technical analysis techniques will be able to reliably produce excess returns. To test
for semi-strong-form efficiency, the adjustments to previously unknown news must be of a
reasonable size and must be instantaneous. To test for this, consistent upward or downward
adjustments after the initial change must be looked for. If there are any such adjustments it
would suggest that investors had interpreted the information in a biased fashion and hence in an
inefficient manner.
In strong-form efficiency, share prices reflect all information, public and private, and no one
can earn excess returns. If there are legal barriers to private information becoming public, as with
insider trading laws, strong-form efficiency is impossible, except in the case where the laws are
universally ignored. To test for strong-form efficiency, a market needs to exist where investors
cannot consistently earn excess returns over a long period of time. Even if some money
managers are consistently observed to beat the market, no refutation even of strong-form
efficiency follows: with hundreds of thousands of fund managers worldwide, even a normal
distribution of returns (as efficiency predicts) should be expected to produce a few dozen "star"
performers.
[edit]Criticism and behavioral finance
Price-Earnings ratios as a predictor of twenty-year returns based upon the plot by
Robert Shiller (Figure 10.1,[19]source). The horizontal axis shows the real price-
earnings ratio of the S&P Composite Stock Price Index as computed in Irrational
Exuberance (inflation adjusted price divided by the prior ten-year mean of inflation-
adjusted earnings). The vertical axis shows the geometric average real annual
return on investing in the S&P Composite Stock Price Index, reinvesting dividends,
and selling twenty years later. Data from different twenty-year periods is color-
coded as shown in the key. See also ten-year returns. Shiller states that this plot
"confirms that long-term investors—investors who commit their money to an
investment for ten full years—did do well when prices were low relative to earnings
at the beginning of the ten years. Long-term investors would be well advised,
individually, to lower their exposure to the stock market when it is high, as it has
been recently, and get into the market when it is low."[19]Burton Malkiel stated that
this correlation may be consistent with an efficient market due to differences in
interest rates.[20]

Investors and researchers have disputed the efficient-market hypothesis both empirically and
theoretically. Behavioral economists attribute the imperfections in financial markets to a
combination of cognitive biases such as overconfidence, overreaction, representative bias,
information bias, and various other predictable human errors in reasoning and information
processing. These have been researched by psychologists such as Daniel Kahneman, Amos
Tversky, Richard Thaler, and Paul Slovic. These errors in reasoning lead most investors to avoid
value stocks and buy growth stocks at expensive prices, which allow those who reason correctly
to profit from bargains in neglected value stocks and the overreacted selling of growth stocks.
Empirical evidence has been mixed, but has generally not supported strong forms of the
efficient-market hypothesis[7][8][21] According to Dreman, in a 1995 paper, low P/E stocks have
greater returns.[22] In an earlier paper he also refuted the assertion by Ray Ball that these higher
returns could be attributed to higher beta,[23] whose research had been accepted by efficient
market theorists as explaining the anomaly[24] in neat accordance with modern portfolio theory.

One can identify "losers" as stocks that have had poor returns over some number of past years.
"Winners" would be those stocks that had high returns over a similar period. The main result of
one such study is that losers have much higher average returns than winners over the following
period of the same number of years.[25] A later study showed that beta (β) cannot account for this
difference in average returns.[26] This tendency of returns to reverse over long horizons (i.e.,
losers become winners) is yet another contradiction of EMH. Losers would have to have much
higher betas than winners in order to justify the return difference. The study showed that the beta
difference required to save the EMH is just not there.
Speculative economic bubbles are an obvious anomaly, in that the market often appears to be
driven by buyers operating on irrational exuberance, who take little notice of underlying value.
These bubbles are typically followed by an overreaction of frantic selling, allowing shrewd
investors to buy stocks at bargain prices. Rational investors have difficulty profiting by shorting
irrational bubbles because, as John Maynard Keynescommented, "Markets can remain irrational
longer than you can remain solvent."[27] Sudden market crashes as happened on Black Monday in
1987 are mysterious from the perspective of efficient markets, but allowed as a rare statistical
event under the Weak-form of EMH.
Burton Malkiel, a well-known proponent of the general validity of EMH, has warned that certain
emerging markets such as China are not empirically efficient; that the Shanghai and Shenzhen
markets, unlike markets in United States, exhibit considerable serial correlation (price trends),
non-random walk, and evidence of manipulation.[28]
Behavioral psychology approaches to stock market trading are among some of the more
promising alternatives to EMH (and some investment strategies seek to exploit exactly such
inefficiencies). But Nobel Laureate co-founder of the programme—Daniel Kahneman—
announced his skepticism of investors beating the market: "They're [investors] just not going to
do it [beat the market]. It's just not going to happen."[29] Indeed defenders of EMH maintain that
Behavioral Finance strengthens the case for EMH in that BF highlights biases in individuals and
committees and not competitive markets. For example, one promiment finding in Behaviorial
Finance is that individuals employ hyperbolic discounting. It is palpably true that bonds,
mortgages, annuities and other similar financial instruments subject to competitive market forces
do not. Any manifestation of hyperbolic discounting in the pricing of these obligations would
invite arbitrage thereby quickly eliminating any vestige of individual biases. Similarly,
diversification, derivative securities and other hedging strategies assuage if not eliminate
potential mispricings from the severe risk-intolerance (loss aversion) of individuals underscored
by behavioral finance. On the other hand, economists, behaviorial psychologists and mutual fund
managers are drawn from the human population and are therefore subject to the biases that
behavioralists showcase. By contrast, the price signals in markets are far less subject to
individual biases highlighted by the Behavioral Finance programme. Richard Thaler has started a
fund based on his research on cognitive biases. In a 2008 report he identified complexity and
herd behavior as central to the global financial crisis of 2008.[30]
Further empirical work has highlighted the impact transaction costs have on the concept of
market efficiency, with much evidence suggesting that any anomalies pertaining to market
inefficiencies are the result of a cost benefit analysis made by those willing to incur the cost of
acquiring the valuable information in order to trade on it. Additionally the concept of liquidity is
a critical component to capturing "inefficiencies" in tests for abnormal returns. Any test of this
proposition faces the joint hypothesis problem, where it is impossible to ever test for market
efficiency, since to do so requires the use of a measuring stick against which abnormal returns
are compared - one cannot know if the market is efficient if one does not know if a model
correctly stipulates the required rate of return. Consequently, a situation arises where either the
asset pricing model is incorrect or the market is inefficient, but one has no way of knowing
which is the case.[citation needed]
A key work on random walk was done in the late 1980s by Profs. Andrew Lo and Craig
MacKinlay; they effectively argue that a random walk does not exist, nor ever has. Their paper
took almost two years to be accepted by academia and in 2001 they published "A Non-random
Walk Down Wall St." which explained the paper in layman's terms.[citation needed]
[edit]Recent financial crisis
The recent global financial crisis has led to renewed scrutiny and criticism of the hypothesis.[31]
Market strategist Jeremy Grantham has stated flatly that EMH is responsible for the current
financial crisis, claiming that belief in the hypothesis caused financial leaders to have a "chronic
underestimation of the dangers of asset bubbles breaking".[2]
At the International Organization of Securities Commissions annual conference, held in June
2009, the hypothesis took center stage. Martin Wolf, the chief economics commentator for the
Financial Times, dismissed the hypothesis as being a useless way to examine how markets
function in reality. Paul McCulley, managing director of PIMCO, was less extreme in his
criticism, saying that the hypothesis had not failed, but was "seriously flawed" in its neglect of
human nature.[32]
The financial crisis has led Richard Posner, a prominent judge, University of Chicago law
professor, and innovator in the field of Law and Economics, to back away from the hypothesis
and express some degree of belief in Keynesian economics. Posner accused some of his 'Chicago
School' colleagues of being "asleep at the switch", saying that "the movement to deregulate the
financial industry went too far by exaggerating the resilience - the self healing powers - of
laissez-faire capitalism."[33] Others, such as Fama himself, said that the theory held up well
during the crisis and that the markets were a casualty of the recession, not the cause of it.
[edit]Popular reception
Despite the best efforts of EMH proponents such as Burton Malkiel, whose book A Random
Walk Down Wall Street achieved best-seller status, the EMH has not caught the public's
imagination. Various forms of stock picking, such as active management, are promoted by
popular CNBC commentator Jim Cramer and former Fidelity Investments fund manager Peter
Lynch, whose books and articles have popularised the notion that investors can "beat the
market".[citation needed]
Many believe that EMH says that a security's price is a correct representation of the value of that
business, as calculated by what the business's future returns will actually be. In other words, they
believe that EMH says a stock's price correctly predicts the underlying company's future results.
Since stock prices clearly do not reflect company future results in many cases, many people
reject EMH as clearly wrong.[citation needed]

Efficient-market hypothesis
In finance, the efficient-market hypothesis (EMH) asserts that financial markets are
"informationally efficient". The weak version of EMH suppose that prices on traded assets
(e.g.,stocks, bonds, or property) already reflect all past publicly available information. The semi-
strong version supposes that prices reflect all publicly available information and instantly change
to reflect new information. The strong version supposes that market reflects even hidden/inside
information. There is some disputed evidence to suggest that the weak and semi-strong versions
are valid while there is powerful evidence against the strong version. Therefore, according to
theory, it is improbable to consistently outperform the market by using any information that the
market already has, except through inside trading. Information or news in the EMH is defined as
anything that may affect prices that is unknowable in the present and thus appears randomly in
the future. The hypothesis has been attacked by critics who blame the belief in rational markets
for much of the financial crisis of 2007–2010,[1][2] with noted financial journalist Roger
Lowenstein declaring "The upside of the current Great Recession is that it could drive a stake
through the heart of the academic nostrum known as the efficient-market hypothesis."[3]
[edit]Historical background
The efficient-market hypothesis was first expressed by Louis Bachelier, a French mathematician,
in his 1900 dissertation, "The Theory of Speculation".[4] His work was largely ignored until the
1950s; however beginning in the 30s scattered, independent work corroborated his thesis. A
small number of studies indicated that US stock prices and related financial series followed a
random walk model.[5] Research by Alfred Cowles in the ’30s and ’40s suggested that
professional investors were in general unable to outperform the market.
The efficient-market hypothesis was developed by Professor Eugene Fama at the University of
Chicago Booth School of Business as an academic concept of study through his published Ph.D.
thesis in the early 1960s at the same school. It was widely accepted up until the 1990s, when
behavioral finance economists, who were a fringe element, became mainstream.[6] Empirical
analyses have consistently found problems with the efficient-market hypothesis, the most
consistent being that stocks with low price to earnings (and similarly, low price to cash-flow or
book value) outperform other stocks.[7][8] Alternative theories have proposed that cognitive biases
cause these inefficiencies, leading investors to purchase overpriced growth stocks rather than
value stocks.[6] Although the efficient-market hypothesis has become controversial because
substantial and lasting inefficiencies are observed, Beechey et al. (2000) consider that it remains
a worthwhile starting point.[9]
The efficient-market hypothesis emerged as a prominent theory in the mid-1960s. Paul
Samuelson had begun to circulate Bachelier's work among economists. In 1964 Bachelier's
dissertation along with the empirical studies mentioned above were published in an anthology
edited by Paul Cootner.[10] In 1965 Eugene Fama published his dissertation arguing for the
random walk hypothesis,[11] and Samuelson published a proof for a version of the efficient-
market hypothesis.[12] In 1970 Fama published a review of both the theory and the evidence for
the hypothesis. The paper extended and refined the theory, included the definitions for three
forms of financial market efficiency: weak, semi-strong and strong (see below).[13]
Further to this evidence that the UK stock market is weak-form efficient, other studies of capital
markets have pointed toward their being semi-strong-form efficient. Studies by Firth (1976,
1979, and 1980) in the United Kingdom have compared the share prices existing after a takeover
announcement with the bid offer. Firth found that the share prices were fully and instantaneously
adjusted to their correct levels, thus concluding that the UK stock market was semi-strong-form
efficient. However, the market's ability to efficiently respond to a short term, widely publicized
event such as a takeover announcement does not necessarily prove market efficiency related to
other more long term, amorphous factors. David Dreman has criticized the evidence provided by
this instant "efficient" response, pointing out that an immediate response is not necessarily
efficient, and that the long-term performance of the stock in response to certain movements are
better indications. A study on stocks response to dividend cuts or increases over three years
found that after an announcement of a dividend cut, stocks underperformed the market by 15.3%
for the three-year period, while stocks outperformed 24.8% for the three years afterward after a
dividend increase announcement.[14]
[edit]Theoretical background
Beyond the normal utility maximizing agents, the efficient-market hypothesis requires that
agents have rational expectations; that on average the population is correct (even if no one
person is) and whenever new relevant information appears, the agents update their expectations
appropriately. Note that it is not required that the agents be rational. EMH allows that when
faced with new information, some investors may overreact and some may underreact. All that is
required by the EMH is that investors' reactions be random and follow a normal distribution
pattern so that the net effect on market prices cannot be reliably exploited to make an abnormal
profit, especially when considering transaction costs (including commissions and spreads). Thus,
any one person can be wrong about the market — indeed, everyone can be — but the market as a
whole is always right. There are three common forms in which the efficient-market hypothesis is
commonly stated — weak-form efficiency, semi-strong-form efficiency and strong-form
efficiency, each of which has different implications for how markets work.
In weak-form efficiency, future prices cannot be predicted by analyzing price from the past.
Excess returns can not be earned in the long run by using investment strategies based on
historical share prices or other historical data. Technical analysis techniques will not be able to
consistently produce excess returns, though some forms of fundamental analysis may still
provide excess returns. Share prices exhibit no serial dependencies, meaning that there are no
"patterns" to asset prices. This implies that future price movements are determined entirely by
information not contained in the price series. Hence, prices must follow a random walk. This
'soft' EMH does not require that prices remain at or near equilibrium, but only that market
participants not be able to systematically profit from market 'inefficiencies'. However, while
EMH predicts that all price movement (in the absence of change in fundamental information) is
random (i.e., non-trending), many studies have shown a marked tendency for the stock markets
to trend over time periods of weeks or longer[15] and that, moreover, there is a positive correlation
between degree of trending and length of time period studied[16] (but note that over long time
periods, the trending is sinusoidal in appearance). Various explanations for such large and
apparently non-random price movements have been promulgated. But the best explanation seems
to be that the distribution of stock market prices is non-Gaussian (in which case EMH, in any of
its current forms, would not be strictly applicable).[17][18]
In semi-strong-form efficiency, it is implied that share prices adjust to publicly available new
information very rapidly and in an unbiased fashion, such that no excess returns can be earned by
trading on that information. Semi-strong-form efficiency implies that neither fundamental
analysis nor technical analysis techniques will be able to reliably produce excess returns. To test
for semi-strong-form efficiency, the adjustments to previously unknown news must be of a
reasonable size and must be instantaneous. To test for this, consistent upward or downward
adjustments after the initial change must be looked for. If there are any such adjustments it
would suggest that investors had interpreted the information in a biased fashion and hence in an
inefficient manner.
In strong-form efficiency, share prices reflect all information, public and private, and no one
can earn excess returns. If there are legal barriers to private information becoming public, as with
insider trading laws, strong-form efficiency is impossible, except in the case where the laws are
universally ignored. To test for strong-form efficiency, a market needs to exist where investors
cannot consistently earn excess returns over a long period of time. Even if some money
managers are consistently observed to beat the market, no refutation even of strong-form
efficiency follows: with hundreds of thousands of fund managers worldwide, even a normal
distribution of returns (as efficiency predicts) should be expected to produce a few dozen "star"
performers.
[edit]Criticism and behavioral finance

Price-Earnings ratios as a predictor of twenty-year returns based upon the plot by


Robert Shiller (Figure 10.1,[19]source). The horizontal axis shows the real price-
earnings ratio of the S&P Composite Stock Price Index as computed in Irrational
Exuberance (inflation adjusted price divided by the prior ten-year mean of inflation-
adjusted earnings). The vertical axis shows the geometric average real annual
return on investing in the S&P Composite Stock Price Index, reinvesting dividends,
and selling twenty years later. Data from different twenty-year periods is color-
coded as shown in the key. See also ten-year returns. Shiller states that this plot
"confirms that long-term investors—investors who commit their money to an
investment for ten full years—did do well when prices were low relative to earnings
at the beginning of the ten years. Long-term investors would be well advised,
individually, to lower their exposure to the stock market when it is high, as it has
been recently, and get into the market when it is low."[19]Burton Malkiel stated that
this correlation may be consistent with an efficient market due to differences in
interest rates.[20]

Investors and researchers have disputed the efficient-market hypothesis both empirically and
theoretically. Behavioral economists attribute the imperfections in financial markets to a
combination of cognitive biases such as overconfidence, overreaction, representative bias,
information bias, and various other predictable human errors in reasoning and information
processing. These have been researched by psychologists such as Daniel Kahneman, Amos
Tversky, Richard Thaler, and Paul Slovic. These errors in reasoning lead most investors to avoid
value stocks and buy growth stocks at expensive prices, which allow those who reason correctly
to profit from bargains in neglected value stocks and the overreacted selling of growth stocks.
Empirical evidence has been mixed, but has generally not supported strong forms of the
efficient-market hypothesis[7][8][21] According to Dreman, in a 1995 paper, low P/E stocks have
greater returns.[22] In an earlier paper he also refuted the assertion by Ray Ball that these higher
returns could be attributed to higher beta,[23] whose research had been accepted by efficient
market theorists as explaining the anomaly[24] in neat accordance with modern portfolio theory.

One can identify "losers" as stocks that have had poor returns over some number of past years.
"Winners" would be those stocks that had high returns over a similar period. The main result of
one such study is that losers have much higher average returns than winners over the following
period of the same number of years.[25] A later study showed that beta (β) cannot account for this
difference in average returns.[26] This tendency of returns to reverse over long horizons (i.e.,
losers become winners) is yet another contradiction of EMH. Losers would have to have much
higher betas than winners in order to justify the return difference. The study showed that the beta
difference required to save the EMH is just not there.
Speculative economic bubbles are an obvious anomaly, in that the market often appears to be
driven by buyers operating on irrational exuberance, who take little notice of underlying value.
These bubbles are typically followed by an overreaction of frantic selling, allowing shrewd
investors to buy stocks at bargain prices. Rational investors have difficulty profiting by shorting
irrational bubbles because, as John Maynard Keynescommented, "Markets can remain irrational
longer than you can remain solvent."[27] Sudden market crashes as happened on Black Monday in
1987 are mysterious from the perspective of efficient markets, but allowed as a rare statistical
event under the Weak-form of EMH.
Burton Malkiel, a well-known proponent of the general validity of EMH, has warned that certain
emerging markets such as China are not empirically efficient; that the Shanghai and Shenzhen
markets, unlike markets in United States, exhibit considerable serial correlation (price trends),
non-random walk, and evidence of manipulation.[28]
Behavioral psychology approaches to stock market trading are among some of the more
promising alternatives to EMH (and some investment strategies seek to exploit exactly such
inefficiencies). But Nobel Laureate co-founder of the programme—Daniel Kahneman—
announced his skepticism of investors beating the market: "They're [investors] just not going to
do it [beat the market]. It's just not going to happen."[29] Indeed defenders of EMH maintain that
Behavioral Finance strengthens the case for EMH in that BF highlights biases in individuals and
committees and not competitive markets. For example, one promiment finding in Behaviorial
Finance is that individuals employ hyperbolic discounting. It is palpably true that bonds,
mortgages, annuities and other similar financial instruments subject to competitive market forces
do not. Any manifestation of hyperbolic discounting in the pricing of these obligations would
invite arbitrage thereby quickly eliminating any vestige of individual biases. Similarly,
diversification, derivative securities and other hedging strategies assuage if not eliminate
potential mispricings from the severe risk-intolerance (loss aversion) of individuals underscored
by behavioral finance. On the other hand, economists, behaviorial psychologists and mutual fund
managers are drawn from the human population and are therefore subject to the biases that
behavioralists showcase. By contrast, the price signals in markets are far less subject to
individual biases highlighted by the Behavioral Finance programme. Richard Thaler has started a
fund based on his research on cognitive biases. In a 2008 report he identified complexity and
herd behavior as central to the global financial crisis of 2008.[30]
Further empirical work has highlighted the impact transaction costs have on the concept of
market efficiency, with much evidence suggesting that any anomalies pertaining to market
inefficiencies are the result of a cost benefit analysis made by those willing to incur the cost of
acquiring the valuable information in order to trade on it. Additionally the concept of liquidity is
a critical component to capturing "inefficiencies" in tests for abnormal returns. Any test of this
proposition faces the joint hypothesis problem, where it is impossible to ever test for market
efficiency, since to do so requires the use of a measuring stick against which abnormal returns
are compared - one cannot know if the market is efficient if one does not know if a model
correctly stipulates the required rate of return. Consequently, a situation arises where either the
asset pricing model is incorrect or the market is inefficient, but one has no way of knowing
which is the case.[citation needed]
A key work on random walk was done in the late 1980s by Profs. Andrew Lo and Craig
MacKinlay; they effectively argue that a random walk does not exist, nor ever has. Their paper
took almost two years to be accepted by academia and in 2001 they published "A Non-random
Walk Down Wall St." which explained the paper in layman's terms.[citation needed]
[edit]Recent financial crisis
The recent global financial crisis has led to renewed scrutiny and criticism of the hypothesis.[31]
Market strategist Jeremy Grantham has stated flatly that EMH is responsible for the current
financial crisis, claiming that belief in the hypothesis caused financial leaders to have a "chronic
underestimation of the dangers of asset bubbles breaking".[2]
At the International Organization of Securities Commissions annual conference, held in June
2009, the hypothesis took center stage. Martin Wolf, the chief economics commentator for the
Financial Times, dismissed the hypothesis as being a useless way to examine how markets
function in reality. Paul McCulley, managing director of PIMCO, was less extreme in his
criticism, saying that the hypothesis had not failed, but was "seriously flawed" in its neglect of
human nature.[32]
The financial crisis has led Richard Posner, a prominent judge, University of Chicago law
professor, and innovator in the field of Law and Economics, to back away from the hypothesis
and express some degree of belief in Keynesian economics. Posner accused some of his 'Chicago
School' colleagues of being "asleep at the switch", saying that "the movement to deregulate the
financial industry went too far by exaggerating the resilience - the self healing powers - of
laissez-faire capitalism."[33] Others, such as Fama himself, said that the theory held up well
during the crisis and that the markets were a casualty of the recession, not the cause of it.
[edit]Popular reception
Despite the best efforts of EMH proponents such as Burton Malkiel, whose book A Random
Walk Down Wall Street achieved best-seller status, the EMH has not caught the public's
imagination. Various forms of stock picking, such as active management, are promoted by
popular CNBC commentator Jim Cramer and former Fidelity Investments fund manager Peter
Lynch, whose books and articles have popularised the notion that investors can "beat the
market".[citation needed]
Many believe that EMH says that a security's price is a correct representation of the value of that
business, as calculated by what the business's future returns will actually be. In other words, they
believe that EMH says a stock's price correctly predicts the underlying company's future results.
Since stock prices clearly do not reflect company future results in many cases, many people
reject EMH as clearly wrong.[citation needed]

Martingale (probability theory)


.

A stopped Brownian motion as an example for a martingale

In probability theory, a martingale is a stochastic process (i.e., a sequence of random variables)


such that the conditional expected value of an observation at some time t, given all the
observations up to some earlier time s, is equal to the observation at that earlier time s. Precise
definitions are given below.
[edit]History
Originally, martingale referred to a class of betting strategies that was popular in 18th century
France.[1] The simplest of these strategies was designed for a game in which the gambler wins his
stake if a coin comes up heads and loses it if the coin comes up tails. The strategy had the
gambler double his bet after every loss so that the first win would recover all previous losses plus
win a profit equal to the original stake. As the gambler's wealth and available time jointly
approach infinity, his probability of eventually flipping heads approaches 1, which makes the
martingale betting strategy seem like a sure thing. However, the exponential growth of the bets
eventually bankrupts its users.
The concept of martingale in probability theory was introduced by Paul Pierre Lévy, and much
of the original development of the theory was done by Joseph Leo Doob among others. Part of
the motivation for that work was to show the impossibility of successful betting strategies.
[edit]Definitions
A discrete-time martingale is a discrete-timestochastic process (i.e., a sequence of random
variables) X1, X2, X3, ... that satisfies for all n

i.e., the conditional expected value of the next observation, given all the past observations, is
equal to the last observation.
Somewhat more generally, a sequence Y1, Y2, Y3 ... is said to be a martingale with respect to
another sequence X1, X2, X3 ... if for all n

The sequence Xi is sometimes known as the


filtration.

Similarly, a continuous-time martingale with respect to the stochastic processXt is a stochastic


processYt such that for all t

This expresses the property that the conditional expectation of an observation at time t, given all
the observations up to time s, is equal to the observation at time s (of course, provided that s ≤ t).
In full generality, a stochastic process Y : T × Ω → S is a martingale with respect to a filtration
Σ∗and probability measure P if
• Σ∗ is a filtration of the underlying probability space (Ω, Σ, P);
• Y is adapted to the filtration Σ∗, i.e., for each t in the index setT, the random
variable Yt is a Σt-measurable function;
• for each t, Yt lies in the Lp spaceL1(Ω, Σt, P; S), i.e.

• for all s and t with s < t and all F ∈ Σs,

where χF denotes the indicator function of the event F. In Grimmett and


Stirzaker's Probability and Random Processes, this last condition is denoted
as

which is a general form of conditional expectation.[2]

It is important to note that the property of being a martingale involves both the filtration and the
probability measure (with respect to which the expectations are taken). It is possible that Y could
be a martingale with respect to one measure but not another one; the Girsanov theorem offers a
way to find a measure with respect to which an Itō process is a martingale.
[edit]Examples of martingales
• Suppose Xn is a gambler's fortune after n tosses of a fair coin, where the
gambler wins $1 if the coin comes up heads and loses $1 if the coin comes
up tails. The gambler's conditional expected fortune after the next trial, given
the history, is equal to his present fortune, so this sequence is a martingale.
This is also known as D'Alembert system.
• Let Yn = Xn2 − n where Xn is the gambler's fortune from the preceding
example. Then the sequence { Yn : n = 1, 2, 3, ... } is a martingale. This can
be used to show that the gambler's total gain or loss varies roughly between
plus or minus the square root of the number of steps.
• (de Moivre's martingale) Now suppose an "unfair" or "biased" coin, with
probability p of "heads" and probability q = 1 − p of "tails". Let

with "+" in case of "heads" and "−" in case of "tails". Let

Then { Yn : n = 1, 2, 3, ... } is a martingale with respect to { Xn : n = 1, 2,


3, ... }. To show this
• (Polya's urn) An urn initially contains r red and b blue marbles. One is chosen
randomly. Then it is put back in the urn along with another marble of the
same colour. Let Xn be the number of red marbles in the urn after niterations
of this procedure, and let Yn = Xn/(n + r + b). Then the sequence { Yn : n = 1,
2, 3, ... } is a martingale.
• (Likelihood-ratio testing in statistics) A population is thought to be distributed
according to either a probability density f or another probability density g. A
random sample is taken, the data being X1, ..., Xn. Let Yn be the "likelihood
ratio"

(which, in applications, would be used as a test statistic). If the population is


actually distributed according to the density f rather than according to g,
then { Yn : n = 1, 2, 3, ... } is a martingale with respect to { Xn : n = 1, 2, 3, ...
}.

• Suppose each amoeba either splits into two amoebas, with probability p, or
eventually dies, with probability 1 − p. Let Xn be the number of amoebas
surviving in the nth generation (in particular Xn = 0 if the population has
become extinct by that time). Let r be the probability of eventual extinction.
(Finding r as function of p is an instructive exercise. Hint: The probability that
the descendants of an amoeba eventually die out is equal to the probability
that either of its immediate offspring dies out, given that the original amoeba
has split.) Then

is a martingale with respect to { Xn: n = 1, 2, 3, ... }.


Software-created martingale series.

• The number of individuals of any particular species in an ecosystem of fixed


size is a function of (discrete) time, and may be viewed as a sequence of
random variables. This sequence is a martingale under the unified neutral
theory of biodiversity.
• If { Nt : t ≥ 0 } is a Poisson process with intensity λ, then the Compensated
Poisson process { Nt − λt : t ≥ 0 } is a continuous-time martingale with right-
continuous/left-limit sample paths.
• An example martingale series can easily be produced with computer
software:
• Microsoft Excel or similar spreadsheet software. Enter 0.0 in the A1
(top left) cell, and in the cell below it (A2) enter
=a1+NORMINV(RAND(),0,1). Now copy that cell by dragging down to
create 300 or so copies. This will create a martingale series with a
mean of 0 and standard deviation of 1. With the cells still highlighted
go to the chart creation tool and create a chart of these values. Now
every time a recalculation happens (in Excel the F9 key does this) the
chart will display another martingale series.
• R. To recreate the example above, issue plot(cumsum(rnorm(100,
mean=0, sd=1)), t="l", col="darkblue", lwd=3). To display another
martingale series, reissue the command.

[edit]Submartingales and supermartingales


A (discrete-time) submartingale is a sequence X1,X2,X3,... of integrablerandom variables
satisfying

Analogously a (discrete-time) supermartingale satisfies

The more general definitions of both discrete-time and continuous-time martingales given earlier
can be converted into the corresponding definitions of sub/supermartingales in the same way by
replacing the equality for the conditional expectation by an inequality.
Here is a mnemonic for remembering which is which: "Life is a supermartingale; as time
advances, expectation decreases."
[edit]Examples of submartingales and supermartingales
• Every martingale is also a submartingale and a supermartingale. Conversely,
any stochastic process that is both a submartingale and a supermartingale is
a martingale.
• Consider again the gambler who wins $1 when a coin comes up heads and
loses $1 when the coin comes up tails. Suppose now that the coin may be
biased, so that it comes up heads with probability p.
○ If p is equal to 1/2, the gambler on average neither wins nor loses
money, and the gambler's fortune over time is a martingale.
○ If p is less than 1/2, the gambler loses money on average, and the
gambler's fortune over time is a supermartingale.
○ If p is greater than 1/2, the gambler wins money on average, and the
gambler's fortune over time is a submartingale.
• A convex function of a martingale is a submartingale, by Jensen's inequality.
For example, the square of the gambler's fortune in the fair coin game is a
submartingale (which also follows from the fact that Xn2 − n is a martingale).
Similarly, a concave function of a martingale is a supermartingale.

[edit]Martingales and stopping times


See also: optional stopping theorem

A stopping time with respect to a sequence of random variables X1, X2, X3, ... is a random
variable τ with the property that for each t, the occurrence or non-occurrence of the event τ = t
depends only on the values of X1, X2, X3, ..., Xt. The intuition behind the definition is that at any
particular time t, you can look at the sequence so far and tell if it is time to stop. An example in
real life might be the time at which a gambler leaves the gambling table, which might be a
function of his previous winnings (for example, he might leave only when he goes broke), but he
can't choose to go or stay based on the outcome of games that haven't been played yet.
Some mathematicians defined the concept of stopping time by requiring only that the occurrence
or non-occurrence of the event τ = t be probabilistically independent of Xt + 1, Xt + 2, ... but not that
it be completely determined by the history of the process up to time t. That is a weaker condition
than the one appearing in the paragraph above, but is strong enough to serve in some of the
proofs in which stopping times are used.
One of the basic property of martingales, that if (Xt)t> 0 is (sub-/super-) martingale and τ is a

stopping time, then the corresponding stopped process defined by


is also (sub-/super-) martingale.
The concept of stopped martingale leads to a series of important theorems. Eg. the optional
stopping theorem (or optional sampling theorem), that says, under certain conditions, the
expected value of a martingale at a stopping time is equal to its initial value. We can use it, for
example, to prove the impossibility of successful betting strategies for a gambler with a finite
lifetime and a house limit on bets.

Measure (mathematics)
Informally, a measure has the property of being monotone in the sense that if A is a
subset of B, the measure of A is less than or equal to the measure of B.
Furthermore, the measure of the empty set is required to be 0.

In mathematics, more specifically in measure theory, a measure on a set is a systematic way to


assign to each suitable subset a number, intuitively interpreted as the size of the subset. In this
sense, a measure is a generalization of the concepts of length, area, volume, et cetera. A
particularly important example is the Lebesgue measure on a Euclidean space, which assigns the
conventional length, area and volume of Euclidean geometry to suitable subsets of Rn,
n=1,2,3,.... For instance, the Lebesgue measure of [0,1] in the real numbers is its length in the
everyday sense of the word, specifically 1.
To qualify as a measure (seeDefinition below), a function that assigns a non-negative real
number or +∞ to a set's subsets must satisfy a few conditions. One important condition is
countable additivity. This condition states that the size of the union of a sequence of disjoint
subsets is equal to the sum of the sizes of the subsets. However, it is in general impossible to
consistently associate a size to each subset of a given set and also satisfy the other axioms of a
measure. This problem was resolved by defining measure only on a sub-collection of all subsets;
the subsets on which the measure is to be defined are called measurable and they are required to
form a sigma-algebra, meaning that unions, intersections and complements of sequences of
measurable subsets are measurable. Non-measurable sets in a Euclidean space, on which the
Lebesgue measure cannot be consistently defined, are necessarily complex to the point of
incomprehensibility, in a sense badly mixed up with their complement; indeed, their existence is
a non-trivial consequence of the axiom of choice.
Measure theory was developed in successive stages during the late 19th and early 20th century
by Emile Borel, Henri Lebesgue, Johann Radon and Maurice Fréchet, among others. The main
applications of measures are in the foundations of the Lebesgue integral, in Andrey
Kolmogorov's axiomatisation of probability theory and in ergodic theory. In integration theory,
specifying a measure allows one to define integrals on spaces more general than subsets of
Euclidean space; moreover, the integral with respect to the Lebesgue measure on Euclidean
spaces is more general and has a richer theory than its predecessor, the Riemann integral.
Probability theory considers measures that assign to the whole set the size 1, and considers
measurable subsets to be events whose probability is given by the measure. Ergodic theory
considers measures that are invariant under, or arise naturally from, a dynamical system.
[edit]Definition
Let Σ be a σ-algebra over a set X. A functionμ from Σ to the extended real number line is called
a measure if it satisfies the following properties:
• Non-negativity:

for all
• Null empty set:

• Countable additivity (or σ-additivity): For all countable collections {Ei} of


pairwise disjoint sets in Σ:

The second condition may be treated as a special case of countable additivity, if the empty
collection is allowed as a countable collection (and the empty sum is interpreted as 0).
Otherwise, if the empty collection is disallowed (but finite collections are allowed), the second
condition still follows from countable additivity provided, however, that there is at least one set
having finite measure.
The pair (X, Σ) is called a measurable space, the members of Σ are called measurable sets,
and the triple(X, Σ, μ) is called a measure space.
If only the second and third conditions are met, and μ takes on at most one of the values ±∞,
then μ is called a signed measure.
A probability measure is a measure with total measure one (i.e., μ(X) = 1); a probability
space is a measure space with a probability measure.
For measure spaces that are also topological spaces various compatibility conditions can be
placed for the measure and the topology. Most measures met in practice in analysis (and in many
cases also in probability theory) are Radon measures. Radon measures have an alternative
definition in terms of linear functionals on the locally convex space of continuous functions with
compact support. This approach is taken by Bourbaki (2004) and a number of other authors. For
more details see Radon measure.
[edit]Properties
Several further properties can be derived from the definition of a countably additive measure.
[edit]Monotonicity
A measure μ is monotonic: If E1 and E2 are measurable sets with E1 ⊆E2 then

[edit]Measures of infinite unions of measurable sets


A measure μ is countably subadditive: If E1, E2, E3, … is a countable sequence of sets in Σ, not
necessarily disjoint, then

A measure μ is continuous from below: If E1, E2, E3, … are measurable sets and En is a subset of
En + 1 for all n, then the union of the sets En is measurable, and

[edit]Measures of infinite intersections of measurable sets


A measure μ is continuous from above: If E1, E2, E3, … are measurable sets and En + 1 is a subset
of En for all n, then the intersection of the sets En is measurable; furthermore, if at least one of the
En has finite measure, then

This property is false without the assumption that at least one of the En has finite measure. For
instance, for each n∈N, let

which all have infinite Lebesgue measure, but the intersection is empty.
[edit]Sigma-finite measures
Main article: Sigma-finite measure

A measure space (X, Σ, μ) is called finite if μ(X) is a finite real number (rather than ∞). It is
called σ-finite if X can be decomposed into a countable union of measurable sets of finite
measure. A set in a measure space has σ-finite measure if it is a countable union of sets with
finite measure.
For example, the real numbers with the standard Lebesgue measure are σ-finite but not finite.
Consider the closed intervals [k,k+1] for all integersk; there are countably many such intervals,
each has measure 1, and their union is the entire real line. Alternatively, consider the real
numbers with the counting measure, which assigns to each finite set of reals the number of points
in the set. This measure space is not σ-finite, because every set with finite measure contains only
finitely many points, and it would take uncountably many such sets to cover the entire real line.
The σ-finite measure spaces have some very convenient properties; σ-finiteness can be compared
in this respect to the Lindelöf property of topological spaces. They can be also thought of as a
vague generalization of the idea that a measure space may have 'uncountable measure'.
[edit]Completeness
A measurable set X is called a null set if μ(X)=0. A subset of a null set is called a negligible set.
A negligible set need not be measurable, but every measurable negligible set is automatically a
null set. A measure is called complete if every negligible set is measurable.
A measure can be extended to a complete one by considering the σ-algebra of subsets Y which
differ by a negligible set from a measurable set X, that is, such that the symmetric difference of X
and Y is contained in a null set. One defines μ(Y) to equal μ(X).
[edit]Examples
Some important measures are listed here.
• The counting measure is defined by μ(S) = number of elements in S.
• The Lebesgue measure on R is a complete translation-invariant measure on a
σ-algebra containing the intervals in R such that μ([0,1]) = 1; and every other
measure with these properties extends Lebesgue measure.
• Circular angle measure is invariant under rotation.
• The Haar measure for a locally compacttopological group is a generalization
of the Lebesgue measure (and also of counting measure and circular angle
measure) and has similar uniqueness properties.
• The Hausdorff measure which is a refinement of the Lebesgue measure to
some fractal sets.
• Every probability space gives rise to a measure which takes the value 1 on
the whole space (and therefore takes all its values in the unit interval [0,1]).
Such a measure is called a probability measure. See probability axioms.
• The Dirac measure δa (cf. Dirac delta function) is given by δa(S) = χS(a),
where χS is the characteristic function of S. The measure of a set is 1 if it
contains the point a and 0 otherwise.
Other 'named' measures used in various theories include: Borel measure, Jordan measure,
ergodic measure, Euler measure, Gaussian measure, Baire measure, Radon measure.
[edit]Non-measurable sets
Main article: Non-measurable set
If the axiom of choice is assumed to be true, not all subsets of Euclidean space are Lebesgue
measurable; examples of such sets include the Vitali set, and the non-measurable sets postulated
by the Hausdorff paradox and the Banach–Tarski paradox.
[edit]Generalizations
For certain purposes, it is useful to have a "measure" whose values are not restricted to the non-
negative reals or infinity. For instance, a countably additive set function with values in the
(signed) real numbers is called a signed measure, while such a function with values in the
complex numbers is called a complex measure. Measures that take values in Banach spaces have
been studied extensively. A measure that takes values in the set of self-adjoint projections on a
Hilbert space is called a projection-valued measure; these are used mainly in functional analysis
for the spectral theorem. When it is necessary to distinguish the usual measures which take non-
negative values from generalizations, the term positive measure is used. Positive measures are
closed under conical combination but not general linear combination, while signed measures are
the linear closure of positive measures.
Another generalization is the finitely additive measure, which are sometimes called contents.
This is the same as a measure except that instead of requiring countable additivity we require
only finite additivity. Historically, this definition was used first, but proved to be not so useful. It
turns out that in general, finitely additive measures are connected with notions such as Banach
limits, the dual of L∞ and the Stone–Čech compactification. All these are linked in one way or
another to the axiom of choice.
A charge is a generalization in both directions: it is a finitely additive, signed measure.
The remarkable result in integral geometry known as Hadwiger's theorem states that the space of
translation-invariant, finitely additive, not-necessarily-nonnegative set functions defined on finite
unions of compact convex sets in Rn consists (up to scalar multiples) of one "measure" that is
"homogeneous of degree k" for each k = 0, 1, 2, ..., n, and linear combinations of those
"measures". "Homogeneous of degree k" means that rescaling any set by any factor c> 0
multiplies the set's "measure" by ck. The one that is homogeneous of degree n is the ordinary n-
dimensional volume. The one that is homogeneous of degree n − 1 is the "surface volume". The
one that is homogeneous of degree 1 is a mysterious function called the "mean width", a
misnomer. The one that is homogeneous of degree 0 is the Euler characteristic.

Measure (mathematics)
Informally, a measure has the property of being monotone in the sense that if A is a
subset of B, the measure of A is less than or equal to the measure of B.
Furthermore, the measure of the empty set is required to be 0.

In mathematics, more specifically in measure theory, a measure on a set is a systematic way to


assign to each suitable subset a number, intuitively interpreted as the size of the subset. In this
sense, a measure is a generalization of the concepts of length, area, volume, et cetera. A
particularly important example is the Lebesgue measure on a Euclidean space, which assigns the
conventional length, area and volume of Euclidean geometry to suitable subsets of Rn,
n=1,2,3,.... For instance, the Lebesgue measure of [0,1] in the real numbers is its length in the
everyday sense of the word, specifically 1.
To qualify as a measure (seeDefinition below), a function that assigns a non-negative real
number or +∞ to a set's subsets must satisfy a few conditions. One important condition is
countable additivity. This condition states that the size of the union of a sequence of disjoint
subsets is equal to the sum of the sizes of the subsets. However, it is in general impossible to
consistently associate a size to each subset of a given set and also satisfy the other axioms of a
measure. This problem was resolved by defining measure only on a sub-collection of all subsets;
the subsets on which the measure is to be defined are called measurable and they are required to
form a sigma-algebra, meaning that unions, intersections and complements of sequences of
measurable subsets are measurable. Non-measurable sets in a Euclidean space, on which the
Lebesgue measure cannot be consistently defined, are necessarily complex to the point of
incomprehensibility, in a sense badly mixed up with their complement; indeed, their existence is
a non-trivial consequence of the axiom of choice.
Measure theory was developed in successive stages during the late 19th and early 20th century
by Emile Borel, Henri Lebesgue, Johann Radon and Maurice Fréchet, among others. The main
applications of measures are in the foundations of the Lebesgue integral, in Andrey
Kolmogorov's axiomatisation of probability theory and in ergodic theory. In integration theory,
specifying a measure allows one to define integrals on spaces more general than subsets of
Euclidean space; moreover, the integral with respect to the Lebesgue measure on Euclidean
spaces is more general and has a richer theory than its predecessor, the Riemann integral.
Probability theory considers measures that assign to the whole set the size 1, and considers
measurable subsets to be events whose probability is given by the measure. Ergodic theory
considers measures that are invariant under, or arise naturally from, a dynamical system.
[edit]Definition
Let Σ be a σ-algebra over a set X. A functionμ from Σ to the extended real number line is called
a measure if it satisfies the following properties:
• Non-negativity:

for all
• Null empty set:

• Countable additivity (or σ-additivity): For all countable collections {Ei} of


pairwise disjoint sets in Σ:

The second condition may be treated as a special case of countable additivity, if the empty
collection is allowed as a countable collection (and the empty sum is interpreted as 0).
Otherwise, if the empty collection is disallowed (but finite collections are allowed), the second
condition still follows from countable additivity provided, however, that there is at least one set
having finite measure.
The pair (X, Σ) is called a measurable space, the members of Σ are called measurable sets,
and the triple(X, Σ, μ) is called a measure space.
If only the second and third conditions are met, and μ takes on at most one of the values ±∞,
then μ is called a signed measure.
A probability measure is a measure with total measure one (i.e., μ(X) = 1); a probability
space is a measure space with a probability measure.
For measure spaces that are also topological spaces various compatibility conditions can be
placed for the measure and the topology. Most measures met in practice in analysis (and in many
cases also in probability theory) are Radon measures. Radon measures have an alternative
definition in terms of linear functionals on the locally convex space of continuous functions with
compact support. This approach is taken by Bourbaki (2004) and a number of other authors. For
more details see Radon measure.
[edit]Properties
Several further properties can be derived from the definition of a countably additive measure.
[edit]Monotonicity
A measure μ is monotonic: If E1 and E2 are measurable sets with E1 ⊆E2 then

[edit]Measures of infinite unions of measurable sets


A measure μ is countably subadditive: If E1, E2, E3, … is a countable sequence of sets in Σ, not
necessarily disjoint, then

A measure μ is continuous from below: If E1, E2, E3, … are measurable sets and En is a subset of
En + 1 for all n, then the union of the sets En is measurable, and

[edit]Measures of infinite intersections of measurable sets


A measure μ is continuous from above: If E1, E2, E3, … are measurable sets and En + 1 is a subset
of En for all n, then the intersection of the sets En is measurable; furthermore, if at least one of the
En has finite measure, then

This property is false without the assumption that at least one of the En has finite measure. For
instance, for each n∈N, let

which all have infinite Lebesgue measure, but the intersection is empty.
[edit]Sigma-finite measures
Main article: Sigma-finite measure

A measure space (X, Σ, μ) is called finite if μ(X) is a finite real number (rather than ∞). It is
called σ-finite if X can be decomposed into a countable union of measurable sets of finite
measure. A set in a measure space has σ-finite measure if it is a countable union of sets with
finite measure.
For example, the real numbers with the standard Lebesgue measure are σ-finite but not finite.
Consider the closed intervals [k,k+1] for all integersk; there are countably many such intervals,
each has measure 1, and their union is the entire real line. Alternatively, consider the real
numbers with the counting measure, which assigns to each finite set of reals the number of points
in the set. This measure space is not σ-finite, because every set with finite measure contains only
finitely many points, and it would take uncountably many such sets to cover the entire real line.
The σ-finite measure spaces have some very convenient properties; σ-finiteness can be compared
in this respect to the Lindelöf property of topological spaces. They can be also thought of as a
vague generalization of the idea that a measure space may have 'uncountable measure'.
[edit]Completeness
A measurable set X is called a null set if μ(X)=0. A subset of a null set is called a negligible set.
A negligible set need not be measurable, but every measurable negligible set is automatically a
null set. A measure is called complete if every negligible set is measurable.
A measure can be extended to a complete one by considering the σ-algebra of subsets Y which
differ by a negligible set from a measurable set X, that is, such that the symmetric difference of X
and Y is contained in a null set. One defines μ(Y) to equal μ(X).
[edit]Examples
Some important measures are listed here.
• The counting measure is defined by μ(S) = number of elements in S.
• The Lebesgue measure on R is a complete translation-invariant measure on a
σ-algebra containing the intervals in R such that μ([0,1]) = 1; and every other
measure with these properties extends Lebesgue measure.
• Circular angle measure is invariant under rotation.
• The Haar measure for a locally compacttopological group is a generalization
of the Lebesgue measure (and also of counting measure and circular angle
measure) and has similar uniqueness properties.
• The Hausdorff measure which is a refinement of the Lebesgue measure to
some fractal sets.
• Every probability space gives rise to a measure which takes the value 1 on
the whole space (and therefore takes all its values in the unit interval [0,1]).
Such a measure is called a probability measure. See probability axioms.
• The Dirac measure δa (cf. Dirac delta function) is given by δa(S) = χS(a),
where χS is the characteristic function of S. The measure of a set is 1 if it
contains the point a and 0 otherwise.
Other 'named' measures used in various theories include: Borel measure, Jordan measure,
ergodic measure, Euler measure, Gaussian measure, Baire measure, Radon measure.
[edit]Non-measurable sets
Main article: Non-measurable set
If the axiom of choice is assumed to be true, not all subsets of Euclidean space are Lebesgue
measurable; examples of such sets include the Vitali set, and the non-measurable sets postulated
by the Hausdorff paradox and the Banach–Tarski paradox.
[edit]Generalizations
For certain purposes, it is useful to have a "measure" whose values are not restricted to the non-
negative reals or infinity. For instance, a countably additive set function with values in the
(signed) real numbers is called a signed measure, while such a function with values in the
complex numbers is called a complex measure. Measures that take values in Banach spaces have
been studied extensively. A measure that takes values in the set of self-adjoint projections on a
Hilbert space is called a projection-valued measure; these are used mainly in functional analysis
for the spectral theorem. When it is necessary to distinguish the usual measures which take non-
negative values from generalizations, the term positive measure is used. Positive measures are
closed under conical combination but not general linear combination, while signed measures are
the linear closure of positive measures.
Another generalization is the finitely additive measure, which are sometimes called contents.
This is the same as a measure except that instead of requiring countable additivity we require
only finite additivity. Historically, this definition was used first, but proved to be not so useful. It
turns out that in general, finitely additive measures are connected with notions such as Banach
limits, the dual of L∞ and the Stone–Čech compactification. All these are linked in one way or
another to the axiom of choice.
A charge is a generalization in both directions: it is a finitely additive, signed measure.
The remarkable result in integral geometry known as Hadwiger's theorem states that the space of
translation-invariant, finitely additive, not-necessarily-nonnegative set functions defined on finite
unions of compact convex sets in Rn consists (up to scalar multiples) of one "measure" that is
"homogeneous of degree k" for each k = 0, 1, 2, ..., n, and linear combinations of those
"measures". "Homogeneous of degree k" means that rescaling any set by any factor c> 0
multiplies the set's "measure" by ck. The one that is homogeneous of degree n is the ordinary n-
dimensional volume. The one that is homogeneous of degree n − 1 is the "surface volume". The
one that is homogeneous of degree 1 is a mysterious function called the "mean width", a
misnomer. The one that is homogeneous of degree 0 is the Euler characteristic.

Tail risk
Tail risk is the risk of an asset or portfolio of assets moving more than 3 standard deviation from
its current price in a probability density function.[1]. This is often under estimated using normal
statistical methods for calculating the probability of changes in the price of financial assets.
The normal distribution which can be used for calculating the probability of sudden asset price
changes is particularly prone to this type of error. However, many if not most types of analysis
are prone to this error to a lesser scale.[2]

Liquidity risk
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In finance, liquidity risk is the risk that a given security or asset cannot be traded quickly
enough in the market to prevent a loss (or make the required profit).
[edit]Types of Liquidity Risk
Asset liquidity - An asset cannot be sold due to lack of liquidity in the market - essentially a
sub-set of market risk. This can be accounted for by:
• Widening bid/offer spread
• Making explicit liquidity reserves
• Lengthening holding period for VaR calculations
Funding liquidity - Risk that liabilities:
• Cannot be met when they fall due
• Can only be met at an uneconomic price
• Can be name-specific or systemic

[edit]Causes of liquidity risk


Liquidity risk arises from situations in which a party interested in trading an asset cannot do it
because nobody in the market wants to trade that asset. Liquidity risk becomes particularly
important to parties who are about to hold or currently hold an asset, since it affects their ability
to trade.
Manifestation of liquidity risk is very different from a drop of price to zero. In case of a drop of
an asset's price to zero, the market is saying that the asset is worthless. However, if one party
cannot find another party interested in trading the asset, this can potentially be only a problem of
the market participants with finding each other. This is why liquidity risk is usually found higher
in emerging markets or low-volume markets.
Liquidity risk is financial risk due to uncertain liquidity. An institution might lose liquidity if its
credit rating falls, it experiences sudden unexpected cash outflows, or some other event causes
counterparties to avoid trading with or lending to the institution. A firm is also exposed to
liquidity risk if markets on which it depends are subject to loss of liquidity.
Liquidity risk tends to compound other risks. If a trading organization has a position in an
illiquid asset, its limited ability to liquidate that position at short notice will compound its market
risk. Suppose a firm has offsetting cash flows with two different counterparties on a given day. If
the counterparty that owes it a payment defaults, the firm will have to raise cash from other
sources to make its payment. Should it be unable to do so, it too will default. Here, liquidity risk
is compounding credit risk.
A position can be hedged against market risk but still entail liquidity risk. This is true in the
above credit risk example—the two payments are offsetting, so they entail credit risk but not
market risk. Another example is the 1993 Metallgesellschaft debacle. Futures contracts were
used to hedge an Over-the-counter finance OTC obligation. It is debatable whether the hedge
was effective from a market risk standpoint, but it was the liquidity crisis caused by staggering
margin calls on the futures that forced Metallgesellschaft to unwind the positions.
Accordingly, liquidity risk has to be managed in addition to market, credit and other risks.
Because of its tendency to compound other risks, it is difficult or impossible to isolate liquidity
risk. In all but the most simple of circumstances, comprehensive metrics of liquidity risk do not
exist. Certain techniques of asset-liability management can be applied to assessing liquidity risk.
A simple test for liquidity risk is to look at future net cash flows on a day-by-day basis. Any day
that has a sizeable negative net cash flow is of concern. Such an analysis can be supplemented
with stress testing. Look at net cash flows on a day-to-day basis assuming that an important
counterparty defaults.
Analyses such as these cannot easily take into account contingent cash flows, such as cash flows
from derivatives or mortgage-backed securities. If an organization's cash flows are largely
contingent, liquidity risk may be assessed using some form of scenario analysis. A general
approach using scenario analysis might entail the following high-level steps:
• Construct multiple scenarios for market movements and defaults over a
given period of time
• Assess day-to-day cash flows under each scenario.
Because balance sheets differ so significantly from one organization to the next, there is little
standardization in how such analyses are implemented.
Regulators are primarily concerned about systemic and implications of liquidity risk.
[edit]Measures of liquidity risk
[edit]Liquidity gap
Culp defines the liquidity gap as the net liquid assets of a firm. The excess value of the firm's
liquid assets over its volatile liabilities. A company with a negative liquidity gap should focus on
their cash balances and possible unexpected changes in their values.
As a static measure of liquidity risk it gives no indication of how the gap would change with an
increase in the firm's marginal funding cost.
[edit]Liquidity risk elasticity
Culp denotes the change of net of assets over funded liabilities that occurs when the liquidity
premium on the bank's marginal funding cost rises by a small amount as the liquidity risk
elasticity. For banks this would be measured as a spread over libor, for nonfinancials the LRE
would be measured as a spread over commercial paper rates.
Problems with the use of liquidity risk elasticity are that it assumes parallel changes in funding
spread across all maturities and that it is only accurate for small changes in funding spreads.
[edit]Measures of Asset Liquidity
[edit]Bid-offer spread
The bid-offer spread is used by market participants as an asset liquidity measure. To compare
different products the ratio of the spread to the product's mid price can be used. The smaller the
ratio the more liquid the asset is.
This spread is composed of operational costs, administrative and processing costs as well as the
compensation required for the possibility of trading with a more informed trader.
[edit]Market depth
Hachmeister refers to market depth as the amount of an asset that can be bought and sold at
various bid-ask spreads. Slippage is related to the concept of market depth. Knight and Satchell
mention a flow trader needs to consider the effect of executing a large order on the market and to
adjust the bid-ask spread accordingly. They calculate the liquidity cost as the difference of the
execution price and the initial execution price.
[edit]Immediacy
Immediacy refers to the time needed to successfully trade a certain amount of an asset at a
prescribed cost.
[edit]Resilience
Hachmeister identifies the fourth dimension of liquidity as the speed with which prices return to
former levels after a large transaction. Unlike the other measures resilience can only be
determined over a period of time.
[edit]Managing Liquidity Risk
[edit]Liquidity-adjusted value at risk
Liquidity-adjusted VAR incorporates exogenous liquidity risk into Value at Risk. It can be
defined at VAR + ELC (Exogenous Liquidity Cost). The ELC is the worst expected half-spread
at a particular confidence level.[1]
Another adjustment is to consider VAR over the period of time needed to liquidate the portfolio.
VAR can be calculated over this time period. The BIS mentions "... a number of institutions are
exploring the use of liquidity adjusted-VAR, in which the holding periods in the risk assessment
are adjusted by the length of time required to unwind positions." [2]
[edit]Liquidity at risk
Greenspan (1999) discusses management of foreign exchange reserves. The Liquidity at risk
measure is suggested. A country's liquidity position under a range of possible outcomes for
relevant financial variables (exchange rates, commodity prices, credit spreads, etc.) is
considered. It might be possible to express a standard in terms of the probabilities of different
outcomes. For example, an acceptable debt structure could have an average maturity—averaged
over estimated distributions for relevant financial variables—in excess of a certain limit. In
addition, countries could be expected to hold sufficient liquid reserves to ensure that they could
avoid new borrowing for one year with a certain ex ante probability, such as 95 percent of the
time.[3]
[edit]Scenario analysis-based contingency plans
The FDIC discuss liquidity risk management and write "Contingency funding plans should
incorporate events that could rapidly affect an institution’s liquidity, including a sudden inability
to securitize assets, tightening of collateral requirements or other restrictive terms associated
with secured borrowings, or the loss of a large depositor or counterparty.".[4] Greenspan's
liquidity at risk concept is an example of scenario based liquidity risk management.
[edit]Diversification of liquidity providers
If several liquidity providers are on call then if any of those providers increases its costs of
supplying liquidity, the impact of this is reduced. The American Academy of Actuaries wrote
"While a company is in good financial shape, it may wish to establish durable, ever-green (i.e.,
always available) liquidity lines of credit. The credit issuer should have an appropriately high
credit rating to increase the chances that the resources will be there when needed." [5]
[edit]Derivatives
Bhaduri, Meissner and Youn discuss five derivatives created specifically for hedging liquidity
risk.:
• Withdrawal option: A put of the illiquid underlying at the market price.
• Bermudan-style return put option: Right to put the option at a specified
strike.
• Return swap: Swap the underlying's return for LIBOR paid periodicially.
• Return swaption: Option to enter into the return swap.
• Liquidity option: "Knock-in" barrier option, where the barrier is a liquidity
metric.

[edit]Case Studies
[edit]Amaranth Advisors LLC - 2006
Amaranth Advisors lost roughly $6bn in the natural gas futures market back in September 2006.
Amaranth had a concentrated, undiversified position in its natural gas strategy. The trader had
used leverage to build a very large position. Amaranth’s positions were staggeringly large,
representing around 10% of the global market in natural gas futures.[6] Chincarini notes that firms
need to manage liquidity risk explicitly. The inability to sell a futures contract at or near the
latest quoted price is related to one’s concentration in the security. In Amaranth’s case, the
concentration was far too high and there were no natural counterparties when they needed to
unwind the positions.[7] Chincarini (2006) argues that part of the loss Amaranth incurred was due
to asset illiquidity. Regression analysis on the 3 week return on natural gas future contracts from
August 31, 2006 to September 21, 2006 against the excess open interest suggested that contracts
whose open interest was much higher on August 31, 2006 than the historical normalized value,
experienced larger negative returns.[8]
[edit]Northern Rock - 2007
Main article: Nationalisation of Northern Rock

Northern Rock suffered from funding liquidity risk back in September 2007 following the
subprime crisis. The firm suffered from liquidity issues despite being solvent at the time, because
maturing loans and deposits could not be renewed in the short-term money markets [9]. In
response, the FSA now places greater supervisory focus on liquidity risk especially with regard
to "high-impact retail firms".[10]
[edit]LTCM - 1998
Long-Term Capital Management (LTCM) was bailed out by a consortium of 14 banks in 1998
after being caught in a cash-flow crisis when economic shocks resulted in excessive mark-to-
market losses and margin calls. The fund suffered from a combination of funding and asset
liquidity. Asset liquidity arose from LTCM failure to account for liquidity becoming more
valuable (as it did following the crisis) . Since much of its balance sheet was exposed to liquidity
risk premium its short positions would increase in price relative to its long positions. This was
essentially a massive, unhedged exposure to a single risk factor.[11] LTCM had been aware of
funding liquidity risk. Indeed, they estimated that in times of severe stress, haircuts on AAA-
rated commercial mortgages would increase from 2% to 10%, and similarly for other securities.
In response to this, LTCM had negotiated long-term financing with margins fixed for several
weeks on many of their collateralized loans. Due to an escalating liquidity spiral, LTCM could
ultimately not fund its positions in spite of its numerous measures to control funding risk.[12]

Pull to par
Pull to Par is the effect in which the price of a bond converges to par value as time passes. At
maturity the price of a debt instrument in good standing should equal its par (or face value).
Another name for this effect is reduction of maturity.
It results from the difference between market interest rate and the nominal yield on the bond.
The Pull to Par effect is one of two factors that influence the market value of the bond and its
volatility (the second one is the level of market interest rates).

Yield curve
From Wikipedia, the free encyclopedia
Jump to: navigation, search

The US dollar yield curve as of 9 February 2005. The curve has a typical upward
sloping shape.

This article is about yield curves as used in finance. For the term's use in physics,
see Yield curve (physics).

In finance, the yield curve is the relation between the interest rate (or cost of borrowing) and the
time to maturity of the debt for a given borrower in a given currency. For example, the U.S.
dollar interest rates paid on U.S. Treasury securities for various maturities are closely watched by
many traders, and are commonly plotted on a graph such as the one on the right which is
informally called "the yield curve." More formal mathematical descriptions of this relation are
often called the term structure of interest rates.
The yield of a debt instrument is the overall rate of return available on the investment. For
instance, a bank account that pays an interest rate of 4% per year has a 4% yield. In general the
percentage per year that can be earned is dependent on the length of time that the money is
invested. For example, a bank may offer a "savings rate" higher than the normal checking
account rate if the customer is prepared to leave money untouched for five years. Investing for a
period of time t gives a yield Y(t).
This function Y is called the yield curve, and it is often, but not always, an increasing function of
t. Yield curves are used by fixed income analysts, who analyze bonds and related securities, to
understand conditions in financial markets and to seek trading opportunities. Economists use the
curves to understand economic conditions.
The yield curve function Y is actually only known with certainty for a few specific maturity
dates, while the other maturities are calculated by interpolation (see Construction of the full yield
curve from market data below).
[edit]The typical shape of the yield curve

The British pound yield curve as of 9 February 2005. This curve is unusual in that
long-term rates are lower than short-term ones.

Yield curves are usually upward sloping asymptotically: the longer the maturity, the higher the
yield, with diminishing marginal increases (that is, as one moves to the right, the curve flattens
out). There are two common explanations for upward sloping yield curves. First, it may be that
the market is anticipating a rise in the risk-free rate. If investors hold off investing now, they may
receive a better rate in the future. Therefore, under the arbitrage pricing theory, investors who are
willing to lock their money in now need to be compensated for the anticipated rise in rates—thus
the higher interest rate on long-term investments.
However, interest rates can fall just as they can rise. Another explanation is that longer maturities
entail greater risks for the investor (i.e. the lender). A risk premium is needed by the market,
since at longer durations there is more uncertainty and a greater chance of catastrophic events
that impact the investment. This explanation depends on the notion that the economy faces more
uncertainties in the distant future than in the near term. This effect is referred to as the liquidity
spread. If the market expects more volatility in the future, even if interest rates are anticipated to
decline, the increase in the risk premium can influence the spread and cause an increasing yield.
The opposite position (short-term interest rates higher than long-term) can also occur. For
instance, in November 2004, the yield curve for UK Government bonds was partially inverted.
The yield for the 10 year bond stood at 4.68%, but was only 4.45% for the 30 year bond. The
market's anticipation of falling interest rates causes such incidents. Negative liquidity premiums
can exist if long-term investors dominate the market, but the prevailing view is that a positive
liquidity premium dominates, so only the anticipation of falling interest rates will cause an
inverted yield curve. Strongly inverted yield curves have historically preceded economic
depressions.
The shape of the yield curve is influenced by supply and demand: for instance if there is a large
demand for long bonds, for instance from pension funds to match their fixed liabilities to
pensioners, and not enough bonds in existence to meet this demand, then the yields on long
bonds can be expected to be low, irrespective of market participants' views about future events.
The yield curve may also be flat or hump-shaped, due to anticipated interest rates being steady,
or short-term volatility outweighing long-term volatility.
Yield curves continually move all the time that the markets are open, reflecting the market's
reaction to news. A further "stylized fact" is that yield curves tend to move in parallel (i.e., the
yield curve shifts up and down as interest rate levels rise and fall).
[edit]Types of yield curve
There is no single yield curve describing the cost of money for everybody. The most important
factor in determining a yield curve is the currency in which the securities are denominated. The
economic position of the countries and companies using each currency is a primary factor in
determining the yield curve. Different institutions borrow money at different rates, depending on
their creditworthiness. The yield curves corresponding to the bonds issued by governments in
their own currency are called the government bond yield curve (government curve). Banks with
high credit ratings (Aa/AA or above) borrow money from each other at the LIBOR rates. These
yield curves are typically a little higher than government curves. They are the most important
and widely used in the financial markets, and are known variously as the LIBOR curve or the
swap curve. The construction of the swap curve is described below.
Besides the government curve and the LIBOR curve, there are corporate (company) curves.
These are constructed from the yields of bonds issued by corporations. Since corporations have
less creditworthiness than most governments and most large banks, these yields are typically
higher. Corporate yield curves are often quoted in terms of a "credit spread" over the relevant
swap curve. For instance the five-year yield curve point for Vodafone might be quoted as LIBOR
+0.25%, where 0.25% (often written as 25 basis points or 25bps) is the credit spread.
[edit]Normal yield curve
From the post-Great Depression era to the present, the yield curve has usually been "normal"
meaning that yields rise as maturity lengthens (i.e., the slope of the yield curve is positive). This
positive slope reflects investor expectations for the economy to grow in the future and,
importantly, for this growth to be associated with a greater expectation that inflation will rise in
the future rather than fall. This expectation of higher inflation leads to expectations that the
central bank will tighten monetary policy by raising short term interest rates in the future to slow
economic growth and dampen inflationary pressure. It also creates a need for a risk premium
associated with the uncertainty about the future rate of inflation and the risk this poses to the
future value of cash flows. Investors price these risks into the yield curve by demanding higher
yields for maturities further into the future.
However, a positively sloped yield curve has not always been the norm. Through much of the
19th century and early 20th century the US economy experienced trend growth with persistent
deflation, not inflation. During this period the yield curve was typically inverted, reflecting the
fact that deflation made current cash flows less valuable than future cash flows. During this
period of persistent deflation, a 'normal' yield curve was negatively sloped.
[edit]Steep yield curve
Historically, the 20-year Treasury bond yield has averaged approximately two percentage points
above that of three-month Treasury bills. In situations when this gap increases (e.g. 20-year
Treasury yield rises higher than the three-month Treasury yield), the economy is expected to
improve quickly in the future. This type of curve can be seen at the beginning of an economic
expansion (or after the end of a recession). Here, economic stagnation will have depressed short-
term interest rates; however, rates begin to rise once the demand for capital is re-established by
growing economic activity.
In January 2010, the gap between yields on two-year Treasury notes and 10-year notes widened
to 2.90 percentage points, its highest ever.
[edit]Flat or humped yield curve
A flat yield curve is observed when all maturities have similar yields, whereas a humped curve
results when short-term and long-term yields are equal and medium-term yields are higher than
those of the short-term and long-term. A flat curve sends signals of uncertainty in the economy.
This mixed signal can revert to a normal curve or could later result into an inverted curve. It
cannot be explained by the Segmented Market theory discussed below.
[edit]Inverted yield curve
An inverted yield curve occurs when long-term yields fall below short-term yields. Under
unusual circumstances, long-term investors will settle for lower yields now if they think the
economy will slow or even decline in the future. An inverted curve has indicated a worsening
economic situation in the future 6 out of 7 times since 1970.[citation needed] The New York Federal
Reserve regards it as a valuable forecasting tool in predicting recessions two to six quarters
ahead. In addition to potentially signaling an economic decline, inverted yield curves also imply
that the market believes inflation will remain low. This is because, even if there is a recession, a
low bond yield will still be offset by low inflation. However, technical factors, such as a flight to
quality or global economic or currency situations, may cause an increase in demand for bonds on
the long end of the yield curve, causing long-term rates to fall. This was seen in 1998 during the
Long Term Capital Management failure when there was a slight inversion on part of the curve.
[edit]Theory
There are four main economic theories attempting to explain how yields vary with maturity. Two
of the theories are extreme positions, while the third attempts to find a middle ground between
the former two.
[edit]Market expectations (pure expectations) hypothesis

This hypothesis assumes that the various maturities are perfect substitutes and suggests that the
shape of the yield curve depends on market participants' expectations of future interest rates.
These expected rates, along with an assumption that arbitrage opportunities will be minimal, is
enough information to construct a complete yield curve. For example, if investors have an
expectation of what 1-year interest rates will be next year, the 2-year interest rate can be
calculated as the compounding of this year's interest rate by next year's interest rate. More
generally, rates on a long-term instrument are equal to the geometric mean of the yield on a
series of short-term instruments. This theory perfectly explains the observation that yields
usually move together. However, it fails to explain the persistence in the shape of the yield
curve.
Shortcomings of expectations theory: Neglects the risks inherent in investing in bonds (because
forward rates are not perfect predictors of future rates). 1) Interest rate risk 2) Reinvestment rate
risk
[edit]Liquidity preference theory
The Liquidity Preference Theory, also known as the Liquidity Premium Theory, is an offshoot of
the Pure Expectations Theory. The Liquidity Preference Theory asserts that long-term interest
rates not only reflect investors’ assumptions about future interest rates but also include a
premium for holding long-term bonds (investors prefer short term bonds to long term bonds),
called the term premium or the liquidity premium. This premium compensates investors for the
added risk of having their money tied up for a longer period, including the greater price
uncertainty. Because of the term premium, long-term bond yields tend to be higher than short-
term yields, and the yield curve slopes upward. Long term yields are also higher not just because
of the liquidity premium, but also because of the risk premium added by the risk of default from
holding a security over the long term. The market expectations hypothesis is combined with the
liquidity preference theory:

Where rpn is the risk premium associated with an n year bond.


[edit]Market segmentation theory
This theory is also called the segmented market hypothesis. In this theory, financial
instruments of different terms are not substitutable. As a result, the supply and demand in the
markets for short-term and long-term instruments is determined largely independently.
Prospective investors decide in advance whether they need short-term or long-term instruments.
If investors prefer their portfolio to be liquid, they will prefer short-term instruments to long-
term instruments. Therefore, the market for short-term instruments will receive a higher demand.
Higher demand for the instrument implies higher prices and lower yield. This explains the
stylized fact that short-term yields are usually lower than long-term yields. This theory explains
the predominance of the normal yield curve shape. However, because the supply and demand of
the two markets are independent, this theory fails to explain the observed fact that yields tend to
move together (i.e., upward and downward shifts in the curve).
In an empirical study, 2000 Alexandra E. MacKay, Eliezer Z. Prisman, and Yisong S. Tian found
segmentation in the market for Canadian government bonds, and attributed it to differential
taxation.
For a brief period in the last week of 2005, and again in early 2006, the US Dollar yield curve
inverted, with short-term yields actually exceeding long-term yields. Market segmentation theory
would attribute this to an investor preference for longer term securities, particularly from pension
funds and foreign investors who prefer guaranteed longer term yields.
[edit]Preferred habitat theory
The Preferred Habitat Theory is another guise of the Market Segmentation theory, and states that
in addition to interest rate expectations, investors have distinct investment horizons and require a
meaningful premium to buy bonds with maturities outside their "preferred" maturity, or habitat.
Proponents of this theory believe that short-term investors are more prevalent in the fixed-
income market, and therefore longer-term rates tend to be higher than short-term rates, for the
most part, but short-term rates can be higher than long-term rates occasionally. This theory is
consistent with both the persistence of the normal yield curve shape and the tendency of the yield
curve to shift up and down while retaining its shape.
[edit]Historical development of yield curve theory
On 15 August 1971, U.S. President Richard Nixon announced that the U.S. dollar would no
longer be based on the gold standard, thereby ending the Bretton Woods system and initiating the
era of floating exchange rates.
Floating exchange rates made life more complicated for bond traders, including importantly
those at Salomon Brothers in New York. By the middle of the 1970s, encouraged by the head of
bond research at Salomon, Marty Liebowitz, traders began thinking about bond yields in new
ways. Rather than think of each maturity (a ten year bond, a five year, etc.) as a separate
marketplace, they began drawing a curve through all their yields. The bit nearest the present time
became known as the short end—yields of bonds further out became, naturally, the long end.
Academics had to play catch up with practitioners in this matter. One important theoretic
development came from a Czech mathematician, Oldrich Vasicek, who argued in a 1977 paper
that bond prices all along the curve are driven by the short end (under risk neutral equivalent
martingale measure) and accordingly by short-term interest rates. The mathematical model for
Vasicek's work was given by an Ornstein-Uhlenbeck process, but has since been discredited
because the model predicts a positive probability that the short rate becomes negative and is
inflexible in creating yield curves of different shapes. Vasicek's model has been superseded by
many different models including the Hull-White model (which allows for time varying
parameters in the Ornstein-Uhlenbeck process), the Cox-Ingersoll-Ross model, which is a
modified Bessel process, and the Heath-Jarrow-Morton framework. There are also many
modifications to each of these models, but see the article on short rate model. Another modern
approach is the LIBOR Market Model, introduced by Brace, Gatarek and Musiela in 1997 and
advanced by others later. In 1996 a group of derivatives traders led by Olivier Doria (then head
of swaps at Deutsche Bank) and Michele Faissola, contributed to an extension of the swap yield
curves in all the major European currencies. Until then the market would give prices until 15
years maturities. The team extended the maturity of European yield curves up to 50 years (for the
lira, French franc, Deutsche mark, Danish krona and many other currencies including the ecu).
This innovation was a major contribution towards the issuance of long dated zero coupon bonds
and the creation of long dated mortgages.
Construction of the full yield curve from market data
The usual Typical inputs to the money market curve
representation of the
yield curve is a
function P, defined on Type Settlement date Rate (%)
all future times t, such
that P(t) represents the Cash Overnight rate 5.58675
value today of
receiving one unit of Cash Tomorrow next rate 5.59375
currency t years in the
future. If P is defined
for all future t then we Cash 1m 5.625
can easily recover the
yield (i.e. the Cash 3m 5.71875
annualized interest
rate) for borrowing Future Dec-97 5.76
money for that period
of time via the formula
Future Mar-98 5.77
The significant
difficulty in defining a Future Jun-98 5.82
yield curve therefore is
to determine the Future Sep-98 5.88
function P(t). P is
called the discount
factor function. Future Dec-98 6.00

Yield curves are built


from either prices Swap 2y 6.01253
available in the bond
market or the money Swap 3y 6.10823
market. Whilst the
yield curves built from Swap 4y 6.16
the bond market use
prices only from a
specific class of bonds Swap 5y 6.22
(for instance bonds
issued by the UK Swap 7y 6.32
government) yield
curves built from the Swap 10y 6.42
money market uses
prices of "cash" from
today's LIBOR rates, Swap 15y 6.56
which determine the
"short end" of the curve Swap 20y 6.56
i.e. for t ≤ 3m, futures
which determine the
Swap 30y 6.56

A list of standard instruments used to build a money market yield curve.


mid-section of the curve (3m ≤ t ≤ 15m) and interest rate swaps which determine the "long end"
(1y ≤ t ≤ 60y).
In either case the available market data provides a matrix A of cash flows, each row representing
a particular financial instrument and each column representing a point in time. The (i,j)-th
element of the matrix represents the amount that instrument i will pay out on day j. Let the vector
F represent today's prices of the instrument (so that the i-th instrument has value F(i)), then by
definition of our discount factor function P we should have that F = A*P (this is a matrix
multiplication). Actually noise in the financial markets means it is not possible to find a P that
solves this equation exactly, and our goal becomes to find a vector P such that
A*P=F+ε
where ε is as small a vector as possible (where the size of a vector might be measured by taking
its norm, for example).
Note that even if we can solve this equation, we will only have determined P(t) for those t which
have a cash flow from one or more of the original instruments we are creating the curve from.
Values for other t are typically determined using some sort of interpolation scheme.
Practitioners and researchers have suggested many ways of solving the A*P = F equation. It
transpires that the most natural method - that of minimizing ε by least squares regression - leads
to unsatisfactory results. The large number of zeroes in the matrix A mean that function P turns
out to be "bumpy".
In their comprehensive book on interest rate modelling James and Webber note that the
following techniques have been suggested to solve the problem of finding P:
1. Approximation using Lagrange polynomials
2. Fitting using parameterised curves (such as splines, the Nelson-Siegel family,
the Svensson family or the Cairns restricted-exponential family of curves).
Van Deventer, Imai and Mesler summarize three different techniques for
curve fitting that satisfy the maximum smoothness of either forward interest
rates, zero coupon bond prices, or zero coupon bond yields
3. Local regression using kernels
4. Linear programming
In the money market practitioners might use different techniques to solve for different areas of
the curve. For example at the short end of the curve, where there are few cashflows, the first few
elements of P may be found by bootstrapping from one to the next. At the long end, a regression
technique with a cost function that values smoothness might be used.

Risk-neutral measure
In mathematical finance, a risk-neutral measure, equivalent martingale measure, or Q-
measure is a probability measure that results when one assumes that the current value of all
financial assets is equal to the expected value of the future payoff of the asset discounted at the
risk-free rate. The concept is used in the pricing of derivatives.
[edit]Idea
In an actual economy, prices of assets depend crucially on their risk. Investors typically demand
payment for bearing uncertainty. Therefore, today's price of a claim on a risky amount realised
tomorrow will generally differ from its expected value. Most commonly,[1] investors are risk-
averse and today's price is below the expectation, remunerating those who bear the risk.
To price assets, consequently, the calculated expected values need to be adjusted for the risk
involved (see also Sharpe ratio).
It turns out, under certain weak conditions (absence of arbitrage) there is an alternative way to do
this calculation: Instead of first taking the expectation and then adjusting for risk, one can first
adjust the probabilities of future outcomes such that they incorporate the effects of risk, and then
take the expectation under those different probabilities. Those adjusted, 'virtual' probabilities are
called risk-neutral probabilities, they constitute the risk-neutral measure.
It is important to note that clearly the probabilities over asset outcomes in the real world cannot
be impacted; the constructed probabilities are counterfactual. They are only computed because
the second way of pricing, called risk-neutral pricing, is often much simpler to calculate than the
first.
The main benefit stems from the fact that once the risk-neutral probabilities are found, every
asset can be priced by simply taking its expected payoff (i.e. calculating as if investors were risk
neutral). If we used the real-world, physical probabilities, every security would require a
different adjustment (as they differ in riskiness).
Note that under the risk-neutral measure all assets have the same expected rate of return, the risk-
free rate (or short rate). This does not imply the assumption that investors were risk neutral. On
the contrary, the point is to price given exactly the risk aversion we observe in the physical
world. Towards that aim, we hypothesize about parallel universes where everybody is risk
neutral. The risk-neutral measure is the probability measure of that parallel universe where all
claims have exactly the prices they have in our real world.
Mathematically, adjusting the probabilities is a measure transformation to an equivalent
martingale measure; it is possible if there are no arbitrage opportunities. If the markets are
complete, the risk-neutral measure is unique.
Often, the physical measure is called P, and the risk-neutral one Q. The term physical measure
is often abused to denote the Lebesgue measure, occasionally, the measure induced by the
corresponding normal density with respect to the Lebesgue measure.
[edit]Usage
Risk-neutral measures make it easy to express the value of a derivative in a formula. Suppose at
a future time T a derivative (e.g., a call option on a stock) pays HT units, where HT is a random
variable on the probability space describing the market. Further suppose that the discount factor
from now (time zero) until time T is P(0,T). Then today's fair value of the derivative is

where the risk-neutral measure is denoted by Q. This can be re-stated in terms of the physical
measure P as
where is the Radon–Nikodym derivative of Q with respect to P.
Another name for the risk-neutral measure is the equivalent martingale measure. If in a financial
market there is just one risk-neutral measure, then there is a unique arbitrage-free price for each
asset in the market. This is the fundamental theorem of arbitrage-free pricing. If there are
more such measures, then in an interval of prices no arbitrage is possible. If no equivalent
martingale measure exists, arbitrage opportunities do.
[edit]Example 1 — Binomial model of stock prices
Given a probability space , consider a single-period binomial model. A probability

measure is called risk neutral if for all . Suppose


we have a two-state economy: the initial stock price S can go either up to S or down to Sd. If
u

the interest rate is R> 0, and , then the risk-neutral probability of an


upward stock movement is given by the number

Given a derivative with payoff Xu when the stock price moves up and Xd when it goes down, we
can price the derivative via

[edit]Example 2 — Brownian motion model of stock prices


Suppose our economy consists of 2 assets, a stock and a risk-free bond, and that we use the
Black-Scholes model. In the model the evolution of the stock price can be described by
Geometric Brownian Motion:

where Wt is a standard Brownian motion with respect to the physical measure. If we define
Girsanov's theorem states that there exists a measure Q under which is a Brownian motion.

is known as the market price of risk. Differentiating and rearranging yields:

Put this back in the original equation:

Q is the unique risk-neutral measure for the model. The (discounted) payoff process of a

derivative on the stock is a martingale under Q. Since S and H are Q-


martingales we can invoke the martingale representation theorem to find a replicating strategy - a

holding of stocks and bonds that pays off Ht at all times .

[edit]Notes
1. ^ At least in large financial markets. Example of risk-seeking markets are
casinos and lotteries.

Growth investing
Growth investing is a style of investment strategy. Those who follow this style, known as
growth investors, invest in companies that exhibit signs of above-average growth, even if the
share price appears expensive in terms of metrics such as price-to-earnings or price-to-book
ratios. In typical usage, the term "growth investing" contrasts with the strategy known as value
investing.
However, some notable investors such as Warren Buffett have stated that there is no theoretical
difference between the concepts of value and growth ("Growth and Value Investing are joined at
the hip"), in consideration of the concept of an asset's intrinsic value. In addition, when just
investing in one style of stocks, diversification could be negatively impacted.
Thomas Rowe Price, Jr. has been called "the father of growth investing".[1]
[edit]Growth at reasonable price
After the busting of the dotcom bubble, "growth at any price" has fallen from favour. Attaching a
high price to a security in the hope of high growth may be risky, since if the growth rate fails to
live up to expectations, the price of the security can plummet. It is often more fashionable now to
seek out stocks with high growth rates that are trading at reasonable valuations.
[edit]Growth investment vehicles
There are many ways to execute a growth investment strategy. Some of these include:
• Emerging markets
• Recovery shares
• Blue chips
• Internet and technology stock
• Smaller companies
• Special situations
• Second-hand life policies

Value investing
Value investing is an investmentparadigm that derives from the ideas on investment and
speculation that Ben Graham&David Dodd began teaching at Columbia Business School in 1928
and subsequently developed in their 1934 text Security Analysis. Although value investing has
taken many forms since its inception, it generally involves buying securities whose shares appear
underpriced by some form(s) of fundamental analysis.[1] As examples, such securities may be
stock in public companies that trade at discounts to book value or tangible book value, have high
dividend yields, have low price-to-earning multiples or have low price-to-book ratios.
High-profile proponents of value investing, including Berkshire Hathaway chairman Warren
Buffett, have argued that the essence of value investing is buying stocks at less than their
intrinsic value.[2] The discount of the market price to the intrinsic value is what Benjamin
Graham called the "margin of safety". The intrinsic value is the discounted value of all future
distributions.
However, the future distributions and the appropriate discount rate can only be assumptions.
Warren Buffett has taken the value investing concept even further as his thinking has evolved to
where for the last 25 years or so his focus has been on "finding an outstanding company at a
sensible price" rather than generic companies at a bargain price.
[edit]History
[edit]Benjamin Graham
Value investing was established by Benjamin Graham and David Dodd, both professors at
Columbia Business School and teachers of many famous investors. In Graham's book The
Intelligent Investor, he advocated the important concept of margin of safety — first introduced in
Security Analysis, a 1934 book he coauthored with David Dodd — which calls for a cautious
approach to investing. In terms of picking stocks, he recommended defensive investment in
stocks trading below their tangible book value as a safeguard to adverse future developments
often encountered in the stock market.
[edit]Further evolution
However, the concept of value (as well as "book value") has evolved significantly since the
1970s. Book value is most useful in industries where most assets are tangible. Intangible assets
such as patents, software, brands, or goodwill are difficult to quantify, and may not survive the
break-up of a company. When an industry is going through fast technological advancements, the
value of its assets is not easily estimated. Sometimes, the production power of an asset can be
significantly reduced due to competitive disruptive innovation and therefore its value can suffer
permanent impairment. One good example of decreasing asset value is a personal computer. An
example of where book value does not mean much is the service and retail sectors. One modern
model of calculating value is the discounted cash flow model (DCF). The value of an asset is the
sum of its future cash flows, discounted back to the present.
[edit]Value investing performance
[edit]Performance, value strategies
Value investing has proven to be a successful investment strategy. There are several ways to
evaluate its success. One way is to examine the performance of simple value strategies, such as
buying low PE ratio stocks, low price-to-cash-flow ratio stocks, or low price-to-book ratio
stocks. Numerous academics have published studies investigating the effects of buying value
stocks. These studies have consistently found that value stocks outperform growth stocks and the
market as a whole.[3][4][5]
[edit]Performance, value investors
Another way to examine the performance of value investing strategies is to examine the
investing performance of well-known value investors. Simply examining the performance of the
best known value investors would not be instructive, because investors do not become well
known unless they are successful. This introduces a selection bias. A better way to investigate
the performance of a group of value investors was suggested by Warren Buffett, in his May 17,
1984 speech that was published as The Superinvestors of Graham-and-Doddsville. In this
speech, Buffett examined the performance of those investors who worked at Graham-Newman
Corporation and were thus most influenced by Benjamin Graham. Buffett's conclusion is
identical to that of the academic research on simple value investing strategies--value investing is,
on average, successful in the long run.
During about a 25-year period (1965-90), published research and articles in leading journals of
the value ilk were few. Warren Buffett once commented, "You couldn't advance in a finance
department in this country unless you taught that the world was flat."[6]
[edit]Well known value investors
Benjamin Graham is regarded by many to be the father of value investing. Along with David
Dodd, he wrote Security Analysis, first published in 1934. The most lasting contribution of this
book to the field of security analysis was to emphasize the quantifiable aspects of security
analysis (such as the evaluations of earnings and book value) while minimizing the importance
of more qualitative factors such as the quality of a company's management. Graham later wrote
The Intelligent Investor, a book that brought value investing to individual investors. Aside from
Buffett, many of Graham's other students, such as William J. Ruane, Irving Kahn and Charles
Brandes have gone on to become successful investors in their own right.
Graham's most famous student, however, is Warren Buffett, who ran successful investing
partnerships before closing them in 1969 to focus on running Berkshire Hathaway. Charlie
Munger joined Buffett at Berkshire Hathaway in the 1970s and has since worked as Vice
Chairman of the company. Buffett has credited Munger with encouraging him to focus on long-
term sustainable growth rather than on simply the valuation of current cash flows or assets.[7]
Columbia Business School has played a significant role in shaping the principles of the Value
Investor, with Professors and students making their mark on history and on each other. Ben
Graham’s book, The Intelligent Investor, was Warren Buffett’s bible and he referred to it as "the
greatest book on investing ever written.” A young Warren Buffett studied under Prof. Ben
Graham, took his course and worked for his small investment firm, Graham Newman, from 1954
to 1956. Twenty years after Ben Graham, Prof. Roger Murray arrived and taught value investing
to a young student named Mario Gabelli. About a decade or so later, Prof. Bruce Greenwald
arrived and produced his own protégés, including Mr. Paul Sonkin - just as Ben Graham had Mr.
Buffett as a protégé, and Roger Murray had Mr. Gabelli.
Mutual Series has a well known reputation of producing top value managers and analysts in this
modern era. This tradition stems from two individuals: the late great value mind Max Heine,
founder of the well regarded value investment firm Mutual Shares fund in 1949 and his protégé
legendary value investor Michael F. Price. Mutual Series was sold to Franklin Templeton in
1996. The disciples of Heine and Price quietly practice value investing at some of the most
successful investment firms in the country.
Seth Klarman is a Mutual Series alum and the founder and president of The Baupost Group, a
Boston-based private investment partnership, authored Margin of Safety, Risk Averse Investing
Strategies for the Thoughtful Investor, which since has become a value investing classic. Now
out of print, Margin of Safety has sold on Amazon for $1,200 and eBay for $2,000.[8] Another
famous value investor is John Templeton. He first achieved investing success by buying shares
of a number of companies in the aftermath of the stock market crash of 1929.
Martin J. Whitman is another well-regarded value investor. His approach is called safe-and-
cheap, which was hitherto referred to as financial-integrity approach. Martin Whitman focuses
on acquiring common shares of companies with extremely strong financial position at a price
reflecting meaningful discount to the estimated NAV of the company concerned. Martin
Whitman believes it is ill-advised for investors to pay much attention to the trend of macro-
factors (like employment, movement of interest rate, GDP, etc.) not so much because they are
not important as because attempts to predict their movement are almost always futile. Martin
Whitman's letters to shareholders of his Third Avenue Value Fund (TAVF) are considered
valuable resources "for investors to pirate good ideas" by another famous investor Joel
Greenblatt in his book on special-situation investment "You Can Be a Stock Market Genius"
(ISBN 0-684-84007-3, pp 247)
Joel Greenblatt achieved annual returns at the hedge fund Gotham Capital of over 50% per year
for 10 years from 1985 to 1995 before closing the fund and returning his investors' money. He is
known for investing in special situations such as spin-offs, mergers, and divestitures.
Charles de Vaulx and Jean-Marie Eveillard are well known global value managers. For a time,
these two were paired up at the First Eagle Funds, compiling an enviable track record of risk-
adjusted outperformance. For example, Morningstar designated them the 2001 "International
Stock Manager of the Year" and de Vaulx earned second place from Morningstar for 2006.
Eveillard is known for his Bloomberg appearances where he insists that securities investors
never use margin or leverage. The point made is that margin should be considered the anathema
of value investing, since a negative price move could prematurely force a sale. In contrast, a
value investor must be able and willing to be patient for the rest of the market to recognize and
correct whatever pricing issue created the momentary value. Eveillard correctly labels the use of
margin or leverage as speculation, the opposite of value investing.
[edit]Criticism
An issue with buying shares in a bear market is that despite appearing undervalued at one time,
prices can still drop along with the market.[9]
An issue with not buying shares in a bull market is that despite appearing overvalued at one time,
prices can still rise along with the market.
Another issue is the method of calculating the "intrinsic value". Two investors can analyze the
same information and reach different conclusions regarding the intrinsic value of the company.
There is no systematic or standard way to valuate a stock.[10]

Financial market efficiency


In the 1970s Eugene Fama defined an efficient financial market as "one in which prices always
fully reflect available information”.[1]
The most common type of efficiency referred to in financial markets is the allocative efficiency,
or the efficiency of allocating resources.
This includes producing the right goods for the right people at the right price.
A trait of allocatively efficient financial market is that it channels funds from the ultimate lenders
to the ultimate borrowers in a way that the funds are used in the most socially useful manner.
[edit]Market efficiency levels
Eugene Fama identified three levels of market efficiency:
1. Weak-form efficiency
Prices of the securities instantly and fully reflect all information of the past prices. This means
future price movements cannot be predicted by using past prices.
2. Semi-strong efficiency
Asset prices fully reflect all of the publicly available information. Therefore, only investors with
additional inside information could have advantage on the market.
3. Strong-form efficiency
Asset prices fully reflect all of the public and inside information available. Therefore, no one can
have advantage on the market in predicting prices since there is no data that would provide any
additional value to the investors.
[edit]Efficient Market Hypothesis (EMH)
Fama also created the Efficient Market Hypothesis (EMH) theory, which states that in any given
time, the prices on the market already reflect all known information, and also change fast to
reflect new information.
Therefore, no one could outperform the market by using the same information that is already
available to all investors, except through luck.[2]
[edit]Random Walk theory
Another theory related to the efficient market hypothesis created by Louis Bachelier is the
“random walk” theory, which states that the prices in the financial markets evolve randomly and
are not connected, they are independent of each other.
Therefore, identifying trends or patterns of price changes in a market couldn’t be used to predict
the future value of financial instruments.
[edit]Evidence
[edit]Evidence of Financial Market Efficiency
• Predicting future asset prices is not always accurate (represents weak
efficiency form)
• Asset prices always reflect all new available information quickly (represents
semi-strong efficiency form)
• Investors can't outperform on the market often (represents strong efficiency
form)
[edit]Evidence of Financial Market In-Efficiency
• January effect (repeating and predictable price movements and patterns
occur on the market)
• Stock market crashes
• Investors that often outperform on the market such as Warren Buffet[3]

[edit]Market efficiency types


James Tobin identified four efficiency types that could be present in a financial market:[4]
1. Information arbitrage efficiency
Asset prices fully reflect all of the publicly available information (the least demanding
requirement for efficient market, since arbitrage includes realizable, risk free transactions)
Arbitrage involves taking advantage of price differences of financial instruments between 2 or
more markets by trading to generate profit.
It involves only risk-free transactions and the information used for trading is obtained at no cost.
Therefore, the profit opportunities are not fully exploited, and it can be said that arbitrage is a
result of market inefficiency.
This reflects the weak-information efficiency model.
2. Fundamental valuation efficiency
Asset prices reflect the expected future flows of payments associated with holding the assets
(profit forecasts are correct, they attract investors)
Fundamental valuation involves greater risks and greater profit opportunities. It refers to the
accuracy of the predicted return on the investment.
Financial markets are characterized by unpredictability and constant misalignments that force the
prices to always deviate from their fundamental valuations.
This reflects the semi-strong information efficiency model.
3. Full insurance efficiency
It ensures the continuous delivery of goods and services in all contingencies.
4. Functional/Operational efficiency
The products and services available at the financial markets are provided for the least cost and
are directly useful to the participants.
Every financial market will contain a unique mixture of the identified efficiency types.
[edit]Conclusion
Financial market efficiency is an important topic in the world of Finance. While most
financiers believe the markets are neither 100% efficient, nor 100% inefficient, many disagree
where on the efficiency line the world's markets fall.
It can be concluded that in reality a financial market can’t be considered to be extremely
efficient, or completely inefficient.
The financial markets are a mixture of both, sometimes the market will provide fair returns on
the investment for everyone, while at other times certain investors will generate above average
returns on their investment.
Ironically, thinking that the financial market is inefficient and that it can be “beaten” is what is
actually keeping the financial market functioning efficiently.

Modern portfolio theory


From Wikipedia, the free encyclopedia
Jump to: navigation, search

Capital Market Line

Modern portfolio theory (MPT) is a theory of investment which tries to maximize return and
minimize risk by carefully choosing different assets. Although MPT is widely used in practice in
the financial industry and several of its creators won a Nobel prize for the theory, in recent years
the basic assumptions of MPT have been widely challenged by fields such as behavioral
economics.
MPT is a mathematical formulation of the concept of diversification in investing, with the aim of
selecting a collection of investment assets that has collectively lower risk than any individual
asset. This is possible, in theory, because different types of assets often change in value in
opposite ways. For example, when the prices in the stock market fall, the prices in the bond
market often increase, and vice versa. A collection of both types of assets can therefore have
lower overall risk than either individually.
More technically, MPT models an asset's return as a normally distributedrandom variable,
defines risk as the standard deviation of return, and models a portfolio as a weighted combination
of assets so that the return of a portfolio is the weighted combination of the assets' returns. By
combining different assets whose returns are not correlated, MPT seeks to reduce the total
variance of the portfolio. MPT also assumes that investors are rational and markets are efficient.
MPT was developed in the 1950s through the early 1970s and was considered an important
advance in the mathematical modeling of finance. Since then, much theoretical and practical
criticism has been leveled against it. These include the fact that financial returns do not follow a
Gaussian distribution and that correlations between asset classes are not fixed but can vary
depending on external events (especially in crises). Further, there is growing evidence that
investors are not rational and markets are not efficient.

[edit]Concept
The fundamental concept behind MPT is that the assets in an investment portfolio cannot be
selected individually, each on their own merits. Rather, it is important to consider how each asset
changes in price relative to how every other asset in the portfolio changes in price.
Investing is a tradeoff between risk and return. In general, assets with higher returns are riskier.
For a given amount of risk, MPT describes how to select a portfolio with the highest possible
return. Or, for a given return, MPT explains how to select a portfolio with the lowest possible
risk (the desired return cannot be more than the highest-returning available security, of course.)[1]
MPT is therefore a form of diversification. Under certain assumptions and for specific
quantitative definitions of risk and return, MPT explains how to find the best possible
diversification strategy.
[edit]History
Harry Markowitz introduced MPT in a 1952 article and a 1959 book.[1]
[edit]Mathematical model
In some sense the mathematical derivation below is MPT, although the basic concepts behind the
model have also been very influential.[1]
This section develops the "classic" MPT model. There have been many extensions since.
[edit]Risk and return
MPT assumes that investors are risk averse, meaning that given two assets that offer the same
expected return, investors will prefer the less risky one. Thus, an investor will take on increased
risk only if compensated by higher expected returns. Conversely, an investor who wants higher
returns must accept more risk. The exact trade-off will differ by investor based on individual risk
aversion characteristics. The implication is that a rational investor will not invest in a portfolio if
a second portfolio exists with a more favorable risk-return profile – i.e., if for that level of risk an
alternative portfolio exists which has better expected returns.
MPT further assumes that the investor's risk / reward preference can be described via a
quadraticutility function. The effect of this assumption is that only the expected return and the
volatility (i.e., mean return and standard deviation) matter to the investor. The investor is
indifferent to other characteristics of the distribution of returns, such as its skew (measures the
level of asymmetry in the distribution) or kurtosis (measure of the thickness or so-called "fat
tail").
Note that the theory uses a parameter, volatility, as a proxy for risk, while return is an
expectation on the future. This is in line with the efficient market hypothesis and most of the
classical findings in finance. There are problems with this, see criticism.
Under the model:
• Portfolio return is the proportion-weighted combination of the constituent
assets' returns.
• Portfolio volatility is a function of the correlationρ of the component assets.
The change in volatility is non-linear as the weighting of the component
assets changes.
In general:
• Expected return:-

Where Ri is return and wi is the weighting of component asset i.


• Portfolio variance:-

where . Alternatively the expression can be written as:

where ρij = 1 for i=j.


• Portfolio volatility:-

For a two asset portfolio:-

• Portfolio return:

• Portfolio variance:
[edit]Diversification
An investor can reduce portfolio risk simply by holding combinations of instruments which are
not perfectly positively correlated (correlation coefficient -1<(r)<1)). In other words, investors
can reduce their exposure to individual asset risk by holding a diversified portfolio of assets.
Diversification will allow for the same portfolio return with reduced risk.
If all the assets of a portfolio have a correlation of +1, i.e., perfect positive correlation, the
portfolio volatility (standard deviation) will be equal to the weighted sum of the individual asset
volatilities. Hence the portfolio variance will be equal to the square of the total weighted sum of
the individual asset volatilities.[2]
If all the assets have a correlation of 0, i.e., perfectly uncorrelated, the portfolio variance is the
sum of the individual asset weights squared times the individual asset variance (and the standard
deviation is the square root of this sum).
If correlation coefficient is less than zero (r=0), i.e., the assets are inversely correlated, the
portfolio variance and hence volatility will be less than if the correlation coefficient is 0.
[edit]The risk-free asset
The risk-free asset is the (hypothetical) asset which pays a risk-free rate. In practice, short-term
Government securities (such as US treasury bills) are used as a risk-free asset, because they pay
a fixed rate of interest and have exceptionally low default risk. The risk-free asset has zero
variance in returns (hence is risk-free); it is also uncorrelated with any other asset (by definition:
since its variance is zero). As a result, when it is combined with any other asset, or portfolio of
assets, the change in return and also in risk is linear.
Because both risk and return change linearly as the risk-free asset is introduced into a portfolio,
this combination will plot a straight line in risk-return space. The line starts at 100% in the risk-
free asset and weight of the risky portfolio = 0 (i.e., intercepting the return axis at the risk-free
rate) and goes through the portfolio in question where risk-free asset holding = 0 and portfolio
weight = 1.
Using the formulae for a two asset portfolio as above:
• Return is the weighted average of the risk free asset, f, and the risky
portfolio, p, and is therefore linear:

• Since the asset is risk free, portfolio standard deviation is simply a function of
the weight of the risky portfolio in the position. This relationship is linear.
[edit]Capital allocation line
The capital allocation line (CAL) is the line of expected return plotted against risk (standard
deviation) that connects all portfolios that can be formed using a risky asset and a riskless asset.
It can be proven that it is a straight line and that it has the following equation.

In this formula P is the risky portfolio, F is the riskless portfolio, and C is a combination of
portfolios P and F.
[edit]The efficient frontier

Efficient Frontier. The hyperbola is sometimes referred to as the 'Markowitz Bullet'

Every possible asset combination can be plotted in risk-return space, and the collection of all
such possible portfolios defines a region in this space. The line along the upper edge of this
region is known as the efficient frontier (sometimes "the Markowitz frontier"). Combinations
along this line represent portfolios (explicitly excluding the risk-free alternative) for which there
is lowest risk for a given level of return. Conversely, for a given amount of risk, the portfolio
lying on the efficient frontier represents the combination offering the best possible return.
Mathematically the Efficient Frontier is the intersection of the Set of Portfolios with Minimum
Variance (MVS) and the Set of Portfolios with Maximum Return. Formally, the efficient frontier
is the set of maximal elements with respect to the partial order of product order on risk and
return, the set of portfolios for which one cannot improve both risk and return.
The efficient frontier is illustrated above, with return μp on the y-axis, and risk σp on the x-axis;
an alternative illustration from the diagram in the CAPM article is at right.
The efficient frontier will be convex – this is because the risk-return characteristics of a portfolio
change in a non-linear fashion as its component weightings are changed. (As described above,
portfolio risk is a function of the correlation of the component assets, and thus changes in a non-
linear fashion as the weighting of component assets changes.) The efficient frontier is a parabola
(hyperbola) when expected return is plotted against variance (standard deviation).
The region above the frontier is unachievable by holding risky assets alone. No portfolios can be
constructed corresponding to the points in this region. Points below the frontier are suboptimal.
A rational investor will hold a portfolio only on the frontier.
Matrices are preferred for calculations of the efficient frontier. In matrix form, for a given "risk tolerance"

, the efficient front is found by minimizing the following expression:


wTΣw − q * RTw
where

• w is a vector of portfolio weights. Each and

∑ w1= i

• Σ is the covariance matrix for the assets in the portfolio


• q is a "risk tolerance" factor, where 0 results in the portfolio with minimal risk
and results in the portfolio with maximal return
• R is a vector of expected returns

The front is calculated by repeating the optimization for various .


Many software packages, including Microsoft Excel, MATLAB, Mathematica and R, provide
optimization routines suitable for the above problem.
[edit]Portfolio leverage
An investor adds leverage to the portfolio by borrowing the risk-free asset. The addition of the
risk-free asset allows for a position in the region above the efficient frontier. Thus, by combining
a risk-free asset with risky assets, it is possible to construct portfolios whose risk-return profiles
are superior to those on the efficient frontier.
• An investor holding a portfolio of risky assets, with a holding in cash, has a
positive risk-free weighting (a de-leveraged portfolio). The return and
standard deviation will be lower than the portfolio alone, but since the
efficient frontier is convex(concave!), this combination will sit above the
efficient frontier – i.e., offering a higher return for the same risk as the point
below it on the frontier.
• The investor who borrows money to fund his/her purchase of the risky assets
has a negative risk-free weighting – i.e., a leveraged portfolio. Here the return
is geared to the risky portfolio. This combination will again offer a return
superior to those on the frontier.
[edit]The market portfolio
The efficient frontier is a collection of portfolios, each one optimal for a given amount of risk. A
quantity known as the Sharpe ratio represents a measure of the amount of additional return
(above the risk-free rate) a portfolio provides compared to the risk it carries. The portfolio on the
efficient frontier with the highest Sharpe Ratio is known as the market portfolio, or sometimes
the super-efficient portfolio; it is the tangency-portfolio in the above diagram. This portfolio has
the property that any combination of it and the risk-free asset will produce a return that is above
the efficient frontier—offering a larger return for a given amount of risk than a portfolio of risky
assets on the frontier would.
[edit]Capital market line
When the market portfolio is combined with the risk-free asset, the result is the Capital Market
Line. All points along the CML have superior risk-return profiles to any portfolio on the efficient
frontier. Just the special case of the market portfolio with zero cash weighting is on the efficient
frontier. Additions of cash or leverage with the risk-free asset in combination with the market
portfolio are on the Capital Market Line. All of these portfolios represent the highest possible
Sharpe ratio. The CML is illustrated above, with return μp on the y-axis, and risk σp on the x-axis.
One can prove that the CML is the optimal CAL and that its equation is

[edit]Asset pricing using MPT


A rational investor would not invest in an asset which does not improve the risk-return
characteristics of his existing portfolio. Since a rational investor would hold the market portfolio,
the asset in question will be added to the market portfolio. MPT derives the required return for a
correctly priced asset in this context.
[edit]Systematic risk and specific risk
Specific risk is the risk associated with individual assets - within a portfolio these risks can be
reduced through diversification (specific risks "cancel out"). Specific risk is also called
diversifiable, unique, unsystematic, or idiosyncratic risk. Systematic risk (a.k.a. portfolio risk or
market risk) refers to the risk common to all securities - except for selling short as noted below,
systematic risk cannot be diversified away (within one market). Within the market portfolio,
asset specific risk will be diversified away to the extent possible. Systematic risk is therefore
equated with the risk (standard deviation) of the market portfolio.
Since a security will be purchased only if it improves the risk / return characteristics of the
market portfolio, the risk of a security will be the risk it adds to the market portfolio. In this
context, the volatility of the asset, and its correlation with the market portfolio, is historically
observed and is therefore a given (there are several approaches to asset pricing that attempt to
price assets by modelling the stochastic properties of the moments of assets' returns - these are
broadly referred to as conditional asset pricing models). The (maximum) price paid for any
particular asset (and hence the return it will generate) should also be determined based on its
relationship with the market portfolio.
Systematic risks within one market can be managed through a strategy of using both long and
short positions within one portfolio, creating a "market neutral" portfolio.
[edit]Security characteristic line
The security characteristic line (SCL) represents the relationship between the market excess
return and the excess return of a given asset i at a given time t. In general, it is reasonable to
assume that the SCL is a straight line and can be illustrated as a statistical equation:

where αi is called the asset's alpha and βi the asset's beta coefficient.
[edit]Capital asset pricing model
Main article: Capital Asset Pricing Model

The asset return depends on the amount for the asset today. The price paid must ensure that the
market portfolio's risk / return characteristics improve when the asset is added to it. The CAPM
is a model which derives the theoretical required return (i.e., discount rate) for an asset in a
market, given the risk-free rate available to investors and the risk of the market as a whole. The
CAPM is usually expressed:

• β, Beta, is the measure of asset sensitivity to a movement in the overall


market; Beta is usually found via regression on historical data. Betas
exceeding one signify more than average "riskiness"; betas below one
indicate lower than average.

• is the market premium, the historically observed excess


return of the market over the risk-free rate.
Once the expected return, E(ri), is calculated using CAPM, the future cash flows of the asset
can be discounted to their present value using this rate to establish the correct price for the asset.
A more risky stock will have a higher beta and will be discounted at a higher rate; less sensitive
stocks will have lower betas and be discounted at a lower rate. In theory, an asset is correctly
priced when its observed price is the same as its value calculated using the CAPM derived
discount rate. If the observed price is higher than the valuation, then the asset is overvalued; it is
undervalued for a too low price.
(1) The incremental impact on risk and return when an additional risky asset, a, is added to the market
portfolio, m, follows from the formulae for a two asset portfolio. These results are used to derive the asset
appropriate discount rate.

Risk =

Hence, risk added to portfolio =


but since the weight of the asset will be relatively low,

i.e. additional risk =

Return =

Hence additional return =

(2) If an asset, a, is correctly priced, the improvement in risk to return achieved by


adding it to the market portfolio, m, will at least match the gains of spending that
money on an increased stake in the market portfolio. The assumption is that the
investor will purchase the asset with funds borrowed at the risk-free rate, Rf; this is

rational if .

Thus:

i.e. :

i.e. :

is the “beta”, β -- the covariance between the asset and the


market compared to the variance of the market, i.e. the sensitivity of the
asset price to movement in the market portfolio.

[edit]Criticism
Despite its theoretical importance, some people question whether MPT is an ideal investing
strategy, because its model of financial markets does not match the real world in many ways.
[edit]Assumptions
The mathematical framework of MPT makes many assumptions about investors and markets.
Some are explicit in the equations, such as the use of Normal distributions to model returns.
Others are implicit, such as the neglect of taxes and transaction fees. None of these assumptions
are entirely true, and each of them compromises MPT to some degree.
• Asset returns are (jointly) normally distributedrandom variables. In
fact, it is frequently observed that returns in equity and other markets are not
normally distributed. Large swings (3 to 6 standard deviations from the
mean) occur in the market far more frequently than the normal distribution
assumption would predict. [3]
• Correlations between assets are fixed and constant forever.
Correlations depend on systemic relationships between the underlying
assets, and change when these relationships change. Examples include one
country declaring war on another, or a general market crash. During times of
financial crisis all assets tend to become positively correlated, because they
all move (down) together. In other words, MPT breaks down precisely when
investors are most in need of protection from risk.
• All investors aim to maximize economic utility (in other words, to
make as much money as possible, regardless of any other
considerations). This is a key assumption of the efficient market
hypothesis, upon which MPT relies.
• All investors are rational and risk-averse. This is another assumption of
the efficient market hypothesis, but we now know from behavioral economics
that market participants are not rational. It does not allow for "herd behavior"
or investors who will accept lower returns for higher risk. Casino gamblers
clearly pay for risk, and it is possible that some stock traders will pay for risk
as well.
• All investors have access to the same information at the same time.
This also comes from the efficient market hypothesis. In fact, real markets
contain information asymmetry, insider trading, and those who are simply
better informed than others.
• Investors have an accurate conception of possible returns, i.e., the
probability beliefs of investors match the true distribution of
returns. A different possibility is that investors' expectations are biased,
causing market prices to be informationally inefficient. This possibility is
studied in the field of behavioral finance, which uses psychological
assumptions to provide alternatives to the CAPM such as the overconfidence-
based asset pricing model of Kent Daniel, David Hirshleifer, and Avanidhar
Subrahmanyam (2001)[4].
• There are no taxes or transaction costs. Real financial products are
subject both to taxes and transaction costs (such as broker fees), and taking
these into account will alter the composition of the optimum portfolio. These
assumptions can be relaxed with more complicated versions of the model.
[citation needed]

• All investors are price takers, i.e., their actions do not influence
prices. In reality, sufficiently large sales or purchases of individual assets can
shift market prices for that asset and others (via cross-elasticity of demand.)
An investor may not even be able to assemble the theoretically optimal
portfolio if the market moves too much while they are buying the required
securities.
• Any investor can lend and borrow an unlimited amount at the risk
free rate of interest. In reality, every investor has a credit limit.
• All securities can be divided into parcels of any size. In reality,
fractional shares usually cannot be bought or sold, and some assets have
minimum orders sizes.
More complex versions of MPT can take into account a more sophisticated model of the world
(such as one with non-normal distributions and taxes) but all mathematical models of finance
still rely on many unrealistic premises.
[edit]MPT does not really model the market
The risk, return, and correlation measures used by MPT are expected values, which means that
they are mathematical statements about the future (the expected value of returns is explicit in the
above equations, and implicit in the definitions of variance and covariance.) In practice investors
must substitute predictions based on historical measurements of asset return and volatility for
these values in the equations. Very often such predictions are wrong, as captured in the classic
disclaimer "past performance is not necessarily indicative of future results."
More fundamentally, investors are stuck with estimating key parameters from past market data
because MPT attempts to model risk in terms of the likelihood of losses, but says nothing about
why those losses might occur. The risk measurements used are probabilistic in nature, not
structural. This is a major difference as compared to many engineering approaches to risk
management.
Options theory and MPT have at least one important conceptual difference from the probabilistic risk
assessment done by nuclear power [plants]. A PRA is what economists would call a structural model. The
components of a system and their relationships are modeled in Monte Carlo simulations. If valve X fails,
it causes a loss of back pressure on pump Y, causing a drop in flow to vessel Z, and so on.
But in the Black-Scholes equation and MPT, there is no attempt to explain an underlying structure to
price changes. Various outcomes are simply given probabilities. And, unlike the PRA, if there is no
history of a particular system-level event like a liquidity crisis, there is no way to compute the odds of it.
If nuclear engineers ran risk management this way, they would never be able to compute the odds of a
meltdown at a particular plant until several similar events occurred in the same reactor design.
—Douglas W. Hubbard, 'The Failure of Risk Management', p. 67, John Wiley & Sons, 2009. ISBN
978-0-470-38795-5

Essentially, the mathematics of MPT view the markets as a collection of dice. By examining past
market data we can develop hypotheses about how the dice are weighted, but this isn't helpful if
the markets are actually dependent upon a much bigger and more complicated chaotic system --
the world. For this reason, accurate structural models of real financial markets are unlikely to be
forthcoming because they would essentially be structural models of the entire world. Nonetheless
there is growing awareness of the concept of systemic risk in financial markets, which should
lead to more sophisticated market models.
[edit]Variance is not a good measure of risk
Mathematical risk measurements are also useful only to the degree that they reflect investor's
true concerns -- there is no point minimizing a variable that nobody cares about in practice. MPT
uses the mathematical concept of variance to quantify risk, and this might be justified under the
assumption of normally distributed returns, but for general return distributions other risk
measures (like coherent risk measures) might better reflect investor's true preferences.
In particular, variance is a symmetric measure that counts abnormally high returns as just as
risky as abnormally low returns. In reality, investors are only concerned about losses, which
shows that our intuitive concept of risk is fundamentally asymmetric in nature.
[edit]"Optimal" doesn't necessarily mean "most profitable"
MPT does not account for the social, environmental, strategic, or personal dimensions of
investment decisions. It only attempts to maximize returns, without regard to other
consequences. In a narrow sense, its complete reliance on asset prices makes it vulnerable to all
the standard market failures such as those arising from information asymmetry, externalities, and
public goods. It also rewards corporate fraud and dishonest accounting. More broadly, a firm
may have strategic or social goals that shape its investment decisions, and an individual investor
might have personal goals. In either case, information other than historical returns is relevant.
See also socially-responsible investing, fundamental analysis.
[edit]Extensions
Since MPT's introduction in 1952, many attempts have been made to improve the model,
especially by using more realistic assumptions.
Post-modern portfolio theory extends MPT by adopting non-normally distributed, asymmetric
measures of risk. This helps with some of these problems, but not others.
[edit]Other Applications
[edit]Applications to project portfolios and other "non-financial" assets
Some experts apply MPT to portfolios of projects and other assets besides financial instruments.
[5]
When MPT is applied outside of traditional financial portfolios, some differences between the
different types of portfolios must be considered.
1. The assets in financial portfolios are, for practical purposes, continuously
divisible while portfolios of projects like new software development are
"lumpy". For example, while we can compute that the optimal portfolio
position for 3 stocks is, say, 44%, 35%, 21%, the optimal position for an IT
portfolio may not allow us to simply change the amount spent on a project. IT
projects might be all or nothing or, at least, have logical units that cannot be
separated. A portfolio optimization method would have to take the discrete
nature of some IT projects into account.
2. The assets of financial portfolios are liquid can be assessed or re-assessed at
any point in time while opportunities for new projects may be limited and
may appear in limited windows of time and projects that have already been
initiated cannot be abandoned without the loss of the sunk costs (i.e., there is
little or no recovery/salvage value of a half-complete IT project).
Neither of these necessarily eliminate the possibility of using MPT and such portfolios. They
simply indicate the need to run the optimization with an additional set of mathematically-
expressed constraints that would not normally apply to financial portfolios.
Furthermore, some of the simplest elements of Modern Portfolio Theory are applicable to
virtually any kind of portfolio. The concept of capturing the risk tolerance of an investor by
documenting how much risk is acceptable for a given return could be and is applied to a variety
of decision analysis problems. MPT, however, uses historical variance as a measure of risk and
portfolios of assets like IT projects don't usually have an "historical variance" for a new piece of
software. In this case, the MPT investment boundary can be expressed in more general terms like
"chance of an ROI less than cost of capital" or "chance of losing more than half of the
investment". When risk is put in terms of uncertainty about forecasts and possible losses then the
concept is transferable to various types of investment.[5]
[edit]Application to other disciplines
In the 1970s, concepts from Modern Portfolio Theory found their way into the field of regional
science. In a series of seminal works, Michael Conroy modeled the labor force in the economy
using portfolio-theoretic methods to examine growth and variability in the labor force. This was
followed by a long literature on the relationship between economic growth and volatility.[6]
More recently, modern portfolio theory has been used to model the self-concept in social
psychology. When the self attributes comprising the self-concept constitute a well-diversified
portfolio, then psychological outcomes at the level of the individual such as mood and self-
esteem should be more stable than when the self-concept is undiversified. This prediction has
been confirmed in studies involving human subjects.[7]
Recently, modern portfolio theory has been applied to modelling the uncertainty and correlation
between documents in information retrieval. Given a query, the aim is to maximize the overall
relevance of a ranked list of documents and at the same time minimize the overall uncertainty of
the ranked list [1].
[edit]Comparison with arbitrage pricing theory
The SML and CAPM are often contrasted with the arbitrage pricing theory (APT), which holds
that the expected return of a financial asset can be modeled as a linear function of various macro-
economic factors, where sensitivity to changes in each factor is represented by a factor specific
beta coefficient.
The APT is less restrictive in its assumptions: it allows for an explanatory (as opposed to
statistical) model of asset returns, and assumes that each investor will hold a unique portfolio
with its own particular array of betas, as opposed to the identical "market portfolio". Unlike the
CAPM, the APT, however, does not itself reveal the identity of its priced factors - the number
and nature of these factors is likely to change over time and between economies.

Hyperbolic discounting
From Wikipedia, the free encyclopedia
Given two similar rewards, humans show a preference for one that arrives sooner rather than
later. Humans are said to discount the value of the later reward, by a factor that increases with
the length of the delay. In behavioral economics, hyperbolic discounting is a particular
mathematical model thought to approximate this discounting process; that is, it models how
humans actually make such valuations. Hyperbolic discounting is sharply different in form from
exponential discounting, a rational function used in finance used in the analysis of choice over
time. Hyperbolic discounting has been observed in humans and animals.
In hyperbolic discounting, valuations fall very rapidly for small delay periods, but then fall
slowly for longer delay periods. This contrasts with exponential discounting, in which valuation
falls by a constant factor per unit delay, regardless of the total length of the delay. The standard
experiment used to reveal a test subject's discounting curve is to ask: "Would you prefer A today
or B tomorrow?" and then, "Would you prefer A in one year, or B in one year and one day?"
For example in studies of pigeons [1] the pigeon is given two buttons: button A provides a small
amount of food quickly while button B provides more seed but after a delay. The bird then
experiments for a while and settles on preferring A or B. With humans the typical experiment
might ask: 'Would you prefer a dollar today or three dollars (today vs. tomorrow) or (in one year
vs. in one year and one day)?" Typically, subjects will take less money today versus tomorrow,
but will gladly wait one extra day in a year in order to receive more money.[citation needed]
Subjects using hyperbolic discounting reveal a strong tendency to make choices that are
inconsistent over time. In other words, they make choices today that their future self would
prefer not to make, despite using the same reasoning. This dynamic inconsistency [2] happens
because hyperbolic discounts value future rewards much more than exponential discounting.
[edit]Observations
The phenomenon of hyperbolic discounting is implicit in Richard Herrnstein's "matching law,"
the discovery that most subjects allocate their time or effort between two non-exclusive, ongoing
sources of reward (concurrent variable interval schedules) in direct proportion to the rate and size
of rewards from the two sources, and in inverse proportion to their delays. That is, subjects'
choices "match" these parameters.
After the report of this effect in the case of delay (Chung and Herrnstein, 1967), George Ainslie
pointed out that in a single choice between a larger, later and a smaller, sooner reward, inverse
proportionality to delay would be described by a plot of value by delay that had a hyperbolic
shape, and that this shape should produce a reversal of preference from the larger, later to the
smaller, sooner reward for no other reason but that the delays to the two rewards got shorter. He
demonstrated the predicted reversal in pigeons[vague] (Ainslie, 1974).
A large number of subsequent experiments have confirmed that spontaneous preferences by both
human and nonhuman subjects follow a hyperbolic curve rather than the conventional,
"exponential" curve that would produce consistent choice over time (Green et al., 1994; Kirby,
1997). For instance, when offered the choice between $50 now and $100 a year from now, many
people will choose the immediate $50. However, given the choice between $50 in five years or
$100 in six years almost everyone will choose $100 in six years, even though that is the same
choice seen at five years' greater distance.
Hyperbolic discounting has also been found to relate to real-world examples of self control.
Indeed, a variety of studies have used measures of hyperbolic discounting to find that drug-
dependent individuals discount delayed consequences more than matched nondependent
controls, suggesting that extreme delay discounting is a fundamental behavioral process in drug
dependence (e.g., Bickel & Johnson, 2003; Madden et al., 1997; Vuchinich & Simpson, 1998).
Some evidence suggests pathological gamblers also discount delayed outcomes at higher rates
than matched controls (e.g., Petry & Casarella, 1999). Whether high rates of hyperbolic
discounting precede addictions or vice-versa is currently unknown, although some studies have
reported that high-rate discounting rats are more likely to consume alcohol (e.g., Poulos et al.,
1995) and cocaine (Perry et al., 2005) than lower-rate discounters. Likewise, some have
suggested that high-rate hyperbolic discounting makes unpredictable (gambling) outcomes more
satisfying (Madden et al., 2007).
The degree of discounting is vitally important in describing hyperbolic discounting, especially in
the discounting of specific rewards such as money. The discounting of monetary rewards varies
across age groups due to the varying discount rate. (Green, Frye, and Myerson, 1994). The rate
depends on a variety of factors, including the species being observed, age, experience, and the
amount of time needed to consume the reward (Lowenstein and Prelec, 1992; Raineri and
Rachlin, 1993).
[edit]Mathematical model

Comparison of the discount factors of hyperbolic and exponential discounting. In


both cases, k = 1. Hyperbolic discounting is shown to over-value future assets
compared to exponential discounting.

Hyperbolic discounting is mathematically described as:

where f(D) is the discount factor that multiplies the value of the reward, D is the delay in the
reward, and k is a parameter governing the degree of discounting. This is compared with the
formula for exponential discounting:
fE(D) = e− kD
[edit]Quasi-hyperbolic approximation
The "quasi-hyperbolic" discount function, which approximates the hyperbolic discount function
above, is given (in discrete time) by
fQH(0) = 1, and fQH(D) = β * δD,
where β and δ are constants between 0 and 1; and again D is the delay in the reward, and f(D)
is the discount factor. The condition f(0) = 1 is stating that rewards taken at the present time are
not discounted.
Quasi-hyperbolic time preferences are also referred to as "present-biased" or "beta-delta"
preferences. They retain much of the analytical tractability of exponential discounting while
capturing the key qualitative feature of discounting with true hyperbolas.
[edit]Explanations
[edit]Uncertain risks
Notice that whether discounting future gains is rational or not – and at what rate such gains
should be discounted – depends greatly on circumstances. Many examples exist in the financial
world, for example, where it is reasonable to assume that there is an implicit risk that the reward
will not be available at the future date, and furthermore that this risk increases with time.
Consider: Paying $50 for your dinner today or delaying payment for sixty years but paying
$100,000. In this case the restaurateur would be reasonable to discount the promised future value
as there is significant risk that it might not be paid (possibly due to your death, his death, etc).
Uncertainty of this type can be quantified with Bayesian analysis. [3] For example, suppose that
the probability for the reward to be available after time t is, for known hazard rate λ
P(Rt | λ) = exp( − λt)
but the rate is unknown to the decision maker. If the prior probability distribution of λ is
p(λ) = exp( − λ / k) / k
then, the decision maker will expect that the probability of the reward after time t is

which is exactly the hyperbolic discount rate. Similar conclusions can be obtained from other
plausible distributions for λ. [3]

[edit]Applications
More recently these observations about discount functions have been used to study saving for
retirement, borrowing on credit cards, and procrastination. However, hyperbolic discounting has
been most frequently used to explain addiction.

Adaptive market hypothesis


The Adaptive Market Hypothesis, as proposed by Andrew Lo (2004,2005), is an attempt to
reconcile theories that imply that the markets are efficient with behavioral alternatives, by
applying the principles of evolution - competition, adaptation, and natural selection - to financial
interactions. [1]
Under this approach the traditional models of modern financial economics can coexist alongside
behavioral models. He argues that much of what behavioralists cite as counterexamples to
economic rationality - loss aversion, overconfidence, overreaction, and other behavioral biases -
are, in fact, consistent with an evolutionary model of individuals adapting to a changing
environment using simple heuristics.
According to Lo, the Adaptive Markets Hypothesis can be viewed as a new version of the
efficient market hypothesis, derived from evolutionary principles. "Prices reflect as much
information as dictated by the combination of environmental conditions and the number and
nature of "species" in the economy." By species, he means distinct groups of market participants,
each behaving in a common manner (i.e. pension funds, retail investors, market makers, and
hedge-fund managers, etc.). If multiple members of a single group are competing for rather
scarce resources within a single market, that market is likely to be highly efficient, e.g., the
market for 10-Year US Treasury Notes, which reflects most relevant information very quickly
indeed. If, on the other hand, a small number of species are competing for rather abundant
resources in a given market, that market will be less efficient, e.g., the market for oil paintings
from the Italian Renaissance. Market efficiency cannot be evaluated in a vacuum, but is highly
context-dependent and dynamic. Shortly stated, the degree of market efficiency is related to
environmental factors characterizing market ecology such as the number of competitors in the
market, the magnitude of profit opportunities available, and the adaptability of the market
participants (Lo,2005).
[edit]Implications
The AMH has several implications that differentiate it from the EMH such as:
1. To the extent that a relation between risk and reward exists, it is unlikely to
be stable over time.
2. Contrary to the classical EMH, there are arbitrage opportunities from time to
time.
3. Investment strategies will also wax and wane, performing well in certain
environments and performing poorly in other environments. This includes
quantitatively-, fundamentally- and technically-based methods.
4. Survival is the only objective that matters while profit and utility
maximization are secondary relevant aspects
5. Innovation is the key to survival because as risk/reward relation varies
through time, the better way of achieving a consistent level of expected
returns is to adapt to changing market conditions.

Market anomaly
From Wikipedia, the free encyclopedia
Jump to: navigation, search

A market anomaly (or inefficiency) is a price and/or return distortion on a financial market.
It is usually related to:
• either structural factors (unfair competition, lack of market transparency, ...)
• or behavioral biases by economic agents (see behavioral economics)
It sometimes refers to phenomena contradicting the efficient market hypothesis. There are
anomalies in relation to the economic fundamentals of the equity, technical trading rules, and
economic calendar events.
Anomalies could be Fundamental, Technical or calendar related. Fundamental anomalies include
value effect and small-cap effect (low P/E stocks and small cap companies do better than index
on an average. Calendar anomalies involve patterns in stock returns from year to year or month
to month, while technical anomalies include momentum effect. Some further information is
available at [1]
See also efficient market

Transparency (market)
From Wikipedia, the free encyclopedia
In economics, a market is transparent if much is known by many about:
• What products, services or capital assets are available.
• What price.
• Where.
There are two types of price transparency: 1) I know what price will be charged to me, and 2) I
know what price will be charged to you. The two types of price transparency have different
implications for differential pricing.[1]
This is a special case of the topic at transparency (humanities).
A high degree of market transparency can result in disintermediation due to the buyer's
increased knowledge of supply pricing.
Transparency is important since it is one of the theoretical conditions required for a free market
to be efficient.
Price transparency can, however, lead to higher prices, if it makes sellers reluctant to give steep
discounts to certain buyers, or if it facilitates collusion.

Noisy market hypothesis


From Wikipedia, the free encyclopedia
Jump to: navigation, search

In financeNoisy Market Hypothesis contrasts the efficient market hypothesis in that it claims
that the prices of securities are not always the best estimate of the true underlying value of the
firm. It argues that prices can be influenced by speculators and momentum traders, as well as by
insiders and institutions that often buy and sell stocks for reasons unrelated to fundamental value,
such as for diversification, liquidity and taxes. These temporary shocks referred to as "noise" can
obscure the true value of securities and may result in mispricing of these securities for many
years. [1]

Dumb agent theory


The Dumb Agent Theory states that many people making individual buying and selling
decisions will better reflect true value than any one individual can. In finance this theory is
predicated on the efficient-market hypothesis (EMH). One of the first instances of the Dumb
Agent Theory in action was with the Policy Analysis Market (PAM); a futures exchange
developed by DARPA[1]. While this project was quickly abandoned by the Pentagon[2], its idea is
now implemented in futures exchanges and prediction markets such as Intrade, Newsfutures and
Predictify.
While first mentioned strictly by name in relation to PAM in 2003, the Dumb Agent Theory was
originally conceived (as the Dumb Smart Market) by James Surowiecki in 1999[3]. Here,
Surowiecki differentiated from the EMH stating that it "doesn't mean that markets are always
right." Instead, he argues that markets are subject to manias and panics because "people are
always shouting out" their stock picks. This, in turn, results in other investors worrying about
these picks and become influenced by them, which ultimately drives the markets (irrationally) up
or down. His argument states that if market decisions were made independently of each other,
and with the sole goal of being correct (as opposed to being in line with what others are
choosing), then the markets would produce the best choice possible[3] and eliminate biases such
as Groupthink, the Bandwagon effect and the Abilene Paradox.

[edit]PAM
The Policy Analysis Market (PAM), while technically a futures market, was described as
utilizing the Dumb Agent Theory[4]. The main difference, as argued by James Surowiecki, is that
in a futures market the current stock prices are known in advance, while in order for the Dumb
Agent Theory to work, they should be unknown to the investor prior to the decision making
period (which is possible only with a prediction market).

[edit]Prediction Markets
For the Dumb Agent Theory to hold, investors should not know what other investors are doing
prior to making their decision[3]. While this is technically impossible in a futures exchange
(because what other people are deciding dictates the price of the security), it can be done in a
Prediction Market. Certain prediction markets are set up in this manner (Such as Predictify,
although it allows for participants to change their answers after their initial prediction).

[edit]Dumb Agent Theory outside the Financial Markets


USS Scorpion (SSN-589) was a nuclear submarine of the United States Navy lost at sea on June
5, 1968. While a public search did not yield any clues as to its location, Dr. John Craven, the
Chief Scientist of the U.S. Navy's Special Projects Division, decided to employ the Bayesian
search theory in order to establish its location. This involved formulating different hypotheses as
to its location and using a probability distribution to combine the information and find the point
of highest probability. The different hypotheses were taken from various independent sources,
such as mathematicians, submarine specialists, and salvage men. The point Craven found ended
up being 220 yards from the actual position of the sunken vessel[5].

Bid-offer spread
From Wikipedia, the free encyclopedia
(Redirected from Bid/offer spread)

Jump to: navigation, search

The bid/offer spread (also known as bid/ask or buy/sell spread) for securities (such as stock,
futures contracts, options, or currency pairs) is the difference between the price quoted by a
market maker for an immediate sale (bid) and an immediate purchase (ask). The size of the bid-
offer spread in a given commodity is a measure of the liquidity of the market and the size of the
transaction cost. [1]
The trader initiating the transaction is said to demand liquidity, and the other party (counterparty)
to the transaction supplies liquidity. Liquidity demanders place market orders and liquidity
suppliers place limit orders. For a round trip (a purchase and sale together) the liquidity
demander pays the spread and the liquidity supplier earns the spread. All limit orders outstanding
at a given time (i.e., limit orders that have not been executed) are together called the Limit Order
Book. In some markets such as NASDAQ, dealers supply liquidity. However, on most
exchanges, such as the Australian Securities Exchange, there are no designated liquidity
suppliers, and liquidity is supplied by other traders. On these exchanges, and even on NASDAQ,
institutions and individuals can supply liquidity by placing limit orders.
The bid-ask spread is an accepted measure of liquidity costs in exchange traded securities and
commodities. On any standardized exchange two elements comprise almost all of the transaction
cost – brokerage fees and bid-ask spreads. Under competitive conditions the bid-ask spread
measures the cost of making transactions without delay. The difference in price paid by an urgent
buyer and received by an urgent seller is the liquidity cost. Since brokerage commissions do not
vary with the time taken to complete a transaction, differences in bid-ask spread indicate
differences in the liquidity cost [2].
[edit]Example: Currency spread
If the current bid price for the EUR/USD currency pair is 1.5760 and the current ask price is
1.5763. This means that currently you can sell the EUR/USD at 1.5760 and buy at 1.5763. The
difference between those prices is the spread. If the USD/JPY currency pair is currently trading
at 101.89/92, that is another way of saying that the bid for the USD/JPY 101.89 and the ask is
101.92. This means that holders of JPY can currently sell JPY for 1 US dollar at 101.89 and
investors who wish to buy JPY can do so at 101.92 per US dollar.[3]
[edit]Example: Stock spread
On United Statesstock exchanges, the minimum spread (also known as the tick size) for many
shares was 12.5 cents (one-eighth of a dollar) until 2001, when the exchanges converted from
fractional to decimal pricing, enabling spreads as small as one cent. The change was mandated
by the U.S. Securities and Exchange Commission in order to provide a fairer market for the
individual investor.

Market depth
From Wikipedia, the free encyclopedia
Jump to: navigation, search

In finance, market depth is the size of an order needed to move the market a given amount. If
the market is deep, a large order is needed to change the price. Market depth closely relates to the
notion of liquidity, the ease to find a trading partner for a given order: a deep market is also a
liquid market.
Factors influencing market depth include:
• Ticksize. This refers to the minimum price increment at which trades may be
made on the market. The major stock markets in the United States went
through a process of decimalisation in April 2001. This switched the minimum
increment from a sixteenth to a one hundredth of a dollar. This decision
improved market depth.[1]
• Price movement restrictions. Most major financial markets do not allow
completely free exchange of the products they trade, but instead restrict
price movement in well-intentioned ways. These include session price change
limits on major commodity markets and program trading curbs on the NYSE,
which disallow certain large basket trades after the Dow Jones Industrial
Average has moved up or down 200 points in a session.
• Trading restrictions. These include futures contract and optionsposition
limits as well as the widely used uptick rule for US stocks. These prevent
market participants from adding to depth when they might otherwise choose
to do so.
• Allowable leverage. Major markets and governing bodies typically set
minimum margin requirements for trading various products. While this may
act to stabilize the marketplace, it decreases the market depth simply
because participants otherwise willing to take on very high leverage cannot
do so without providing more capital.
• Market transparency. While the latest bid or ask price is usually available
for most participants, additional information about the size of these offers
and pending bids or offers that are not the best are sometimes hidden for
reasons of technical complexity or simplicity. This decrease in available
information can affect the willingness of participants to add to market depth.
In some cases, the term refers to financial data feeds available from exchanges or brokers. An
example would be NASDAQ Level II quote data.

Slippage (finance)
From Wikipedia, the free encyclopedia
Jump to: navigation, search

This article is about the financial concept. For other uses, see Slippage.

With regards to futures contracts as well as other financial instruments, slippage is the difference
between estimated transaction costs and the amount actually paid. Brokers may not always be
effective enough at executing orders. Market-impacted, liquidity, and frictional costs may also
contribute. Algorithmic trading is often used to reduce slippage.
[edit]Measurement
Using initial mid price
Taleb (1997) defines slippage as the difference between the average execution price and the
initial midpoint of the bid and the offer for a given quantity to be executed.

Using initial execution price


Knight and Satchell mention a flow trader needs to consider the effect of executing a large order
on the market and to adjust the bid-ask spread accordingly. They calculate the liquidity cost as
the difference of the execution price and the initial execution price.

Speculation
In statistical terms, I figure I have traded about 2 million contracts, with an average profit of $70
per contract (after slippage of perhaps $20). This average is approximately 700 standard
deviations away from randomness. [1]
[edit]Reverse Slippage
Reverse slippage as described by Taleb occurs when the purchase of a large position is done at
increasing prices, so that the mark to market value of the position increases. The danger occurs
when the trader attempts to exit his position. If the trader manages to create a squeeze large
enough then this phenomenon can be profitable.
[edit]Leveraged portfolio
A portfolio of securities that is leveraged with borrowed funds will encounter the slippage that
comes with how the portfolio increase/decrease multiply (see leverage (finance)).
• Nassim Taleb (1997). Dynamic Hedging managing vanilla and exotic options.
John Wiley & Sons. ISBN 978-0471152804.
• John L. Knight, Stephen Satchell (2003). Forecasting Volatility in the Financial
Markets. Butterworth-Heinemann. ISBN 978-0750655156.

Value at risk
From Wikipedia, the free encyclopedia
Jump to: navigation, search

In financial mathematics and financial risk management, Value at Risk (VaR) is a widely used
risk measure of the risk of loss on a specific portfolio of financial assets. For a given portfolio,
probability and time horizon, VaR is defined as a threshold value such that the probability that
the mark-to-market loss on the portfolio over the given time horizon exceeds this value
(assuming normal markets and no trading in the portfolio) is the given probability level.[1]
For example, if a portfolio of stocks has a one-day 5% VaR of $1 million, there is a 5%
probability that the portfolio will fall in value by more than $1 million over a one day period,
assuming markets are normal and there is no trading. Informally, a loss of $1 million or more on
this portfolio is expected on 1 day in 20. A loss which exceeds the VaR threshold is termed a
“VaR break.”[2]
The 10% Value at Risk of a normally distributed portfolio returns

VaR has five main uses in finance: risk management, risk measurement, financial control,
financial reporting and computing regulatory capital. VaR is sometimes used in non-financial
applications as well.[3]
Important related ideas are economic capital, backtesting, stress testing and expected shortfall.[4]
[edit]Details
Common parameters for VaR are 1% and 5% probabilities and one day and two week horizons,
although other combinations are in use.[5]
The reason for assuming normal markets and no trading, and to restricting loss to things
measured in daily accounts, is to make the loss observable. In some extreme financial events it
can be impossible to determine losses, either because market prices are unavailable or because
the loss-bearing institution breaks up. Some longer-term consequences of disasters, such as
lawsuits, loss of market confidence and employee morale and impairment of brand names can
take a long time to play out, and may be hard to allocate among specific prior decisions. VaR
marks the boundary between normal days and extreme events. Institutions can lose far more than
the VaR amount; all that can be said is that they will not do so very often.[6]
The probability level is about equally often specified as one minus the probability of a VaR
break, so that the VaR in the example above would be called a one-day 95% VaR instead of one-
day 5% VaR. This generally does not lead to confusion because the probability of VaR breaks is
almost always small, certainly less than 0.5.[1]
Although it virtually always represents a loss, VaR is conventionally reported as a positive
number. A negative VaR would imply the portfolio has a high probability of making a profit, for
example a one-day 5% VaR of negative $1 million implies the portfolio has a 95% chance of
making more than $1 million over the next day. [7]
[edit]Varieties of VaR
The definition of VaR is nonconstructive, it specifies a property VaR must have, but not how to
compute VaR. Moreover, there is wide scope for interpretation in the definition.[8] This has led to
two broad types of VaR, one used primarily in risk management and the other primarily for risk
measurement. The distinction is not sharp, however, and hybrid versions are typically used in
financial control, financial reporting and computing regulatory capital. [9]
To a risk manager, VaR is a system, not a number. The system is run periodically (usually daily)
and the published number is compared to the computed price movement in opening positions
over the time horizon. There is never any subsequent adjustment to the published VaR, and there
is no distinction between VaR breaks caused by input errors (including Information Technology
breakdowns, fraud and rogue trading), computation errors (including failure to produce a VaR on
time) and market movements.[10]
A frequentist claim is made, that the long-term frequency of VaR breaks will equal the specified
probability, within the limits of sampling error, and that the VaR breaks will be independent in
time and independent of the level of VaR. This claim is validated by a backtest, a comparison of
published VaRs to actual price movements. In this interpretation, many different systems could
produce VaRs with equally good backtests, but wide disagreements on daily VaR values.[1]
For risk measurement a number is needed, not a system. A Bayesian probability claim is made,
that given the information and beliefs at the time, the subjective probability of a VaR break was
the specified level. VaR is adjusted after the fact to correct errors in inputs and computation, but
not to incorporate information unavailable at the time of computation.[7] In this context,
“backtest” has a different meaning. Rather than comparing published VaRs to actual market
movements over the period of time the system has been in operation, VaR is retroactively
computed on scrubbed data over as long a period as data are available and deemed relevant. The
same position data and pricing models are used for computing the VaR as determining the price
movements.[2]
Although some of the sources listed here treat only one kind of VaR as legitimate, most of the
recent ones seem to agree that risk management VaR is superior for making short-term and
tactical decisions today, while risk measurement VaR should be used for understanding the past,
and making medium term and strategic decisions for the future. When VaR is used for financial
control or financial reporting it should incorporate elements of both. For example, if a trading
desk is held to a VaR limit, that is both a risk-management rule for deciding what risks to allow
today, and an input into the risk measurement computation of the desk’s risk-adjusted return at
the end of the reporting period.[4]
[edit]VAR in Governance
An interesting takeoff on VaR is its application in Governance for endowments, trusts, and
pension plans. Essentially trustees adopt portfolio Values-at-Risk metrics for the entire pooled
account and the diversified parts individually managed. Instead of probability estimates they
simply define maximum levels of acceptable loss for each. Doing so provides an easy metric for
oversight and adds accountability as managers are then directed to manage, but with the
additional constraint to avoid losses within a defined risk parameter. VAR utilized in this manner
adds relevance as well as an easy to monitor risk measurement control far more intuitive than
Standard Deviation of Return. Use of VAR in this context, as well as a worthwhile critque on
board governance practices as it relates to investment management oversight in general can be
found in 'Best Practices in Governance".[11]
[edit]Risk measure and risk metric
The term “VaR” is used both for a risk measure and a risk metric. This sometimes leads to
confusion. Sources earlier than 1995 usually emphasize the risk measure, later sources are more
likely to emphasize the metric.
The VaR risk measure defines risk as mark-to-market loss on a fixed portfolio over a fixed time
horizon, assuming normal markets. There are many alternative risk measures in finance. Instead
of mark-to-market, which uses market prices to define loss, loss is often defined as change in
fundamental value. For example, if an institution holds a loan that declines in market price
because interest rates go up, but has no change in cash flows or credit quality, some systems do
not recognize a loss. Or we could try to incorporate the economic cost of things not measured in
daily financial statements, such as loss of market confidence or employee morale, impairment of
brand names or lawsuits.[4]
Rather than assuming a fixed portfolio over a fixed time horizon, some risk measures incorporate
the effect of expected trading (such as a stop loss order) and consider the expected holding
period of positions. Finally, some risk measures adjust for the possible effects of abnormal
markets, rather than excluding them from the computation.[4]
The VaR risk metric summarizes the distribution of possible losses by a quantile, a point with a
specified probability of greater losses. Common alternative metrics are standard deviation, mean
absolute deviation, expected shortfall and downside risk.[1]
[edit]VaR risk management
Supporters of VaR-based risk management claim the first and possibly greatest benefit of VaR is
the improvement in systems and modeling it forces on an institution. In 1997, Philippe Jorion
wrote:[12]
[T]he greatest benefit of VAR lies in the imposition of a structured methodology for critically thinking
about risk. Institutions that go through the process of computing their VAR are forced to confront their
exposure to financial risks and to set up a proper risk management function. Thus the process of getting to
VAR may be as important as the number itself.
Publishing a daily number, on-time and with specified statistical properties holds every part of a
trading organization to a high objective standard. Robust backup systems and default
assumptions must be implemented. Positions that are reported, modeled or priced incorrectly
stand out, as do data feeds that are inaccurate or late and systems that are too-frequently down.
Anything that affects profit and loss that is left out of other reports will show up either in inflated
VaR or excessive VaR breaks. “A risk-taking institution that does not compute VaR might
escape disaster, but an institution that cannot compute VaR will not.” [13]
The second claimed benefit of VaR is that it separates risk into two regimes. Inside the VaR
limit, conventional statistical methods are reliable. Relatively short-term and specific data can be
used for analysis. Probability estimates are meaningful, because there are enough data to test
them. In a sense, there is no true risk because you have a sum of many independent observations
with a left bound on the outcome. A casino doesn't worry about whether red or black will come
up on the next roulette spin. Risk managers encourage productive risk-taking in this regime,
because there is little true cost. People tend to worry too much about these risks, because they
happen frequently, and not enough about what might happen on the worst days.[14]
Outside the VaR limit, all bets are off. Risk should be analyzed with stress testing based on long-
term and broad market data.[15] Probability statements are no longer meaningful.[16] Knowing the
distribution of losses beyond the VaR point is both impossible and useless. The risk manager
should concentrate instead on making sure good plans are in place to limit the loss if possible,
and to survive the loss if not.[1]
One specific system uses three regimes.[17]
1. Out to three times VaR are normal occurrences. You expect periodic VaR
breaks. The loss distribution typically has fat tails, and you might get more
than one break in a short period of time. Moreover, markets may be
abnormal and trading may exacerbate losses, and you may take losses not
measured in daily marks such as lawsuits, loss of employee morale and
market confidence and impairment of brand names. So an institution that
can't deal with three times VaR losses as routine events probably won't
survive long enough to put a VaR system in place.
2. Three to ten times VaR is the range for stress testing. Institutions should be
confident they have examined all the foreseeable events that will cause
losses in this range, and are prepared to survive them. These events are too
rare to estimate probabilities reliably, so risk/return calculations are useless.
3. Foreseeable events should not cause losses beyond ten times VaR. If they do
they should be hedged or insured, or the business plan should be changed to
avoid them, or VaR should be increased. It's hard to run a business if
foreseeable losses are orders of magnitude larger than very large everyday
losses. It's hard to plan for these events, because they are out of scale with
daily experience. Of course there will be unforeseeable losses more than ten
times VaR, but it's pointless to anticipate them, you can't know much about
them and it results in needless worrying. Better to hope that the discipline of
preparing for all foreseeable three-to-ten times VaR losses will improve
chances for surviving the unforeseen and larger losses that inevitably occur.
"A risk manager has two jobs: make people take more risk the 99% of the time it is safe to do so,
and survive the other 1% of the time. VaR is the border."[13]
[edit]VaR risk measurement
The VaR risk measure is a popular way to aggregate risk across an institution. Individual
business units have risk measures such as duration for a fixed incomeportfolio or beta for an
equity business. These cannot be combined in a meaningful way.[1] It is also difficult to aggregate
results available at different times, such as positions marked in different time zones, or a high
frequency trading desk with a business holding relatively illiquid positions. But since every
business contributes to profit and loss in an additive fashion, and many financial businesses
mark-to-market daily, it is natural to define firm-wide risk using the distribution of possible
losses at a fixed point in the future.[4]
In risk measurement, VaR is usually reported alongside other risk metrics such as standard
deviation, expected shortfall and “greeks” (partial derivatives of portfolio value with respect to
market factors). VaR is a distribution-free metric, that is it does not depend on assumptions about
the probability distribution of future gains and losses.[13] The probability level is chosen deep
enough in the left tail of the loss distribution to be relevant for risk decisions, but not so deep as
to be difficult to estimate with accuracy.[18]
Risk measurement VaR is sometimes called parametric VaR. This usage can be confusing,
however, because it can be estimated either parametrically (for examples, variance-covariance
VaR or delta-gamma VaR) or nonparametrically (for examples, historical simulation VaR or
resampled VaR). The inverse usage makes more logical sense, because risk management VaR is
fundamentally nonparametric, but it is seldom referred to as nonparametric VaR.[4][6]
[edit]History of VaR
The problem of risk measurement is an old one in statistics, economics and finance. Financial
risk management has been a concern of regulators and financial executives for a long time as
well. Retrospective analysis has found some VaR-like concepts in this history. But VaR did not
emerge as a distinct concept until the late 1980s. The triggering event was the stock market crash
of 1987. This was the first major financial crisis in which a lot of academically-trained quants
were in high enough positions to worry about firm-wide survival.[1]
The crash was so unlikely given standard statistical models, that it called the entire basis of quant
finance into question. A reconsideration of history led some quants to decide there were
recurring crises, about one or two per decade, that overwhelmed the statistical assumptions
embedded in models used for trading, investment management and derivative pricing. These
affected many markets at once, including ones that were usually not correlated, and seldom had
discernible economic cause or warning (although after-the-fact explanations were plentiful).[16]
Much later, they were named "Black Swans" by Nassim Taleb and the concept extended far
beyond finance.[19]
If these events were included in quantitative analysis they dominated results and led to strategies
that did not work day to day. If these events were excluded, the profits made in between "Black
Swans" could be much smaller than the losses suffered in the crisis. Institutions could fail as a
result.[13][16][19]
VaR was developed as a systematic way to segregate extreme events, which are studied
qualitatively over long-term history and broad market events, from everyday price movements,
which are studied quantitatively using short-term data in specific markets. It was hoped that
"Black Swans" would be preceded by increases in estimated VaR or increased frequency of VaR
breaks, in at least some markets. The extent to which has proven to be true is controversial.[16]
Abnormal markets and trading were excluded from the VaR estimate in order to make it
observable.[14] It is not always possible to define loss if, for example, markets are closed as after
9/11, or severely illiquid, as happened several times in 2008.[13] Losses can also be hard to define
if the risk-bearing institution fails or breaks up.[14] A measure that depends on traders taking
certain actions, and avoiding other actions, can lead to self reference.[1]
This is risk management VaR. It was well-established in quantative trading groups at several
financial institutions, notably Bankers Trust, before 1990, although neither the name nor the
definition had been standardized. There was no effort to aggregate VaRs across trading desks.[16]
The financial events of the early 1990s found many firms in trouble because the same underlying
bet had been made at many places in the firm, in non-obvious ways. Since many trading desks
already computed risk management VaR, and it was the only common risk measure that could be
both defined for all businesses and aggregated without strong assumptions, it was the natural
choice for reporting firmwide risk. J. P. Morgan CEO Dennis Weatherstone famously called for
a “4:15 report” that combined all firm risk on one page, available within 15 minutes of the
market close.[8]
Risk measurement VaR was developed for this purpose. Development was most extensive at J.
P. Morgan, which published the methodology and gave free access to estimates of the necessary
underlying parameters in 1994. This was the first time VaR had been exposed beyond a
relatively small group of quants. Two years later, the methodology was spun off into an
independent for-profit business now part of RiskMetrics Group.[8]
In 1997, the U.S. Securities and Exchange Commission ruled that public corporations must
disclose quantitative information about their derivatives activity. Major banks and dealers chose
to implement the rule by including VaR information in the notes to their financial statements.[1]
Worldwide adoption of the Basel II Accord, beginning in 1999 and nearing completion today,
gave further impetus to the use of VaR. VaR is the preferred measure of market risk, and
concepts similar to VaR are used in other parts of the accord.[1]
[edit]Mathematics
"Given some confidence level the VaR of the portfolio at the confidence level α is
given by the smallest number l such that the probability that the loss L exceeds l is not larger
than (1 − α)"[3]

The left equality is a definition of VaR. The right equality assumes an underlying probability
distribution, which makes it true only for parametric VaR. Risk managers typically assume that
some fraction of the bad events will have undefined losses, either because markets are closed or
illiquid, or because the entity bearing the loss breaks apart or loses the ability to compute
accounts. Therefore, they do not accept results based on the assumption of a well-defined
probability distribution.[6]Nassim Taleb has labeled this assumption, "charlatanism."[20] On the
other hand, many academics prefer to assume a well-defined distribution, albeit usually one with
fat tails.[1] This point has probably caused more contention among VaR theorists than any other.[8]
[edit]Criticism
VaR has been controversial since it moved from trading desks into the public eye in 1994. A
famous 1997 debate between Nassim Taleb and Philippe Jorion set out some of the major points
of contention. Taleb claimed VaR:[21]
1. Ignored 2,500 years of experience in favor of untested models built by non-
traders
2. Was charlatanism because it claimed to estimate the risks of rare events,
which is impossible
3. Gave false confidence
4. Would be exploited by traders
More recently David Einhorn and Aaron Brown debated VaR in Global Association of Risk
Professionals Review[13][22] Einhorn compared VaR to “an airbag that works all the time, except
when you have a car accident.” He further charged that VaR:
1. Led to excessive risk-taking and leverage at financial institutions
2. Focused on the manageable risks near the center of the distribution and
ignored the tails
3. Created an incentive to take “excessive but remote risks”
4. Was “potentially catastrophic when its use creates a false sense of security
among senior executives and watchdogs.”
New York Times reporter Joe Nocera wrote an extensive piece Risk Mismanagement[23] on
January 4, 2009 discussing the role VaR played in the Financial crisis of 2007-2008. After
interviewing risk managers (including several of the ones cited above) the article suggests that
VaR was very useful to risk experts, but nevertheless exacerbated the crisis by giving false
security to bank executives and regulators. A powerful tool for professional risk managers, VaR
is portrayed as both easy to misunderstand, and dangerous when misunderstood.
A common complaint among academics is that VaR is not subadditive.[4] That means the VaR of
a combined portfolio can be larger than the sum of the VaRs of its components. To a practicing
risk manager this makes sense. For example, the average bank branch in the United States is
robbed about once every ten years. A single-branch bank has about 0.004% chance of being
robbed on a specific day, so the risk of robbery would not figure into one-day 1% VaR. It would
not even be within an order of magnitude of that, so it is in the range where the institution should
not worry about it, it should insure against it and take advice from insurers on precautions. The
whole point of insurance is to aggregate risks that are beyond individual VaR limits, and bring
them into a large enough portfolio to get statistical predictability. It does not pay for a one-
branch bank to have a security expert on staff.
As institutions get more branches, the risk of a robbery on a specific day rises to within an order
of magnitude of VaR. At that point it makes sense for the institution to run internal stress tests
and analyze the risk itself. It will spend less on insurance and more on in-house expertise. For a
very large banking institution, robberies are a routine daily occurrence. Losses are part of the
daily VaR calculation, and tracked statistically rather than case-by-case. A sizable in-house
security department is in charge of prevention and control, the general risk manager just tracks
the loss like any other cost of doing business.
As portfolios or institutions get larger, specific risks change from low-probability/low-
predictability/high-impact to statistically predictable losses of low individual impact. That means
they move from the range of far outside VaR, to be insured, to near outside VaR, to be analyzed
case-by-case, to inside VaR, to be treated statistically.[13]
Even VaR supporters generally agree there are common abuses of VaR:[6][8]
1. Referring to VaR as a "worst-case" or "maximum tolerable" loss. In fact, you
expect two or three losses per year that exceed one-day 1% VaR.
2. Making VaR control or VaR reduction the central concern of risk
management. It is far more important to worry about what happens when
losses exceed VaR.
3. Assuming plausible losses will be less than some multiple, often three, of
VaR. The entire point of VaR is that losses can be extremely large, and
sometimes impossible to define, once you get beyond the VaR point. To a risk
manager, VaR is the level of losses at which you stop trying to guess what
will happen next, and start preparing for anything.
4. Reporting a VaR that has not passed a backtest. Regardless of how VaR is
computed, it should have produced the correct number of breaks (within
sampling error) in the past. A common specific violation of this is to report a
VaR based on the unverified assumption that everything follows a
multivariate normal distribution.

Arbitrage pricing theory


From Wikipedia, the free encyclopedia
Jump to: navigation, search

Arbitrage pricing theory (APT), in finance, is a general theory of asset pricing, that has
become influential in the pricing of stocks.
APT holds that the expected return of a financial asset can be modeled as a linear function of
various macro-economic factors or theoretical market indices, where sensitivity to changes in
each factor is represented by a factor-specific beta coefficient. The model-derived rate of return
will then be used to price the asset correctly - the asset price should equal the expected end of
period price discounted at the rate implied by model. If the price diverges, arbitrage should bring
it back into line.
The theory was initiated by the economistStephen Ross in 1976.
[edit]The APT model
Risky asset returns are said to follow a factor structure if they can be expressed as:

where

• E(rj) is the jth asset's expected return,


• Fk is a systematic factor (assumed to have mean zero),
• bjk is the sensitivity of the jth asset to factor k, also called factor
loading,
• and εj is the risky asset's idiosyncratic random shock with mean zero.
Idiosyncratic shocks are assumed to be uncorrelated across assets and uncorrelated with the
factors.
The APT states that if asset returns follow a factor structure then the following relation exists
between expected returns and the factor sensitivities:

where

• RPk is the risk premium of the factor,


• rf is the risk-free rate,

That is, the expected return of an asset j is a linear function of the assets sensitivities to the n
factors.
Note that there are some assumptions and requirements that have to be fulfilled for the latter to
be correct: There must be perfect competition in the market, and the total number of factors may
never surpass the total number of assets (in order to avoid the problem of matrix singularity),
[edit]Arbitrage and the APT
Arbitrage is the practice of taking advantage of a state of imbalance between two (or possibly
more) markets and thereby making a risk-free profit; see Rational pricing.
[edit]Arbitrage in expectations
The CAPM and its extensions are based on specific assumptions on investors’ asset demand. For
example: • Investors care only about mean return and variance. • Investors hold only traded
assets.
[edit]Arbitrage mechanics
In the APT context, arbitrage consists of trading in two assets – with at least one being
mispriced. The arbitrageur sells the asset which is relatively too expensive and uses the proceeds
to buy one which is relatively too cheap.
Under the APT, an asset is mispriced if its current price diverges from the price predicted by the
model. The asset price today should equal the sum of all future cash flows discounted at the APT
rate, where the expected return of the asset is a linear function of various factors, and sensitivity
to changes in each factor is represented by a factor-specific beta coefficient.
A correctly priced asset here may be in fact a synthetic asset - a portfolio consisting of other
correctly priced assets. This portfolio has the same exposure to each of the macroeconomic
factors as the mispriced asset. The arbitrageur creates the portfolio by identifying x correctly
priced assets (one per factor plus one) and then weighting the assets such that portfolio beta per
factor is the same as for the mispriced asset.
When the investor is long the asset and short the portfolio (or vice versa) he has created a
position which has a positive expected return (the difference between asset return and portfolio
return) and which has a net-zero exposure to any macroeconomic factor and is therefore risk free
(other than for firm specific risk). The arbitrageur is thus in a position to make a risk-free profit:
Where today's price is too low:
The implication is that at the end of the period the portfolio would have
appreciated at the rate implied by the APT, whereas the mispriced asset
would have appreciated at more than this rate. The arbitrageur could
therefore:

Today:

1 short sell the portfolio

2 buy the mispriced asset with the proceeds.

At the end of the period:

1 sell the mispriced asset

2 use the proceeds to buy back the portfolio

3 pocket the difference.

Where today's price is too high:


The implication is that at the end of the period the portfolio would have
appreciated at the rate implied by the APT, whereas the mispriced asset
would have appreciated at less than this rate. The arbitrageur could
therefore:
Today:

1 short sell the mispriced asset

2 buy the portfolio with the proceeds.

At the end of the period:

1 sell the portfolio

2 use the proceeds to buy back the mispriced asset

3 pocket the difference.

[edit]Relationship with the capital asset pricing model


The APT along with the capital asset pricing model (CAPM) is one of two influential theories on
asset pricing. The APT differs from the CAPM in that it is less restrictive in its assumptions. It
allows for an explanatory (as opposed to statistical) model of asset returns. It assumes that each
investor will hold a unique portfolio with its own particular array of betas, as opposed to the
identical "market portfolio". In some ways, the CAPM can be considered a "special case" of the
APT in that the securities market line represents a single-factor model of the asset price, where
beta is exposed to changes in value of the market.
Additionally, the APT can be seen as a "supply-side" model, since its beta coefficients reflect the
sensitivity of the underlying asset to economic factors. Thus, factor shocks would cause
structural changes in assets' expected returns, or in the case of stocks, in firms' profitabilities.
On the other side, the capital asset pricing model is considered a "demand side" model. Its
results, although similar to those of the APT, arise from a maximization problem of each
investor's utility function, and from the resulting market equilibrium (investors are considered to
be the "consumers" of the assets).
[edit]Using the APT
[edit]Identifying the factors
As with the CAPM, the factor-specific Betas are found via a linear regression of historical
security returns on the factor in question. Unlike the CAPM, the APT, however, does not itself
reveal the identity of its priced factors - the number and nature of these factors is likely to change
over time and between economies. As a result, this issue is essentially empirical in nature.
Several a priori guidelines as to the characteristics required of potential factors are, however,
suggested:
1. their impact on asset prices manifests in their unexpected movements
2. they should represent undiversifiable influences (these are, clearly, more
likely to be macroeconomic rather than firm-specific in nature)
3. timely and accurate information on these variables is required
4. the relationship should be theoretically justifiable on economic grounds
Chen, Roll and Ross (1986) identified the following macro-economic factors as significant in
explaining security returns:
• surprises in inflation;
• surprises in GNP as indicted by an industrial production index;
• surprises in investor confidence due to changes in default premium in
corporate bonds;
• surprise shifts in the yield curve.
As a practical matter, indices or spot or futures market prices may be used in place of macro-
economic factors, which are reported at low frequency (e.g. monthly) and often with significant
estimation errors. Market indices are sometimes derived by means of factor analysis. More direct
"indices" that might be used are:
• short term interest rates;
• the difference in long-term and short-term interest rates;
• a diversified stock index such as the S&P 500 or NYSE Composite Index;
• oil prices
• gold or other precious metal prices
• Currency exchange rates
[edit]APT and asset management
The linear factor model structure of the APT is used as the basis for many of the commercial risk
systems employed by asset managers. These include MSCI Barra, APT, Northfield and Axioma.

Liquidity premium
From Wikipedia, the free encyclopedia
Jump to: navigation, search

The introduction to this article provides insufficient context for those


unfamiliar with the subject. Please help improve the article with a good
introductory style. (October 2009)

Liquidity premium is a term used to explain a difference between two types of financial
securities (e.g. stocks), that have all the same qualities except liquidity. For example:
Liquidity premium is a segment of a three-part theory that works to explain the behavior of
yield curves for interest rates. The upwards-curving component of the interest yield can be
explained by the liquidity premium. The reason behind this is that short term securities are less
risky compared to long term rates due to the difference in maturity dates. Therefore investors
expect a premium, or risk premium for investing in the risky security.
or
Assets that are traded on an organized market are more liquid. Financial disclosure requirements
are more stringent for quoted companies. For a given economic result, organized liquidity and
transparency make the value of quoted share higher than the market value of an unquoted share.
The difference in the prices of two assets, which are similar in all aspects except liquidity, is
called the liquidity premium.
Flight-to-quality
From Wikipedia, the free encyclopedia
(Redirected from Flight to quality)

Jump to: navigation, search

A flight-to-quality is a stock market phenomenon occurring when investors sell what they
perceive to be higher-risk investments and purchase safer investments, such as US Treasuries,
gold or land. This is considered a sign of fear in the marketplace, as investors seek less risk in
exchange for lower profits.
This also is the increased demand for assets that are government-backed, while is also a decline
in private asset

Fundamental theorem of arbitrage-free


pricing
From Wikipedia, the free encyclopedia
Jump to: navigation, search

Please help improve this article by expanding it. Further information


might be found on the talk page. (December 2007)

In a general sense, the fundamental theorem of arbitrage/finance is a way to relate arbitrage


opportunities with risk neutral measures that are equivalent to the original probability measure.
[edit]In a finite state market
In a finite state market, the fundamental theorem of arbitrage has two parts. The first part relates
to existence of a risk neutral measure, while the second relates to the uniqueness of the measure
(see Harrison and Pliska):
1. The first part states that there is no arbitrage if and only if there exists a risk
neutral measure that is equivalent to the original probability measure.
2. The second part states provided absence of arbitrage, a market is complete if
and only if there is a unique risk neutral measure that is equivalent to the
original probability measure.
The fundamental theorem of pricing is a way for the concept of arbitrage to be converted to a
question about whether or not a risk neutral measure exists.
[edit]In more general markets
When stock price returns follow a single Brownian motion, there is a unique risk neutral
measure. When the stock price process is assumed to follow a more general semi-martingale (see
Delbaen and Schachermayer), then the concept of arbitrage is too strong, and a weaker concept
such as no free lunch with vanishing risk must be used to describe these opportunities in an
infinite dimensional setting.

Forward measure
In finance, a T-forward measure is a pricing measure absolutely continuous with respect to a
risk-neutral measure but rather than using the money market as numeraire, it uses a bond with
maturity T.
[edit]Mathematical definition
Let

be the bank account or money market account numeraire and

be the discount factor in the market at time 0 for maturity T. If Q* is the risk neutral measure,
then the forward measure QT is defined via the Radon–Nikodym derivative given by

Note that this implies that the forward measure and the risk neutral measure coincide when
interest rates are deterministic. Also, this is a particular form of the change of numeraire formula
by changing the numeraire from the money market or bank account B(t) to a T-maturity bond
P(t,T). Indeed, if in general

is the price of a zero coupon bond at time t for maturity T, where is the filtration denoting
market information at time t, then we can write

from which it is indeed clear that the forward T measure is associated to the T-maturity zero
coupon bond as numeraire. For a more detailed discussion see Brigo and Mercurio (2001).
[edit]Consequences
Under the forward measure, forward prices are martingales. Compare with futures prices, which
are martingales under the risk neutral measure. Note that when interest rates are deterministic,
this implies that forward prices and futures prices are the same.
For example, the discounted stock price is a martingale under the risk-neutral measure:

The forward price is given by . Thus, we have FS(T,T)

by the abstract Bayes' rule. The last term is equal to unity by definition of the bond price so that
we get

Law of one price


From Wikipedia, the free encyclopedia
Jump to: navigation, search

The law of one price is an economic law stated as: "In an efficient market all identical goods
must have only one price."
The intuition for this law is that all sellers will flock to the highest prevailing price, and all
buyers to the lowest current market price. In an efficient market the convergence on one price is
instant.
[edit]An example: Financial markets
Commodities can be traded on financial markets, where there will be a single offer price, and bid
price. Although there is a small spread between these two values the law of one price applies (to
each). No trader will sell the commodity at a lower price than the market maker's offer-level or
buy at a higher price than the market maker's bid-level. In either case moving away from the
prevailing price would either leave no takers, or be charity.
In the derivatives market the law applies to financial instruments which appear different, but
which resolve to the same set of cash flows; see Rational pricing. Thus:
"a security must have a single price, no matter how that security is created.
For example, if an option can be created using two different sets of
underlying securities, then the total price for each would be the same or else
an arbitrage opportunity would exist." A similar argument can be used by
considering arrow securities as alluded to by Arrow and Debrue (1944).[1]
[edit]Where the law does not apply
• The law does not apply intertemporally, so prices for the same item can be
different at different times in one market. The application of the law to
financial markets in the example above is obscured by the fact that the
market maker's prices are continually moving in liquid markets. However, at
the moment each trade is executed, the law is in force (it would normally be
against exchange rules to break it).
• The law also need not apply if buyers have less than perfect information
about where to find the lowest price. In this case, sellers face a tradeoff
between the frequency and the profitability of their sales. That is, firms may
be indifferent between posting a high price (thus selling infrequently,
because most consumers will search for a lower one) and a low price (at
which they will sell more often, but earn less profit per sale).[2]
• The Balassa-Samuelson effect argues that the law of one price is not
applicable to all goods internationally, because some goods are not tradable.
It argues that the consumption may be cheaper in some countries than
others, because nontradables (especially land and labor) are cheaper in less
developed countries. This can make a typical consumption basket cheaper in
a less developed country, even if some goods in that basket have their prices
equalized by international trade.

[edit]Apparent violations
• The best-known example of an apparent violation of the law was Royal
Dutch / Shell shares. After merging in 1907, holders of Royal Dutch Petroleum
(traded in Amsterdam) and Shell Transport shares (traded in London) were
entitled to 60% and 40% respectively of all future profits. Royal Dutch shares
should therefore automatically have been priced at 50% more than Shell
shares. However, they diverged from this by up to 15%.[3] This discrepancy
disappeared with their final merger in 2005.

Rational pricing
From Wikipedia, the free encyclopedia
Jump to: navigation, search

Rational pricing is the assumption in financial economics that asset prices (and hence asset
pricing models) will reflect the arbitrage-free price of the asset as any deviation from this price
will be "arbitraged away". This assumption is useful in pricing fixed income securities,
particularly bonds, and is fundamental to the pricing of derivative instruments.
[edit]Arbitrage mechanics
Arbitrage is the practice of taking advantage of a state of imbalance between two (or possibly
more) markets. Where this mismatch can be exploited (i.e. after transaction costs, storage costs,
transport costs, dividends etc.) the arbitrageur "locks in" a risk free profit without investing any
of his own money.
In general, arbitrage ensures that "the law of one price" will hold; arbitrage also equalises the
prices of assets with identical cash flows, and sets the price of assets with known future cash
flows.
[edit]The law of one price
The same asset must trade at the same price on all markets ("the law of one price"). Where this is
not true, the arbitrageur will:
1. buy the asset on the market where it has the lower price, and simultaneously
sell it (short) on the second market at the higher price
2. deliver the asset to the buyer and receive that higher price
3. pay the seller on the cheaper market with the proceeds and pocket the
difference.
[edit]Assets with identical cash flows
Two assets with identical cash flows must trade at the same price. Where this is not true, the
arbitrageur will:
1. sell the asset with the higher price (short sell) and simultaneously buy the
asset with the lower price
2. fund his purchase of the cheaper asset with the proceeds from the sale of the
expensive asset and pocket the difference
3. deliver on his obligations to the buyer of the expensive asset, using the cash
flows from the cheaper asset.
[edit]An asset with a known future-price
An asset with a known price in the future, must today trade at that price discounted at the risk
free rate.
Note that this condition can be viewed as an application of the above, where the two assets in
question are the asset to be delivered and the risk free asset.
(a) where the discounted future price is higher than today's price:
1. The arbitrageur agrees to deliver the asset on the future date (i.e. sells
forward) and simultaneously buys it today with borrowed money.
2. On the delivery date, the arbitrageur hands over the underlying, and receives
the agreed price.
3. He then repays the lender the borrowed amount plus interest.
4. The difference between the agreed price and the amount owed is the
arbitrage profit.
(b) where the discounted future price is lower than today's price:
1. The arbitrageur agrees to pay for the asset on the future date (i.e. buys
forward) and simultaneously sells (short) the underlying today; he invests the
proceeds.
2. On the delivery date, he cashes in the matured investment, which has
appreciated at the risk free rate.
3. He then takes delivery of the underlying and pays the agreed price using the
matured investment.
4. The difference between the maturity value and the agreed price is the
arbitrage profit.
It will be noted that (b) is only possible for those holding the asset but not needing it until the
future date. There may be few such parties if short-term demand exceeds supply, leading to
backwardation.
[edit]Fixed income securities
Rational pricing is one approach used in pricing fixed rate bonds. Here, each cash flow can be
matched by trading in (a) some multiple of a zero-coupon bond corresponding to the coupon
date, and of equivalent credit worthiness (if possible, from the same issuer as the bond being
valued) with the corresponding maturity, or (b) in a corresponding strip and ZCB.
Given that the cash flows can be replicated, the price of the bond, must today equal the sum of
each of its cash flows discounted at the same rate as each ZCB, as above. Were this not the case,
arbitrage would be possible and would bring the price back into line with the price based on
ZCBs; see Bond valuation: Arbitrage-free pricing approach

The pricing formula is as below, where each cash flow is discounted at the rate which
matches that of the coupon date:

Price =

Often, the formula is expressed as , using prices instead of rates, as


prices are more readily available.
See also Fixed income arbitrage; Bond credit rating.
[edit]Pricing derivatives
A derivative is an instrument which allows for buying and selling of the same asset on two
markets – the spot market and the derivatives market. Mathematical finance assumes that any
imbalance between the two markets will be arbitraged away. Thus, in a correctly priced
derivative contract, the derivative price, the strike price (or reference rate), and the spot price will
be related such that arbitrage is not possible.
see: Fundamental theorem of arbitrage-free pricing

[edit]Futures
In a futures contract, for no arbitrage to be possible, the price paid on delivery (the forward
price) must be the same as the cost (including interest) of buying and storing the asset. In other
words, the rational forward price represents the expected future value of the underlying
discounted at the risk free rate (the "asset with a known future-price", as above). Thus, for a
simple, non-dividend paying asset, the value of the future/forward, , will be found by

accumulating the present value at time to maturity by the rate of risk-free return .

This relationship may be modified for storage costs, dividends, dividend yields, and convenience
yields; see futures contract pricing.
Any deviation from this equality allows for arbitrage as follows.
• In the case where the forward price is higher:
1. The arbitrageur sells the futures contract and buys the underlying today (on
the spot market) with borrowed money.
2. On the delivery date, the arbitrageur hands over the underlying, and receives
the agreed forward price.
3. He then repays the lender the borrowed amount plus interest.
4. The difference between the two amounts is the arbitrage profit.
• In the case where the forward price is lower:
1. The arbitrageur buys the futures contract and sells the underlying today (on
the spot market); he invests the proceeds.
2. On the delivery date, he cashes in the matured investment, which has
appreciated at the risk free rate.
3. He then receives the underlying and pays the agreed forward price using the
matured investment. [If he was short the underlying, he returns it now.]
4. The difference between the two amounts is the arbitrage profit.
[edit]Options
As above, where the value of an asset in the future is known (or expected), this value can be used
to determine the asset's rational price today. In an option contract, however, exercise is
dependent on the price of the underlying, and hence payment is uncertain. Option pricing models
therefore include logic which either "locks in" or "infers" this future value; both approaches
deliver identical results. Methods which lock-in future cash flows assume arbitrage free pricing,
and those which infer expected value assume risk neutral valuation.
To do this, (in their simplest, though widely used form) both approaches assume a “Binomial
model” for the behavior of the underlying instrument, which allows for only two states - up or
down. If S is the current price, then in the next period the price will either be S up or S down.
Here, the value of the share in the up-state is S × u, and in the down-state is S × d (where u and d
are multipliers with d < 1 < u and assuming d < 1+r < u; see the binomial options model). Then,
given these two states, the "arbitrage free" approach creates a position which will have an
identical value in either state - the cash flow in one period is therefore known, and arbitrage
pricing is applicable. The risk neutral approach infers expected option value from the intrinsic
values at the later two nodes.
Although this logic appears far removed from the Black-Scholes formula and the lattice
approach in the Binomial options model, it in fact underlies both models; see The Black-Scholes
PDE. The assumption of binomial behaviour in the underlying price is defensible as the number
of time steps between today (valuation) and exercise increases, and the period per time-step is
increasingly short. The Binomial options model allows for a high number of very short time-
steps (if coded correctly), while Black-Scholes, in fact, models a continuous process.
The examples below have shares as the underlying, but may be generalised to other instruments.
The value of a put option can be derived as below, or may be found from the value of the call
using put-call parity.
[edit]Arbitrage free pricing
Here, the future payoff is "locked in" using either "delta hedging" or the "replicating portfolio"
approach. As above, this payoff is then discounted, and the result is used in the valuation of the
option today.
[edit]Delta hedging
It is possible to create a position consisting of Δcalls sold and 1 share, such that the position’s
value will be identical in the S up and S down states, and hence known with certainty (see Delta
hedging). This certain value corresponds to the forward price above ("An asset with a known
future price"), and as above, for no arbitrage to be possible, the present value of the position
must be its expected future value discounted at the risk free rate, r. The value of a call is then
found by equating the two.
1) Solve for Δ such that:
value of position in one period = S up - Δ × MAX (S up – strike price ) = S
down - Δ × MAX (S down – strike price)

2) solve for the value of the call, using Δ, where:


value of position today = value of position in one period ÷ (1 + r) = S current
– Δ × value of call

[edit]The replicating portfolio


It is possible to create a position consisting of Δ shares and $B borrowed at the risk free rate,
which will produce identical cash flows to one option on the underlying share. The position
created is known as a "replicating portfolio" since its cash flows replicate those of the option. As
shown above ("Assets with identical cash flows"), in the absence of arbitrage opportunities, since
the cash flows produced are identical, the price of the option today must be the same as the value
of the position today.
1) Solve simultaneously for Δ and B such that:
i) Δ × S up - B × (1 + r) = MAX ( 0, S up – strike price )

ii) Δ × S down - B × (1 + r) = MAX ( 0, S down – strike price )

2) solve for the value of the call, using Δ and B, where:


call = Δ × S current - B
Note that here there is no discounting - the interest rate appears only as part of the construction.
This approach is therefore used in preference to others where it is not clear whether the risk free
rate may be applied as the discount rate at each decision point, or whether, instead, a premium
over risk free would be required. The best example of this would be under Real options analysis
where managements' actions actually change the risk characteristics of the project in question,
and hence the Required rate of return could differ in the up- and down-states. Here, in the above
formulae, we then have: "Δ × S up - B × (1 + r up)..." and "Δ × S down - B × (1 + r down)..." .
[edit]Risk neutral valuation
Here the value of the option is calculated using the risk neutrality assumption. Under this
assumption, the “expected value” (as opposed to "locked in" value) is discounted. The expected
value is calculated using the intrinsic values from the later two nodes: “Option up” and “Option
down”, with u and d as price multipliers as above. These are then weighted by their respective
probabilities: “probability” p of an up move in the underlying, and “probability” (1-p) of a down
move. The expected value is then discounted at r, the risk free rate.
1) solve for p
for no arbitrage to be possible in the share, today’s price must represent its
expected value discounted at the risk free rate:

S = [ p × (up value) + (1-p) ×(down value) ] ÷ (1+r) = [ p × S × u + (1-p) ×


S × d ] ÷ (1+r)

then, p = [(1+r) - d ] ÷ [ u - d ]

2) solve for call value, using p


for no arbitrage to be possible in the call, today’s price must represent its
expected value discounted at the risk free rate:

Option value = [ p × Option up + (1-p)× Option down] ÷ (1+r)

= [ p × (S up - strike) + (1-p)× (S down - strike) ] ÷ (1+r)

[edit]The risk neutrality assumption


Note that above, the risk neutral formula does not refer to the volatility of the underlying – p as
solved, relates to the risk-neutral measure as opposed to the actual probability distribution of
prices. Nevertheless, both arbitrage free pricing and risk neutral valuation deliver identical
results. In fact, it can be shown that “Delta hedging” and “Risk neutral valuation” use identical
formulae expressed differently. Given this equivalence, it is valid to assume “risk neutrality”
when pricing derivatives.
[edit]Swaps
Rational pricing underpins the logic of swap valuation. Here, two counterparties "swap"
obligations, effectively exchanging cash flow streams calculated against a notional principal
amount, and the value of the swap is the present value (PV) of both sets of future cash flows
"netted off" against each other.
[edit]Valuation at initiation
To be arbitrage free, the terms of a swap contract are such that, initially, the Net present value of
these future cash flows is equal to zero; see swap valuation. For example, consider the valuation
of a fixed-to-floating Interest rate swap where Party A pays a fixed rate, and Party B pays a
floating rate. Here, the fixed rate would be such that the present value of future fixed rate
payments by Party A is equal to the present value of the expected future floating rate payments
(i.e. the NPV is zero). Were this not the case, an Arbitrageur, C, could:
1. assume the position with the lower present value of payments, and borrow
funds equal to this present value
2. meet the cash flow obligations on the position by using the borrowed funds,
and receive the corresponding payments - which have a higher present value
3. use the received payments to repay the debt on the borrowed funds
4. pocket the difference - where the difference between the present value of the
loan and the present value of the inflows is the arbitrage profit.

[edit]Subsequent valuation
Once traded, swaps can also be priced using rational pricing. For example, the Floating leg of an
interest rate swap can be "decomposed" into a series of Forward rate agreements. Here, since the
swap has identical payments to the FRA, arbitrage free pricing must apply as above - i.e. the
value of this leg is equal to the value of the corresponding FRAs. Similarly, the "receive-fixed"
leg of a swap, can be valued by comparison to a Bond with the same schedule of payments.
(Relatedly, given that their underlyings have the same cash flows, bond options and swaptions
are equatable.)
[edit]Pricing shares
The Arbitrage pricing theory (APT), a general theory of asset pricing, has become influential in
the pricing of shares. APT holds that the expected return of a financial asset, can be modelled as
a linear function of various macro-economic factors, where sensitivity to changes in each factor
is represented by a factor specific beta coefficient:

where

• E(rj) is the risky asset's expected return,


• rf is the risk free rate,
• Fk is the macroeconomic factor,
• bjk is the sensitivity of the asset to factor k,
• and εj is the risky asset's idiosyncratic random shock with mean zero.
The model derived rate of return will then be used to price the asset correctly - the asset price
should equal the expected end of period price discounted at the rate implied by model. If the
price diverges, arbitrage should bring it back into line. Here, to perform the arbitrage, the
investor “creates” a correctly priced asset (a synthetic asset) being a portfolio which has the same
net-exposure to each of the macroeconomic factors as the mispriced asset but a different
expected return; see the APT article for detail on the construction of the portfolio. The
arbitrageur is then in a position to make a risk free profit as follows:
• Where the asset price is too low, the portfolio should have appreciated at the
rate implied by the APT, whereas the mispriced asset would have appreciated
at more than this rate. The arbitrageur could therefore:
1. Today: short sell the portfolio and buy the mispriced-asset with the proceeds.
2. At the end of the period: sell the mispriced asset, use the proceeds to buy
back the portfolio, and pocket the difference.
• Where the asset price is too high, the portfolio should have appreciated at
the rate implied by the APT, whereas the mispriced asset would have
appreciated at less than this rate. The arbitrageur could therefore:
1. Today: short sell the mispriced-asset and buy the portfolio with the proceeds.
2. At the end of the period: sell the portfolio, use the proceeds to buy back the
mispriced-asset, and pocket the difference.
Note that under "true arbitrage", the investor locks-in a guaranteed payoff, whereas under APT
arbitrage, the investor locks-in a positive expected payoff. The APT thus assumes "arbitrage in
expectations" - i.e that arbitrage by investors will bring asset prices back into line with the
returns expected by the model.
The Capital asset pricing model (CAPM) is an earlier, (more) influential theory on asset pricing.
Although based on different assumptions, the CAPM can, in some ways, be considered a "special
case" of the APT; specifically, the CAPM's Securities market line represents a single-factor
model of the asset price, where Beta is exposure to changes in value of the Market.

Risk-return spectrum
From Wikipedia, the free encyclopedia
Jump to: navigation, search

The risk-return spectrum is the relationship between the amount of return gained on an
investment and the amount of risk undertaken in that investment.[citation needed] The more return
sought, the more risk that must be undertaken.
[edit]The progression
There are various classes of possible investments, each with their own positions on the overall
risk-return spectrum. The general progression is: short-term debt; long-term debt; property; high-
yield debt; equity. There is considerable overlap of the ranges for each investment class.
All this can be visualised by plotting expected return on the vertical axis against risk (represented
by standard deviation upon that expected return) on the horizontal axis. This line starts at the
risk-free rate and rises as risk rises. The line will tend to be straight, and will be straight at
equilibrium - see discussion below on domination.
For any particular investment type, the line drawn from the risk-free rate on the vertical axis to
the risk-return point for that investment has a slope called the Sharpe ratio.
[edit]Short-term loans to good government bodies
At the lowest end is short-dated loans to government and government-guaranteed entities
(usually semi-independent government departments). The lowest of all is the risk-free rate of
return. The risk-free rate has zero risk (most modern major governments will inflate and
monetise their debts rather than default upon them), but the return is positive because there is
still both the time-preference and inflation premium components of minimum expected rates of
return that must be met or exceeded if the funding is to be forthcoming from providers. The risk-
free rate is commonly approximated by the return paid upon 30-day T-Bills or their equivalent,
but in reality that rate has more to do with the monetary policy of that country's central bank than
the market supply conditions for credit.
[edit]Mid- and long-term loans to good government bodies
The next types of investment is longer-term loans to government, such as 3-year bonds. The
range width is larger, and follows the influence of increasing risk premium required as the
maturity of that debt grows longer. Nevertheless, because it is debt of good government the
highest end of the range is still comparatively low compared to the ranges of other investment
types discussed below.
Also, if the government in question is not at the highest jurisdiction (ie, is a state or municipal
government), or the smaller that government is, the more along the risk-return spectrum that
government's securities will be.
[edit]Short term loans to blue-chip corporations
Following the lowest risk investments are short-dated bills of exchange from major blue-
chipcorporations with the highest credit ratings. The further away from perfect is the credit
rating, the more along the risk-return spectrum is that particular return.
[edit]Mid- and long-term loans to blue-chip corporations
Overlapping the range for short-term debt is the longer term debt from those same well-rated
corporations. These are higher up the range because the maturity has increased. The overlap
occurs of the mid-term debt of the best rated corporations with the short-term debt of the nearly-
but-not perfectly-rated corporations.
In this arena, the debts are called investment grade by the rating agencies. The lower the credit
rating, the higher the yield and thus the expected return.
[edit]Rental property
A commercial property that the investor rents out is comparable in risk or return to a low-
investment-grade. Industrial property has higher risk and returns, followed by residential (with
the possible exception of the investor's own home).
[edit]High-yield debt
After the returns upon all classes of investment-grade debt come the returns on speculative
gradehigh-yield debt (also known derisively as junk bonds). These may come from mid and low
rated corporations, and less politically stable governments.
[edit]Equity
Equity returns are the profits earned by businesses after interest and tax. Even the equity returns
on the highest rated corporations are notably risky. Small-cap stocks are generally riskier than
large-cap; companies that primarily service governments, or provide basic consumer goods such
as food or utilities, tend to be less volatile than those in other industries. Note that since stocks
tend to rise when corporate bonds fall and vice-versa, a portfolio containing a small percentage
of stocks can be less risky than one containing only debts.
[edit]Options and futures
Option and futures contracts often provide leverage on underlying stocks, bonds or commodities;
this increases the returns but also the risks. Note that in some cases, derivatives can be used to
hedge, decreasing the overall risk of the portfolio due to negative correlation with other
investments.
[edit]Why the progression?
The existence of risk causes the need to incur a number of expenses. For example, the more risky
the investment the more time and effort is usually required to obtain information about it and
monitor its progress. For another, the importance of a loss of X amount of value is greater than
the importance of a gain of X amount of value, so a riskier investment will attract a higher risk
premium even if the forecast return is the same as upon a less risky investment. Risk is therefore
something that must be compensated for, and the more risk the more compensation required.
If an investment had a high return with low risk, eventually everyone would want to invest there.
That action would drive down the actual rate of return achieved, until it reached the rate of return
the market deems commensurate with the level of risk. Similarly, if an investment had a low
return with high risk, all the present investors would want to leave that investment, which would
then increase the actual return until again it reached the rate of return the market deems
commensurate with the level of risk. That part of total returns which sets this appropriate level is
called the risk premium.
[edit]Leverage extends the spectrum
The use of leverage can extend the progression out even further. Examples of this include
borrowing funds to invest in equities, or use of derivatives.
If leverage is used then there are two lines instead of one. This is because although one can
invest at the risk-free rate, one can only borrow at an interest rate according to one's own credit-
rating. This is visualised by the new line starting at the point of the riskiest unleveraged
investment (equities) and rising at a lower slope than the original line. If this new line were
traced back to the vertical axis of zero risk, it will cross it at the borrowing rate.
[edit]Domination
All investment types compete against each other, even though they are on different positions on
the risk-return spectrum. Any of the mid-range investments can have their performances
simulated by a portfolio consisting of a risk-free component and the highest-risk component.
This principle, called the separation property, is a crucial feature of Modern Portfolio Theory.
The line is then called the capital market line.
If at any time there is an investment that has a higher Sharpe Ratio than another then that return
is said to dominate. When there are two or more investments above the spectrum line, then the
one with the highest Sharpe Ratio is the most dominant one, even if the risk and return on that
particular investment is lower than another. If every mid-range return falls below the spectrum
line, this means that the highest-risk investment has the highest Sharpe Ratio and so dominates
over all others.
If at any time there is an investment that dominates then funds will tend to be withdrawn from all
others and be redirected to that dominating investment. This action will lower the return on that
investment and raise it on others. The withdrawal and redirection of capital ceases when all
returns are at the levels appropriate for the degrees of risk and commensurate with the
opportunity cost arising from competition with the other investment types on the spectrum,
which means they all tend to end up having the same Sharpe Ratio.
[edit]See also
• Modern portfolio theory
• Risk
• Financial capital
• Investment
• Credit
• Interest
• Ownership equity
• Profit
• Leverage

[edit]References
This article does not cite any references or sources.
Please help improve this article by adding citations to reliable sources. Unsourced
material may be challenged and removed. (September 2007)

Retrieved from "http://en.wikipedia.org/wiki/Risk-return_spectrum"

Categories: Finance

Hidden categories: Articles that may contain original research from January 2008 |
All articles that may contain original research | Articles with topics of unclear
notability from January 2008 | All articles with topics of unclear notability | All
articles with unsourced statements | Articles with unsourced statements from
September 2007 | Articles lacking sources from September 2007 | All articles
lacking sources

Views
• Article
• Discussion
• Edit this page
• History
Personal tools
• Try Beta
• Log in / create account
Navigation
• Main page
• Contents
• Featured content
• Current events
• Random article
Search
Top of Form

Special:Search Go Search

Bottom of Form
Interaction
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
• Donate to Wikipedia
• Help
Toolbox
• What links here
• Related changes
• Upload file
• Special pages
• Printable version
• Permanent link
• Cite this page

• This page was last modified on 6 January 2010 at 22:58.


• Text is available under the Creative Commons Attribution-ShareAlike License;
additional terms may apply. See Terms of Use for details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a
non-profit organization.
• Contact us
• Privacy policy
• About Wikipedia
• Disclaimers

Linear combination
From Wikipedia, the free encyclopedia
Jump to: navigation, search

This article does not cite any references or sources.


Please help improve this article by adding citations to reliable sources. Unsourced
material may be challenged and removed. (August 2008)

In mathematics, linear combinations is a concept central to linear algebra and related fields of
mathematics. Most of this article deals with linear combinations in the context of a vector space
over a field, with some generalizations given at the end of the article.

Contents
• 1Definition
• 2Examples and counterexamples
○ 2.1Vectors
○ 2.2Functions
○ 2.3Polynomials
• 3The linear span
• 4Linear independence
• 5Affine, conical, and convex
combinations
• 6Operad theory
• 7Generalizations

[edit]Definition
Suppose that K is a field and V is a vector space over K. As usual, we call elements of Vvectors
and call elements of Kscalars. If v1,...,vn are vectors and a1,...,an are scalars, then the linear
combination of those vectors with those scalars as coefficients is

There is some ambiguity in the use of the term "linear combination" as to whether it refers to the
expression or to its value. In most cases the value is meant, like in the assertion "the set of all
linear combinations of v1,...,vn always forms a subspace"; however one could also say "two
different linear combinations can have the same value" in which case the expression must have
been meant. The subtle difference between these uses is the essence of the notion of linear
dependence: a family F of vectors is linearly independent precisely if any linear combination of
the vectors in F (as value) is uniquely so (as expression). In any case, even when viewed as
expressions, all that matters about a linear combination is the coefficient of each vi; trivial
modifications such as permuting the terms or adding terms with zero coefficient are not
considered to give new linear combinations.
In a given situation, K and V may be specified explicitly, or they may be obvious from context.
In that case, we often speak of a linear combination of the vectorsv1,...,vn, with the coefficients
unspecified (except that they must belong to K). Or, if S is a subset of V, we may speak of a
linear combination of vectors in S, where both the coefficients and the vectors are unspecified,
except that the vectors must belong to the set S (and the coefficients must belong to K). Finally,
we may speak simply of a linear combination, where nothing is specified (except that the vectors
must belong to V and the coefficients must belong to K); in this case one is probably referring to
the expression, since every vector in V is certainly the value of some linear combination.
Note that by definition, a linear combination involves only finitely many vectors (except as
described in Generalizations below). However, the set S that the vectors are taken from (if one
is mentioned) can still be infinite; each individual linear combination will only involve finitely
many vectors. Also, there is no reason that n cannot be zero; in that case, we declare by
convention that the result of the linear combination is the zero vector in V.
[edit]Examples and counterexamples
[edit]Vectors
Let the field K be the set R of real numbers, and let the vector space V be the Euclidean spaceR3.
Consider the vectors e1 = (1,0,0), e2 = (0,1,0) and e3 = (0,0,1). Then any vector in R3 is a linear
combination of e1, e2 and e3.
To see that this is so, take an arbitrary vector (a1,a2,a3) in R3, and write:

[edit]Functions
Let K be the set C of all complex numbers, and let V be the set CC(R) of all continuousfunctions
from the real lineR to the complex planeC. Consider the vectors (functions) f and g defined by
f(t) := eit and g(t) := e−it. (Here, e is the base of the natural logarithm, about 2.71828..., and i is the
imaginary unit, a square root of −1.) Some linear combinations of f and g are:


On the other hand, the constant function 3 is not a linear combination of f and g. To see this,
suppose that 3 could be written as a linear combination of eit and e−it. This means that there would
exist complex scalars a and b such that aeit + be−it = 3 for all real numbers t. Setting t = 0 and t =
π gives the equations a + b = 3 and a + b = −3, and clearly this cannot happen. See Euler's
identity.
[edit]Polynomials
Let K be R, C, or any field, and let V be the set P of all polynomials with coefficients taken from
the field K. Consider the vectors (polynomials) p1 := 1, p2 := x + 1, and p3 := x2 + x + 1.
Is the polynomial x2 − 1 a linear combination of p1, p2, and p3? To find out, consider an arbitrary
linear combination of these vectors and try to see when it equals the desired vector x2 − 1.
Picking arbitrary coefficients a1, a2, and a3, we want

Multiplying the polynomials out, this means

and collecting like powers of x, we get

Two polynomials are equal if and only if their corresponding coefficients are equal, so we can
conclude

This system of linear equations can easily be solved. First, the first equation simply says that a3
is 1. Knowing that, we can solve the second equation for a2, which comes out to −1. Finally, the
last equation tells us that a1 is also −1. Therefore, the only possible way to get a linear
combination is with these coefficients. Indeed,

so x2 − 1 is a linear combination of p1, p2, and p3.


On the other hand, what about the polynomial x3 − 1? If we try to make this vector a linear
combination of p1, p2, and p3, then following the same process as before, we’ll get the equation

However, when we set corresponding coefficients equal in this case, the equation for x3 is
which is always false. Therefore, there is no way for this to work, and x3 − 1 is not a linear
combination of p1, p2, and p3.
[edit]The linear span
Main article: linear span
Take an arbitrary field K, an arbitrary vector space V, and let v1,...,vn be vectors (in V). It’s
interesting to consider the set of all linear combinations of these vectors. This set is called the
linear span (or just span) of the vectors, say S ={v1,...,vn}. We write the span of S as span(S) or
sp(S):

[edit]Linear independence
Main article: Linear independence

For some sets of vectors v1,...,vn, a single vector can be written in two different ways as a linear
combination of them:

Equivalently, by subtracting these (ci: = ai − bi) a non-trivial combination is zero:

If that is possible, then v1,...,vn are called linearly dependent; otherwise, they are linearly
independent. Similarly, we can speak of linear dependence or independence of an arbitrary set S
of vectors.
If S is linearly independent and the span of S equals V, then S is a basis for V.
[edit]Affine, conical, and convex combinations
By restricting the coefficients used in linear combinations, one can define the related concepts of
affine combination, conical combination, and convex combination, and the associated notions of
sets closed under these operations.
Type of Restrictions on
Name of set Model space
combination coefficients

Linear Vector
no restrictions
combination subspace

Affine Affine Affine


combination subspace hyperplane
Conical Quadrant/Octa
Convex cone
combination nt

Convex
Convex set Simplex
combination and

Because these are more restricted operations, more subsets will be closed under them, so affine
subsets, convex cones, and convex sets are generalizations of vector subspaces: a vector
subspace is also an affine subspace, a convex cone, and a convex set, but a convex set need not
be a vector subspace, affine, or a convex cone.
These concepts often arise when one can take certain linear combinations of objects, but not any:
for example, probability distributions are closed under convex combination (they form a convex
set), but not conical or affine combinations (or linear), and positive measures are closed under
conical combination but not affine or linear – hence one defines signed measures as the linear
closure.
Linear and affine combinations can be defined over any field (or ring), but conical and convex
combination require a notion of "positive", and hence can only be defined over an ordered field
(or ordered ring), generally the real numbers.
If one allows only scalar multiplication, not addition, one obtains a (not necessarily convex)
cone; one often restricts the definition to only allowing multiplication by positive scalars.
All of these concepts are usually defined as subsets of an ambient vector space (except for affine
spaces, which are also considered as "vector spaces forgetting the origin"), rather than being
axiomatized independently.
[edit]Operad theory
More abstractly, in the language of operad theory, one can consider vector spaces to be algebras
over the operad (the infinite direct sum, so only finitely many terms are non-zero; this
corresponds to only taking finite sums), which parametrizes linear combinations: the vector

for instance corresponds to the linear combination

. Similarly, one can consider affine combinations, conical


combinations, and convex combinations to correspond to the sub-operads where the terms sum to
1, the terms are all non-negative, or both, respectively. Graphically, these are the infinite affine
hyperplane, the infinite hyper-octant, and the infinite simplex. This formalizes what is meant by
being or the standard simplex being model spaces, and such observations as that every
bounded convex polytope is the image of a simplex. Here suboperads correspond to more
restricted operations and thus more general theories.
From this point of view, we can think of linear combinations as the most general sort of
operation on a vector space – saying that a vector space is an algebra over the operad of linear
combinations is precisely the statement that all possible algebraic operations in a vector space
are linear combinations.
The basic operations of addition and scalar multiplication, together with the existence of an
additive identity and additive inverses, cannot be combined in any more complicated way than
the generic linear combination: the basic operations are a generating set for the operad of all
linear combinations.
Ultimately, this fact lies at the heart of the usefulness of linear combinations in the study of
vector spaces.
[edit]Generalizations
If V is a topological vector space, then there may be a way to make sense of certain infinite linear
combinations, using the topology of V. For example, we might be able to speak of a1v1 + a2v2 +
a3v3 + ..., going on forever. Such infinite linear combinations do not always make sense; we call
them convergent when they do. Allowing more linear combinations in this case can also lead to a
different concept of span, linear independence, and basis. The articles on the various flavours of
topological vector spaces go into more detail about these.
If K is a commutative ring instead of a field, then everything that has been said above about
linear combinations generalizes to this case without change. The only difference is that we call
spaces like Vmodules instead of vector spaces. If K is a noncommutative ring, then the concept
still generalizes, with one caveat: Since modules over noncommutative rings come in left and
right versions, our linear combinations may also come in either of these versions, whatever is
appropriate for the given module. This is simply a matter of doing scalar multiplication on the
correct side.
A more complicated twist comes when V is a bimodule over two rings, KL and KR. In that case,
the most general linear combination looks like

where a1,...,an belong to KL, b1,...,bn belong to KR, and v1,...,vn belong to V.
v • d • e

Topics related to linear algebra

Scalar ·Vector ·Vector space ·Vector projection ·Linear span ·Linear map ·Linear
projection ·Linear independence ·Linear combination ·Basis ·Column space ·Row
space ·Dual space ·Orthogonality ·Rank ·Minor ·Kernel (matrix) ·Eigenvalue,
eigenvector and eigenspace ·Least squares regressions ·Outer product ·Inner
product space ·Dot product ·Transpose ·Gram–Schmidt process ·Matrix
decomposition

Retrieved from "http://en.wikipedia.org/wiki/Linear_combination"

Categories: Abstract algebra | Linear algebra

Hidden categories: Articles lacking sources from August 2008 | All articles lacking
sources

Views
• Article
• Discussion
• Edit this page
• History
Personal tools
• Try Beta
• Log in / create account
Navigation
• Main page
• Contents
• Featured content
• Current events
• Random article
Search
Top of Form

Special:Search Go Search

Bottom of Form
Interaction
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
• Donate to Wikipedia
• Help
Toolbox
• What links here
• Related changes
• Upload file
• Special pages
• Printable version
• Permanent link
• Cite this page
Languages
• Català
• Česky
• Deutsch
• Español
• Esperanto
• ‫فارسی‬
• Français
• 한국어
• Italiano
• ‫עברית‬
• Magyar
• Nederlands
• 日本語
• Polski
• Português
• Slovenčina
• Suomi
• Svenska
• Українська
• ‫اردو‬
• Tiếng Việt
• 中文

• This page was last modified on 12 March 2010 at 10:18.


• Text is available under the Creative Commons Attribution-ShareAlike License;
additional terms may apply. See Terms of Use for details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a
non-profit organization.
• Contact us
• Privacy policy
• About Wikipedia
• Disclaimers

Sharpe ratio
From Wikipedia, the free encyclopedia
Jump to: navigation, search
The Sharpe ratio or Sharpe index or Sharpe measure or reward-to-variability ratio is a
measure of the excess return (or Risk Premium) per unit of risk in an investment asset or a
trading strategy, named after William Forsyth Sharpe. Since its revision by the original author in
1994, it is defined as:

where R is the asset return, Rf is the return on a benchmark asset, such as the risk free rate of
return, E[R − Rf] is the expected value of the excess of the asset return over the benchmark
return, and σ is the standard deviation of the asset.[1]
Note, if Rf is a constant risk free return throughout the period,

The Sharpe ratio is used to characterize how well the return of an asset compensates the investor
for the risk taken, the higher the Sharpe ratio number the better. When comparing two assets
each with the expected return E[R] against the same benchmark with return Rf, the asset with
the higher Sharpe ratio gives more return for the same risk. Investors are often advised to pick
investments with high Sharpe ratios. However like any mathematical model it relies on the data
being correct. Pyramid schemes with a long duration of operation would typically provide a high
Sharpe ratio when derived from reported returns but the inputs are false. When examining the
investment performance of assets with smoothing of returns (such as With profits funds) the
Sharpe ratio should be derived from the performance of the underlying assets rather than the
fund returns.
Sharpe ratios, along with Treynor ratios and Jensen's alphas, are often used to rank the
performance of portfolio or mutual fund managers.
[edit]History
This ratio was developed by William Forsyth Sharpe in 1966.[2] Sharpe originally called it the
"reward-to-variability" ratio before it began being called the Sharpe Ratio by later academics and
financial operators.
Sharpe's 1994 revision acknowledged that the risk free rate changes with time. Prior to this
revision the definition was

assuming a constant Rf .
Recently, the (original) Sharpe ratio has often been challenged with regard to its appropriateness
as a fund performance measure during evaluation periods of declining markets.[3]
[edit]Examples
Suppose the asset has an expected return of 15% in excess of the risk free rate. We typically do
not know if the asset will have this return; suppose we assess the risk of the asset, defined as
standard deviation of the asset's excess return, as 10%. The risk-free return is constant. Then the
Sharpe ratio (using a new definition) will be 1.5 (R − Rf = 0.15 and σ = 0.10).
As a guide post, one could substitute in the longer term return of the S&P500 as 10%. Assume
the risk-free return is 3.5%. And the average standard deviation of the S&P500 is about 16%.
Doing the math, we get that the average, long-term Sharpe ratio of the US market is about
0.40625 ((10%-3.5%)/16%). But we should note that if one were to calculate the ratio over, for
example, three-year rolling periods, then the Sharpe ratio could vary dramatically.
[edit]Strengths and weaknesses
The Sharpe ratio has as its principal advantage that it is directly computable from any observed
series of returns without need for additional information surrounding the source of profitability.
Other ratios such as the bias ratio have recently been introduced into the literature to handle
cases where the observed volatility may be an especially poor proxy for the risk inherent in a
time-series of observed returns.
[edit]References
1. ^Sharpe, W. F. (1994). "The Sharpe Ratio". Journal of Portfolio
Management21 (1): 49–58.
2. ^Sharpe, W. F. (1966). "Mutual Fund Performance". Journal of Business39
(S1): 119–138. doi:10.1086/294846.
3. ^Scholz, Hendrik (2007). "Refinements to the Sharpe ratio: Comparing
alternatives for bear markets". Journal of Asset Management7 (5): 347–357.
doi:10.1057/palgrave.jam.2250040.

[edit]See also
• Capital asset pricing model
• Jensen's alpha
• Modern portfolio theory
• Roy's safety-first criterion
• Sortino ratio
• Bias ratio (finance)
• Calmar ratio
• Treynor ratio
• Upside potential ratio
• Information ratio
• Coefficient of variation

[edit]External links
• The Sharpe ratio
• How sharp is the Sharpe ratio
v • d • e
Stock market

Stock ·Common stock ·Preferred stock ·Outstanding


Types of
stock ·Treasury stock ·Authorised stock ·Restricted
stocks
stock ·Concentrated stock ·Golden share

Participant Investor ·Stock trader/investor ·Market maker ·Floor trader ·Floor


s broker ·Broker-dealer

Stock exchange ·List of stock exchanges ·Over-the-


Exchanges
counter ·Electronic Communication Network

Gordon model ·Dividend yield ·Earnings per share ·Book


Stock
value ·Earnings yield ·Beta ·Alpha ·CAPM ·Arbitrage pricing
valuation
theory ·T-Model

P/CF ratio ·P/E ·PEG ·P/S ratio ·P/B ratio ·D/E ratio ·Dividend
Financial payout ratio ·Dividend
ratios cover ·SGR ·ROIC ·ROCE ·ROE ·ROA ·EV/EBITDA ·RSI ·Sharpe
ratio ·Treynor ratio ·Cap rate

Trading
Efficient-market hypothesis ·Fundamental analysis ·Technical
theories
analysis ·Modern portfolio theory ·Post-modern portfolio
and
theory ·Mosaic theory ·Pairs trade
strategies

Dividend ·Stock split ·Reverse stock split ·Growth


Related stock ·Speculation ·Trade ·IPO ·Market trend ·Short
terms selling ·Momentum ·Day trading ·DuPont Model ·Dark
liquidity ·Market depth ·Margin ·Rally ·Volatility

[edit]Further reading
• Bruce J. Feibel. Investment Performance Measurement. New York: Wiley,
2003. ISBN 0471268496
Retrieved from "http://en.wikipedia.org/wiki/Sharpe_ratio"
Categories: Financial ratios | Statistical ratios

Views
• Article
• Discussion
• Edit this page
• History
Personal tools
• Try Beta
• Log in / create account
Navigation
• Main page
• Contents
• Featured content
• Current events
• Random article
Search
Top of Form

Special:Search Go Search

Bottom of Form
Interaction
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
• Donate to Wikipedia
• Help
Toolbox
• What links here
• Related changes
• Upload file
• Special pages
• Printable version
• Permanent link
• Cite this page
Languages
• ‫العربية‬
• Català
• Deutsch
• Español
• Français
• Italiano
• ‫עברית‬
• Nederlands
• Polski
• Русский
• Suomi

• This page was last modified on 12 March 2010 at 16:45.


• Text is available under the Creative Commons Attribution-ShareAlike License;
additional terms may apply. See Terms of Use for details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a
non-profit organization.
• Contact us
• Privacy policy
• About Wikipedia
• Disclaimers

Market portfolio
From Wikipedia, the free encyclopedia
Jump to: navigation, search

A market portfolio is a portfolio consisting of a weighted sum of every asset in the market, with
weights in the proportions that they exist in the market (with the necessary assumption that these
assets are infinitely divisible).
Richard Roll's critique (1977) states that this is only a theoretical concept, as to create a market
portfolio for investment purposes in practice would necessarily include every single possible
available asset, including real estate, precious metals, stamp collections, jewelry, and anything
with any worth, as the theoretical market being referred to would be the world market. As a
result, proxies for the market (such as the FTSE100 in the UK, DAX in Germany or the S&P500
in the US) are used in practice by investors. Roll's critique states that these proxies cannot
provide an accurate representation of the entire market.
The concept of a market portfolio plays an important role in many financial theories and models,
including the Capital asset pricing model where it is the only fund in which investors need to
invest, to be supplemented only by a risk-free asset (depending upon each investor's attitude
towards risk).
(rj - rf) = Bj(rm - rf)
Where Rj and Fj are the the returns to security j and the risk-free rate, Rm is the return of the
Market Portfolio.

This economics or finance-related article is a stub. You can help


Wikipedia by expanding it.
v • d • e

Retrieved from "http://en.wikipedia.org/wiki/Market_portfolio"

Categories: Financial markets | Economics and finance stubs

Hidden categories: Articles lacking sources from December 2009 | All articles
lacking sources

Views
• Article
• Discussion
• Edit this page
• History
Personal tools
• Try Beta
• Log in / create account
Navigation
• Main page
• Contents
• Featured content
• Current events
• Random article
Search
Top of Form

Special:Search Go Search

Bottom of Form
Interaction
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
• Donate to Wikipedia
• Help
Toolbox
• What links here
• Related changes
• Upload file
• Special pages
• Printable version
• Permanent link
• Cite this page

• This page was last modified on 13 February 2010 at 03:57.


• Text is available under the Creative Commons Attribution-ShareAlike License;
additional terms may apply. See Terms of Use for details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a
non-profit organization.
• Contact us
• Privacy policy
• About Wikipedia
• Disclaimers

Earnings response coefficient


From Wikipedia, the free encyclopedia
Jump to: navigation, search

The introduction to this article provides insufficient context for those


unfamiliar with the subject. Please help improve the article with a good
introductory style. (October 2009)
Contents
• 1Introduction
• 2Use & Debate
• 3See also
• 4References

[edit]Introduction
The earnings response coefficient, or ERC, is the estimated relationship between equity returns
and the unexpected portion of (i.e., new information in) companies' earnings announcements.
In financial economics, arbitrage pricing theory describes the theoretical relationship between
information that is known to market participants about a particular equity (e.g., a common stock
share of a particular company) and the price of that equity. Under the efficient market
hypothesis, equity prices are expected in the aggregate to reflect all relevant information at a
given time. Market participants with superior information are expected to exploit that
information until share prices have effectively impounded the information. Therefore, in the
aggregate, a portion of changes in a company's share price is expected to result from changes in
the relevant information available to the market. The ERC is an estimate of the change in a
company's stock price due to the information provided in a company's earnings announcement.
The ERC is expressed mathematically as follows:
R = a + b(ern − u) + e
R = the expected return

a = benchmark rate

b = earning response coefficient

(ern-u) = (actual earnings less expected earnings) = unexpected earnings

e = random movement

[edit]Use & Debate


ERCs are used primarily in research in accounting and finance. In particular, ERCs have been
used in research in positive accounting, a branch of financial accounting research, as they
theoretically describe how markets react to different information events. Research in Finance has
used ERCs to study, among other things, how different investors react to information events.
(Hotchkiss & Strickland 2003)
There is some debate concerning the true nature and strength of the ERC relationship. As
demonstrated in the above model, the ERC is generally considered to be the slope coefficient of
a linear equation between unexpected earnings and equity return. However, certain research
results suggest that the relationship is nonlinear.(Freeman & Tse 1992)

Price dispersion
From Wikipedia, the free encyclopedia
Jump to: navigation, search

In economics, price dispersion is variation in prices across sellers of the same item, holding
fixed the item's characteristics. Price dispersion can be viewed as a measure of trading frictions
(or, tautologically, as a violation of the law of one price). It is often attributed to consumer search
costs or unmeasured attributes (such as the reputation) of the retailing outlets involved. There is a
difference between price dispersion and price discrimination. The latter concept involves a single
provider charging different prices to different customers for an identical good. Price dispersion,
on the other hand, is best thought of as the outcome of many firms potentially charging different
prices, where customers of one firm find it difficult to patronize (or are perhaps unaware of)
other firms due to the existence of search costs.
Price dispersion measures include the range of prices, the percentage difference of highest and
lowest price, the standard deviation of the price distribution, the variance of the price
distribution, and the coefficient of variation of the price distribution.
In most theoretical literature, price dispersion is argued as result from spatial difference and the
existence of significant search cost. With the development of internet and shopping agent
programs, conventional wisdom tells that price dispersion should be alleviated and may
eventually disappear in the online market due to the reduced search cost for both price and
product features. However, recent studies found a surprisingly high level of price dispersion
online, even for standardized items such as books, CDs and DVDs. There is some evidence of a
shrinking of this online price dispersion, but it remains significant. Recently, work has also been
done in the area of e-commerce, specifically the Semantic Web, and its effects on price
dispersion.
Hal Varian, an economist at U. C. Berkeley, argued in a 1980 article that price dispersion may be
an intentional marketing technique to encourage shoppers to explore their options.[1]
A related concept is that of wage dispersion.
[edit]See also
• Law of one price
• Search theory

[edit]References
1. ^ Varian, Hal R., "A Model of Sales" (Sep., 1980), The American Economic
Review, Vol. 70, No. 4 , pp. 651-659.
• Baye, Michael, John Morgan and Patrick Scholten, "Information, Search, and
Price Dispersion," (in Handbook on Economics and Information Systems, T.
Hendershott, Ed., Elsevier, forthcoming)
• Dahlby, Bev and Douglas West, (1986), "Price Dispersion in an Automobile
Market," Journal of Political Economy, 94(2): 418-438.
• Nash-Equilibrium.com
• Venkatesh Shankar, Xing Pan, and Brian T. Ratchford, (2002), "Do Drivers of
Online Price Dispersion Change as Online Markets Grow?," working paper,
December, University of Maryland.
• Cooper, Sean, "Why You Can't Get iPods At a Discount", Slate.
• Gupta, Tanya, and Abir Qasem,(2002), "Reduction of Price Dispersion through
Semantic E-commerce," (in Workshop at WWW2002 International Workshop
on the Semantic Web, Hawaii, May 7, 2002)
• Thiel, Stuart E., "A New Model of Persistent Retail Price Dispersion" (July 6,
2005). Available at SSRN: http://ssrn.com/abstract=757357
Retrieved from "http://en.wikipedia.org/wiki/Price_dispersion"
Categories: Pricing | Economic efficiency | Economics terminology

Views
• Article
• Discussion
• Edit this page
• History
Personal tools
• Try Beta
• Log in / create account
Navigation
• Main page
• Contents
• Featured content
• Current events
• Random article
Search
Top of Form

Special:Search Go Search

Bottom of Form
Interaction
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
• Donate to Wikipedia
• Help
Toolbox
• What links here
• Related changes
• Upload file
• Special pages
• Printable version
• Permanent link
• Cite this page
Languages
• ‫العربية‬

• This page was last modified on 3 November 2009 at 16:49.


• Text is available under the Creative Commons Attribution-ShareAlike License;
additional terms may apply. See Terms of Use for details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a
non-profit organization.
• Contact us
• Privacy policy
• About Wikipedia
• Disclaimers

Search theory
From Wikipedia, the free encyclopedia
Jump to: navigation, search

This article is about the economics of search problems. For other uses of 'search',
see Searching.

In economics, search theory (or just search) is the study of an individual's optimal strategy
when choosing from a series of potential opportunities of random quality, given that delaying
choice is costly. Search models illustrate how best to balance the cost of delay against the value
of the option to try again.
Two common settings for these models (and their empirical applications) are a worker's search
for a job, in labor economics, and a consumer's search for a product they wish to purchase, in
consumer theory. From a worker's perspective, an acceptable job would be one that pays a high
wage, one that offers desirable benefits, and/or one that offers pleasant and safe working
conditions. From a consumer's perspective, a product worth purchasing would have sufficiently
high quality, and be offered at a sufficiently low price. In both cases, whether a given job or
product is acceptable depends on the searcher's beliefs about the alternatives available in the
market.
Contents
• 1Search from a known distribution
• 2Search from an unknown
distribution
• 3Endogenizing the price
distribution
• 4Matching theory
• 5See also
• 6References

[edit]Search from a known distribution


George J. Stigler proposed thinking of searching for bargains or jobs as an economically
important problem,[1][2] but the problem was first solved mathematically by John J. McCall.
McCall's paper studied the problem of which job offers an unemployed worker should accept,
and which reject, when the distribution of alternatives is known and constant, and the value of
money is constant.[3] Holding fixed job characteristics, he characterized the job search decision in
terms of the reservation wage, that is, the lowest wage the worker is willing to accept. The
worker's optimal strategy is simply to reject any wage offer lower than the reservation wage, and
accept any wage offer higher than the reservation wage.
The reservation wage may change over time if some of the conditions assumed by McCall are
not met. For example, a worker who fails to find a job might lose skills or face stigma, in which
case the distribution of potential offers that worker might receive will get worse, the longer he or
she is unemployed. In this case, the worker's optimal reservation wage will decline over time.
Likewise, if the worker is risk averse, the reservation wage will decline over time if the worker
gradually runs out of money while searching.[4] The reservation wage would also differ for two
jobs of different characteristics; that is, there will be a compensating differential between
different types of jobs.
An interesting observation about McCall's model is that greater variance of offers may make the
searcher better off, and prolong optimal search, even if he or she is risk averse. This is because
when there is more variation in wage offers (holding fixed the mean), the searcher may want to
wait longer (that is, set a higher reservation wage) in hopes of receiving an exceptionally high
wage offer. The possibility of receiving some exceptionally low offers has less impact on the
reservation wage, since bad offers can be turned down.
While McCall framed his theory in terms of the wage search decision of an unemployed worker,
similar insights are applicable to a consumer's search for a low price. In that context, the highest
price a consumer is willing to pay for a particular good is called the reservation price.
[edit]Search from an unknown distribution
When the searcher does not even know the distribution of offers, then there is an additional
motive for search: by searching longer, more is learned about the range of offers available.
Search from one or more unknown distributions is called a multi-armed bandit problem. The
name comes from the slang term 'one-armed bandit' for a casino slot machine, and refers to the
case in which the only way to learn about the distribution of rewards from a given slot machine
is by actually playing that machine. Optimal search strategies for an unknown distribution have
been analyzed using allocation indices such as the Gittins index.
[edit]Endogenizing the price distribution
Studying optimal search from a given distribution of prices led economists to ask why the same
good should ever be sold, in equilibrium, at more than one price. After all, this is by definition a
violation of the law of one price. However, when buyers do not have perfect information about
where to find the lowest price (that is, whenever search is necessary), not all sellers may wish to
offer the same price, because there is a tradeoff between the frequency and the profitability of
their sales. That is, firms may be indifferent between posting a high price (thus selling
infrequently, only to those consumers with the highest reservation prices) and a low price (at
which they will sell more often, because it will fall below the reservation price of more
consumers).[5],[6]
[edit]Matching theory
Main article: Matching theory (macroeconomics)

More recently, especially since the 1990s, many economists have been working on integrating
job search into models of the macroeconomy, using a framework called 'matching theory'
originally developed by Dale Mortensen and extended by Peter A. Diamond and Christopher A.
Pissarides. In this framework, the rate at which new jobs are formed is assumed to depend both
on workers' search decisions, and on firms' decisions to open job vacancies. While some
matching models include a distribution of different wages,[7] others are simplified by ignoring
wage differences. The simplified versions of the model focus instead on the main reduced form
implication of search: namely, the fact that optimal job search takes time, so that workers are
likely to pass through a spell of unemployment before beginning work.[8]

Bias ratio (finance)


From Wikipedia, the free encyclopedia
Jump to: navigation, search

The bias ratio is an indicator used in finance to analyze the returns of investment portfolios, and
in performing due diligence.
The bias ratio is a concrete metric that detects valuation bias or deliberate price manipulation of
portfolio assets by a manager of a hedge fund, mutual fund or similar investment vehicle,
without requiring disclosure (transparency) of the actual holdings. This metric measures
abnormalities in the distribution of returns that indicate subjective pricing. The formulation of
the Bias Ratio stems from an insight into the behavior of asset managers as they address the
expectations of investors with the valuation of assets that determine their performance.
The bias ratio measures how far the returns from an investment portfolio - e.g. one managed by a
hedge fund - are from an unbiased distribution. Thus the bias ratio of a pure equity index will
usually be close to 1. However, if a fund smooths its returns using subjective pricing of illiquid
assets the bias ratio will be higher. As such, it can help identify the presence of illiquid securities
where they are not expected.
The bias ratio was first defined by Adil Abdulali, a risk manager at the investment firm Protégé
Partners. The Concepts behind the Bias Ratio were formulated between 2001 and 2003 and
privately used to screen money managers. The first public discussions on the subject took place
in 2004 at New York University's Courant Institute and in 2006 at Columbia University. In 2006,
the Bias Ratio was published in a letter to Investors and made available to the public by Riskdata
a risk management solution provider that included it in its standard suite of analytics.
The Bias Ratio has since been used by a number of Risk Management professionals to spot
suspicious funds that subsequently turned out to be frauds. The most spectacular example of this
was reported in the Financial Times on January 22, 2009 titled “Bias ratio seen to unmask
Madoff”!

Contents
• 1Explanation
• 2Mathematical formulation
• 3Examples and Context
○ 3.1Natural Bias Ratios of asset returns
• 4Contrast to other metrics
○ 4.1Bias Ratios vs. Sharpe Ratios
○ 4.2Serial correlation
• 5Practical thresholds
• 6Uses and limitations
• 7See also
• 8References

[edit]Explanation
Imagine that you are a hedge fund manager who invests in securities that are hard to value, such
as mortgage backed derivatives. Your peer group consists of funds with similar mandates, and all
have track records with high Sharpe ratios, very few down months, and investor demand from
the “one per cent per month” crowd. You are keenly aware that your potential investors look
carefully at the characteristics of returns, including such calculations as the percentage of months
with negative and positive returns.
Furthermore, assume that no pricing service can reliably price your portfolio, and the assets are
often sui generis with no quoted market. In order to price the portfolio for return calculations,
you poll dealers for prices on each security monthly and get results that vary widely on each
asset. The following real world example illustrates this theoretical construct.
Table 1

When pricing this portfolio, standard market practice allows a manager to discard outliers and
average the remaining prices. But what constitutes an outlier? Market participants contend that
outliers are difficult to characterize methodically and thus use the heuristic rule “you know it
when you see it.” Visible outliers consider the particular security’s characteristics and liquidity
as well as the market environment in which quotes are solicited. After discarding outliers, a
manager sums up the relevant figures and determines the net asset value (“NAV”). Now let’s
consider what happens when this NAV calculation results in a small monthly loss, such as
-0.01%. Lo and behold, just before the CFO publishes the return, an aspiring junior analyst
notices that the pricing process included a dealer quote 50% below all the other prices for that
security. Throwing out that one quote would raise the monthly return to +0.01%.
A manager with high integrity faces two pricing alternatives. Either the manager can close the
books, report the -0.01% return, and ignore new information, ensuring the consistency of the
pricing policy (Option 1) or the manager can accept the improved data, report the +0.01% return,
and document the reasons for discarding the quote (Option 2).
Figure 1

The smooth blue histogram represents a manager who employed Option 1, and the kinked red
histogram represents a manager who chose Option 2 in those critical months. Given the
proclivity of Hedge Fund investors for consistent, positive monthly returns, many a smart
businessman might choose Option 2, resulting in more frequent small positive results and far
fewer small negative ones than in Option 1. The “reserve” that allows “false positives” with
regularity is evident in the unusual hump at the -1.5 Standard Deviation point. This psychology
is summed up in a phrase often heard on trading desks on Wall Street, “let us take the pain now!”
The geometry of this behavior in figure 1 is the area in between the blue line and the red line
from -1σ to 0.0, which has been displaced, like toothpaste squeezed from a tube, farther out into
negative territory.
By itself, such a small cover up might not concern some beyond the irritation of misstated return
volatility. However, the empirical evidence that justifies using a “Slippery Slope” argument here
includes almost every mortgage backed fund that has blown up because of valuation problems,
such as the Safe Harbor fund, and equity funds such as the Bayou fund. Both funds ended up
perpetrating outright fraud born from minor cover ups. More generally, financial history has
several well known examples where hiding small losses eventually led to fraud such as the
Sumitomo copper affair as well as the demise of Barings Bank.
[edit]Mathematical formulation
Although the hump at -σ is difficult to model, behavior induced modifications manifest
themselves in the shape of the return histogram around a small neighborhood of zero. It is
approximated by a straightforward formula.
Let: [0, +σ] = the closed interval from zero to +1 standard deviation of returns (including zero)
Let: [-σ, 0) = the half open interval from -1 standard deviation of returns to zero (including -σ
and excluding zero)
Let:
ri = return in month i, 1 ≤ i ≤ n, and n = number of monthly returns
Then:

The Bias Ratio roughly approximates the ratio between the area under the return histogram near
zero in the first quadrant and the similar area in the second quadrant. It holds the following
properties:

a.

b. If then BR = 0

c. If such that then BR = 0

d. If the distribution ri is Normal with mean = 0, then BR approaches 1 as n


goes to infinity.

The Bias Ratio defined by a 1σ interval around zero works well to discriminate amongst hedge
funds. Other intervals provide metrics with varying resolutions, but these tend towards 0 as the
interval shrinks.
[edit]Examples and Context
[edit]Natural Bias Ratios of asset returns
Table 2

The Bias Ratios of market and hedge fund indices gives some insight into the natural shape of
returns near zero. Theoretically one would not expect demand for markets with normally
distributed returns around a zero mean. Such markets have distributions with a Bias Ratio of less
than 1.0. Major market indices support this intuition and have Bias Ratios generally greater than
1.0 over long time periods. The returns of equity and fixed income markets as well as alpha
generating strategies have a natural positive skew that manifests in a smoothed return histogram
as a positive slope near zero. Fixed income strategies with a relatively constant positive return
(“carry”) also exhibit total return series with a naturally positive slope near zero. Cash
investments such as 90-day T-Bills have large Bias Ratios, because they generally do not
experience periodic negative returns. Consequently the Bias Ratio is less reliable for the theoretic
hedge fund that has an un-levered portfolio with a high cash balance.
[edit]Contrast to other metrics
[edit]Bias Ratios vs. Sharpe Ratios
Since the Sharpe Ratio measures risk-adjusted returns, and valuation biases are expected to
understate volatility, one might reasonably expect a relationship between the two. For example,
an unexpectedly high Sharpe Ratio may be a flag for skeptical practitioners to detect smoothing .
The data does not support a strong statistical relationship between a high Bias Ratio and a high
Sharpe Ratio. High Bias Ratios exist only in strategies that have traditionally exhibited high
Sharpe Ratios, but plenty of examples exist of funds in such strategies with high Bias Ratios and
low Sharpe Ratios. The prevalence of low Bias Ratio funds within all strategies further
attenuates any relationship between the two.
[edit]Serial correlation
Hedge fund investors use serial correlation to detect smoothing in hedge fund returns. Market
frictions such as transaction costs and information processing costs that cannot be arbitraged
away lead to serial correlation, as well as do stale prices for illiquid assets. Managed prices are a
more nefarious cause for serial correlation. Confronted with illiquid, hard to price assets,
managers may use some leeway to arrive at the fund’s NAV. When returns are smoothed by
marking securities conservatively in the good months and aggressively in the bad months a
manager adds serial correlation as a side effect. The more liquid the fund’s securities are, the less
leeway the manager has to make up the numbers.
The most common measure of serial correlation is the Ljung-Box Q-Statistic. The p-values of the
Q-statistic establish the significance of the serial correlation. The Bias Ratio compared to the
serial correlation metric gives different results.
Table 3

Serial correlations appear in many cases that are likely not the result of willful manipulation but
rather the result of stale prices and illiquid assets. Both Sun Asia and Plank are emerging market
hedge funds for which the author has full transparency and whose NAVs are based on objective
prices. However, both funds show significant serial correlation. The presence of serial
correlation in several market indices such as the JASDAQ and the SENSEX argues further that
serial correlation might be too blunt a tool for uncovering manipulation. However the two
admitted frauds, namely Bayou, an Equity fund and Safe Harbor, an MBS fund (Table IV shows
the critical Bias Ratio values for these strategies) are uniquely flagged by the Bias Ratio in this
sample set with none of the problems of false positives suffered by the serial correlation metric.
The Bias Ratio’s unremarkable values for market indices, adds further credence to its
effectiveness in detecting fraud.
[edit]Practical thresholds
Figure 2

Hedge fund strategy indices cannot generate benchmark Bias Ratios because aggregated monthly
returns mask individual manager behavior. All else being equal, managers face the difficult
pricing options outlined in the introductory remarks in non-synchronous periods, and their
choices should average out in aggregate. However, Bias Ratios can be calculated at the manager
level and then aggregated to create useful benchmarks.

Table 4

Strategies that employ illiquid assets can have Bias Ratios with an order of magnitude
significantly higher than the Bias Ratios of indices representing the underlying asset class. For
example, most equity indices have Bias Ratios falling between 1.0 and 1.5. A sample of equity
hedge funds may have Bias Ratios ranging from 0.3 to 3.0 with an average of 1.29 and standard
deviation of 0.5. On the other hand, the Lehman Aggregate MBS Index had a Bias Ratio of 2.16,
while MBS hedge funds may have Bias Ratios from a respectable 1.7 to an astounding 31.0, with
an average of 7.7 and standard deviation of 7.5.
[edit]Uses and limitations
Ideally, a Hedge Fund investor would examine the price of each individual underlying asset that
comprises a manager’s portfolio. With limited transparency, this ideal falls short in practice,
furthermore, even with full transparency, time constraints prohibit the plausibility of this ideal,
rendering the Bias Ratio more efficient to highlight problems. The Bias Ratio can be used to
differentiate among a universe of funds within a strategy. If a fund has a Bias Ratio above the
median level for the strategy, perhaps a closer look at the execution of its pricing policy is
warranted; whereas, well below the median might warrant only a cursory inspection.
The Bias Ratio is also useful to detect illiquid assets forensically. The table above offers some
useful benchmarks. If a database search for Long/Short Equity managers reveals a fund with a
reasonable history and a Bias Ratio greater than 2.5, detailed diligence will no doubt reveal some
fixed income or highly illiquid equity investments in the portfolio.
The Bias Ratio gives a strong indication of the presence of a) illiquid assets in a portfolio
combined with b) a subjective pricing policy. Most of the valuation-related hedge fund debacles
have exhibited high Bias Ratios. However, the converse is not always true. Often managers have
legitimate reasons for subjective pricing, including restricted securities, Private Investments in
public equities, and deeply distressed securities. Therefore, it would be unwise to use the Bias
Ratio as a stand alone due diligence tool. In many cases, the author has found that the subjective
policies causing high Bias Ratios also lead to “conservative” pricing that would receive higher
grades on a “prudent man” test than would an un-biased policy. Nevertheless, the coincidence of
historical blow-ups with high Bias Ratios encourages the diligent investor to use the tool as a
warning flag to investigate the implementation of a manager’s pricing policies.
[edit]See also
• Sharpe ratio
• Treynor ratio
• Jensen's alpha
• Sortino ratio
• Beta (finance)
• Modern portfolio theory

[edit]References
• Weinstein, Eric; Abdulali, Adil, “Hedge fund transparency: quantifying
valuation bias for illiquid assets”, June 2002, Risk.
• Abdulali, Adil; Rahl, Leslie; Weinstein, Eric, “Phantom Prices & Liquidity: The
Nuisance of Translucence”, 2002, AIMA.
• Bias Ratio: Detecting Hedge-Fund Return Smoothing
• The Madoff Case: Quantitative Beats Qualitative!
• Risk Indicator Detects When Hedge Funds Trading Illiquid Securities Are
Smoothing Returns
• Riskdata Research Shows That 30% of Funds Trading Illiquid Securities
Smooth Their Returns
• Bias ratio seen to unmask Madoff (Financial Times January 22 2009)
• Riskdata [1]
• Pension Risk Matters [2]
• Getmansky, Mila; Lo, Andrew; Makarov, Igor; “An Econometric Model of Serial
Correlation and Illiquidity In Hedge Fund Returns”, 2003, NBER Working Paper
No. w9571 Issued in March 2003.)
• Asness, Clifford S.; Krail, Robert J.; Liew, John M., “Alternative Investments:
Do Hedge Funds Hedge?”, 2001, Journal of Portfolio Management, Volume 28,
Number 1.
• SEC Litigation Release No. 18950, October 28, 2004
• SEC Litigation Release No. 19692, May 9, 2006
• Weisman, Andrew, “Dangerous Attractions: Informationless Investing and
Hedge Fund Performance Measurement Bias”, 2002, Journal of Portfolio
Management.
• Lo, Andrew W.; “Risk Management For Hedge Funds: Introduction and
Overview”, White Paper, June, 2001.
• Ljung, G.M.; Box, G.E.P.; “On a measure of lack of fit in time series models”,
Biometrika, 65, 2, pp. 297-303. 1978.
• Chan, Nicholas; Getmansky, Mila; Haas, Shane M.; Lo, Andrew; “Systemic Risk
and Hedge Funds”, 2005, NBER Draft, August 1, 2005.
Retrieved from "http://en.wikipedia.org/wiki/Bias_ratio_(finance)"
Categories: Financial ratios

Views
• Article
• Discussion
• Edit this page
• History
Personal tools
• Try Beta
• Log in / create account
Navigation
• Main page
• Contents
• Featured content
• Current events
• Random article
Search
Top of Form

Special:Search Go Search

Bottom of Form
Interaction
• About Wikipedia
• Community portal
• Recent changes
• Contact Wikipedia
• Donate to Wikipedia
• Help
Toolbox
• What links here
• Related changes
• Upload file
• Special pages
• Printable version
• Permanent link
• Cite this page

• This page was last modified on 3 February 2010 at 16:53.


• Text is available under the Creative Commons Attribution-ShareAlike License;
additional terms may apply. See Terms of Use for details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a
non-profit organization.
• Contact us
• Privacy policy
• About Wikipedia
• Disclaimers

Roll's critique
From Wikipedia, the free encyclopedia
Jump to: navigation, search

Roll's critique is a famous analysis of the validity of the Capital Asset Pricing Model (CAPM).
It concerns methods to formally test the statement of the CAPM, the equation

This equation relates the asset expected return E(Ri) to the asset covariance βim with the market
portfolio return Rm. The market return is defined as the wealth-weighted sum of all investment
returns in the economy.
Roll's critique makes two statements regarding the market portfolio:
1. Mean-Variance Tautology: Any mean-variance efficient portfolio Rp satisfies the CAPM
equation exactly:

.
Mean-variance efficiency of the market portfolio is equivalent to the CAPM equation holding.
This statement is a mathematical fact, requiring no model assumptions.
Given a proxy for the market portfolio, testing the CAPM equation is equivalent to testing mean-
variance efficiency of the portfolio. The CAPM is tautological if the market is assumed to be
mean-variance efficient. Proof of Mean Variance Tautology.
2. The Market Portfolio is Unobservable: The market portfolio in practice would necessarily
include every single possible available asset, including real estate, precious metals, stamp
collections, jewelry, and anything with any worth. The returns on all possible investments
opportunities are unobservable.
From statement 1, validity of the CAPM is equivalent to the market being mean-variance
efficient with respect to all investment opportunities. Without observing all investment
opportunities, it is not possible to test whether this portfolio, or indeed any portfolio, is mean-
variance efficient. Consequently, it is not possible to test the CAPM.
[edit]Relationship to the APT
The mean-variance tautology argument applies to the Arbitrage Pricing Theory and all asset-
pricing models of the form

where are unspecified factors. If the factors are returns on a mean-variance portfolio,
the equation holds exactly.
It is always possible to identify in-sample mean-variance efficient portfolios within a dataset of
returns. Consequently, it is also always possible to construct in-sample asset pricing models that
exactly satisfy the above pricing equation. This is an example of data dredging.
[edit]Discussion
Roll's critique has received a large number of citations in the financial economics literature[1].
The majority of these citations refer to the second statement of critique; few papers address the
first statement. Many researchers and practitioners interpret Roll's critique as stating only "The
Market Portfolio is Unobservable."

Вам также может понравиться